Message boards :
Number crunching :
Nvidia Driver kernal something stopped responding something recovered
Message board moderation
Author | Message |
---|---|
JakeTheDog Send message Joined: 3 Nov 13 Posts: 153 Credit: 2,585,912 RAC: 0 |
I think I downloaded 2 GPU SOG tasks several hours before servers went down May 31 early morning with that extended downtime issue. https://setiathome.berkeley.edu/result.php?resultid=4960012801 https://setiathome.berkeley.edu/result.php?resultid=4960012803 I was about 50% through 1 of them when screen would go black for a few seconds and a Windows taskbar message would say "Nvidia driver kernel something stopped responding and recovered." I noticed the task list kept saying postponed 30 seconds, and event log sometimes said GPU not detected. I suspended that one, and it would happen to the other GPU task that had not started. Sorry, but I didn't copy down the exact messages in the logs. I was so worried that my GPU or PSU were broken. I reinstalled drivers, rolled them back as well, with DDU to clean them up. Tried some benchmark tests. Tasks still the same. On CPUID HWM, temperatures were OK, not familiar with voltages but they didn't go very high compared to heavy gaming. Heavy gaming that used more power did not have same issue. Servers came back up. I decided to reset project and download new GPU tasks. New GPU tasks are running OK. The 2 I reset are still in my account's task list. Do they stay there until the deadline? Were those 2 tasks somehow corrupted? Possibly related to server crash? Or random corruption? SOG related (did many of those successfully, though)? Hopefully nothing wrong with my hardware. Windows 7 64-bit i5-3570k not OC right now MSI GTX 650TI Boost 16GB RAM |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
You suffered a TDR fault with Windows and unfortunately, those tasks that were running when it happened are toast. Anytime the Nvidia driver disappears while crunching leads to a task error. The tasks will just time out after deadline but will be "ghosts" in your In Progress tasks. If you happen to go over to the Nvidia graphics card forums and search for the TDR fault theme, then there is a registry hack that extends the timeout detection algorithm which reduces the driver re-initialize problem. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
|
JakeTheDog Send message Joined: 3 Nov 13 Posts: 153 Credit: 2,585,912 RAC: 0 |
TDR registry seems to be already set to 8 as default. I'm not familiar with the settings files in BOINC, I'll take another look at that OpenCL thread if the driver crash happens again |
The_Matrix Send message Joined: 17 Nov 03 Posts: 414 Credit: 5,827,850 RAC: 0 |
strange thing, i changed the drivers, set pcie bus frequency lower, but the SoG workunits, since Lunarics 0.45 BETA newest release. Crunshing stop, and display drivers recovers permanently. Known issue ? |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
strange thing, i changed the drivers, set pcie bus frequency lower, but http://setiathome.berkeley.edu/forum_thread.php?id=79760 SETI apps news We're not gonna fight them. We're gonna transcend them. |
JakeTheDog Send message Joined: 3 Nov 13 Posts: 153 Credit: 2,585,912 RAC: 0 |
If anyone's interested, this happened to me again. I tried 2 things at the same time so don't know which one fixed it. My TDR was already 8, I decided to increase to 10. Restarted BOINC but did not reboot the computer and it kept happening. https://support.microsoft.com/en-gb/kb/2665946 From Raistmer's thread, I added -sbs 256 -period_iterations_num 100 to some txt files. not sure if the right ones. They were mb_cmdline-8.12_windows_intel__opencl_nvidia_sah.txt and mb_cmdline-8.12_windows_intel__opencl_nvidia_SoG.txt. Again, did not reboot computer but restarted BOINC, still happened. And this time my computer crashed and rebooted on its own. After reboot, GPU task is running OK so far. So, not sure which one fixed it, but possibly changing TDR even higher to 10 with a reboot (because changed registry) maybe did it for me. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.