Message boards :
Number crunching :
Some GPU workunits cause driver reset
Message board moderation
Author | Message |
---|---|
DayneC Send message Joined: 12 Dec 14 Posts: 8 Credit: 75,443 RAC: 0 |
It seems like some but not all tasks cause my AMD graphics driver to reset. Most recently these two are giving me the issue; http://setiathome.berkeley.edu/result.php?resultid=4014474912 http://setiathome.berkeley.edu/result.php?resultid=4014474916 I aborted the first one but have just suspended gpu processing for now. I have done 3 passes with memtest on both my ram sticks, tested both slots of my motherboard with 3 passes in memtest. Furmark for 30 mins with no issue, though a few times I ran Furmark it actually crashed, it seemed to run fine with all other software closed down. I have run Prime95 for a few hours without error. I have tried AMD driver versions 14.12, 14.4, and pretty sure also 14.9 and 13.12. Sometimes I will come back to find that the GPU no longer has any load, though the task says it is running in BOINC, usually this is over night. Other times it will reset while I am sitting in front of the computer. Several times when I have tried to restart a task that has stopped in this way, it will cause the display to go black for a moment and not recover correctly while the machine remains unresponsive till a hard reset, although music or video will still be heard playing. I have read another thread on this forum about disabling a feature called TDR delay iirc and while this seems to stop it from resetting the display driver the system becomes unresponsive for a few moments every now and then. Worse it seemingly causes some tasks (Folding@home) to bsod the computer. Should I just try more driver versions till I find one that works or is that unlikely to be the issue? What about the optimised applications lunatics iirc? Will they possibly be less likely to crash? If I use those do you still get credits in the normal way? Unrelated to the my main issue, if you disable the option "Should SETI@home show your computers on its web site?" will that prevent stats websites e.g. BOINCStats from keeping track of your info? My Specs (nothing overclocked): Intel DH67GD i5 2500 8gb RAM (2x 4gb 1333 iirc) HD 6870 Corsair 650TX 2x WD Black HDD (1TB and 500GB) |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
@Raistmer: on the visible one, is that stderr truncated or normal ? If truncated I spent a good chunk of last week attempting to raise the core issues with Boinc devs... maybe you'll have better luck there. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
Try deleting the compilations the r1831 app made, that revision only did that once, later revisions redo them every time the driver/APP runtime changes. Suspend GPU usage, navigate to the setiathome project directory, (Should be C:\ProgramData\BOINC\projects\setiathome.berkeley.edu) then delete the compilations, they follow the following format: MB_clFFTplan_Capeverde_8_r1831.bin Where Capeverde is replaced by your GPU type, and IntelRCoreTMi72600KCPU340GHz is replaced by your CPU type, Once you're deleting those files, Unsuspend GPU usage and the app will regenerate them with the current APP runtime. Claggy |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
@Raistmer: on the visible one, is that stderr truncated or normal ? If truncated I spent a good chunk of last week attempting to raise the core issues with Boinc devs... maybe you'll have better luck there. That's truncated. Claggy |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
@Raistmer: on the visible one, is that stderr truncated or normal ? If truncated I spent a good chunk of last week attempting to raise the core issues with Boinc devs... maybe you'll have better luck there. hmmmm, thought so, *grumble* *grumble* *grumble* LOL (failing the other possible causes in this case, causes of that symptom can crash GPU drivers too...) "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
@Raistmer: on the visible one, is that stderr truncated or normal ? If truncated I spent a good chunk of last week attempting to raise the core issues with Boinc devs... maybe you'll have better luck there. Have a look in the 'No output again (for just one WU)?' thread too!! Claggy |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
@Raistmer: on the visible one, is that stderr truncated or normal ? If truncated I spent a good chunk of last week attempting to raise the core issues with Boinc devs... maybe you'll have better luck there. *gentle bangs head on table* yeah, it's been a long week. I'll probably just keep using modified boincapi, and try figure out how to offer a solid variant that's generally usable. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
If truncated I spent a good chunk of last week attempting to raise the core issues with Boinc devs... maybe you'll have better luck there. I find I have most luck with the BOINC devs if I watch what they're up to, and send off bug reports when I see that their head is in the right bit of code. At the moment Rom seems to be working all hours getting all the international language translations ported to a new CMS: I can sympathise with him not wanting to break away from that for an extended conversation about arcane (as they probably seem to him) details of the M$ C runtime threading model. |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
If truncated I spent a good chunk of last week attempting to raise the core issues with Boinc devs... maybe you'll have better luck there. Part of the problem isn't the Boinc devs, But the project itself, and it's devs, it tends to release apps, then sit on them for their whole life without producing refreshes with fixed lib/api's and fixes like the blocksum precision maintenance fixes that made it into the source after the 7.01 release. There is an outstanding ATI/AMD MBv7 suspending Bug in all the Windows ATI/AMD MBv7 apps where it doesn't suspend during CPU benchmarks (But stay in memory), I reported this problem something like 18 months ago, and it was fixed shortly afterwards, But new apps never made it to the project(s), recently I've found out that 'Suspend when non-BOINC CPU usage is above' x should also suspend GPU usage, but it doesn't, while that doesn't really matter to users with mid or higher range ATI/AMD GPUs, it does to users with low end GPUs where usage of the GPU causes lag, and therefore the GPU must suspend correctly. Claggy |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
If truncated I spent a good chunk of last week attempting to raise the core issues with Boinc devs... maybe you'll have better luck there. I can live with that. I wish it were solely an MS C runtime issue, and they had the time/inclination to harden the core functionality. Until that time I just hope they don't lose customers over it. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
If truncated I spent a good chunk of last week attempting to raise the core issues with Boinc devs... maybe you'll have better luck there. Yeah it's tough project-wise I suppose mostly because they've moved onto other work (like possibly drawing up MB8 and GBT stuff, who knows). Well, will just keep toodling away... "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
I have two machines with HD6870's. I'm using 14.4 on one and 14.9 on the other. Other than Cat Control Center crashing sometimes when opting with 14.9 I'm not having issues with either. With 14.12 I did see longer than normal CPU times for tasks when I did some testing with it. It looks like you didn't mentions your GPU temps. With the two different cases I have one of my HD6870s runs around 65-70ºC & the other around 68-74ºC. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
DayneC Send message Joined: 12 Dec 14 Posts: 8 Credit: 75,443 RAC: 0 |
Try deleting the compilations the r1831 app made, that revision only did that once, later revisions redo them every time the driver/APP runtime changes. Once I re enabled GPU it immediately froze up again and I had to reset. I then disabled gpu, closed BOINC completely, deleted the files again. Since then it has been running ok, but I will have to see over a longer period. My CPU temp peaks at about 75 usually and GPU maybe a bit higher than 70 when it's hot during the day. I'm experimenting with TThrottle as well to adjust temps and CPU/GPU load. |
DayneC Send message Joined: 12 Dec 14 Posts: 8 Credit: 75,443 RAC: 0 |
Since my last post this (came back to find the WU stalled with no GPU activity) has happened again twice, the first it didn't cause the machine to freeze up when I started it again, the second time I had to restart my machine for other reasons. Do I just have to delete those files in the project folder whenever this happens and live with it or is there something else I can try? |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
Delete all binaries first. Also i would suggest to try a more recent app. Easiest way is to run the Lunatics installer. http://setiathome.berkeley.edu/forum_thread.php?id=71867 With each crime and every kindness we birth our future. |
DayneC Send message Joined: 12 Dec 14 Posts: 8 Credit: 75,443 RAC: 0 |
You mean I should delete the EXEs from the project directory? |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
You mean I should delete the EXEs from the project directory? Mike is referring to these binaries mentioned earlier; Try deleting the compilations the r1831 app made, that revision only did that once, later revisions redo them every time the driver/APP runtime changes. After deleting those files run the first installer from here; http://mikesworldnet.de/download.html lunatics_win64_v0.43a_setup.exe Make sure to choose the app MB7_win_x86_SSE_OpenCL_ATi_HD5_r2489.exe After installing those files, find the file named mb_cmdline_win_x86_SSE_OpenCL_ATi_HD5.txt, and open it in wordpad. Add the line -sbs 256 to the empty file and save it. See how that works. |
DayneC Send message Joined: 12 Dec 14 Posts: 8 Credit: 75,443 RAC: 0 |
After installing lunatics according to your instructions, when restarting BOINC and re-enabling GPU it crashed almost instantly. After I restarted the machine I started BOINC again and it ran fine until later when it crashed again sometime during the night. I did select MB7_win_x86_SSE_OpenCL_ATi_HD5_r2489.exe, I also selected something else though, I think it was for astropulse to run on GPU but I'm not 100% sure. Should I uninstall and reinstall lunatics with only the option you said? |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
... when restarting BOINC and re-enabling GPU it crashed almost instantly. What crashed? The app, driver, BOINC, Windows ...? Try the stability of the computer/GPU/driver/PSU with some test programs: http://setiathome.berkeley.edu/forum_thread.php?id=76878&postid=1650200#1650200 http://setiathome.berkeley.edu/forum_thread.php?id=76878&postid=1651402#1651402 Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â |
DayneC Send message Joined: 12 Dec 14 Posts: 8 Credit: 75,443 RAC: 0 |
Seti@home WU crashed as per my thread title. As per my OP I have done Memtest, Prime95 and Furmark already. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.