Questions and Answers :
GPU applications :
GTX 970 problem
Message board moderation
Author | Message |
---|---|
Dave Lampkins Send message Joined: 19 Jan 00 Posts: 3 Credit: 27,683 RAC: 0 |
I am getting a message that says. Postponed:Cuda runtime,memory related failure,threadsafe temporary Exit I have no idea what to do,this is a new card,about 4 weeks old.My driver is Ver 347.88 release date 3/17/15 Thanks Dave |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
couple of questions Are you overclocking? Are you running more than 1 work units on it, if so, how many work units are you running? Have you tried exiting BOINC and restarting? Have you tried rebooting your computer? Brent ran into the same issue with his 750Ti, Here is Jason's response to question http://setiathome.berkeley.edu/forum_thread.php?id=77226&postid=1671799 |
Dave Lampkins Send message Joined: 19 Jan 00 Posts: 3 Credit: 27,683 RAC: 0 |
I have 8 running at a time,not sure why.My card says it's super clocked but I havn't over clocked anything.I did try to exit the program and restart it,no luck.I havn't tried to reboot yet. |
rob smith Send message Joined: 7 Mar 03 Posts: 22220 Credit: 416,307,556 RAC: 380 |
"Running 8 at at a time" - I assume that is 8 CPU tasks, as you would certainly know how you got to 8 GPU tasks at a time (setting up configuration files) Overclocked to 1366Mhz (default is 1050/1178 boost) Large number of errors from the GTX970 in the last few hours, so its probably running too hot and protesting like crazy. Drop the clocks to the Nvidia defaults and see if the situation improves. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
I'd also drop the number of instances (work units per card down to 3) The 970 don't have a full 4 GB of Ram only 3.5 That causes instabilities in crunching on those. The best and safest number we found is 3. 4 can be done but occasionally it will crash again. No overclocking of the 970 for Seti. It's fine for gaming but not for crunching. Zalster |
Dave Lampkins Send message Joined: 19 Jan 00 Posts: 3 Credit: 27,683 RAC: 0 |
How would i drop the instances to 3? I have never really setup anything in the past for seti,all my other cards have always worked good with no worries.As for the overclocking i think i can change that.Unless going to 3 instances will take care of it with no errors. Thanks for the help Dave |
rob smith Send message Joined: 7 Mar 03 Posts: 22220 Credit: 416,307,556 RAC: 380 |
Change the instances of 0.125 to 0.33 in your app_config.xml file Before doing that however make sure you are running 8ight tasks on your GPU, and not 8 CPU task - check BOINC Mangers, tasks tab, sorted by progess and count the number of running tasks status - those which are shown as "running" are running on the CPU, those that are shown as "Running (x%CPU + y%NVIDIA)" are running on your GPU. By default I would expect to see 8 on your CPU and one on your GPU Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
How would i drop the instances to 3? Then don't worry about that - you very probably run only one instance on the GPU as is the default. Check (Windows Task Manager, Process Explorer) how many SETI@home processes for CUDA run. They have names like: setiathome_7.00_windows_intelx86__cuda50.exe setiathome_7.00_windows_intelx86__cuda42.exe Error is: http://setiathome.berkeley.edu/results.php?hostid=7379623&offset=0&show_names=0&state=6&appid= http://setiathome.berkeley.edu/result.php?resultid=4131749131 Thread call stack limit is: 1k uncaptured error before launch (find_pulse_kernel2<fft_n, numthreads/fft_n, 5, true><<<grid, block>>>(best_pulse_score, PulsePoTLen, AdvanceBy, y_offset, numdivs, firstP, lastP)), file c:/[Projects]/__Sources/sah_v7_opt/Xbranch/client/cuda/cudaAcc_pulsefind.cu, line 1505: unknown error Exiting ... which causes "too many boinc_temporary_exit()s" Post in Number Crunching with link to this thread for more Advanced answers  - ALF - "Find out what you don't do well ..... then don't do it!" :)  |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
When he said 8 work units, I thought he said he was trying to run 8 at a time on the GPU. But if no modification have been done then, he probably is running 7-8 on the CPU and 1 on the card. too many exits.. Means that the work unit is not progressing and boinc is terminating the work unit after the 99th attempt to progress. 1) Is there any other activity going on with the computer at the same time? Ie, streaming, working on it, antivirus(make sure your anti-virus doesn't scan the BOINC folder)? 2)Did you do a clean install of the driver for the GPU when you put it in? 3)Have you tried turning off the computer and turning it back on? 4)Try removing the GPU and reseating it and see if that helps with the problem. Just a couple of things to try first. Lets see if some of the others have some other ideas. Sorry for the delay. Had a few problems of my own I was working on, lol Zalster |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
I would guess that you are overclocking the card, I think that is why I got the same error. I also removed my command line for a more aggressive processing at the same time. And you shouldn't be running 8 GPU tasks at a time, I don't even think a GTX980 would be happy with that. Go back to 3 tasks. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
The consistency of the issue here seems suspicious, in that the usual power and temperature first suspects seem unlikely. After clean driver install and reboot, I would check the temperatures, PSU and clocks again anyway, and run an artefact scanner. The factory clocks/boost on GPU core or VRAM (SC edition was mentioned) may be just a little high, and require a small voltage bump for reliability. The failing portion of code is indeed VRAM access intensive, so consistent failure there could indicate one or more memory chips running on the hairy edge, which a small clock backoff or voltage increase should address easily. These things are sold for gaming, and competition in the mid-high range is fierce, so sometimes the manufacturers are erring on the side of performance over reliability when setting default clocks and voltages. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.