Questions and Answers :
GPU applications :
CUDA task runs 8h at 0.00%, seems to be stuck
Message board moderation
Author | Message |
---|---|
Maik Send message Joined: 15 May 99 Posts: 163 Credit: 9,208,555 RAC: 0 |
after closing boinc and restarting i got a compute error as result in boinc manager overview. After uploading of the result i found this: task-ID: 1092856149 work unit ID: 381499165 copy of taks details: Work Unit Info: ............... WU true angle range is : 0.010346 Optimal function choices: ----------------------------------------------------- name ----------------------------------------------------- v_BaseLineSmooth (no other) v_GetPowerSpectrum 0.00020 0.00000 v_ChirpData 0.01657 0.00000 v_Transpose4 0.01054 0.00000 FPU opt folding 0.00695 0.00000 SETI@home error -12 Unknown error cudaAcc_find_triplets doesn't support more than MAX_TRIPLETS_ABOVE_THRESHOLD numBinsAboveThreshold in find_triplets_kernel File: c:/sw/gpgpu/seti/seti_boinc/client/cuda/cudaAcc_pulsefind.cu Line: 232 I tried search but didnt got some informations. Also im wondering about the "File: c:/sw/....". I cant find this directory or the file on my PC. If you speak german, a copy of your answere in german language would be nice ;) some more info's of my PC: 19.12.2008 15:17:31||Starting BOINC client version 6.4.5 for windows_intelx86 19.12.2008 15:17:31||log flags: task, file_xfer, sched_ops 19.12.2008 15:17:31||Libraries: libcurl/7.19.0 OpenSSL/0.9.8i zlib/1.2.3 19.12.2008 15:17:31||Data directory: D:\boinc_data 19.12.2008 15:17:31||Processor: 4 GenuineIntel Intel(R) Core(TM)2 Quad CPU Q9300 @ 2.50GHz [x86 Family 6 Model 23 Stepping 7] 19.12.2008 15:17:31||Processor features: fpu tsc pae nx sse sse2 mmx 19.12.2008 15:17:31||OS: Microsoft Windows XP: Professional x86 Editon, Service Pack 3, (05.01.2600.00) 19.12.2008 15:17:31||Memory: 2.00 GB physical, 4.85 GB virtual 19.12.2008 15:17:31||Disk: 455.76 GB total, 155.01 GB free 19.12.2008 15:17:31||Local time is UTC +1 hours 19.12.2008 15:17:31||Not using a proxy 19.12.2008 15:17:31||CUDA devices found 19.12.2008 15:17:31||Coprocessor: GeForce 9600 GT (1) GPU-Drivers are up to date. Download and installation yesterday ... |
Maik Send message Joined: 15 May 99 Posts: 163 Credit: 9,208,555 RAC: 0 |
I've updated GFX again and now it seems to work. |
BMaytum Send message Joined: 3 Apr 99 Posts: 104 Credit: 4,382,041 RAC: 2 |
Yesterday I had my first (so far my ONLY) Seti@home Beta Test WU terminate with a Compute Error: http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=5020623 The stderrout reported: Cuda error 'cudaMemcpy(best_PoT, dev_tmp_pot, max_nb_of_elems * sizeof(float), cudaMemcpyDeviceToHost)' in file 'c:/sw/gpgpu/seti/seti_boinc/client/cuda/cudaAcc_pulsefind.cu' in line 1265 : unknown error. This WU was crunched using v6.06-cuda application (Boinc v6.4.5), with nVidia 32-bit v180.60 driver package (includes Cuda v2.1Beta) on WinXP32 SP3. This WU likewise showed 0.000% progress and Increasing (not decreasing) time To Complete whilst it ran for ~5 minutes wallclock time. Sabertooth Z77, i7-3770K@4.2GHz, GTX680, W8.1Pro x64 P5N32-E SLI, C2D E8400@3Ghz, GTX580, Win7SP1Pro x64 & PCLinuxOS2015 x64 |
Maik Send message Joined: 15 May 99 Posts: 163 Credit: 9,208,555 RAC: 0 |
Your Boinc is using the v6.06-cuda application ?!? Hmm... mine is 6.05 ... i'll check that. I'm using a GeForce 9600GT with driver package nVidia 32-bit v180.48 / Cuda v2.0 on WinXP SP3 WU's still getting stuck (around 5% of all Cuda-WU's) Some WU's at 5%, some at 58% ... Next thing is, some WU's are reported with: "SETI@Home Informational message -9 result_overflow NOTE: The number of results detected exceeds the storage space allocated." Take a look at this: http://setiathome.berkeley.edu/workunit.php?wuid=382225403 None-Cuda applications reporting "normal" results of the same WU. How can that be? |
Eric Korpela Send message Joined: 3 Apr 99 Posts: 1382 Credit: 54,506,847 RAC: 60 |
Sounds like general numerical problems. Have you checked your GPU temperature? @SETIEric@qoto.org (Mastodon) |
Maik Send message Joined: 15 May 99 Posts: 163 Credit: 9,208,555 RAC: 0 |
The GPU temp looking fine, everest reporting atm (Cuda is running) arround 35° Boinc and Cuda was running fine about 24h, then i tried the beta drivers. Load-stuck WU's (0.000% for minutes / hours) ) again ... so im back at Cuda v2.0 and displaydrivers v180.48 with the MB_6.04_Winx86_CUDA application. That was running fine bevore i installed the beta drivers. Hmm..., i was typing Cuda is running atm ... i'm wrong. WU goes stuck again, 0.000% since 20min. WinTaskManager reporting 00 CPU-Time and the same amount of memory-usage (72,964K). Last WU that goes stuck 4 times i've aborted. There is a debugging log. May you take an eye on it to find out what the problem is. I understand nothing of that ;) http://setiathome.berkeley.edu/result.php?resultid=1097953246 Next Problem: I have lot "lag's" if cuda is runnig. Doesn't matter what drivers ore application version i use. System is freezing for arround 5sec and then it goes on normaly. |
Spectrum Send message Joined: 14 Jun 99 Posts: 468 Credit: 53,129,336 RAC: 0 |
I have abandoned Cuda as a bad joke, until its proven 100% stable I will let both my cpu's do the work, I keep getting units stuck at 0 progress after 8+ hours. I am not saying Cuda is a bad idea just that it should have stayed Beta a while longer. |
Byron S Goodgame Send message Joined: 16 Jan 06 Posts: 1145 Credit: 3,936,993 RAC: 0 |
http://setiathome.berkeley.edu/result.php?resultid=1097953246 The example you've given is VLAR and is known bug in CUDA for everyone. If this is the AR that you're getting on the errors you're having, it's probably the reason for the problems. There's also this message that can show you how to spot VLAR in the client_state.xml file. |
Maik Send message Joined: 15 May 99 Posts: 163 Credit: 9,208,555 RAC: 0 |
Thanks, that must be the answere of the problem i have. I've chacked that. Most results with compute error have an AR lower than 0.00 And the only answere of this problem is to abort the WU's / delete the files manually o_O? |
Byron S Goodgame Send message Joined: 16 Jan 06 Posts: 1145 Credit: 3,936,993 RAC: 0 |
Thanks, that must be the answere of the problem i have. I don't get that many, and most of the time they just get stuck. So for me they're easy to see. When I see one I usually go into the stderr and verify. Once I've done that I just abort them and then report them. You could go thru the client state file and see if it's going to get out of that AR or have considerable more to do. At least you'll know what to expect and then if it's that bad, you could match them up in BM and abort them before they start and freeze up your sys. Just make sure to post here if you do about the ones you found. |
Maik Send message Joined: 15 May 99 Posts: 163 Credit: 9,208,555 RAC: 0 |
I checked the client_state.xml like reported at this post. I have stopped all files i have found. Cuda is running very well without errors. Thanks for this hint ;) |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.