Tasks hanging on CUDA 6.08

Questions and Answers : GPU applications : Tasks hanging on CUDA 6.08
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Profile Borgholio
Avatar

Send message
Joined: 2 Aug 99
Posts: 654
Credit: 18,623,738
RAC: 45
United States
Message 856282 - Posted: 22 Jan 2009, 4:33:46 UTC

Overall, CUDA 6.08 works fine with the exception of occasional hanging workunits. This one is an example:

http://setiathome.berkeley.edu/workunit.php?wuid=398292134

It doesn't crash my system or anything, it simply doesn't progress. Unlike most other stalled workunits, even pausing / resuming the task doesn't have any effect. I had to abort it. Once I did that, BOINC started working on AP instead of going back to CUDA.

Tried pausing the AP tasks and BOINC went nuts downloading tons of work for other projects. :-/

After restarting BOINC, it appears to be chugging away with the 6.08 CPU client instead of CUDA.

So it seems there is one small bug left in the CUDA app, and I'm still waiting anxiously for a version of BOINC that properly schedules CUDA work. But since I now have tons of work downloaded for other projects, it seems I have plenty of time. :-P
You will be assimilated...bunghole!

ID: 856282 · Report as offensive
Profile Borgholio
Avatar

Send message
Joined: 2 Aug 99
Posts: 654
Credit: 18,623,738
RAC: 45
United States
Message 856324 - Posted: 22 Jan 2009, 7:02:39 UTC - in response to Message 856282.  

Update - Restarting BOINC client seems to get stuck WUs running again.
You will be assimilated...bunghole!

ID: 856324 · Report as offensive
Profile RandyC
Avatar

Send message
Joined: 20 Oct 99
Posts: 714
Credit: 1,704,345
RAC: 0
United States
Message 856543 - Posted: 22 Jan 2009, 21:30:19 UTC

I aborted this WU after it ran for 9 hrs and was less than 60% complete. Started running 6.08 last night; I'd been running Raistmer's app up until then.
ID: 856543 · Report as offensive
Profile MeglaW
Volunteer tester

Send message
Joined: 21 Jun 00
Posts: 36
Credit: 479,460
RAC: 0
Sweden
Message 856761 - Posted: 23 Jan 2009, 11:02:20 UTC

new version? my boinc is not updating!
ID: 856761 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 856791 - Posted: 23 Jan 2009, 13:41:08 UTC

Please people, when you report hanging tasks do tell the basics about your system, OS, driver version, CUDA card used, application version, BOINC client version used and that link to your hanging task.
ID: 856791 · Report as offensive
Profile Borgholio
Avatar

Send message
Joined: 2 Aug 99
Posts: 654
Credit: 18,623,738
RAC: 45
United States
Message 856853 - Posted: 23 Jan 2009, 17:36:29 UTC - in response to Message 856791.  

Please people, when you report hanging tasks do tell the basics about your system, OS, driver version, CUDA card used, application version, BOINC client version used and that link to your hanging task.


OS - XP 32 Bit
Drivers - 181.20 /w GeForce 9600 GT
CUDA App 6.08
BOINC 6.4.5

Failed Task:

http://setiathome.berkeley.edu/workunit.php?wuid=398292134
You will be assimilated...bunghole!

ID: 856853 · Report as offensive
Maik

Send message
Joined: 15 May 99
Posts: 163
Credit: 9,208,555
RAC: 0
Germany
Message 857076 - Posted: 24 Jan 2009, 5:48:07 UTC - in response to Message 856853.  
Last modified: 24 Jan 2009, 5:48:39 UTC

How does it look if the task is stuck?
- is cpu time of the cuda process (WinTaskManger) = 0 ?
- whats the temp. on your gpu if a task is stuck / running ?
ID: 857076 · Report as offensive
Profile Borgholio
Avatar

Send message
Joined: 2 Aug 99
Posts: 654
Credit: 18,623,738
RAC: 45
United States
Message 857087 - Posted: 24 Jan 2009, 6:15:15 UTC - in response to Message 857076.  

How does it look if the task is stuck?
- is cpu time of the cuda process (WinTaskManger) = 0 ?
- whats the temp. on your gpu if a task is stuck / running ?



The CPU time of the hanging tasks is often at zero, but not always. The temp of the GPU drops to idle as soon as the task hangs (50 - 51c full load, ~40c idle"
You will be assimilated...bunghole!

ID: 857087 · Report as offensive
javamann

Send message
Joined: 14 Oct 02
Posts: 14
Credit: 2,877,279
RAC: 0
United States
Message 857372 - Posted: 24 Jan 2009, 22:52:26 UTC - in response to Message 857087.  

Same here, I just upgraded to the 6.6.2 version from the 6.2.19 version and the CUDA job is not progressing. Shows "Running (0.18CPUs, 1 CUDA). I am also running four Enhanced 6.03 with the CPU's

Video Card: BFG GTX280 1G Ram / Water Cooled
Video Driver: 181.22 (Just installed)
OS: Window XP SP3
BOINC: 6.6.2
CUDA: 6.08
CPU: AMD Phenom II X4 (3.62GHz)

NOTE: When I added this cc_config (program and data directories) my GPU started processing units. I also did a another BOINC restart which might have fixed it too.

cc_config:
<cc_config>
<options>
<ncpus>5</ncpus>
</options>
</cc_config>


ID: 857372 · Report as offensive
Profile startrekforever

Send message
Joined: 5 Dec 99
Posts: 36
Credit: 17,772,420
RAC: 0
United States
Message 857706 - Posted: 25 Jan 2009, 17:49:38 UTC

My 6.08 CUDA task also hung:

OS - XP 32-bit
Drivers - 181.20 w/ 9600 GT
CUDA App 6.0.8
BOINC 6.4.5
(Seems to be same setup as Borgholio, hmm.)

http://setiathome.berkeley.edu/workunit.php?wuid=398428488

CPU is Q6600 overclocked to 3.2, should not be a factor but thought to mention it. This system is totally stable.

Non-overclocked video card. Main monitor on non-CUDA capable 9300GT. This CUDA task was chugging for a while, showed 7 hours to go before I noticed it was stuck at about 53% complete.

6.08 much more stable (no compute errors so far, this one hang though) than 6.06 was. I have another system (Q6600 at 3.5) running CUDA on a 260 card under XP 64-bit, everything else including main monitor on 7300GT the same) with no issue yet.

ID: 857706 · Report as offensive
Maik

Send message
Joined: 15 May 99
Posts: 163
Credit: 9,208,555
RAC: 0
Germany
Message 857719 - Posted: 25 Jan 2009, 18:35:23 UTC

@ Al:
Have a 9600GT too, same driver, quad-core ...
Still have some problems with tasks like you and Borgholio.
Here is a log of my script i've written:
24.01.2009 14:32:32 > 15dc08ag.4820.11933.9.8.153_0 runtime 31s -> compute error?
24.01.2009 17:16:50 > 16dc08aa.9825.5389.12.8.87_0 runtime 357s -> stuck ?
24.01.2009 19:06:19 > 16dc08aa.9825.5389.12.8.83_1 runtime 465s -> stuck ?
24.01.2009 19:16:24 > 15dc08ah.10456.1299.14.8.54_0 runtime 419s -> stuck ?
24.01.2009 22:52:57 > 15dc08ag.25773.23794.15.8.30_0 runtime 248s -> stuck ?
25.01.2009 00:15:35 > 15dc08ah.26302.24203.15.8.218_0 runtime 340s -> stuck ?
25.01.2009 04:01:56 > 16dc08aa.26261.24612.13.8.12_0 runtime 186s -> stuck ?
25.01.2009 04:35:47 > 16dc08aa.26261.24612.13.8.2_0 runtime 31s -> compute error?
25.01.2009 05:11:10 > 15dc08ag.25773.24203.15.8.255_0 runtime 31s -> compute error?
25.01.2009 05:20:12 > 15dc08ag.25773.24203.15.8.253_0 runtime 31s -> compute error?
25.01.2009 12:36:14 > 15dc08ah.26302.24612.15.8.116_1 runtime 303s -> stuck ?
25.01.2009 17:51:04 > 15dc08ah.26302.24612.15.8.252_1 runtime 31s -> compute error?

If you want to try it, go lunatics-gpu-forum.
This script detects idleing feeder process if a wu is stuck and will terminate them. After this BM is restarting the task and its running to 100% ...
ID: 857719 · Report as offensive
Profile Borgholio
Avatar

Send message
Joined: 2 Aug 99
Posts: 654
Credit: 18,623,738
RAC: 45
United States
Message 858507 - Posted: 27 Jan 2009, 7:13:43 UTC
Last modified: 27 Jan 2009, 7:42:50 UTC

Here is another hanging task:

http://setiathome.berkeley.edu/result.php?resultid=1136193414

It hung at 6.742%. I'll let it run overnight and see if it un-sticks itself.
You will be assimilated...bunghole!

ID: 858507 · Report as offensive
Profile Borgholio
Avatar

Send message
Joined: 2 Aug 99
Posts: 654
Credit: 18,623,738
RAC: 45
United States
Message 858534 - Posted: 27 Jan 2009, 9:33:51 UTC - in response to Message 858507.  

Here is another hanging task:

http://setiathome.berkeley.edu/result.php?resultid=1136193414

It hung at 6.742%. I'll let it run overnight and see if it un-sticks itself.


Well it unstuck itself. However since my last post, BOINC manager suspended and resumed computation in order to run CPU benchmarks. I wonder if perhaps that unstuck the task.
You will be assimilated...bunghole!

ID: 858534 · Report as offensive
Profile Borgholio
Avatar

Send message
Joined: 2 Aug 99
Posts: 654
Credit: 18,623,738
RAC: 45
United States
Message 858642 - Posted: 27 Jan 2009, 17:21:06 UTC
Last modified: 27 Jan 2009, 17:38:54 UTC

Ok here's another one:

http://setiathome.berkeley.edu/result.php?resultid=1136193376

Started at 1:48am and is stuck at 98.817%. Been that way for nearly 8 hours.

Edit - Pausing and resuming the task helped. When I resumed it, however, the % completion dropped to 87.683%. It doesn't seem to have checkpointed before the task froze.
You will be assimilated...bunghole!

ID: 858642 · Report as offensive
Profile Borgholio
Avatar

Send message
Joined: 2 Aug 99
Posts: 654
Credit: 18,623,738
RAC: 45
United States
Message 859213 - Posted: 29 Jan 2009, 4:57:23 UTC

Another stuck task:

http://setiathome.berkeley.edu/result.php?resultid=1138268168

Task resumed after pausing / unpausing.
You will be assimilated...bunghole!

ID: 859213 · Report as offensive
Profile Borgholio
Avatar

Send message
Joined: 2 Aug 99
Posts: 654
Credit: 18,623,738
RAC: 45
United States
Message 859796 - Posted: 30 Jan 2009, 17:51:24 UTC

Another one:

http://setiathome.berkeley.edu/result.php?resultid=1139774244

Seemed to hang at the same moment that BOINC suspended an AP task and resumed one from the World Community Grid.
You will be assimilated...bunghole!

ID: 859796 · Report as offensive
Profile Borgholio
Avatar

Send message
Joined: 2 Aug 99
Posts: 654
Credit: 18,623,738
RAC: 45
United States
Message 859853 - Posted: 30 Jan 2009, 21:59:55 UTC - in response to Message 859796.  

And another:

http://setiathome.berkeley.edu/result.php?resultid=1141356737

As an aside, should I keep reporting these here? I don't want to keep spamming stuck workunits if it's already being looked at and / or this is the wrong place to do it...
You will be assimilated...bunghole!

ID: 859853 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 859854 - Posted: 30 Jan 2009, 22:01:38 UTC - in response to Message 859853.  

Yes please, so Eric can send those tasks to the Nvidia developer.
ID: 859854 · Report as offensive
Profile Borgholio
Avatar

Send message
Joined: 2 Aug 99
Posts: 654
Credit: 18,623,738
RAC: 45
United States
Message 860165 - Posted: 31 Jan 2009, 8:31:23 UTC

Task hung at 5.261%.

http://setiathome.berkeley.edu/result.php?resultid=1141356725

Suspended / resumed task and it picked up at 4.613%. Crunching fine now.
You will be assimilated...bunghole!

ID: 860165 · Report as offensive
Profile Borgholio
Avatar

Send message
Joined: 2 Aug 99
Posts: 654
Credit: 18,623,738
RAC: 45
United States
Message 860228 - Posted: 31 Jan 2009, 14:15:07 UTC
Last modified: 31 Jan 2009, 14:21:48 UTC

Another stuck task:

http://setiathome.berkeley.edu/result.php?resultid=1141507512

Hung at ~35%.

Suspended / resumed and is working now.
You will be assimilated...bunghole!

ID: 860228 · Report as offensive
1 · 2 · 3 · Next

Questions and Answers : GPU applications : Tasks hanging on CUDA 6.08


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.