Message boards :
Number crunching :
Perhaps my 7th wingman will be the charm! (or maybe the 8th)
Message board moderation
Author | Message |
---|---|
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
I'm posting this just for fun. WU #1561509069 seems to have been a real hot potato for about 7 weeks now, with a whole bunch of different reasons for getting dropped. My host is the first one, patiently waiting for another reliable host to come along. Even the one that finally "finished", to trigger the inconclusive, is a runaway machine that got a 30/30 overflow! The real irony is that, after all is said and done, since my host found 0 single pules and 0 repetitive pulses, all this churning will be for naught, anyway. ;^) |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
I was sent one of those, http://setiathome.berkeley.edu/workunit.php?wuid=1594867559 Which lead me to find TWO other 'Old' ATI cards running the ATI CAL App. Similar to others, the Driver installed doesn't even have OpenCL, driver: 1.4.1417 & driver: 1.4.1385 So, why is the server sending OpenCL tasks to Hosts that don't have OpenCL which then have to Abort them? Along the same line, Why is the Server sending ATI_CAL tasks to Hosts with ATI Driver 1.4.1734 when it has been known 'forever' that Driver 1734 (Legacy Driver) doesn't work with the ATI_CAL App? It all results in Aborted/Error results which clog the System... |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
I was sent one of those, http://setiathome.berkeley.edu/workunit.php?wuid=1594867559 And that looks like another WU with no pulses to be found, despite all the back and forth it'll end up with. I seem to recall that in June of last year, there was a major fiasco with one of the AP apps for ATI that was causing Computation errors just as fast as the scheduler could send them out. A lot of WUs were failing with too many errors after 6 wingmen crapped out. I think that's the most wingmen I've ever run across, until now. Also, the only time I ever got a "Completed, can't validate" status for one of my tasks. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
I was sent one of those, http://setiathome.berkeley.edu/workunit.php?wuid=1594867559 Yes, it was the same App. As I recall, the problem was caused by a corrupted copy being placed on the server due to a failing Flash Drive...I could be wrong. It was a while ago. I'm Not wrong about those Old Drivers Not having OpenCL, or that the ATI_CAL App WILL NOT WORK with the ATI Legacy Driver 1734 (Legacy 13.1 & Legacy 13.9). The App works fine with the Intended drivers. One has to question why a task titled "ati_nocal" is being sent to an App titled "cal_ati". opencl_ati_nocal_100 was meant for the New ATI cards that DON'T have CAL, NOT the Old ATI cards that ONLY have CAL. These 2 Don't work either, GeForce 9800 GT (511MB) driver: 340.32. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Yep, another Invalid for the 9800 GT running driver 340.xx; @Pre-FERMI nVidia GPU users: Important warning So, if you want to use your pre-FERMI nVidia hardware for AstroPulse crunching stay with pre-340.xx drivers. |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
I'm posting this just for fun. WU #1561509069 seems to have been a real hot potato for about 7 weeks now, with a whole bunch of different reasons for getting dropped. My host is the first one, patiently waiting for another reliable host to come along. Even the one that finally "finished", to trigger the inconclusive, is a runaway machine that got a 30/30 overflow! The real irony is that, after all is said and done, since my host found 0 single pules and 0 repetitive pulses, all this churning will be for naught, anyway. ;^) 4 days and counting until the 7th wingman times out, as well (no contact from that host since 8 Oct). I think that's the last chance this WU will have. It's definitely jinxed! This is just the sort of WU that could conceivably drag out the demise of AP v6 for months, too. |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
And now I have my 8th wingman, after the 7th one timed out. I thought 7 would be the limit, but I guess we'll soon find out if it stops at 8, since number 8's task summary doesn't indicate a particularly successful host. State: All (70) · In progress (8) · Validation pending (0) · Validation inconclusive (0) · Valid (1) · Invalid (0) · Error (61) This is downright comical! Edit: Although now that I take a second look at his task list, the one and only Valid task he has is an AP v6. Maybe there's still hope! |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
And now I have my 8th wingman, after the 7th one timed out. I thought 7 would be the limit, but I guess we'll soon find out if it stops at 8, since number 8's task summary doesn't indicate a particularly successful host. The BOINC temporary exits on the AP v7 tasks because the drivers are too old should be ensuring the host actually makes some progress on that AP v6 task each time the GPU gets some crunching time. Still, the on again off again processing probably means at least a couple of days to finish even though the run time is likely to be around half a day. Joe |
David S Send message Joined: 4 Oct 99 Posts: 18352 Credit: 27,761,924 RAC: 12 |
Okay, you can quit worrying now. You got validated 20 minutes ago, less than 13 hours after the last host was assigned the task. David Sitting on my butt while others boldly go, Waiting for a message from a small furry creature from Alpha Centauri. |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
LOL Well, it was certainly entertaining while it lasted! I see one last bit of mystery in that last host's Stderr, which looks to be truncated, after multiple restarts, with no pulse counts included. A fitting finish. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
And you are lucky cause last valid result could be invalid with easy: OpenCL 1.1 AMD-APP-SDK-v2.4 (650.9) It's unsupported SDK version. |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
And you are lucky cause last valid result could be invalid with easy: Unsupported for "Windows x86 rev 1832, V6 match, by Raistmer" build? I've been assuming your switch to the newer SDK took place after that build. The host is of course erroring on all AP v7 tasks, and the user will soon need to update it with a later Catalyst version to have it remain productive. I did send a PM. Joe |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
I did send a PM.Joe If he takes notice of your PM and updates his host, at least something positive will come out of this little episode. I think at least a few of the other wingmen on this WU could use a similar nudge. From time to time, I've tried sending PMs to other users when I saw a wingman's host that had just recently appeared to go off the rails, but only one of them ever responded, and I think even that took about a month. It's a shame that the project doesn't have some functionality in one of the servers that would automatically generate an email to a user when a host crosses some defined threshold of Invalids and Errors, perhaps when those results exceed 50% of Valid results. The email wouldn't have to diagnose the problem, just point out to the user that a problem appears to exist and direct them to the Message Boards if they need assistance. I can't help but feel that such a process could enable a whole lot of hosts to regain lost productivity, which would surely be a good thing for the project. The current "system", which relies on individual users to occasionally PM other users, with what I suspect are widely varying degrees of diplomacy, doesn't seem like it accomplishes much. Then, too, quite a few of the wayward rigs belong to Anonymous users who can't be PM'd in the first place. Only the project administrators, or an automated system they implement, can reach the Anonymous ones. Automatically generating some emails, perhaps once every couple of weeks or once a month, can't be that big a deal, can it? |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
And you are lucky cause last valid result could be invalid with easy: Hm, good question. There were some v6 build with SDK 2.6 but w/o code that blocks execution in case of SDK 2.4 (like in v7). Can't recall in what range falls 1832 though. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.