Invalids.

Message boards : Number crunching : Invalids.
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Herb Smith
Volunteer tester

Send message
Joined: 28 Jan 07
Posts: 76
Credit: 31,615,205
RAC: 0
United States
Message 1740165 - Posted: 6 Nov 2015, 15:18:28 UTC

283 invalids as of this morning and another 81 inconclusive. But at least the office is being kept warm.

Herb
ID: 1740165 · Report as offensive
Michael Cruz
Avatar

Send message
Joined: 23 Jan 00
Posts: 35
Credit: 323,653,343
RAC: 30
United States
Message 1740176 - Posted: 6 Nov 2015, 15:48:56 UTC

I'm at 1056 invalids and 414 inconclusive...wtf is going on?
All my computers are generating invalids...


Seti Classic: 204,777 WU /113.636 Yrs
ID: 1740176 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22200
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1740183 - Posted: 6 Nov 2015, 16:48:43 UTC

There was an issue with the splitters just after this week's scheduled outage - some incorrect updates were applied and as a result most(all?) the MultiBeam WU split for about 24 hours were incorrectly formatted, and so should/will return "invalid".

I would guess it is going to take a week or more to clear up the debris from this problem.
It is worth noting that NEW tasks (suffix _0 and _1) produced from November 5th are unaffected, as are those produced before November 3rd.




Note - There times on the two "black" days that are OK, I think it is before 17:00UTC November 3rd and after 19:00UTC on November 4th - but I'm not totally confident about those times.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1740183 · Report as offensive
Michael Cruz
Avatar

Send message
Joined: 23 Jan 00
Posts: 35
Credit: 323,653,343
RAC: 30
United States
Message 1740204 - Posted: 6 Nov 2015, 18:06:24 UTC

Bob, Thanks for the information. I'm glad to know it's not something that I'm doing wrong :)


Seti Classic: 204,777 WU /113.636 Yrs
ID: 1740204 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1740250 - Posted: 6 Nov 2015, 21:39:56 UTC - in response to Message 1740239.  

Will these Tasks be put back threw to be processed again or dismissed.

My understanding is that they're duds. I'd expect them just to re-split the affected files with the repaired splitters.
So the data will still get processed.
Grant
Darwin NT
ID: 1740250 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22200
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1740252 - Posted: 6 Nov 2015, 21:41:28 UTC

Last time this happened the tapes were re-run after the vast majority of the tasks produced had "errored out" because they had been re-sent too often. One can but assume the same will happen this time around. I would guess this timescale for this will be months rather than days.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1740252 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1740263 - Posted: 6 Nov 2015, 22:49:25 UTC - in response to Message 1740252.  

Last time this happened the tapes were re-run after the vast majority of the tasks produced had "errored out"....

No, they weren't, at least according to my records. When the problem occurred in January, there were only two source files ("tapes") that generated the bad tasks. Initially it was only "19ap11ad" coughing up the fur balls, but then about a week later, "01jl12ad" did the same, though to a lesser degree. I haven't seen either of those "tapes" reenter the system since.

Perhaps Richard can confirm or refute my observation from his tape distribution database.
ID: 1740263 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1740273 - Posted: 6 Nov 2015, 23:04:23 UTC - in response to Message 1740263.  
Last modified: 6 Nov 2015, 23:13:33 UTC

Perhaps Richard can confirm or refute my observation from his tape distribution database.

Tape		First processed	Last processed
01jl12ad	15-Jul-2012	12-Mar-2015
19ap11ad	05-Aug-2011	17-Feb-2015

That looks like long-tail stragglers from the fur balls, not (yet) a concerted re-split.

Edit: that 12-Mar-2015 date is a single rogue outlier, and yet only a _2 replication from a very different splitter PID. 01jl12ad was mostly done by 02-Feb-2015, including replications up to _9. I did 147 tasks in the re-run starting 14-Jan-2015.
ID: 1740273 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1740277 - Posted: 6 Nov 2015, 23:13:16 UTC - in response to Message 1740273.  

Perhaps Richard can confirm or refute my observation from his tape distribution database.

Tape		First processed	Last processed
01jl12ad	15-Jul-2012	12-Mar-2015
19ap11ad	05-Aug-2011	17-Feb-2015

That looks like long-tail stragglers from the fur balls, not (yet) a concerted re-split.

I would agree. I got my first bad task from that batch on January 10, 2015, and the last one didn't clear from my task list until March 3, 2015. I would expect that there were still at least a few lurking in the shadows for quite a while beyond that.
ID: 1740277 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1740282 - Posted: 6 Nov 2015, 23:22:06 UTC - in response to Message 1740273.  

I'm glad you are keeping tabs on that Richard.

This last time the was, I think, about 20 files that went though with errors. So my math says that is somewhere around 43M tasks to hit the 10 resend limit ... wow.

Anything I get as a MB resend is an ABORT, trying to get them though the system.
ID: 1740282 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1740284 - Posted: 6 Nov 2015, 23:33:53 UTC - in response to Message 1740282.  

Anything I get as a MB resend is an ABORT, trying to get them though the system.

_4 and higher I abort straight off.
_3 and _2 I do a search on. Most of the _3s have been automatic Invalids, but I've had a couple that weren't, so I kept them. _2s a couple of them have automatic Invalids, but most haven't which I've kept.
Grant
Darwin NT
ID: 1740284 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1740288 - Posted: 6 Nov 2015, 23:45:35 UTC - in response to Message 1740284.  

Anything I get as a MB resend is an ABORT, trying to get them though the system.

_4 and higher I abort straight off.
_3 and _2 I do a search on. Most of the _3s have been automatic Invalids, but I've had a couple that weren't, so I kept them. _2s a couple of them have automatic Invalids, but most haven't which I've kept.

What I've been doing is periodically running a text search on the entire S@h data directory looking for "<autocorr_fftlen>0</autocorr_fftlen>". I use PSPad, which gives me a nice list of all those files which match. Then I can simply compare that list to what's in my queue to see what to abort. It's a fairly quick process.
ID: 1740288 · Report as offensive
Kathy
Avatar

Send message
Joined: 5 Jan 03
Posts: 338
Credit: 27,877,436
RAC: 0
United States
Message 1740339 - Posted: 7 Nov 2015, 6:34:15 UTC

http://setiathome.berkeley.edu/workunit.php?wuid=1954236680

92 invalids, not sure what is happening.
ID: 1740339 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1740341 - Posted: 7 Nov 2015, 6:39:48 UTC - in response to Message 1740339.  

As mentioned earlier in the thread, there was an issue after the weekly outage with the splitters, and all those WUs are Invalid as soon as they are returned.
The issue has been fixed, but it will take a few months until all of the faulty WUs are out of the system.
The vast majority of them should be gone in the next couple of days.
Grant
Darwin NT
ID: 1740341 · Report as offensive
Cavalary

Send message
Joined: 15 Jul 99
Posts: 104
Credit: 7,507,548
RAC: 38
Romania
Message 1740511 - Posted: 8 Nov 2015, 2:41:52 UTC - in response to Message 1740341.  

Oy, was scared there for a moment, seeing 9 invalids and 3 new inconclusives today. But then checked and saw the WUs reporting instant invalid on all but the most recent result, and some already having several invalids listed. So yeah, obviously a WU problem. Annoying though, and resets everybody's consecutive valids again...
ID: 1740511 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1740548 - Posted: 8 Nov 2015, 8:10:58 UTC

There seems to be a way to detect them early (2 sec) in Cuda code.
Like this WU
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1740548 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1740549 - Posted: 8 Nov 2015, 8:32:21 UTC - in response to Message 1740548.  

The OpenCL App I built back in mid-summer Nails them Immediately and calls them an Error, http://setiathome.berkeley.edu/results.php?hostid=6796479&state=6&appid=
Unfortunately My CPU App I built around the same time wastes Hours of time & energy on them, http://setiathome.berkeley.edu/results.php?hostid=6796479&state=5&appid=
Strange, I built both Apps from the same Berkeley code...
ID: 1740549 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1740557 - Posted: 8 Nov 2015, 9:23:22 UTC - in response to Message 1740549.  
Last modified: 8 Nov 2015, 9:24:06 UTC

The OpenCL App I built back in mid-summer Nails them Immediately and calls them an Error, http://setiathome.berkeley.edu/results.php?hostid=6796479&state=6&appid=
Unfortunately My CPU App I built around the same time wastes Hours of time & energy on them, http://setiathome.berkeley.edu/results.php?hostid=6796479&state=5&appid=
Strange, I built both Apps from the same Berkeley code...


Not so strange.

These apps are from 2 different repositories.
The CPU apps are optimized by Joe W Segur whilst the OpenCL apps are from Raistmer.
Only the base code is identical.


With each crime and every kindness we birth our future.
ID: 1740557 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1740563 - Posted: 8 Nov 2015, 9:42:05 UTC - in response to Message 1740557.  

The OpenCL App I built back in mid-summer Nails them Immediately and calls them an Error, http://setiathome.berkeley.edu/results.php?hostid=6796479&state=6&appid=
Unfortunately My CPU App I built around the same time wastes Hours of time & energy on them, http://setiathome.berkeley.edu/results.php?hostid=6796479&state=5&appid=
Strange, I built both Apps from the same Berkeley code...


Not so strange.

These apps are from 2 different repositories.
The CPU apps are optimized by Joe W Segur whilst the OpenCL apps are from Raistmer.
Only the base code is identical.

I used the same folder for both builds, https://setisvn.ssl.berkeley.edu/trac/browser/branches/sah_v7_opt/AKv8
The only difference is I didn't use OpenCL for the CPU App.
Hmmm, up to 3121 now. Maybe I should build a couple new Apps...
ID: 1740563 · Report as offensive
Profile Louis Loria II
Volunteer tester
Avatar

Send message
Joined: 20 Oct 03
Posts: 259
Credit: 9,208,040
RAC: 24
United States
Message 1740612 - Posted: 8 Nov 2015, 16:44:22 UTC

Allright, I don't understand the reasons behind invalids, but timed out? My rig runs 24/7. What has happened this round? GPU WUs especially. WTH?
ID: 1740612 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : Invalids.


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.