Message boards :
Number crunching :
Invalids.
Message board moderation
Author | Message |
---|---|
Herb Smith Send message Joined: 28 Jan 07 Posts: 76 Credit: 31,615,205 RAC: 0 |
283 invalids as of this morning and another 81 inconclusive. But at least the office is being kept warm. Herb |
Michael Cruz Send message Joined: 23 Jan 00 Posts: 35 Credit: 323,653,343 RAC: 30 |
I'm at 1056 invalids and 414 inconclusive...wtf is going on? All my computers are generating invalids... Seti Classic: 204,777 WU /113.636 Yrs |
rob smith Send message Joined: 7 Mar 03 Posts: 22218 Credit: 416,307,556 RAC: 380 |
There was an issue with the splitters just after this week's scheduled outage - some incorrect updates were applied and as a result most(all?) the MultiBeam WU split for about 24 hours were incorrectly formatted, and so should/will return "invalid". I would guess it is going to take a week or more to clear up the debris from this problem. It is worth noting that NEW tasks (suffix _0 and _1) produced from November 5th are unaffected, as are those produced before November 3rd. Note - There times on the two "black" days that are OK, I think it is before 17:00UTC November 3rd and after 19:00UTC on November 4th - but I'm not totally confident about those times. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Michael Cruz Send message Joined: 23 Jan 00 Posts: 35 Credit: 323,653,343 RAC: 30 |
Bob, Thanks for the information. I'm glad to know it's not something that I'm doing wrong :) Seti Classic: 204,777 WU /113.636 Yrs |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13746 Credit: 208,696,464 RAC: 304 |
Will these Tasks be put back threw to be processed again or dismissed. My understanding is that they're duds. I'd expect them just to re-split the affected files with the repaired splitters. So the data will still get processed. Grant Darwin NT |
rob smith Send message Joined: 7 Mar 03 Posts: 22218 Credit: 416,307,556 RAC: 380 |
Last time this happened the tapes were re-run after the vast majority of the tasks produced had "errored out" because they had been re-sent too often. One can but assume the same will happen this time around. I would guess this timescale for this will be months rather than days. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
Last time this happened the tapes were re-run after the vast majority of the tasks produced had "errored out".... No, they weren't, at least according to my records. When the problem occurred in January, there were only two source files ("tapes") that generated the bad tasks. Initially it was only "19ap11ad" coughing up the fur balls, but then about a week later, "01jl12ad" did the same, though to a lesser degree. I haven't seen either of those "tapes" reenter the system since. Perhaps Richard can confirm or refute my observation from his tape distribution database. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874 |
Perhaps Richard can confirm or refute my observation from his tape distribution database. Tape First processed Last processed 01jl12ad 15-Jul-2012 12-Mar-2015 19ap11ad 05-Aug-2011 17-Feb-2015 That looks like long-tail stragglers from the fur balls, not (yet) a concerted re-split. Edit: that 12-Mar-2015 date is a single rogue outlier, and yet only a _2 replication from a very different splitter PID. 01jl12ad was mostly done by 02-Feb-2015, including replications up to _9. I did 147 tasks in the re-run starting 14-Jan-2015. |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
Perhaps Richard can confirm or refute my observation from his tape distribution database. I would agree. I got my first bad task from that batch on January 10, 2015, and the last one didn't clear from my task list until March 3, 2015. I would expect that there were still at least a few lurking in the shadows for quite a while beyond that. |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
I'm glad you are keeping tabs on that Richard. This last time the was, I think, about 20 files that went though with errors. So my math says that is somewhere around 43M tasks to hit the 10 resend limit ... wow. Anything I get as a MB resend is an ABORT, trying to get them though the system. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13746 Credit: 208,696,464 RAC: 304 |
Anything I get as a MB resend is an ABORT, trying to get them though the system. _4 and higher I abort straight off. _3 and _2 I do a search on. Most of the _3s have been automatic Invalids, but I've had a couple that weren't, so I kept them. _2s a couple of them have automatic Invalids, but most haven't which I've kept. Grant Darwin NT |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
Anything I get as a MB resend is an ABORT, trying to get them though the system. What I've been doing is periodically running a text search on the entire S@h data directory looking for "<autocorr_fftlen>0</autocorr_fftlen>". I use PSPad, which gives me a nice list of all those files which match. Then I can simply compare that list to what's in my queue to see what to abort. It's a fairly quick process. |
Kathy Send message Joined: 5 Jan 03 Posts: 338 Credit: 27,877,436 RAC: 0 |
http://setiathome.berkeley.edu/workunit.php?wuid=1954236680 92 invalids, not sure what is happening. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13746 Credit: 208,696,464 RAC: 304 |
As mentioned earlier in the thread, there was an issue after the weekly outage with the splitters, and all those WUs are Invalid as soon as they are returned. The issue has been fixed, but it will take a few months until all of the faulty WUs are out of the system. The vast majority of them should be gone in the next couple of days. Grant Darwin NT |
Cavalary Send message Joined: 15 Jul 99 Posts: 104 Credit: 7,507,548 RAC: 38 |
Oy, was scared there for a moment, seeing 9 invalids and 3 new inconclusives today. But then checked and saw the WUs reporting instant invalid on all but the most recent result, and some already having several invalids listed. So yeah, obviously a WU problem. Annoying though, and resets everybody's consecutive valids again... |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
There seems to be a way to detect them early (2 sec) in Cuda code. Like this WU To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
The OpenCL App I built back in mid-summer Nails them Immediately and calls them an Error, http://setiathome.berkeley.edu/results.php?hostid=6796479&state=6&appid= Unfortunately My CPU App I built around the same time wastes Hours of time & energy on them, http://setiathome.berkeley.edu/results.php?hostid=6796479&state=5&appid= Strange, I built both Apps from the same Berkeley code... |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
The OpenCL App I built back in mid-summer Nails them Immediately and calls them an Error, http://setiathome.berkeley.edu/results.php?hostid=6796479&state=6&appid= Not so strange. These apps are from 2 different repositories. The CPU apps are optimized by Joe W Segur whilst the OpenCL apps are from Raistmer. Only the base code is identical. With each crime and every kindness we birth our future. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
The OpenCL App I built back in mid-summer Nails them Immediately and calls them an Error, http://setiathome.berkeley.edu/results.php?hostid=6796479&state=6&appid= I used the same folder for both builds, https://setisvn.ssl.berkeley.edu/trac/browser/branches/sah_v7_opt/AKv8 The only difference is I didn't use OpenCL for the CPU App. Hmmm, up to 3121 now. Maybe I should build a couple new Apps... |
Louis Loria II Send message Joined: 20 Oct 03 Posts: 259 Credit: 9,208,040 RAC: 24 |
Allright, I don't understand the reasons behind invalids, but timed out? My rig runs 24/7. What has happened this round? GPU WUs especially. WTH? |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.