Message boards :
Technical News :
Double Oopsie (Apr 12 2007)
Message board moderation
Author | Message |
---|---|
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
Okay - I messed up. My workunit zombie cleanup process was querying against the replica database, unbeknownst to me (even though I wrote the script). So when the replica went offline my script started errantly removing workunits. That meant many users were getting "file not found" errors when trying to download work. Of course I'm smart enough to not actually delete files of such importance, and upon discovering the exact problem I was able to immediately move the mistakenly removed files back into place (they were simply moved into an analogous directory one level up). So all's well there, more or less. The good news is the replica issues of yesterday (and earlier) have been fixed sometime last night/this morning so we have both servers on line and caught up. Once that workunit fire was put out I wrapped up work on the "nag" scripts and am now currently sending e-mails to users who signed up relatively recently but have failed to successfully send any work back. Directions about getting help were in the e-mail. The validator queue has been a little high - not at panic levels but not really shrinking either. I believe this has to do with the extra stress the validators have now that there is less redundancy. They have to process results 25% faster than before (as long as work in continually coming in/going out). I just added 2 extra validators to the backend. Let's see if that helps. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
Pooh Bear 27 Send message Joined: 14 Jul 03 Posts: 3224 Credit: 4,603,826 RAC: 0 |
I know this is going to be asked, and I think I understood the answer for myself, but others may have not understood as well. Does this Zombie killer go after the old stale results that have been in the database for the longest time? IE the August 2005 and February 2006 ones that plague many people? Or is this to just keep the current set clean? My movie https://vimeo.com/manage/videos/502242 |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
One other tech question: Is there going to be a new stat run before the weekend? Alinator |
Wander Saito Send message Joined: 7 Jul 03 Posts: 555 Credit: 2,136,061 RAC: 0 |
No harm done. It's good to know that the project is being managed by people humble enough to recognize (and sometimes make fun of) their own mistakes. Thanks for your efforts, Matt. Regards, Wander EDIT: One other thing, I noticed that some WUs were canceled out due to excess of errors, in this case, download errors. Some errors were raised by the client while others were aborted by users (like me). Maybe you guys could reset them in order to send out this data again. |
Viking Warrior Send message Joined: 14 Jul 00 Posts: 57 Credit: 469,343 RAC: 0 |
Why is my Boinc not requesting more work ???.... It is set to request work and comunication is available all the time. Yet my cached work unit list is shrinking rapidly. Please assist with this Thankyou Rob SETI WARRIORS are totaly commited to locating ET and use 100% of our Computing power 24/7 to do so. Click below and join only if your computer is up to it. SETI WARRIORS |
Byron Leigh Hatch @ team Carl Sagan Send message Joined: 5 Jul 99 Posts: 4548 Credit: 35,667,570 RAC: 4 |
Why is my Boinc not requesting more work ???.... It is set to request work and comunication is available all the time. Yet my cached work unit list is shrinking rapidly. Please assist with this I had had the same problem as you and I found a fix for my problem in the following thread: __settings which affect caching for BOINC 5.8.15 any ideas ?__ _http://setiathome.berkeley.edu/forum_thread.php?id=38529_ I hope this helps you. Byron |
speedimic Send message Joined: 28 Sep 02 Posts: 362 Credit: 16,590,653 RAC: 0 |
wasn´t the plan to move all services of kryten over to bruno? maybe it´s time to start that... mic. mic. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
Why is my Boinc not requesting more work ???.... It is set to request work and comunication is available all the time. Yet my cached work unit list is shrinking rapidly. Please assist with this ... probably because it has work with short deadlines. When it finishes that work it'll get more. This is not a bug, it's a feature. |
Pappa Send message Joined: 9 Jan 00 Posts: 2562 Credit: 12,301,681 RAC: 0 |
One of the things that was needed to complete that operation was a 24 Port Gigabyte Switch to go in the Server Closet... They should have that shortly
Please consider a Donation to the Seti Project. |
Richard Williams Send message Joined: 14 Jan 04 Posts: 10 Credit: 101,524 RAC: 0 |
So, I don't understand something. I am processing work units on all four of my boxes at the same rate I always do, none of them are showing any service errors, and the Pending Credit is showing that I am sending back results but don't have a massive backlog (certainly not a 3 day backlog). But just as on 6th/7th, now for the 11th, 12th, and 13th I am seing zero credit on my Stats. I look at the World Position, and I can see other people have received credit 'credit/day' column, so I assume the servers are processing and aren't down for stats. Also, I don't see any message telling us to expect an outage as on the 4th. So, I'm wondering: what's the scoop? Thx, Richard. |
Wander Saito Send message Joined: 7 Jul 03 Posts: 555 Credit: 2,136,061 RAC: 0 |
So, I don't understand something. I am processing work units on all four of my boxes at the same rate I always do, none of them are showing any service errors, and the Pending Credit is showing that I am sending back results but don't have a massive backlog (certainly not a 3 day backlog). But just as on 6th/7th, now for the 11th, 12th, and 13th I am seing zero credit on my Stats. Hi Richard, The reason your stats are not being updated is because SAH is not producing a file that contains all data pertaining to credits, users, teams, etc, that sites like BOINCstats use to generate those charts and lists. The credits you're seeing being added to other users are probably coming from other projects like CPDN, EAH and others. If you check the world position list but only for SAH, you'll see that nobody received any credits for the past 3 days or so. So my advice to you is to be patient. Regards, Wander |
Pappa Send message Joined: 9 Jan 00 Posts: 2562 Credit: 12,301,681 RAC: 0 |
Richard The process that calculates the Stats for Seti was being ran from the Replica Database (which crashed, see the Oopsie Thread). Currently that function is not ready to be turned back on yet as work was being done to insure the crash does not happen again... When it the Process is turned back on the Credits will show up with the Stats Sites... So, I don't understand something. I am processing work units on all four of my boxes at the same rate I always do, none of them are showing any service errors, and the Pending Credit is showing that I am sending back results but don't have a massive backlog (certainly not a 3 day backlog). But just as on 6th/7th, now for the 11th, 12th, and 13th I am seing zero credit on my Stats. Please consider a Donation to the Seti Project. |
Richard Williams Send message Joined: 14 Jan 04 Posts: 10 Credit: 101,524 RAC: 0 |
Thx Wander .. although, I wasn't being _impatient_, I was merely wondering what was busted in case something had gone wrong with my install ... you know, like when the zombie 'update' took over people's installs and started crediting somebody else for a ton of people's units :)) Thx again guys for the explanation. --R. So, I don't understand something. I am processing work units on all four of my boxes at the same rate I always do, none of them are showing any service errors, and the Pending Credit is showing that I am sending back results but don't have a massive backlog (certainly not a 3 day backlog). But just as on 6th/7th, now for the 11th, 12th, and 13th I am seing zero credit on my Stats. |
kittyman Send message Joined: 9 Jul 00 Posts: 51469 Credit: 1,018,363,574 RAC: 1,004 |
So, I don't understand something. I am processing work units on all four of my boxes at the same rate I always do, none of them are showing any service errors, and the Pending Credit is showing that I am sending back results but don't have a massive backlog (certainly not a 3 day backlog). But just as on 6th/7th, now for the 11th, 12th, and 13th I am seing zero credit on my Stats. Dammit Matt, we're givin' it all we got! How 'bout an update, capn'!!! No offense Matt, just tryin' to inject some humour into the sitiation. And the kitties say.......'Matt is doin' the best that he can!!' "Freedom is just Chaos, with better lighting." Alan Dean Foster |
littlegreenmanfrommars Send message Joined: 28 Jan 06 Posts: 1410 Credit: 934,158 RAC: 0 |
I'm just looking forward to the day I get an enormous lift on the stats sites! It should be a personal best! lol |
brialex42 Send message Joined: 6 Aug 99 Posts: 8 Credit: 395,298 RAC: 0 |
Waiting for the stats update as well to BOINC. Should be past 100k on Seti around midnight central time and have only been crunching this time around for ~20 days so far after last using classic a few years back. |
Cherokee150 Send message Joined: 11 Nov 99 Posts: 192 Credit: 58,513,758 RAC: 74 |
Matt, Perhaps I missed it, but is there a thread or a link that explains the change to the SETI SOP requiring four Initial Replications and three Minimum Quorum per result to three IRs and two MQ that occurred on April 12? Since that is probably one of the top procedural changes to date, and as it will most likely have major impact on everyone at both ends of the project, I am very interested in this. Thanks! |
Fuzzy Hollynoodles Send message Joined: 3 Apr 99 Posts: 9659 Credit: 251,998 RAC: 0 |
Can I suggest to the kitties to take a nap while Matt is doing what he can? "I'm trying to maintain a shred of dignity in this world." - Me |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
Matt, You did miss it - for some reason, it was posted in the Staff Blog: Heads Up: Quorum Change |
KWSN THE Holy Hand Grenade! Send message Joined: 20 Dec 05 Posts: 3187 Credit: 57,163,290 RAC: 0 |
You're lucky your kitty didn't decide to take his nap on the keyboard. Mine used to… Unfortunately, they're all now deceased. (One also nibbled on cables… and severed a few! He's lucky that he missed the power cable…) . Hello, from Albany, CA!... |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.