Double Oopsie (Apr 12 2007)

Message boards : Technical News : Double Oopsie (Apr 12 2007)
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 545164 - Posted: 12 Apr 2007, 22:59:49 UTC

Okay - I messed up. My workunit zombie cleanup process was querying against the replica database, unbeknownst to me (even though I wrote the script). So when the replica went offline my script started errantly removing workunits. That meant many users were getting "file not found" errors when trying to download work. Of course I'm smart enough to not actually delete files of such importance, and upon discovering the exact problem I was able to immediately move the mistakenly removed files back into place (they were simply moved into an analogous directory one level up). So all's well there, more or less. The good news is the replica issues of yesterday (and earlier) have been fixed sometime last night/this morning so we have both servers on line and caught up.

Once that workunit fire was put out I wrapped up work on the "nag" scripts and am now currently sending e-mails to users who signed up relatively recently but have failed to successfully send any work back. Directions about getting help were in the e-mail.

The validator queue has been a little high - not at panic levels but not really shrinking either. I believe this has to do with the extra stress the validators have now that there is less redundancy. They have to process results 25% faster than before (as long as work in continually coming in/going out). I just added 2 extra validators to the backend. Let's see if that helps.

- Matt

-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 545164 · Report as offensive
Profile Pooh Bear 27
Volunteer tester
Avatar

Send message
Joined: 14 Jul 03
Posts: 3224
Credit: 4,603,826
RAC: 0
United States
Message 545174 - Posted: 12 Apr 2007, 23:29:17 UTC

I know this is going to be asked, and I think I understood the answer for myself, but others may have not understood as well.

Does this Zombie killer go after the old stale results that have been in the database for the longest time? IE the August 2005 and February 2006 ones that plague many people? Or is this to just keep the current set clean?



My movie https://vimeo.com/manage/videos/502242
ID: 545174 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 545187 - Posted: 13 Apr 2007, 0:07:12 UTC

One other tech question:

Is there going to be a new stat run before the weekend?

Alinator
ID: 545187 · Report as offensive
Wander Saito
Volunteer tester

Send message
Joined: 7 Jul 03
Posts: 555
Credit: 2,136,061
RAC: 0
Brazil
Message 545243 - Posted: 13 Apr 2007, 1:52:23 UTC
Last modified: 13 Apr 2007, 2:43:51 UTC

No harm done. It's good to know that the project is being managed by people humble enough to recognize (and sometimes make fun of) their own mistakes. Thanks for your efforts, Matt.

Regards,
Wander

EDIT: One other thing, I noticed that some WUs were canceled out due to excess of errors, in this case, download errors. Some errors were raised by the client while others were aborted by users (like me). Maybe you guys could reset them in order to send out this data again.
ID: 545243 · Report as offensive
Profile Viking Warrior
Avatar

Send message
Joined: 14 Jul 00
Posts: 57
Credit: 469,343
RAC: 0
United Kingdom
Message 545378 - Posted: 13 Apr 2007, 8:27:45 UTC

Why is my Boinc not requesting more work ???.... It is set to request work and comunication is available all the time. Yet my cached work unit list is shrinking rapidly. Please assist with this

Thankyou
Rob
SETI WARRIORS are totaly commited to locating ET and use 100% of our Computing power 24/7 to do so. Click below and join only if your computer is up to it.

SETI WARRIORS
ID: 545378 · Report as offensive
Profile Byron Leigh Hatch @ team Carl Sagan
Volunteer tester
Avatar

Send message
Joined: 5 Jul 99
Posts: 4548
Credit: 35,667,570
RAC: 4
Canada
Message 545456 - Posted: 13 Apr 2007, 13:41:20 UTC - in response to Message 545378.  



Why is my Boinc not requesting more work ???.... It is set to request work and comunication is available all the time. Yet my cached work unit list is shrinking rapidly. Please assist with this

Thankyou
Rob

I had had the same problem as you and I found a fix for my problem in the following thread:

__settings which affect caching for BOINC 5.8.15 any ideas ?__

_http://setiathome.berkeley.edu/forum_thread.php?id=38529_

I hope this helps you.

Byron

ID: 545456 · Report as offensive
Profile speedimic
Volunteer tester
Avatar

Send message
Joined: 28 Sep 02
Posts: 362
Credit: 16,590,653
RAC: 0
Germany
Message 545503 - Posted: 13 Apr 2007, 15:40:11 UTC


The validator queue has been a little high - not at panic levels but not really shrinking either. I believe this has to do with the extra stress the validators have now that there is less redundancy. They have to process results 25% faster than before (as long as work in continually coming in/going out). I just added 2 extra validators to the backend. Let's see if that helps.


wasn´t the plan to move all services of kryten over to bruno?
maybe it´s time to start that...


mic.
mic.


ID: 545503 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 545527 - Posted: 13 Apr 2007, 17:10:28 UTC - in response to Message 545378.  

Why is my Boinc not requesting more work ???.... It is set to request work and comunication is available all the time. Yet my cached work unit list is shrinking rapidly. Please assist with this

Thankyou
Rob

... probably because it has work with short deadlines. When it finishes that work it'll get more.

This is not a bug, it's a feature.
ID: 545527 · Report as offensive
Profile Pappa
Volunteer tester
Avatar

Send message
Joined: 9 Jan 00
Posts: 2562
Credit: 12,301,681
RAC: 0
United States
Message 545539 - Posted: 13 Apr 2007, 17:44:45 UTC - in response to Message 545503.  

One of the things that was needed to complete that operation was a 24 Port Gigabyte Switch to go in the Server Closet... They should have that shortly


The validator queue has been a little high - not at panic levels but not really shrinking either. I believe this has to do with the extra stress the validators have now that there is less redundancy. They have to process results 25% faster than before (as long as work in continually coming in/going out). I just added 2 extra validators to the backend. Let's see if that helps.


wasn´t the plan to move all services of kryten over to bruno?
maybe it´s time to start that...


mic.


Please consider a Donation to the Seti Project.

ID: 545539 · Report as offensive
Profile Richard Williams

Send message
Joined: 14 Jan 04
Posts: 10
Credit: 101,524
RAC: 0
United States
Message 545703 - Posted: 13 Apr 2007, 22:23:26 UTC

So, I don't understand something. I am processing work units on all four of my boxes at the same rate I always do, none of them are showing any service errors, and the Pending Credit is showing that I am sending back results but don't have a massive backlog (certainly not a 3 day backlog). But just as on 6th/7th, now for the 11th, 12th, and 13th I am seing zero credit on my Stats.

I look at the World Position, and I can see other people have received credit 'credit/day' column, so I assume the servers are processing and aren't down for stats. Also, I don't see any message telling us to expect an outage as on the 4th.

So, I'm wondering: what's the scoop?

Thx,

Richard.

ID: 545703 · Report as offensive
Wander Saito
Volunteer tester

Send message
Joined: 7 Jul 03
Posts: 555
Credit: 2,136,061
RAC: 0
Brazil
Message 545712 - Posted: 13 Apr 2007, 22:48:27 UTC - in response to Message 545703.  

So, I don't understand something. I am processing work units on all four of my boxes at the same rate I always do, none of them are showing any service errors, and the Pending Credit is showing that I am sending back results but don't have a massive backlog (certainly not a 3 day backlog). But just as on 6th/7th, now for the 11th, 12th, and 13th I am seing zero credit on my Stats.

I look at the World Position, and I can see other people have received credit 'credit/day' column, so I assume the servers are processing and aren't down for stats. Also, I don't see any message telling us to expect an outage as on the 4th.

So, I'm wondering: what's the scoop?

Thx,

Richard.


Hi Richard,

The reason your stats are not being updated is because SAH is not producing a file that contains all data pertaining to credits, users, teams, etc, that sites like BOINCstats use to generate those charts and lists. The credits you're seeing being added to other users are probably coming from other projects like CPDN, EAH and others.

If you check the world position list but only for SAH, you'll see that nobody received any credits for the past 3 days or so. So my advice to you is to be patient.

Regards,
Wander
ID: 545712 · Report as offensive
Profile Pappa
Volunteer tester
Avatar

Send message
Joined: 9 Jan 00
Posts: 2562
Credit: 12,301,681
RAC: 0
United States
Message 545713 - Posted: 13 Apr 2007, 22:49:29 UTC - in response to Message 545703.  

Richard

The process that calculates the Stats for Seti was being ran from the Replica Database (which crashed, see the Oopsie Thread). Currently that function is not ready to be turned back on yet as work was being done to insure the crash does not happen again... When it the Process is turned back on the Credits will show up with the Stats Sites...


So, I don't understand something. I am processing work units on all four of my boxes at the same rate I always do, none of them are showing any service errors, and the Pending Credit is showing that I am sending back results but don't have a massive backlog (certainly not a 3 day backlog). But just as on 6th/7th, now for the 11th, 12th, and 13th I am seing zero credit on my Stats.

I look at the World Position, and I can see other people have received credit 'credit/day' column, so I assume the servers are processing and aren't down for stats. Also, I don't see any message telling us to expect an outage as on the 4th.

So, I'm wondering: what's the scoop?

Thx,

Richard.


Please consider a Donation to the Seti Project.

ID: 545713 · Report as offensive
Profile Richard Williams

Send message
Joined: 14 Jan 04
Posts: 10
Credit: 101,524
RAC: 0
United States
Message 545742 - Posted: 13 Apr 2007, 23:37:22 UTC - in response to Message 545712.  

Thx Wander .. although, I wasn't being _impatient_, I was merely wondering what was busted in case something had gone wrong with my install ... you know, like when the zombie 'update' took over people's installs and started crediting somebody else for a ton of people's units :))

Thx again guys for the explanation.

--R.

So, I don't understand something. I am processing work units on all four of my boxes at the same rate I always do, none of them are showing any service errors, and the Pending Credit is showing that I am sending back results but don't have a massive backlog (certainly not a 3 day backlog). But just as on 6th/7th, now for the 11th, 12th, and 13th I am seing zero credit on my Stats.

I look at the World Position, and I can see other people have received credit 'credit/day' column, so I assume the servers are processing and aren't down for stats. Also, I don't see any message telling us to expect an outage as on the 4th.

So, I'm wondering: what's the scoop?

Thx,

Richard.


Hi Richard,

The reason your stats are not being updated is because SAH is not producing a file that contains all data pertaining to credits, users, teams, etc, that sites like BOINCstats use to generate those charts and lists. The credits you're seeing being added to other users are probably coming from other projects like CPDN, EAH and others.

If you check the world position list but only for SAH, you'll see that nobody received any credits for the past 3 days or so. So my advice to you is to be patient.

Regards,
Wander


ID: 545742 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 545772 - Posted: 14 Apr 2007, 0:40:30 UTC - in response to Message 545712.  

So, I don't understand something. I am processing work units on all four of my boxes at the same rate I always do, none of them are showing any service errors, and the Pending Credit is showing that I am sending back results but don't have a massive backlog (certainly not a 3 day backlog). But just as on 6th/7th, now for the 11th, 12th, and 13th I am seing zero credit on my Stats.

I look at the World Position, and I can see other people have received credit 'credit/day' column, so I assume the servers are processing and aren't down for stats. Also, I don't see any message telling us to expect an outage as on the 4th.

So, I'm wondering: what's the scoop?

Thx,

Richard.


Hi Richard,

The reason your stats are not being updated is because SAH is not producing a file that contains all data pertaining to credits, users, teams, etc, that sites like BOINCstats use to generate those charts and lists. The credits you're seeing being added to other users are probably coming from other projects like CPDN, EAH and others.

If you check the world position list but only for SAH, you'll see that nobody received any credits for the past 3 days or so. So my advice to you is to be patient.

Regards,
Wander


Dammit Matt, we're givin' it all we got! How 'bout an update, capn'!!!

No offense Matt, just tryin' to inject some humour into the sitiation.

And the kitties say.......'Matt is doin' the best that he can!!'
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 545772 · Report as offensive
Profile littlegreenmanfrommars
Volunteer tester
Avatar

Send message
Joined: 28 Jan 06
Posts: 1410
Credit: 934,158
RAC: 0
Australia
Message 545813 - Posted: 14 Apr 2007, 2:01:35 UTC

I'm just looking forward to the day I get an enormous lift on the stats sites! It should be a personal best! lol
ID: 545813 · Report as offensive
Profile brialex42
Volunteer tester

Send message
Joined: 6 Aug 99
Posts: 8
Credit: 395,298
RAC: 0
United States
Message 545862 - Posted: 14 Apr 2007, 4:05:57 UTC

Waiting for the stats update as well to BOINC. Should be past 100k on Seti around midnight central time and have only been crunching this time around for ~20 days so far after last using classic a few years back.
ID: 545862 · Report as offensive
Cherokee150

Send message
Joined: 11 Nov 99
Posts: 192
Credit: 58,513,758
RAC: 74
United States
Message 546022 - Posted: 14 Apr 2007, 15:13:17 UTC - in response to Message 545164.  


The validator queue has been a little high - not at panic levels but not really shrinking either. I believe this has to do with the extra stress the validators have now that there is less redundancy. They have to process results 25% faster than before (as long as work in continually coming in/going out). I just added 2 extra validators to the backend. Let's see if that helps.

- Matt


Matt,
Perhaps I missed it, but is there a thread or a link that explains the change to the SETI SOP requiring four Initial Replications and three Minimum Quorum per result to three IRs and two MQ that occurred on April 12?

Since that is probably one of the top procedural changes to date, and as it will most likely have major impact on everyone at both ends of the project, I am very interested in this.

Thanks!
ID: 546022 · Report as offensive
Profile Fuzzy Hollynoodles
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 9659
Credit: 251,998
RAC: 0
Message 546026 - Posted: 14 Apr 2007, 15:21:35 UTC - in response to Message 545772.  



Dammit Matt, we're givin' it all we got! How 'bout an update, capn'!!!

No offense Matt, just tryin' to inject some humour into the sitiation.

And the kitties say.......'Matt is doin' the best that he can!!'


Can I suggest to the kitties to take a nap while Matt is doing what he can?





"I'm trying to maintain a shred of dignity in this world." - Me

ID: 546026 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14653
Credit: 200,643,578
RAC: 874
United Kingdom
Message 546035 - Posted: 14 Apr 2007, 15:55:21 UTC - in response to Message 546022.  

Matt,
Perhaps I missed it, but is there a thread or a link that explains the change to the SETI SOP requiring four Initial Replications and three Minimum Quorum per result to three IRs and two MQ that occurred on April 12?

Since that is probably one of the top procedural changes to date, and as it will most likely have major impact on everyone at both ends of the project, I am very interested in this.

Thanks!

You did miss it - for some reason, it was posted in the Staff Blog: Heads Up: Quorum Change
ID: 546035 · Report as offensive
Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar

Send message
Joined: 20 Dec 05
Posts: 3187
Credit: 57,163,290
RAC: 0
United States
Message 546659 - Posted: 15 Apr 2007, 15:18:38 UTC - in response to Message 546026.  



Dammit Matt, we're givin' it all we got! How 'bout an update, capn'!!!

No offense Matt, just tryin' to inject some humour into the sitiation.

And the kitties say.......'Matt is doin' the best that he can!!'


Can I suggest to the kitties to take a nap while Matt is doing what he can?

[image snipped]



You're lucky your kitty didn't decide to take his nap on the keyboard. Mine used to… Unfortunately, they're all now deceased. (One also nibbled on cables… and severed a few! He's lucky that he missed the power cable…)
.

Hello, from Albany, CA!...
ID: 546659 · Report as offensive
1 · 2 · Next

Message boards : Technical News : Double Oopsie (Apr 12 2007)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.