Panic Mode On (88) Server Problems?

Message boards : Number crunching : Panic Mode On (88) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 17 · 18 · 19 · 20 · 21 · Next

AuthorMessage
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13786
Credit: 208,696,464
RAC: 304
Australia
Message 1555341 - Posted: 11 Aug 2014, 23:22:12 UTC - in response to Message 1555296.  

18fe09ag has claimed its 2nd splitter, I see this morning, so we'll just have to make do running on 3 splitters until someone kicks that file loose again.

Cheers.

I think 22fe09ah might be stuck as well, or it could just be running very slowly. Other tapes have advanced while it has not.

Whatever has happened, for the last 3 hours splitter output has been less than it has been. It's dropped from low 20s, to mid teens.
And as a result the Ready-to-send buffer is very, very slowly (at this stage) running down.
Grant
Darwin NT
ID: 1555341 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1555342 - Posted: 11 Aug 2014, 23:27:47 UTC - in response to Message 1555341.  

18fe09ag has claimed its 2nd splitter, I see this morning, so we'll just have to make do running on 3 splitters until someone kicks that file loose again.

Cheers.

I think 22fe09ah might be stuck as well, or it could just be running very slowly. Other tapes have advanced while it has not.

Whatever has happened, for the last 3 hours splitter output has been less than it has been. It's dropped from low 20s, to mid teens.
And as a result the Ready-to-send buffer is very, very slowly (at this stage) running down.

22fe09ah did finally start moving, but 18fe09ag has a hold of 2 splitters. So we are running on 3 out of 5 cylinders. We might just make it to maintenance, in about 16ish hours, without running dry. If no ones touches anything & nothing else goes wonky...
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1555342 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13786
Credit: 208,696,464
RAC: 304
Australia
Message 1555349 - Posted: 11 Aug 2014, 23:38:23 UTC - in response to Message 1555342.  

22fe09ah did finally start moving, but 18fe09ag has a hold of 2 splitters. So we are running on 3 out of 5 cylinders. We might just make it to maintenance, in about 16ish hours, without running dry. If no ones touches anything & nothing else goes wonky...

At the present rate the Ready-to-send buffer is running down we'd probably get close to 48 hours (from the time it started running down) to when it's empty.
As you say, as long as nothing else happens...
Grant
Darwin NT
ID: 1555349 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 35392
Credit: 261,360,520
RAC: 489
Australia
Message 1555358 - Posted: 11 Aug 2014, 23:54:28 UTC - in response to Message 1555296.  

18fe09ag has claimed its 2nd splitter, I see this morning, so we'll just have to make do running on 3 splitters until someone kicks that file loose again.

Cheers.

I think 22fe09ah might be stuck as well, or it could just be running very slowly. Other tapes have advanced while it has not.

If that is the case then we'll just have to get by on 2 splitters.

Cheers.

22fe09ah has made it to 3 done now so it must be alive.

Cheers.
ID: 1555358 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1643
Credit: 12,921,799
RAC: 89
New Zealand
Message 1555389 - Posted: 12 Aug 2014, 1:19:04 UTC - in response to Message 1555342.  


22fe09ah did finally start moving, but 18fe09ag has a hold of 2 splitters.

I am unsure how you can tell how many splitters tape is taking up could somebody please explain this to me?
ID: 1555389 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13786
Credit: 208,696,464
RAC: 304
Australia
Message 1555393 - Posted: 12 Aug 2014, 1:31:46 UTC - in response to Message 1555389.  
Last modified: 12 Aug 2014, 1:32:34 UTC


22fe09ah did finally start moving, but 18fe09ag has a hold of 2 splitters.

I am unsure how you can tell how many splitters tape is taking up could somebody please explain this to me?

There are 7 PFB (Multi Beam) splitters. Splitters 0 & 14 I've never seen running (always disabled) so that leaves 5 splitters available.

Under Splitter Status are all the "tapes" loaded for work.
They show as being Done, Channels in progress, Completed channels, Channels with errors.

At the moment, all the AP work has been split, so they all show as Done.
For MB, all the light green blocks show completed channels (they have been split). The dark green ones are those that are in the process of being split. As there are 5 splitters running, there will be 5 dark green blocks to show which channels are being split.
Generally as the channels are split they will go from dark green (in progress) to light green (completed). If after several hours a block remains dark green, you can be sure that it's a "stuck tape". Since it's sitting there, and not being split that means that splitter isn't producing any work.

At the moment 18fe09ag is the problem child- it shows 2 channels being split, unfortunately it's been that way for quite a few hours now, which means those 2 splitters aren't producing any work, leaving only the other 3 to pump out new WUs.
Grant
Darwin NT
ID: 1555393 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 35392
Credit: 261,360,520
RAC: 489
Australia
Message 1555394 - Posted: 12 Aug 2014, 1:34:42 UTC - in response to Message 1555389.  


22fe09ah did finally start moving, but 18fe09ag has a hold of 2 splitters.

I am unsure how you can tell how many splitters tape is taking up could somebody please explain this to me?

Notice how only 4 files are showing as being in progress instead of the usual 5 and also notice that the "channels in progress" dark green beside the the file 18fe09ag is twice the size of the others? ;-)

Cheers.
ID: 1555394 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1643
Credit: 12,921,799
RAC: 89
New Zealand
Message 1555403 - Posted: 12 Aug 2014, 1:56:31 UTC

Thank you so much Grant and Wig go. It certainly makes sense to have one splitter on 20 au 08 ag as it is close to finishing. Currently at (14)
ID: 1555403 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14658
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1555543 - Posted: 12 Aug 2014, 7:40:42 UTC - in response to Message 1555358.  

18fe09ag has claimed its 2nd splitter, I see this morning, so we'll just have to make do running on 3 splitters until someone kicks that file loose again.

Cheers.

I think 22fe09ah might be stuck as well, or it could just be running very slowly. Other tapes have advanced while it has not.

If that is the case then we'll just have to get by on 2 splitters.

Cheers.

22fe09ah has made it to 3 done now so it must be alive.

Cheers.

And it seems to be producing mainly shorties...
ID: 1555543 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13786
Credit: 208,696,464
RAC: 304
Australia
Message 1555627 - Posted: 12 Aug 2014, 11:55:33 UTC - in response to Message 1555543.  

I hope the weekly outage clears the blockages.
What looked like almost 2 days worth of work Ready-to-send now looks like not much more than 12 hours worth, if that.
Grant
Darwin NT
ID: 1555627 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1555680 - Posted: 12 Aug 2014, 14:10:52 UTC

The 18fe09ag tape is stuck and the new work buffer slowly goes down. Hope they could kick it in the today´s outage.
ID: 1555680 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1555718 - Posted: 12 Aug 2014, 15:39:45 UTC

Yesterday, I looked at the Cricket and saw a big jump. I thought, oh good, APs are flowing. Then I got to the SSP and saw that APs are not flowing. So what's all that data going out?
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1555718 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1555723 - Posted: 12 Aug 2014, 15:45:37 UTC - in response to Message 1555718.  

Yesterday, I looked at the Cricket and saw a big jump. I thought, oh good, APs are flowing. Then I got to the SSP and saw that APs are not flowing. So what's all that data going out?

Probably the processed data going from the colo back to the lab. So they can free up space to dump more data to the colo.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1555723 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1555794 - Posted: 12 Aug 2014, 19:36:46 UTC

Now let's see if 18fe09ag gets stuck on the 3rd channel. Maybe it will take off like the other tape did (once it was kicked enough).
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1555794 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1555876 - Posted: 12 Aug 2014, 20:57:31 UTC - in response to Message 1555794.  
Last modified: 12 Aug 2014, 20:58:08 UTC

Now let's see if 18fe09ag gets stuck on the 3rd channel. Maybe it will take off like the other tape did (once it was kicked enough).

I´m not so confident, normaly without problem the production is about 30/s and now is at about 21/s, that happening when i have one tape with problem. But let´s wait for few hours more to be sure.

BTW Today´s outage was realy fast. Now we need more AP WU to fill our caches.
ID: 1555876 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 35392
Credit: 261,360,520
RAC: 489
Australia
Message 1555879 - Posted: 12 Aug 2014, 21:03:04 UTC - in response to Message 1555876.  
Last modified: 12 Aug 2014, 21:05:36 UTC

Now let's see if 18fe09ag gets stuck on the 3rd channel. Maybe it will take off like the other tape did (once it was kicked enough).

I´m not so confident, normaly without problem the production is about 30/s and now is at about 21/s, that happening when i have one tape with problem. But let´s wait for few hours more to be sure.

BTW Today´s outage was realy fast. Now we need more AP WU to fill our caches.

It certainly looks like that file is stuck again and I'm not getting many of my requests for more work answered. :-(

Cheers.
ID: 1555879 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1555887 - Posted: 12 Aug 2014, 21:21:18 UTC - in response to Message 1555879.  
Last modified: 12 Aug 2014, 21:37:14 UTC

Now let's see if 18fe09ag gets stuck on the 3rd channel. Maybe it will take off like the other tape did (once it was kicked enough).

I´m not so confident, normaly without problem the production is about 30/s and now is at about 21/s, that happening when i have one tape with problem. But let´s wait for few hours more to be sure.

BTW Today´s outage was realy fast. Now we need more AP WU to fill our caches.

It certainly looks like that file is stuck again and I'm not getting many of my requests for more work answered. :-(

Cheers.

Somebody who have access to the lab people could ping them about? They must still around for a couple of hours since must be about 14:30 in CA now (if i not made a mistake on the time zone conversion again).
ID: 1555887 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13786
Credit: 208,696,464
RAC: 304
Australia
Message 1555925 - Posted: 12 Aug 2014, 22:43:17 UTC - in response to Message 1555887.  

MB splitter output still borked. Initially started off OK after the outage, but didn't take long to drop down to 20/s or less again.
At least there's some AP work going out, so that will help reduce the demand for MB work.
Grant
Darwin NT
ID: 1555925 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1555948 - Posted: 12 Aug 2014, 23:22:30 UTC - in response to Message 1555925.  
Last modified: 12 Aug 2014, 23:25:54 UTC

MB splitter output still borked. Initially started off OK after the outage, but didn't take long to drop down to 20/s or less again.
At least there's some AP work going out, so that will help reduce the demand for MB work.

Of course it´s broken, the 18fe09ag tape still stuck.

And since nobody kick it, slow host start to get AP WU and that is bad for us since they normaly return the crunched WU close to the time limit and that makes our pendings rise.

I still don´t understand how anybody could write a software who teoricaly works stand-alone and don´t prevent a watch-dog exactly to avoid things like this.

That´s one of the first thing you learn on the elementary software school.
ID: 1555948 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1556189 - Posted: 13 Aug 2014, 14:56:37 UTC

http://setiathome.berkeley.edu/show_host_detail.php?hostid=7352368

How is it possible for "Number of times client has contacted server" to be 0? Doesn't it have to contact at least once to get the 127 tasks that are probably going to start timing out in another 10 days?

(This is just one of the TWO cases I currently have where an inconclusive has gone out to a _2 host, only to have said host disappear. The other one was working for five years, though, so I have a small bit of hope for its return.)
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1556189 · Report as offensive
Previous · 1 . . . 17 · 18 · 19 · 20 · 21 · Next

Message boards : Number crunching : Panic Mode On (88) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.