Switcheroo (Mar 21 2007)

Message boards : Technical News : Switcheroo (Mar 21 2007)
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 534777 - Posted: 21 Mar 2007, 22:39:01 UTC
Last modified: 21 Mar 2007, 22:39:45 UTC

Just after I posted yesterday's tech news message we had to reboot kryten and penguin as they both lost NFS mounts. In fact, we had to boot kryten twice (as it came up immediately being unable to mount bruno's disks). I really wish I knew what was causing these to happen, but perhaps this problem will simply just "time out."

The first technical issue for today was the hill shuttle bus broke down, so I got in a few minutes later than expected. This at least afforded me an extra few minutes to complete a rather pesky sudoku puzzle. Take that, unruly numbers!

So what happened with the replica yesterday? Turns out, for some (currently) inexplicable reason the .MYD files under data/mysql were all zero length. None of the other files were affected, just the .MYD's. Oddly their time stamps were sane (they were rather old as they haven't been updated in a while). So what emptied out these very specific files but didn't update their time stamps? In any case, we're forced to recover the replica from scratch (not that big a deal). Bob was finally able to wiggle his way in to at least clean out the current database so we can drop everything and reload. We might have an outage soon to dump the current data for such a reload.

Meanwhile, bruno progresses. Making it the new upload server was held up on being able to compile a working fastcgi-enabled file_upload_handler. Jeff finally got one to compile. So we embarked on what should have been a quick transition - basically just moving a cable from one jack to another and updating DNS. However the file_upload_handler didn't work. Refusing to debug it I suggested we just use a normal garden variety handler without the fastcgi hooks. All the fastcgi was buying us was process spawning overhead. This was a major necessity on our old n' slow 3500, but bruno didn't even break a sweat once we fired it up. So bruno is now our upload server!

But wait! After a half hour or so I noticed the traffic graphs were a bit "dampened." Why weren't we sending out as much data as before? After finding no obvious bottlenecks we dug out a gigabit switch and split the Hurricane link so both kryten and bruno could act as simultaneous upload servers. Sure enough, a third of our clients were still trying to connect to the kryten address. This is odd as the DNS entry has a 5 minute TTL (time to live). Perhaps we're seeing the effect of DNS caching (in Windows or otherwise). Fair enough - we'll leave both kryten and bruno up as "mirror" servers as DNS (hopefully) corrects itself over the coming days. I'll reflect the changes in the server status page eventually.

- Matt

-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 534777 · Report as offensive
Profile Dr. C.E.T.I.
Avatar

Send message
Joined: 29 Feb 00
Posts: 16019
Credit: 794,685
RAC: 0
United States
Message 534783 - Posted: 21 Mar 2007, 22:49:52 UTC - in response to Message 534777.  

Just after I posted yesterday's tech news message we had to reboot kryten and penguin as they both lost NFS mounts. In fact, we had to boot kryten twice (as it came up immediately being unable to mount bruno's disks). I really wish I knew what was causing these to happen, but perhaps this problem will simply just "time out."

The first technical issue for today was the hill shuttle bus broke down, so I got in a few minutes later than expected. This at least afforded me an extra few minutes to complete a rather pesky sudoku puzzle. Take that, unruly numbers!

So what happened with the replica yesterday? Turns out, for some (currently) inexplicable reason the .MYD files under data/mysql were all zero length. None of the other files were affected, just the .MYD's. Oddly their time stamps were sane (they were rather old as they haven't been updated in a while). So what emptied out these very specific files but didn't update their time stamps? In any case, we're forced to recover the replica from scratch (not that big a deal). Bob was finally able to wiggle his way in to at least clean out the current database so we can drop everything and reload. We might have an outage soon to dump the current data for such a reload.

Meanwhile, bruno progresses. Making it the new upload server was held up on being able to compile a working fastcgi-enabled file_upload_handler. Jeff finally got one to compile. So we embarked on what should have been a quick transition - basically just moving a cable from one jack to another and updating DNS. However the file_upload_handler didn't work. Refusing to debug it I suggested we just use a normal garden variety handler without the fastcgi hooks. All the fastcgi was buying us was process spawning overhead. This was a major necessity on our old n' slow 3500, but bruno didn't even break a sweat once we fired it up. So bruno is now our upload server!

But wait! After a half hour or so I noticed the traffic graphs were a bit "dampened." Why weren't we sending out as much data as before? After finding no obvious bottlenecks we dug out a gigabit switch and split the Hurricane link so both kryten and bruno could act as simultaneous upload servers. Sure enough, a third of our clients were still trying to connect to the kryten address. This is odd as the DNS entry has a 5 minute TTL (time to live). Perhaps we're seeing the effect of DNS caching (in Windows or otherwise). Fair enough - we'll leave both kryten and bruno up as "mirror" servers as DNS (hopefully) corrects itself over the coming days. I'll reflect the changes in the server status page eventually.

- Matt


> good work @ Berkeley - Thanks for the Post Matt . . .

note: 'IF' you have any time - look @ this Thread
"Client error Aborted . . ." it would be appreciated - some strange 'anomalies'

ID: 534783 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20323
Credit: 7,508,002
RAC: 20
United Kingdom
Message 534827 - Posted: 21 Mar 2007, 23:53:02 UTC - in response to Message 534777.  

... we had to reboot kryten and penguin as they both lost NFS mounts. ... but perhaps this problem will simply just "time out."

Silly guesses time...

An overheated or even an overloaded switch?...

With too much simultaneous traffic, they can run out of table space or even saturate their backplane...

And, any error packets reported anywhere by any machine's ifconfig?

Good luck,
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 534827 · Report as offensive
Profile speedimic
Volunteer tester
Avatar

Send message
Joined: 28 Sep 02
Posts: 362
Credit: 16,590,653
RAC: 0
Germany
Message 534838 - Posted: 22 Mar 2007, 0:14:24 UTC
Last modified: 22 Mar 2007, 0:14:40 UTC

Well done Bob and Matt!

Keep us updated if you find out what happened to the .MYD's... interesting...

Matt:
talking about the server status page - how is that done technically? Would be cool to have that for my servers too!

keep up the good work!

mic.
mic.


ID: 534838 · Report as offensive
Profile Pooh Bear 27
Volunteer tester
Avatar

Send message
Joined: 14 Jul 03
Posts: 3224
Credit: 4,603,826
RAC: 0
United States
Message 534853 - Posted: 22 Mar 2007, 0:37:10 UTC

As I suspected, Matt is another Sudoku fan. Is the rest of the group there into it, also?

I love the professional puzzles. *Most* of the time I finish them in 20-25 minutes, but there are a few that take a little longer. I now need to get into the Hex ones, or the 26 letter ones and see what I can do with those puppies.



My movie https://vimeo.com/manage/videos/502242
ID: 534853 · Report as offensive
Wander Saito
Volunteer tester

Send message
Joined: 7 Jul 03
Posts: 555
Credit: 2,136,061
RAC: 0
Brazil
Message 534863 - Posted: 22 Mar 2007, 0:43:05 UTC

Very good news indeed! Thanks for all your efforts. Let's hope that Bruno lives up to the expectation we all have for it. Congrats to all the people involved in the switch.

Regards,
Wander
ID: 534863 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 534950 - Posted: 22 Mar 2007, 4:04:39 UTC - in response to Message 534853.  

As I suspected, Matt is another Sudoku fan. Is the rest of the group there into it, also?

I love the professional puzzles. *Most* of the time I finish them in 20-25 minutes, but there are a few that take a little longer. I now need to get into the Hex ones, or the 26 letter ones and see what I can do with those puppies.


I am a programmer at heart. I have written a 9x9 sudoku solver - that finds all the solutions for a given puzzle (some of them do indeed have more than one). It took me most of a weekend.


BOINC WIKI
ID: 534950 · Report as offensive
Profile MikeSW17
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 1603
Credit: 2,700,523
RAC: 0
United Kingdom
Message 535029 - Posted: 22 Mar 2007, 9:29:40 UTC

In case anyones' missed the obvious, a Google for 'zero length MYD files' does find several discussions re empty MYD files.

ID: 535029 · Report as offensive
gomeyer
Volunteer tester

Send message
Joined: 21 May 99
Posts: 488
Credit: 50,370,425
RAC: 0
United States
Message 535222 - Posted: 22 Mar 2007, 21:13:19 UTC

Matt,

I don't know if anyone has made you or any member of your team aware of this, or if anything can be done about it, but yesterday and the day before there were a fairly considerable number of results marked as invalid. I'll guess that it was due to Kryten losing it's mounts or a glitch during the switchover to Bruno, but whatever the cause I wanted to be sure you knew about it in case adjustments can be made.

Many of these have been posted in Number Crunching in the "Validate Errors - Please post them" thread. Of course I know that you and your team don't have time to read all of the boards so I'm bringing this to your attention here in the hopes that it will be noticed.

One issue is that the earliest of these have already been resent and reported, and so are in danger of being deleted. So, in the event that credit may be granted I guess time is becoming a factor.

Whatever, thanks for your hard work and that of the rest of the Berkeley crew in the effort to keep this project going despite all of the limitations; hardware, manpower and otherwise.

Regards,

Gus Obermeyer (gomeyer)
ID: 535222 · Report as offensive
Profile Dr. C.E.T.I.
Avatar

Send message
Joined: 29 Feb 00
Posts: 16019
Credit: 794,685
RAC: 0
United States
Message 535245 - Posted: 22 Mar 2007, 22:26:11 UTC - in response to Message 535222.  
Last modified: 22 Mar 2007, 22:29:20 UTC

Matt,

I don't know if anyone has made you or any member of your team aware of this, or if anything can be done about it, but yesterday and the day before there were a fairly considerable number of results marked as invalid. I'll guess that it was due to Kryten losing it's mounts or a glitch during the switchover to Bruno, but whatever the cause I wanted to be sure you knew about it in case adjustments can be made.

Many of these have been posted in Number Crunching in the "Validate Errors - Please post them" thread. Of course I know that you and your team don't have time to read all of the boards so I'm bringing this to your attention here in the hopes that it will be noticed.

One issue is that the earliest of these have already been resent and reported, and so are in danger of being deleted. So, in the event that credit may be granted I guess time is becoming a factor.

Whatever, thanks for your hard work and that of the rest of the Berkeley crew in the effort to keep this project going despite all of the limitations; hardware, manpower and otherwise.

Regards,

Gus Obermeyer (gomeyer)


> @ Gus - see Matt's NEW POST - Ups and Downs (Mar 22 2007)

edit - sorry 'bout Link (fixed)
ID: 535245 · Report as offensive
gomeyer
Volunteer tester

Send message
Joined: 21 May 99
Posts: 488
Credit: 50,370,425
RAC: 0
United States
Message 535265 - Posted: 22 Mar 2007, 23:01:03 UTC - in response to Message 535245.  

> @ Gus - see Matt's NEW POST - Ups and Downs (Mar 22 2007)

edit - sorry 'bout Link (fixed)


Already saw it, but thanks.
ID: 535265 · Report as offensive
Bellator

Send message
Joined: 3 Sep 04
Posts: 15
Credit: 50,270
RAC: 0
France
Message 537778 - Posted: 28 Mar 2007, 9:03:50 UTC - in response to Message 534777.  
Last modified: 28 Mar 2007, 9:09:00 UTC

this was not my message and I have deleted it.
ID: 537778 · Report as offensive
Bellator

Send message
Joined: 3 Sep 04
Posts: 15
Credit: 50,270
RAC: 0
France
Message 537780 - Posted: 28 Mar 2007, 9:07:08 UTC

Perhaps my problem has something to do with this, perhaps not. At any rate, my Seti completed task does not upload because it is "locked by file upload handler". I had this problem with the previous task and eventually I just detached and reattached, meaning I had lost all credit.
I have also had problems with Climateprediction, i.e. unable to contact server. Are the problems related. Is there a solution for a poor computer semi-illiterate?
ID: 537780 · Report as offensive
Profile Fuzzy Hollynoodles
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 9659
Credit: 251,998
RAC: 0
Message 538057 - Posted: 28 Mar 2007, 23:16:23 UTC - in response to Message 534863.  

Very good news indeed! Thanks for all your efforts. Let's hope that Bruno lives up to the expectation we all have for it. Congrats to all the people involved in the switch.

Regards,
Wander


And a big thank you to all who have donated the parts for Bruno.



"I'm trying to maintain a shred of dignity in this world." - Me

ID: 538057 · Report as offensive
Profile Pappa
Volunteer tester
Avatar

Send message
Joined: 9 Jan 00
Posts: 2562
Credit: 12,301,681
RAC: 0
United States
Message 538084 - Posted: 29 Mar 2007, 1:23:20 UTC - in response to Message 537780.  
Last modified: 29 Mar 2007, 1:24:31 UTC

Bellator

What Version of the BOINC Core Client are you using? With Problems with Seti and ClimatePrediction I would suspect that it is the BOINC Core. For Best Results/help if you post in the Number Crunching Forum it may get attention faster...

Perhaps my problem has something to do with this, perhaps not. At any rate, my Seti completed task does not upload because it is "locked by file upload handler". I had this problem with the previous task and eventually I just detached and reattached, meaning I had lost all credit.
I have also had problems with Climateprediction, i.e. unable to contact server. Are the problems related. Is there a solution for a poor computer semi-illiterate?


Please consider a Donation to the Seti Project.

ID: 538084 · Report as offensive
Profile Uioped1
Volunteer tester
Avatar

Send message
Joined: 17 Sep 03
Posts: 50
Credit: 1,179,926
RAC: 0
United States
Message 540934 - Posted: 4 Apr 2007, 18:12:36 UTC - in response to Message 537780.  
Last modified: 4 Apr 2007, 18:13:30 UTC

Perhaps my problem has something to do with this, perhaps not. At any rate, my Seti completed task does not upload because it is "locked by file upload handler". I had this problem with the previous task and eventually I just detached and reattached, meaning I had lost all credit.
I have also had problems with Climateprediction, i.e. unable to contact server. Are the problems related. Is there a solution for a poor computer semi-illiterate?



I had that problem as well yesterday. Restarting the boinc service did not help, but a reboot did fix it.

I was having intermittant internet access at the time, so I assumed it was related to that.
ID: 540934 · Report as offensive
PhonAcq

Send message
Joined: 14 Apr 01
Posts: 1656
Credit: 30,658,217
RAC: 1
United States
Message 543155 - Posted: 9 Apr 2007, 13:54:26 UTC

What is the retirement plan for the old suns? (No I don't mean 401k's) With the modern servers, do you really need all those others running? Are they going to be gutted for parts? Reducing complexity is always good.
May this Farce be with You
ID: 543155 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 543361 - Posted: 9 Apr 2007, 22:15:06 UTC - in response to Message 543155.  

What is the retirement plan for the old suns? (No I don't mean 401k's)


Hmmm... and I was going to say black holes or white dwarfs.
ID: 543361 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65759
Credit: 55,293,173
RAC: 49
United States
Message 543667 - Posted: 10 Apr 2007, 7:59:57 UTC - in response to Message 543361.  

What is the retirement plan for the old suns? (No I don't mean 401k's)


Hmmm... and I was going to say black holes or white dwarfs.

Of course a Neutron Stars Gravity well may not equal a Black hole, But It would strip the thing of It's electrons and such, Leaving just Its Neutrons, crushed and all.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 543667 · Report as offensive
PhonAcq

Send message
Joined: 14 Apr 01
Posts: 1656
Credit: 30,658,217
RAC: 1
United States
Message 543808 - Posted: 10 Apr 2007, 16:18:45 UTC - in response to Message 543361.  

What is the retirement plan for the old suns? (No I don't mean 401k's)


Hmmm... and I was going to say black holes or white dwarfs.


Good one. I completely missed the opportunity to be cleaver and am glad you didn't!
May this Farce be with You
ID: 543808 · Report as offensive
1 · 2 · Next

Message boards : Technical News : Switcheroo (Mar 21 2007)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.