Message boards :
Technical News :
Promenade au Fond d'un Canal (Nov 19 2007)
Message board moderation
Author | Message |
---|---|
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
As we warned, we had a major outage today to do some massive cleaning/organization in our server closet. It went well: with dozens of cable ties and power strips on hand we got rid of about 95% of the spaghetti dangling from the backs of the racks, spilling into several piles on the closet floor. But that wasn't the main reason for this outage. We also installed a new UPS to replace a broken one - so jocelyn and isaac are protected again, as well as put everything on some kind of power switch so that when we have our lab-wide outage it'll be easy to just flick things on/off (as opposed to reaching behind big, heavy things to yank plugs from the wall). With the power off we were able to move racks around to allow enough of a gap to finally get the old E3500 out of there (the late, great galileo) - it had been collecting dust in the corner for years. Speaking of dust, we also vacuumed. But of course there were issues, which is to be expected when powering many massive servers off and on. We discovered jocelyn lost contact with its fibre-channel RAID (where the BOINC database resides). After some head scratching we realized this was due to fibre-channel support being lost in the recently upgraded kernel. We booted to an older kernel and it was fine. As I write this, both ewen (Eric's hydrogen database server) and thumper are doing forced checks of large disk volumes - that might take all night during which certain parts of our project will have to remain offline. We'll probably run out of work before too long. Apparently we need to turn off the forced checks. We also had some routing problems upon rebooting the Cisco but we quickly remembered that you have to do a "magic ping" to wake up the next hop and then traffic pushed through. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
Orgil Send message Joined: 3 Aug 05 Posts: 979 Credit: 103,527 RAC: 0 |
Thanks for the feedback. Even in Antarctica people were waiting for servers. ;) Mandtugai! |
DJStarfox Send message Joined: 23 May 01 Posts: 1066 Credit: 1,226,053 RAC: 2 |
Sounds like you guys got a lot accomplished this afternoon. With all those cables organized instead of in a big mess, you may notice better air flow in the closet too. I would investigate the kernel upgrade...why did it happen originally? Perhaps there is a kernel module or patch that can be added to include the Fibre support, if you still need this upgrade for some other reason. Lucky for me, I got 6 WU before the pipes dried up. :) Good luck for the real upgrades tomorrow. I'm sure the hard work done today will be helpful. |
Dr. C.E.T.I. Send message Joined: 29 Feb 00 Posts: 16019 Credit: 794,685 RAC: 0 |
Thanks for the update Matt / Berkeley . . . You All had your work cut out for you Today eh! Matt - is that from Producer Keith Souza: 'Kiss Me Red CD' - Like Stars or is 'Promenade au Fond d'un Canal' a reference to Analyse de l'oeuvre de David Lynch ;) BOINC Wiki . . . Science Status Page . . . |
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
Title of a song by Belgian group Present. Highly recommended for people who like chords with both major sevenths and minor ninths. - Matt
-- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
Pooh Bear 27 Send message Joined: 14 Jul 03 Posts: 3224 Credit: 4,603,826 RAC: 0 |
We also had some routing problems upon rebooting the Cisco but we quickly remembered that you have to do a "magic ping" to wake up the next hop and then traffic pushed through. Have you looked into an IOS update for this? Recently we had a Cisco router that was doing funky things, and everything we tried did nothing to correct the issue. On a whim (after 6 months of it being tested over and over, even by Cisco) we did an IOS update and all the issues disappeared. Just a thought. |
Dr. C.E.T.I. Send message Joined: 29 Feb 00 Posts: 16019 Credit: 794,685 RAC: 0 |
Thanks Matt . . . now i just have to get the sheets . . . btw - really great music 'PRESENT' - A GREAT INHUMANE ADVENTURE (CD on Cuneiform) Track Listings 1. Delusions (14:46) 2. Alone (10:58) 3. Le Poison Qui Rend Fou (10:16) 4. Laundry Blues (13:01) 5. Promenade au Fond d'un Canal (23:33) MP3
'Vital Weekly' is published by Frans de Waard BOINC Wiki . . . Science Status Page . . . |
JLDun Send message Joined: 21 Apr 06 Posts: 573 Credit: 196,101 RAC: 0 |
We also had some routing problems upon rebooting the Cisco but we quickly remembered that you have to do a "magic ping" to wake up the next hop and then traffic pushed through. So it appears that there is no "easy" replacement to doing this? (How about a .BAT file that executes upon reboots?, asks the relatively uninformed cruncher.) |
ML1 Send message Joined: 25 Nov 01 Posts: 20359 Credit: 7,508,002 RAC: 20 |
...How about a .BAT file that executes upon reboots?, ... Ooooer... That's so Windows-esq :-/ But still a good idea. There's the option of a "(Vixi)cron @boot ping wherever", Or even a ping in the equivalent rc.local or wherever. Or there is the very good excuse of setting up ping traces charted up by nagios to monitor all manner of LAN stuff... ;-) And then where is there time for a beer? :-) Cheers, Martin (OK, so I guess Matt is likely already onto that one.) See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.