Message boards :
Technical News :
Strangely Normal (Sep 10 2007)
Message board moderation
Author | Message |
---|---|
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
So it was a busy weekend, with our focus mostly on thumper (the science database server). There were actually two separate problems. Three drives within four days failed somewhat spuriously. We are fairly convinced at this point that they didn't actually fail - I actually took them out of RAID control this morning and am heavily exercising them without any errors. Why they seemed to fail is still a mystery. We are running an older version of Fedora Core on this system and therefore an older version of mdadm. Or is it drive controller issues? Or just error-level threshholds that need tweaking to be less hypersensitive to transient I/O issues? Meanwhile, perhaps due to all the above, an index in the database got corrupted and needed to be dropped/rebuilt which took all of Thursday night to Friday afternoon to complete. Add all this up and we weren't able to create/assimilate new work for most of the weekend. I did get the assimilators going on Friday night, and when the smoke cleared Jeff got the splitters running on Saturday. So far so good. We were expecting more spurious disk failures, but so far nothing. In fact today has been strangely normal. Tomorrow we may try implementing a method of distributing workunits around our local network so we aren't so choked on that one NAS server which can only do so much. We need to get more headroom before we can try to win participants back. As it stands now given our current level of redundancy we can barely keep up with demand. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
Wasabi Peanut Send message Joined: 14 Jul 99 Posts: 62 Credit: 32,646,911 RAC: 0 |
Thanx for the news, Matt! Things have moved towards smooth sailing a great deal on my boxes over the last 24 hours - excellent!! But as you say, keeping up with demand appears to be a challenge for the current architecture. I'm curious to see the effects of distributed WU-storage in action. Speaking of it: ever thought about a SAN? Seems to me like it would give you a great deal of welcome flexibility beyond the much needed performance boost. Just a thought... Kudos to everyone @ Berkeley for their hard work! Cheers, Ron |
perryjay Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0 |
Thanx for the news, Matt! Things have moved towards smooth sailing a great deal on my boxes over the last 24 hours - excellent!! Correct me if I'm wrong but isn't a SAN failure what Rosetta just recovered from? PROUD MEMBER OF Team Starfire World BOINC |
Dr. C.E.T.I. Send message Joined: 29 Feb 00 Posts: 16019 Credit: 794,685 RAC: 0 |
Thank You Matt - and as Well to the others @ Berkeley > note: RAC is finally startin' to count right ;) |
Agnostic Pope Send message Joined: 25 May 99 Posts: 20 Credit: 118,354 RAC: 0 |
Thanx for the news, Matt! Things have moved towards smooth sailing a great deal on my boxes over the last 24 hours - excellent!! Yup. Be careful what you wish for ... you might get it! |
Howard Send message Joined: 20 Dec 01 Posts: 1 Credit: 171,270 RAC: 0 |
Well talking of getting "participants back" ,i'm one. Joined the project along long time ago. I stopped some time around when bonic came along. I have over the past 6 months so or been using Linux alot more and have decided to help out again. Came to look at the forums when my client stopped working, nice to see the community still here. Sad to see that im still reasonably high on the stats of my join date even though i havnt been part of the project in over a year. |
ML1 Send message Joined: 25 Nov 01 Posts: 20359 Credit: 7,508,002 RAC: 20 |
Well talking of getting "participants back" ,i'm one. Joined the project along long time ago. I stopped some time around when bonic came along. ... Welcome back to the great crunch! We're now on very new data... Until Arecibo gets closed that is... Happy crunchin', Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) |
[DPC]TeamGrazzie~MoMurdaSquad~Oet Send message Joined: 19 Mar 02 Posts: 3 Credit: 3,619,841 RAC: 0 |
Hi Matt, Appreciate all the time and effort you guys put in this project, it's great :) Meanwhile, it may help if you could make some kind of graphical reprentation of the current infrastructure (meaning the SETI servers, networking and data flow). There might be a hand full of participants with enough knowledge that are willing to help you designing a improved design? It's just a thought though.. |
ML1 Send message Joined: 25 Nov 01 Posts: 20359 Credit: 7,508,002 RAC: 20 |
Meanwhile, it may help if you could make some kind of graphical reprentation of the current infrastructure (meaning the SETI servers, networking and data flow)... I think that'd need a webcam on their lab whiteboard! Also, beware of too many 'chefs' causing confusion and spoiling the 'broth'... I'm sure Matt etal will contact off-forum/offlist whenever there is a need. Enjoy the ride! Happy crunchin', Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) |
speedimic Send message Joined: 28 Sep 02 Posts: 362 Credit: 16,590,653 RAC: 0 |
I did that once for my network - took me a whole day... The problem at the lab is not the lack of knowledge or good ideas, but the lack of TIME and MONEY! If Matt is stuck somewhere, I'm sure he will ask and we can all try to help. mic. |
haddock29 Send message Joined: 18 Sep 99 Posts: 36 Credit: 26,012,417 RAC: 0 |
So it was a busy weekend, with our focus mostly on thumper (the science database server). There were actually two separate problems. Three drives within four days failed somewhat spuriously. We are fairly convinced at this point that they didn't actually fail - I actually took them out of RAID control this morning and am heavily exercising them without any errors. Why they seemed to fail is still a mystery. We are running an older version of Fedora Core on this system and therefore an older version of mdadm. Or is it drive controller issues? Or just error-level threshholds that need tweaking to be less hypersensitive to transient I/O issues? Meanwhile, perhaps due to all ... I got some years ago such a mystery with a 3ware ATA raid card and 120 GB ST (may be WD) disks. The vendor was unable to fix it. In afct the probleme was documented, and we had to change the firmware of the disks, something related to vibrations. The symptom was random disk failures (after 2 years of continuous operation), and the disks tested alone wera aprrarently good, as was the cards, the cables and so ON. Everything is OK since the firmware upgrade (there e was a tool to do that on the 3ware Web), that means 2 more years of continuous operation. Will be happy if that can help you. Alain |
Hawksfollow Send message Joined: 23 Feb 03 Posts: 9 Credit: 19,348 RAC: 0 |
Well talking of getting "participants back" ,i'm one. Joined the project along long time ago. I stopped some time around when bonic came along. I have over the past 6 months so or been using Linux alot more and have decided to help out again. Came to look at the forums when my client stopped working, nice to see the community still here. Sad to see that im still reasonably high on the stats of my join date even though i havnt been part of the project in over a year. I'm also one who came back and with a new pc, the other one caught fire, smoke sparks and flame. Have a dumb question, hope this is the right place to ask. I linked my account using my first email that had only 14 WU. Can't figure out how to add or change to the email with over 500 WU. Any way to do this? Thanks in advance. Glad to be back running Seti again! |
KWSN THE Holy Hand Grenade! Send message Joined: 20 Dec 05 Posts: 3187 Credit: 57,163,290 RAC: 0 |
Well talking of getting "participants back" ,i'm one. Joined the project along long time ago. I stopped some time around when bonic came along. I have over the past 6 months so or been using Linux alot more and have decided to help out again. Came to look at the forums when my client stopped working, nice to see the community still here. Sad to see that im still reasonably high on the stats of my join date even though i havnt been part of the project in over a year. You should ask this question again in the number crunching forums  you're more likely to get a reply... . Hello, from Albany, CA!... |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.