Message boards :
Technical News :
Movin' Along (Apr 18 2007)
Message board moderation
Author | Message |
---|---|
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
Yesterday we started the creation of a new index in the science database on a field in the Gaussian table. When creating an index, the table gets locked, so you can't insert anything, so we disabled the assimilators. This is a step towards developing the near time persistency checker (the thing that actually hunts for ET automatically in the background as signals come in without waiting for our intervention - me might got some science done after all!). However, during the post-outage recovery yesterday and starting up the assimilators this morning we found bruno was dropping TCP connections. Eric adjusted various tcp parameters last night and again this morning to alleviate this bottleneck. That helped a bit, but it wasn't until I bumped up the MaxClients in the apache config that the dam really broke open. As common with such problems, I'm not sure why we were choked in the first place, as the previous tcp/apache settings were more than adequate 24 hours earlier. In brighter news, db_dump seems to be working again. Cool. Today's batch is being generated as I type. Stats all around! - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
Well, I might be the only happy camper participant about the bottleneck. It may end letting a few more of the detached results I had on my inaccesible host get credit because everyone else got slowed big time and it got a chance to catch up more. ;-) In any event, hopefully things will simmer down a bit over in NC. :-) <edit> Oh yeah, new personal high day for me on SAH! Alinator |
Mugsy Send message Joined: 10 Jan 03 Posts: 2 Credit: 669,231 RAC: 0 |
Why no credit for the last seven days for me? Problem on my end or does your message explain why? Marc Mugmon |
ML1 Send message Joined: 25 Nov 01 Posts: 20397 Credit: 7,508,002 RAC: 20 |
... it wasn't until I bumped up the MaxClients in the apache config that the dam really broke open. As common with such problems, I'm not sure why we were choked in the first place, as the previous tcp/apache settings were more than adequate 24 hours earlier. Is that possibly a knock-on effect from the Boinc clients requiring to open a series of connections to complete an upload/download transaction? For example, the first enquiry succeeds, and then a second subsequent connection request to actually make the upload/download fails due to no more free connections being available. The whole process then must start from the beginning again... Just a wild guess hypothesis for if the MaxClients was actually getting hit... Cheers, Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) |
Pappa Send message Joined: 9 Jan 00 Posts: 2562 Credit: 12,301,681 RAC: 0 |
Marc Welcome to Seti I one of the previous posts explained that the Replica Database had issues and the db_dump process ran against that database. So after the database repair, the db_dump process was refusing to run properly... Today It had decided it was done messing with Matt... So No, it was not on your end... Why no credit for the last seven days for me? Problem on my end or does your message explain why? Please consider a Donation to the Seti Project. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
Why no credit for the last seven days for me? Problem on my end or does your message explain why? If this work unit is typical, you're the first one to return it, and you'll get credit when it is validated against at least one other result. I didn't study all of your results, and I didn't look to see what the validator backlog might be. |
Walla Send message Joined: 14 May 06 Posts: 329 Credit: 177,013 RAC: 0 |
and I didn't look to see what the validator backlog might be. a really big number at the moment |
*Viking* Send message Joined: 2 Nov 03 Posts: 17 Credit: 1,051,900 RAC: 1 |
And the Validator backlog keeps getting bigger, not smaller, over the last 4 hours or so... 138,728 at the moment. * Viking * |
JohnAlton Send message Joined: 28 Aug 01 Posts: 54 Credit: 164,417,653 RAC: 369 |
And the Validator backlog keeps getting bigger, not smaller, over the last 4 hours or so... 138,728 at the moment. Is it validating at all? Mine are really mounting up. |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
I have had a some validate in the last few hours, so it must still be doing something just not very quickly. Alinator |
Haos.PL Send message Joined: 18 Mar 04 Posts: 63 Credit: 3,268,546 RAC: 0 |
This is quite a backlog. I`m a small-scale kruncher, and having three WU`s pending concurrenlty... this is a rare sight for me. |
W-K 666 Send message Joined: 18 May 99 Posts: 19114 Credit: 40,757,560 RAC: 67 |
Accoding to Scarecrows graphs the awaiting validation is coming down slowly. Andy |
M4rtyn Send message Joined: 4 Aug 03 Posts: 48 Credit: 799,965 RAC: 0 |
Strange! In the last 24 hours I've seen my pending credit plummet from around my usual 2000 to 450, lower than its been for a long time, and no real change in my work pattern either. m4rtyn |
Mugsy Send message Joined: 10 Jan 03 Posts: 2 Credit: 669,231 RAC: 0 |
Marc Thanks. Today I see over 10,000 credited, so I guess all is well. I've been on Seti for a long time, but only started in December to again become active, after adding two screamingly fast Macs to my collection. Marc |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
This is time for a reality check, folks. The reason SETI has separate processes for different purposes (validators, schedulers, upload servers, download servers, etc.) is so that parts of the system can be shut down with very little impact on the project as a whole. If the validator is shut down, work is queued, and when the validator comes back it catches up. If the scheduler is down, work can still be uploaded, and reported later. If the upload server is down, work can queue on the BOINC clients. This is not intended to be a real-time system. It usually is near-real-time, but when things slow down it isn't an automatic crisis. In other news: Matt explained why the validators are down. Has to do with the real-time-persistency-checker we're all interested in -- the code that will tell us if the signals we've found are in the same place over, and over, and over. |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
Actually I believe it was the assimilators he mentioned because mods were being made to the MSD for RTPC, but your point is well taken in any event. Alinator |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
This is time for a reality check, folks. nice points thanks, I was thinking about it in terms of hardware fault tolerance, but yours says it nicer. The whole "real time persistancy checker" issue sends chills down my spine, I vote that takes precedence .... Jason "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
Jason, The design concepts behind BOINC start with the idea that fault tolerance and redundancy are expensive. We saw this with Classic: the classic screensaver downloaded one work unit, did it, and uploaded. If SETI was down, it waited. ... and if you're someone like Amazon or Google, you address this by spreading servers across many datacenters, you have multiple connections to the net, etc. At a minimum, SETI would buy two or three connections to the net, buy a bunch more servers, and bring them on campus by diverse paths. Expensive. The alternative is to build a BOINC client that tolerates outages. By doing that SETI can avoid things like hot-failover, redundant connections, and the crunching keeps going. If things are down for a few hours, it's no big deal. It means that SETI can be done on a shoestring budget. It means that we can actually have BOINC based projects run by individuals out of their own pockets. ... and that's very cool. We need to look at SETI stability and let go of the "Amazon" model. If I want to buy something, and Amazon can't take my order, someone else will. If the BOINC servers are down, the BOINC client will deal with it later. We need to think of 90% uptime as a reasonable goal. -- Ned |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
We need to think of 90% uptime as a reasonable goal. Can't see any reason to argue with that, Once approach the boinc developers may or may not have thought of is the "minimal feedback" aproach from the system. Okay there are those of us that like to look at our credit graphs etc.. that's important and should stay IMO... but boinc logs tend to induce Panic... even though it handles most stuff (pretty much everything)well on its own. one small example was when i flicked to the messages pane, although everything was running smoothly there was a bold red line indicating a certain workunit file could not be deleted. boinc in its fault tolerant manner handled it fine and removed the file later. of course I frantically rechecked my antivirus wasn't locking boinc folder files etc... not necessary, the log made me do it :D If boinc can't upload a workunit, it tries again later. do we really need to see the verbose logs? I guess this question is partially answered by the introduction of the simple interface. do we need to ride the suspend network activity button? probably not, but its there. have option will fiddle. Jason "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
John McLeod VII Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0 |
[quote}We need to think of 90% uptime as a reasonable goal.[/quote] There are some projects where I could wish for 5% uptime... BOINC WIKI |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.