Harsh, but could help

Message boards : Number crunching : Harsh, but could help
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Paul Shellien

Send message
Joined: 6 Sep 03
Posts: 8
Credit: 1,171,683
RAC: 0
United Kingdom
Message 77517 - Posted: 8 Feb 2005, 12:54:31 UTC

Just a suggestion. Why do they not take the project down for 48 hours and let the splitters work away and build up a good cache of WU's? That way when they come back on line and start accepting connections, everyone who requests a WU will get one and we will nopt be stuck in the situation we are in now with the splitters playing 'catch-up' all the time.

Anyone else have a view on my idea. Good or bad, please comment
<img border="0" src="http://boinc.mundayweb.com/one/stats.php?userID=515&amp;prj=1&amp;trans=off" />
ID: 77517 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 77518 - Posted: 8 Feb 2005, 12:59:39 UTC
Last modified: 8 Feb 2005, 13:18:35 UTC

From my understanding, the splitters are working fine. After a tape is split, the data is written to the data base. therein lies the problem. The 1/2 the DB resources are being taken up by Mysql, and that is limiting the number of split units that can be handled by the DB. This I think is what's limiting our ability to get work.

tony

Matt Lebofsky wrote the following in the "Are we back, looks like it" thread. there's more info there. this was written AFTER the migration. For those who don't know Matt, he's the person that physically places the tapes on the splitters, and I'm sure he does other things.

Matt Lebofsky
Joined: Mar 2, 1999
Posts: 60
ID: 122079
Posted: 8 Feb 2005 4:18:40 UTC

Well, the actual migration wasn't terribly exciting. We fired off a mysqldump that copied everything onto the new server. About 9 hours later it was done. Initial data checks look okay.

But the fact it didn't crash during the data transfer is exciting. We'll run some tests tomorrow and then add the plumbing to make it a replica. If all goes well, it'll be the master by the end of the week.

- Matt
BOINC/SETI@home


Matt Lebofsky
Joined: Mar 2, 1999
Posts: 60
ID: 122079
Posted: 8 Feb 2005 4:27:39 UTC

> good news!...so we should expect things to function more or less the way they
> have been until the replica becomes the master?

Yup. I know it's slooow, but at least things work. There is still the perfectly valid hope that mysql's internal merge/purge which is eating up a lot of I/O could finish at any second, rendering our current database more than adequate for now. Until then, well, we'll just have to deal with a bit of sluggishness here and there.

- Matt
BOINC/SETI@home


Matt Lebofsky
Joined: Mar 2, 1999
Posts: 60
ID: 122079
Posted: 8 Feb 2005 4:45:16 UTC

> Any further info on the strange I/O activity? Will it continue to impact the
> current server performance for a while, and is it expected to continue on the
> new hardware?

We're pretty convinced at this point it has to do with innodb caching that we didn't have tuned perfectly (thanks to low memory and disk space on the server). Bob and Jeff understand this part better than me, but the basic gist of it is: the data were logically up to date, but not commited to disk in a manner that mysql finds pleasing (or efficient). Eventually this hits a limit where mysql goes, "okay - I'm going to take 50% of your I/O power and take care of some garbage collection," and there's nothing you can do about it until it's done.

- Matt
BOINC/SETI@home



ID: 77518 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 77558 - Posted: 8 Feb 2005, 16:54:51 UTC - in response to Message 77517.  

> Just a suggestion. Why do they not take the project down for 48 hours and let
> the splitters work away and build up a good cache of WU's? That way when they
> come back on line and start accepting connections, everyone who requests a WU
> will get one and we will nopt be stuck in the situation we are in now with the
> splitters playing 'catch-up' all the time.
>
> Anyone else have a view on my idea. Good or bad, please comment

Based on what Matt said, the problem is going to be the problem as long as MySQL is struggling with garbage collection. Talking the rest of the project down may or may not let the splitters run.

... and as I read his messages, it's mostly RAM -- the current database server just doesn't have enough.

So, given the way most people panic at the slightest glitch, the best thing for our friends in Berkeley is to continue with the move to the new server as quickly as possible while also making sure that everything goes perfectly.
ID: 77558 · Report as offensive
Profile Paul Shellien

Send message
Joined: 6 Sep 03
Posts: 8
Credit: 1,171,683
RAC: 0
United Kingdom
Message 77647 - Posted: 8 Feb 2005, 22:11:17 UTC - in response to Message 77558.  

I now stand corrected. The snags, it appears, are with the database. The same premise still stands. Take it down and stop accepting requests until it is all sorted out. Surely that has to be the best option?

Everyone who is multi-project on BOINC can then direct the SETi resources to the other projects for the out time, and then redivert them when SETI has sorted itself out and is ready to distribute WUs
<img border="0" src="http://boinc.mundayweb.com/one/stats.php?userID=515&amp;prj=1&amp;trans=off" />
ID: 77647 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 77648 - Posted: 8 Feb 2005, 22:16:55 UTC - in response to Message 77647.  

> The same premise still stands. Take it down and stop accepting requests until it
> is all sorted out. Surely that has to be the best option?
>
I have no problem with this.

tony

ID: 77648 · Report as offensive
Profile MattDavis
Volunteer tester
Avatar

Send message
Joined: 11 Nov 99
Posts: 919
Credit: 934,161
RAC: 0
United States
Message 77652 - Posted: 8 Feb 2005, 22:31:33 UTC

Seti needs a vacation 8)
-----
ID: 77652 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 77664 - Posted: 8 Feb 2005, 23:30:30 UTC - in response to Message 77647.  

> I now stand corrected. The snags, it appears, are with the database. The
> same premise still stands. Take it down and stop accepting requests until it
> is all sorted out. Surely that has to be the best option?

... because BOINC/SETI is successfully sending out work (or I would have run out last week).

But the biggest reason is: if they are going to test under load, they need a load for testing.

... and here we are, providing that load.
ID: 77664 · Report as offensive
baalthazaar

Send message
Joined: 23 Mar 03
Posts: 1
Credit: 10,194
RAC: 0
United States
Message 77746 - Posted: 9 Feb 2005, 4:35:27 UTC - in response to Message 77558.  

Maybe they ought to consider switching databases..... I hear that Ingres is open source now...

> > Just a suggestion. Why do they not take the project down for 48 hours
> and let
> > the splitters work away and build up a good cache of WU's? That way when
> they
> > come back on line and start accepting connections, everyone who requests
> a WU
> > will get one and we will nopt be stuck in the situation we are in now
> with the
> > splitters playing 'catch-up' all the time.
> >
> > Anyone else have a view on my idea. Good or bad, please comment
>
> Based on what Matt said, the problem is going to be the problem as long as
> MySQL is struggling with garbage collection. Talking the rest of the project
> down may or may not let the splitters run.
>
> ... and as I read his messages, it's mostly RAM -- the current database server
> just doesn't have enough.
>
> So, given the way most people panic at the slightest glitch, the best thing
> for our friends in Berkeley is to continue with the move to the new server as
> quickly as possible while also making sure that everything goes perfectly.
>
ID: 77746 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 77753 - Posted: 9 Feb 2005, 5:41:28 UTC - in response to Message 77746.  

> Maybe they ought to consider switching databases..... I hear that Ingres is
> open source now...

Garbage collection is one of a class of problems that just seems to exist everywhere in some form or another.

Ingres may very well have the same kind of issue somewhere -- especially if run on "small" hardware.
ID: 77753 · Report as offensive

Message boards : Number crunching : Harsh, but could help


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.