Message boards :
Number crunching :
Should we start over from scratch?
Message board moderation
Author | Message |
---|---|
Geek@Play Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0 |
It appears to me, after 3 attempts to migrate the DB to the new server hardware, that the DB is greatly compromised and that parts of it may not be recoverable. Since admin has been unable to migrate the DB to the new server successfully, I propose the following. Wheel in the new hardware, hook it up and start it up creating a new DB. Users would be asked to reattach each client computer. Admin would then have all the time they need to unravel the old DB and credit those users with the credit they have earned from the old DB. Anyone else agree with this? Boinc....Boinc....Boinc....Boinc.... |
Speedy67 & Friends Send message Joined: 14 Jul 99 Posts: 335 Credit: 1,178,138 RAC: 0 |
|
Hans Dorn Send message Joined: 3 Apr 99 Posts: 2262 Credit: 26,448,570 RAC: 0 |
> It appears to me, after 3 attempts to migrate the DB to the new server > hardware, that the DB is greatly compromised and that parts of it may not be > recoverable. > > Since admin has been unable to migrate the DB to the new server successfully, > I propose the following. Wheel in the new hardware, hook it up and start it > up creating a new DB. Users would be asked to reattach each client computer. > Admin would then have all the time they need to unravel the old DB and credit > those users with the credit they have earned from the old DB. > > Anyone else agree with this? > > Yep. Just move over the user accounts and discard all work in progress. Regards Hans |
Divide Overflow Send message Joined: 3 Apr 99 Posts: 365 Credit: 131,684 RAC: 0 |
> Anyone else agree with this? Others might, but I certainly don't. |
Hans Dorn Send message Joined: 3 Apr 99 Posts: 2262 Credit: 26,448,570 RAC: 0 |
> > Anyone else agree with this? > > Others might, but I certainly don't. > > OK, I guess this suggestion may sound a bit harsh. But since the data on the old servers isn't lost, it may be possible to slowly move it over to the new hardware later. Regards Hans |
EclipseHA Send message Joined: 28 Jul 99 Posts: 1018 Credit: 530,719 RAC: 0 |
The real question is "what is the DB busy doing" - Per the news, it's too busy to add new WU's from the splitter - It also seems that it's not busy validating, as that's not happening - Is is busy sending out WU's? (guess not, as they are rare) - How about doing transition? - well, I guess not - does the file_deleter have anything to delete if things aren't validated or transitioned? - did the Borg assimilator the DB? It just seems the DB is all tied up doing something other than what the DB should be doing! (kind of like finding "important stuff" when the Honey-do list gets too long) If the DB is spending all it's time refusing connections and doing nothing more, there does seem to be a big bug in the DB code! |
Rom Walton (BOINC) Send message Joined: 28 Apr 00 Posts: 579 Credit: 130,733 RAC: 0 |
The DB is currently IO bound because a backup job is in progress. I would guess it is part of the migration process. The machine the DB currently runs on is pretty old. It's one of the Sun D220R (2 x 440MHz Sparc, 2 GB RAM). The machine it is being migrated too is a new dual proc opteron ( 2.4GHz, I think ) with 6GB-8GB of ram. It is expandable to four processors and 16GB of ram if we need it. I just think that we finally topped out on what the old machine was capable of handling, heck I just bought a PocketPC with a 600+MHz RISC processor that is probably able to keep up with it. I'm surprised it lasted this long. ----- Rom BOINC Development Team, U.C. Berkeley My Blog |
Speedy67 & Friends Send message Joined: 14 Jul 99 Posts: 335 Credit: 1,178,138 RAC: 0 |
> The real question is "what is the DB busy doing" > > - Per the news, it's too busy to add new WU's from the splitter > - It also seems that it's not busy validating, as that's not happening > - Is is busy sending out WU's? (guess not, as they are rare) > - How about doing transition? - well, I guess not > - does the file_deleter have anything to delete if things aren't validated or > transitioned? > - did the Borg assimilator the DB? > > It just seems the DB is all tied up doing something other than what the DB > should be doing! (kind of like finding "important stuff" when the Honey-do > list gets too long) > > If the DB is spending all it's time refusing connections and doing nothing > more, there does seem to be a big bug in the DB code! > This was in the Technial News section on Janauary 26, and referred to a couple times later as it is still happening: --quote-- Meanwhile the current database is being artificially slowed for reasons we have yet to determine. Basically, something internal to mysql caused it to suddenly read 5 megabytes/sec from the data disks. This started last Friday and hasn't stopped since. Even when there are no queries happening there are major amounts of disk I/O. Everything is working, just a little slower than it should. --unquote-- So maybe the last one of your options is the right one. :) |
FloridaBear Send message Joined: 28 Mar 02 Posts: 117 Credit: 6,480,773 RAC: 0 |
--quote-- Basically, something internal to mysql caused it to suddenly read 5 megabytes/sec from the data disks. --quote-- I was actually kind of amused by this when I first read it. My striped RAID (2x120 GB SATA 7200 RPM) in my PC benchmarks at 99 MB/sec for sequential reads ("only" 53 random) in Sandra, so 5 MB/sec is practically nothing these days. All from a 2-year old $100 mainboard and two $100 hard drives. More evidence that their hardware is just not up to the task. I'm very happy to hear that they're moving away from Sparc...the bang/buck is so much better with Intel or AMD now. I've dealt with Sun for years, and just can't understand why they are still selling 0.5 GHz CPUs (64-bit or not). It sure seems to me that they have fallen behind in the hardware game...perhaps that's one reason why their stock chart looks so awful ;) |
EclipseHA Send message Joined: 28 Jul 99 Posts: 1018 Credit: 530,719 RAC: 0 |
> --quote-- > Basically, something internal to mysql caused it to suddenly read 5 > megabytes/sec from the data disks. > --quote-- > > I was actually kind of amused by this when I first read it. My striped RAID > (2x120 GB SATA 7200 RPM) in my PC benchmarks at 99 MB/sec for sequential reads > ("only" 53 random) in Sandra, so 5 MB/sec is practically nothing these days. > All from a 2-year old $100 mainboard and two $100 hard drives. More evidence > that their hardware is just not up to the task. I'm very happy to hear that > they're moving away from Sparc.. I guess you weren't running the project when they tried to move to the Snap Applicance box for the DB... That was a hoot! Something they can't ID hitting the DB at 5mb/sec? It's either the Borg or there's a bug in some code they cant find! As MySql is used by many, I doubt that by itself, it would suddenly start slamming the DB! Dollars to dounuts, it's some code boinc code that is scanning the DB and ignoring (or not seeing) an error return, and like the Eveready Bunny, just keeps going, and going, and going..... |
EclipseHA Send message Joined: 28 Jul 99 Posts: 1018 Credit: 530,719 RAC: 0 |
> The DB is currently IO bound because a backup job is in progress. > > I would guess it is part of the migration process. > > The machine the DB currently runs on is pretty old. It's one of the Sun D220R > (2 x 440MHz Sparc, 2 GB RAM). The machine it is being migrated too is a new > dual proc opteron ( 2.4GHz, I think ) with 6GB-8GB of ram. It is expandable > to four processors and 16GB of ram if we need it. > > I just think that we finally topped out on what the old machine was capable of > handling, heck I just bought a PocketPC with a 600+MHz RISC processor that is > probably able to keep up with it. I'm surprised it lasted this long. > > Rom.. It's interesting that you didn't note the unexplained 5mb/sec hits that others on this thread are referencing! Maybe it's not that the HW is "topped out"! :) |
ghstwolf Send message Joined: 14 Oct 04 Posts: 322 Credit: 55,806 RAC: 0 |
Maybe I'm missing something important, that would stop this from working. But why can't the raid array from the current DB machine be pulled, then clone the data. The new DB machine is ready to run, so everything is in it now, add one drive clone data, remove, and repeat. Or (if possible) install the array all at once (then clone to the new DBs array). The only issue I see, is it means shutting everything down. This isn't for a couple hours either, maybe a day or so. Feel free to tell me what is wrong with this plan, other than it means shutting down (we are already at a crawl). Still looking for something profound or inspirational to place here. |
Rom Walton (BOINC) Send message Joined: 28 Apr 00 Posts: 579 Credit: 130,733 RAC: 0 |
> Rom.. It's interesting that you didn't note the unexplained 5mb/sec hits that > others on this thread are referencing! Maybe it's not that the HW is "topped > out"! :) I didn't try to explain it because I haven’t been part of the investigation team, therefore I don’t have any information on it. David has had me focusing in on the client-side part of the software stack, after we get the next release out the door, he might re-task me, who knows. I figure that if we went through another round of optimization with the server-side stack we could probably get another 100% out of the software, but it still wouldn’t be enough to handle the load of classic shutting down. So in any event, it was still logical to upgrade the database server. If we top out the ability of that server when it is completely decked out, then we’ll have to scale horizontally. Personally, I would really love to redesign the database interface based in stored procedures, but until mysql supports them its not possible. Guess we’ll see what the future holds. ----- Rom BOINC Development Team, U.C. Berkeley My Blog |
Jim Baize Send message Joined: 6 May 00 Posts: 758 Credit: 149,536 RAC: 0 |
Rom, I think a lot of us look to you as being all knowing about BOINC partly because you come here and talk to us. We do appreciate your feed back on the projects and your work and dedication. Thank you. Jim > I didn't try to explain it because I haven’t been part of the investigation > team, therefore I don’t have any information on it. > > David has had me focusing in on the client-side part of the software stack, > after we get the next release out the door, he might re-task me, who knows. > > I figure that if we went through another round of optimization with the > server-side stack we could probably get another 100% out of the software, but > it still wouldn’t be enough to handle the load of classic shutting down. So > in any event, it was still logical to upgrade the database server. If we top > out the ability of that server when it is completely decked out, then we’ll > have to scale horizontally. > > Personally, I would really love to redesign the database interface based in > stored procedures, but until mysql supports them its not possible. > > Guess we’ll see what the future holds. > > |
FloridaBear Send message Joined: 28 Mar 02 Posts: 117 Credit: 6,480,773 RAC: 0 |
> Rom, > > I think a lot of us look to you as being all knowing about BOINC partly > because you come here and talk to us. We do appreciate your feed back on the > projects and your work and dedication. > > Thank you. > > Jim Absolutely. I think feedback from the development team is vital to the health of a project like this, and you've certainly done your part. A big thank you from me too :) |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
> Something they can't ID hitting the DB at 5mb/sec? It's either the Borg or > there's a bug in some code they cant find! As MySql is used by many, I doubt > that by itself, it would suddenly start slamming the DB! Just so I understand this: In your distinguished career as a developer, you've never seen a system that behaved in strange and inexplicable ways right up to the moment that you and/or your team finally saw the reason? |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.