Message boards :
Technical News :
Feature Rich (Jan 14 2008)
Message board moderation
Author | Message |
---|---|
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
Things ran quite well over the weekend. Looks like we added the right index to the mysql database to reduce the slow "validator fix" queries. A note about general BOINC/mysql implementation/design: there are a lot of features in BOINC that are seemingly excessive from a single-project perspetive, but are there as every project has different needs. Project-specific factors (server power, workunit processing times, number of active users, min quorum, etc.) make some features less helpful. In the case of "resend lost workunits" (see last thread) this feature, implemented mostly for the benefit of Einstein@home, was most definitely weighing down our database server. We turned this off and have been running smoothly since. There were assumptions this would lead to greater problems down the line (fearing many results will be sitting on disk longer waiting for their redundant pairing to return) but in fact our "results returned and waiting for validation" number has been stable (if not slowly decreasing) since I made the change. Nevertheless, at some point soon we will see if we could optimize/reimplement this code, and Eric is actually making adjustments to the splitter which will perhaps create less "fast runners." Our new-hardware-to-obtain priorities are shifting. Namely, we need a router (we're not ignoring discussion about this on other threads but we are limited to what we can use for various configuration/policy reasons). We also need a new KVM - our current one in the closet is maxed out and we'd like to get more stuff in the there ASAP. We also need three new desktop systems. Dan's using an old, sloooow solaris system which is out of support. Bob is on a slightly faster solaris system, but needs a safe mysql test sandbox. Josh's old super-cheap windows/intel box is basically a glorified console server. Had some minor issues due to the root drive on bruno filling up on Sunday. I scanned the drive and found only 4GB of stuff, while "df" was showing 40GB. Eric eventually found a deleted-yet-open file - an infinitely growing httpd log. Apparently httpd log rotation broke at some point, but we cleaned this up. Annoying, but harmless. Due to increased load in general, I changed the server db stats to update every hour (instead of half hour). Actually it's becoming clearer as we increase active user load and I'm populating credited_job, etc. that the mysql database might be our bottleneck du jour any jour now. There were also some issues with the user-of-the-day selection process which I tracked down and fixed this morning. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
Dr. C.E.T.I. Send message Joined: 29 Feb 00 Posts: 16019 Credit: 794,685 RAC: 0 |
Thanks for the Post Matt - good job being done by All others @ Berkeley too . . . < best of luck with the router prob. - hopefully it's a fix in the nearby future BOINC Wiki . . . Science Status Page . . . |
Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0 |
In the case of "resend lost workunits" (see last thread) this feature, implemented mostly for the benefit of Einstein@home, was most definitely weighing down our database server. We turned this off and have been running smoothly since. There were assumptions this would lead to greater problems down the line (fearing many results will be sitting on disk longer waiting for their redundant pairing to return) but in fact our "results returned and waiting for validation" number has been stable (if not slowly decreasing) since I made the change. Nevertheless, at some point soon we will see if we could optimize/reimplement this code, and Eric is actually making adjustments to the splitter which will perhaps create less "fast runners." I assume you missed my "complaint" about a problem with 22fe07ah? I had 23 results from that dataset that all 23 completed in under 90 seconds (3.7ish seconds a piece). The reduction in the stat you're referencing would be greatly influenced by something such as this, as they were all "fast runners" anyway (they already had short deadlines), and thus probably bumped people with larger caches into "High Priority" / "Earliest Deadline First", so you'd get a whole slew of results coming back in pretty quickly, thus dropping that figure and giving a transient drop in the turnaround times. Also, the resends only matter if someone has had a problem and blown away their local data files, while the server still knows that they are supposed to have a result, either that or there were blips in the download process like what was experienced back in May/June '07 here where the server tagged a host as having a task, but the download server never sent it to the host. IOW, the stability and/or drop in that stat may not mean anything at all in the grand scheme of things... Way too early to tell...IMO. I decided to grab some more work on my AMD today and out of 16, 5 of them are guaranteed "fast runners" (short deadline). Beyond that, who knows how many may be noisy and overflow? It could be 0, or it could be 5-10 more... (I've finished 1 out of the set so far).
The 3800 series would be good, but you may want the 7200 series... I wouldn't stay within the 2800 series, but politics may come into play...
The professor at school said that "nobody" would run an enterprise operation on anything but either SQL Server 2005, Oracle, DB2, or equivalent...and that mySQL would be "Mom and Pop" types... I don't know enough to debate it, but perhaps you could do some rebuttal? :-) |
Neil Blaikie Send message Joined: 17 May 99 Posts: 143 Credit: 6,652,341 RAC: 0 |
Since getting new work in the early hours of today, I have returned 44 results all with completion times of just over an hour with deadlines on January 22nd, out of 165 tasks still to process a lot of them are all going to finish in just over an hour on my machine. Looking briefly at some of them as well, my wingman on some of them will not complete before the 22nd January deadline. A little annoying on a fast machine that is on 24/7 running BOINC that it is not sending work that will at least tax the dual processors a bit. Not that I really care much about RAC but that is dropping like a stone as well with so many fast results. Anyways the results are all from 12ja07ad 8995 and 12ja07af 8995 if anyone cares. Nite nite from Montreal, off to get some shut eye and then be probed by a doctor during a medical exam tomorrow. Maybe if aliens came and took me away tonight they could save me the hassle of driving so damn far to the docs and beam the results to my doc. |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
What kind of router would be a suitable replacement? Anything particular in mind that would be "approved" by campus? |
Tony Li Send message Joined: 21 May 01 Posts: 6 Credit: 1,337,747 RAC: 0 |
What kind of router would be a suitable replacement? Anything particular in mind that would be "approved" by campus? Indeed. Please let me know what could work and let me see what I can do. |
seti@elrcastor.com Send message Joined: 30 Jan 00 Posts: 35 Credit: 4,879,559 RAC: 0 |
What kind of router would be a suitable replacement? Anything particular in mind that would be "approved" by campus? A Cisco 3825 would probably be a step in the right direction |
Dr. C.E.T.I. Send message Joined: 29 Feb 00 Posts: 16019 Credit: 794,685 RAC: 0 |
What kind of router would be a suitable replacement? Anything particular in mind that would be "approved" by campus? > best price: CISCO 3825 INTEGRATED SERVICE ROUTER W/AC PWR (MPN: CISCO3825) $5,518.00 - No Tax - Free Shipping from Corporate Computer Solutions right? BOINC Wiki . . . Science Status Page . . . |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
What kind of router would be a suitable replacement? Anything particular in mind that would be "approved" by campus? But is that "campus approved"? |
seti@elrcastor.com Send message Joined: 30 Jan 00 Posts: 35 Credit: 4,879,559 RAC: 0 |
probably, if they liked the 2811 they should defenitly like the 3825 |
Jesse Viviano Send message Joined: 27 Feb 00 Posts: 100 Credit: 3,949,583 RAC: 0 |
Cisco's 3800 series model comparison shows that the 3800 series of routers would be inadequate for someone who needs at least 100Mbps of routing speed, because the 3825 maxes out at 1/2 the speed of a T3(22.5Mbps), and the 3845 maxes out at the speed of a full T3 (45Mbps). I think that you may want to investigate the 7200 series of routers. The 7600 series of routers seem to be overkill for your application, unless you have plans to get Internet speeds beyond OC-48. |
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
For the record: 1. We have a 7301 on the other end of the tunnel. It's handling the load just fine (average 40% cpu load, compared to 90% of the 2811). So the 7301 may seem like overkill but we plan on eventually doubling (tripling? quadrupling?) our bandwidth capabilities at some point. 2. Part of the consideration is that in the closet we have a 2811, a switch for machines in the closet going into the 2811, and a switch for inter-closet traffic with an uplink to the campus LAN. One hefty unit can combine 2 of these functions, if not all three (needs at lest 36 ports, though, if not 48). 3. Just so we're clear campus isn't the entity holding us back in our selection process. They have suggestions which brandname/specs of hardware to use in any given situation, but so far has been quite willing to work with whatever we come up with. In some cases they are more strict about what to do and what to use, but we tend to agree with them. 4. If there's any politics, policy, etc. it has to do with stuff I'm less comfortable about discussing publicly - namely our various benefactors who helped us in the past and may perhaps again in the future. I (actually all of us) have been bogged down with other fires lately, but we'll address this issue at some point, and will let you know exactly what we want if we need help. Until then, I vastly appreciate the informative comments. I have zero time to do any research. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0 |
For the record: Yes, always a balancing act between too much and too little info... :-) I was leaning more towards the 7200 series with a possible "overkill" of the 7600 series, but since you're already talking those classes, you might want to go ahead and go for 7600s and/or Catalysts... |
whawn Send message Joined: 11 Apr 00 Posts: 18 Credit: 1,053,191 RAC: 2 |
The professor at school said that "nobody" would run an enterprise operation on anything but either SQL Server 2005, Oracle, DB2, or equivalent...and that mySQL would be "Mom and Pop" types... I don't know enough to debate it, but perhaps you could do some rebuttal? :-) Well, I dunno if Matt will want to, but Sun Microsystems Will probably have a little to say on the subject, since it looks like that company is buying mySQL: press release |
Dr. C.E.T.I. Send message Joined: 29 Feb 00 Posts: 16019 Credit: 794,685 RAC: 0 |
The professor at school said that "nobody" would run an enterprise operation on anything but either SQL Server 2005, Oracle, DB2, or equivalent...and that mySQL would be "Mom and Pop" types... I don't know enough to debate it, but perhaps you could do some rebuttal? :-) full text: Sun Microsystems Announces Agreement to Acquire MySQL BOINC Wiki . . . Science Status Page . . . |
Jan Schotsmans Send message Joined: 27 Oct 00 Posts: 98 Credit: 92,693 RAC: 0 |
The professor at school said that "nobody" would run an enterprise operation on anything but either SQL Server 2005, Oracle, DB2, or equivalent...and that mySQL would be "Mom and Pop" types... I don't know enough to debate it, but perhaps you could do some rebuttal? :-) Considering Google, Nokia, Facebook and alot of other big names use MySQL, I think your professor is full of the brown stuff and talking out of his old outdated behind. |
Wedge009 Send message Joined: 3 Apr 99 Posts: 451 Credit: 431,396,357 RAC: 553 |
In the case of "resend lost workunits" (see last thread) this feature, implemented mostly for the benefit of Einstein@home, was most definitely weighing down our database server. We turned this off and have been running smoothly since... Nevertheless, at some point soon we will see if we could optimize/reimplement this code... I realise the whole team is always under a lot of stress and heavy workloads, but I was just wondering if this feature could be switched back on? Or is the database server still in a tricky state? I found this feature really useful back when it was still active, and right now I have eight 'lost' workunits which are otherwise going to end with a "No Reply"... :( It's not a tragedy if you can't switch it back on right now, but I would appreciate it if you would, even for a moment, consider the possibility. Thanks. Soli Deo Gloria |
Andy Lee Robinson Send message Joined: 8 Dec 05 Posts: 630 Credit: 59,973,836 RAC: 0 |
The professor at school said that "nobody" would run an enterprise operation on anything but either SQL Server 2005, Oracle, DB2, or equivalent...and that mySQL would be "Mom and Pop" types... I don't know enough to debate it, but perhaps you could do some rebuttal? :-) He is wrong. While good for "Mom and Pop" types, it is also good for industrial enterprise operations and is more accessible than MSSQL etc. There are a large number of programmers familiar with it so driving down development costs and making it more attractive. However, real MySQL gurus are still worth their weight in platinum... I am CTO of a company developing social networking applications, and set up a mysql cluster of multiple masters dealing with thousands of queries a second. The company's future is staked on it, and like Facebook et al, it is hardly a Mom and Pop operation! |
Yellow Horror Send message Joined: 10 Jun 03 Posts: 3 Credit: 10,157,045 RAC: 7 |
I was just wondering if this feature could be switched back on? Or is the database server still in a tricky state? I found this feature really useful back when it was still active, and right now I have eight 'lost' workunits which are otherwise going to end with a "No Reply"... :(+1. I've returned to the SETI@home project after a long gap and, being unfamiliar with the new interface and too curious, press "Reset Project" button before the pop-up warning about losing my workunits appears. Then i press "Yes" in the plain Yes/No dialogue (why in the world there is no BIG RED WARNING symbol it this dialogue?!) and whoa - all my 26 downloaded tasks have disappeared in the thin Æther and therefore are stuck on the server awaiting their deadlines. I know well that is not a tragedy to the project nor to me. But why don't just allow me to correct the subsequences of my inadvertence somehow? |
Dr. C.E.T.I. Send message Joined: 29 Feb 00 Posts: 16019 Credit: 794,685 RAC: 0 |
. . . Here's a Prayer for Matt (he's been out sick) and the Hope that he's Better Today BOINC Wiki . . . Science Status Page . . . |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.