Benchmark Estimates Way Wrong

Author	Message
Wesley Johnston Send message Joined: 4 Jun 99 Posts: 44 Credit: 5,494,065 RAC: 0	Message 107845 - Posted: 5 May 2005, 17:47:49 UTC The benchmark for my PC consistently estimates that units will take about 8 1/2 hours, when the reality is that they take just over 5 hours. I'm not sure what this means about the credibility of my statistics (or anyone else's), but it has been wrong about this for many months. So the problem is not just a recent fluke. ID: 107845 ·

jshenry1963 Send message Joined: 17 Nov 04 Posts: 182 Credit: 68,878 RAC: 0	Message 107847 - Posted: 5 May 2005, 17:51:57 UTC oh boy, I can see it also coming... aren't benchmarks used in computing the whetstone,drystone,which also compute credits? ooops, wish I hadn't said that, here come the ..... Thanks, and Keep on crunchin' John Henry KI4JPL Sevierville TN I started with nothing, and I still have some of it left. <img src="http://www.boincstats.com/stats/banner.php?cpid=989478996ebd8eadba8f0809051cdde2"> ID: 107847 ·

ampoliros Volunteer tester Send message Joined: 24 Sep 99 Posts: 152 Credit: 3,542,579 RAC: 5	Message 107849 - Posted: 5 May 2005, 18:00:56 UTC Both my client and application on this computer are fully optimized, but I still get an estimate of 7.5 hours while it only takes just over 2 hours average. >I'm not sure what this means about the credibility of my statistics (or anyone else's), but it has been wrong about this for many months. Always has been wrong. Just take heart in the fact that it's wrong across the board. :) 7,049 S@H Classic Credits ID: 107849 ·

Astro Volunteer tester Send message Joined: 16 Apr 02 Posts: 8026 Credit: 600,015 RAC: 0	Message 107872 - Posted: 5 May 2005, 18:40:39 UTC - in response to Message 107847. Last modified: 5 May 2005, 19:13:06 UTC <blockquote>oh boy, I can see it also coming...</blockquote> are you OK john henry? You are asking a question that requires alot of time to answer. The benchmarking question is complex. Suffice it to say that you might be better off to read about it from an already published source. I'd recommend that you start your search here. This is Paul bucks websited dedicated to explaining Boinc to users. This link will take you to the "B" page, scroll down to Benchmark, and enjoy your reading. does this help? tony ID: 107872 ·

jshenry1963 Send message Joined: 17 Nov 04 Posts: 182 Credit: 68,878 RAC: 0	Message 107883 - Posted: 5 May 2005, 19:09:14 UTC yes, I'm perfectly fine. I was just reading this, and wondering how many remarks will come up or arguments start over this issue. Since one of the variables used in computing credits, is the output of the benchmarks. Therefore all credits are inherently bogus... And some are here ONLY to get credits, I can see the steam rising from someones ears right now, "WHAT? MY CREDITS AREN'T ACCURATE DOWN TO THE MICROPOINT???????" But then again, everyone elses is off too, so does it matter? Sorry, I just thought it was funny. But them I'm twisted sometimes. Thanks, and Keep on crunchin' John Henry KI4JPL Sevierville TN I started with nothing, and I still have some of it left. <img src="http://www.boincstats.com/stats/banner.php?cpid=989478996ebd8eadba8f0809051cdde2"> ID: 107883 ·

Dorsai Send message Joined: 7 Sep 04 Posts: 474 Credit: 4,504,838 RAC: 0	Message 107887 - Posted: 5 May 2005, 19:12:13 UTC My benchmarks give me a Time of 5:15...I take 3:50 (average).. I forgot to bookmark it, but I found a PC here using a 3Ghz Pentium that benchmarks lower than my XP2000 AMD. Needless to say he actually gets the Wu's done in about half the time I do... But his benchmark is lower than mine. I don't care really... I do a WU. I return it, report it, and get credit... But those with PC's that get poor BM's, but crunch fast, result in me getting a low credit (but then they do too)...I claim a credit based on my benchmark. He claims a credit based on a benchmars parsecs away from the real performance of his PC.... The Credit allocation system is "the system". SAH uses it. I have a simple choice. Accept it, or don't. I might think it's not perfect, but it's the best game in town.I am still here. I accept it. It's not perfect, but it's the only system SAH use. Foamy is "Lord and Master". (Oh, + some Classic WUs too.) ID: 107887 ·

Astro Volunteer tester Send message Joined: 16 Apr 02 Posts: 8026 Credit: 600,015 RAC: 0	Message 107892 - Posted: 5 May 2005, 19:18:01 UTC - in response to Message 107883. Last modified: 5 May 2005, 19:21:58 UTC <blockquote>yes, I'm perfectly fine.</blockquote> I knew what you were thinking, then I thought, Since John said it, then they won't give you the satisfaction, by doing it. LOL I think it depends on how the user responds to answers to questions, Not the questions themselves. This user just happened to ask a difficult, multi-faceted question. There is no easy answer to the question. Except to say the difference between actual (CPU time) and Estimated time is normal. ID: 107892 ·

Digger Volunteer tester Send message Joined: 4 Dec 99 Posts: 614 Credit: 21,053 RAC: 0	Message 107894 - Posted: 5 May 2005, 19:23:48 UTC LOL @ Tony... right on the money. Everyone knows the benchmarks are a little whacky, there's no need to instigate another Credit War over it. My machine benchmarks rather poorly, but then it completes a work unit pretty quickly so i tend to claim lower credit. The devs are aware there are issues in the benchmarking, and at some point in the future i'm sure they'll take another look into it. It's just not a high priority right now. :) Dig ID: 107894 ·

Jord Volunteer tester Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3	Message 107896 - Posted: 5 May 2005, 19:26:56 UTC - in response to Message 107883. <blockquote>Since one of the variables used in computing credits, is the output of the benchmarks. Therefore all credits are inherently bogus... And some are here ONLY to get credits, I can see the steam rising from someones ears right now, "WHAT? MY CREDITS AREN'T ACCURATE DOWN TO THE MICROPOINT???????"</blockquote>I pointed this out a while ago to the developers, since I always claim too much credit. Or better said, for the Seti project I claim more credit than I get. All other projects I am attached to I claim credit and get something in the neighborhood of what I claim. It's in the benchmarks, yes. But as David Anderson has said, it needs a complete rewrite of the benchmarks. The present benchmarks are based on the Level 2 cache amount. If you have little L2 cache, the benchmarks run fast, but the amount of memory that can be addressed is low, the way in which this memory is addressed is slow so the unit takes a long time to crunch. "Credit should reflect floating-point operations, not memory accesses, so the lower credit is the correct one. Of course it's irritating for owners of small-cache machines, who feel like they're being cheated. The way to fix this is to change the benchmarks so that they are more like the typical apps, i.e. so that they access large arrays. Does anyone know of a standard benchmark that does this? I don't want to develop one." That's what he added to the developers list about 3 weeks ago. Unless someone can come up with a good way for benchmarking and post that on the developers email list, I doubt it'll be fixed any time soon. It's low priority. ID: 107896 ·

1mp0Â£173 Volunteer tester Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0	Message 107907 - Posted: 5 May 2005, 20:10:43 UTC - in response to Message 107896. <blockquote> It's in the benchmarks, yes. But as David Anderson has said, it needs a complete rewrite of the benchmarks. The present benchmarks are based on the Level 2 cache amount. If you have little L2 cache, the benchmarks run fast, but the amount of memory that can be addressed is low, the way in which this memory is addressed is slow so the unit takes a long time to crunch. "Credit should reflect floating-point operations, not memory accesses, so the lower credit is the correct one. Of course it's irritating for owners of small-cache machines, who feel like they're being cheated. The way to fix this is to change the benchmarks so that they are more like the typical apps, i.e. so that they access large arrays. Does anyone know of a standard benchmark that does this? I don't want to develop one." That's what he added to the developers list about 3 weeks ago. Unless someone can come up with a good way for benchmarking and post that on the developers email list, I doubt it'll be fixed any time soon. It's low priority. </blockquote> No matter what you do, the benchmarks are estimates. Typically, a benchmark is a fairly tight loop so it fits inside even the smallest cache, and that doesn't help. The saving grace is that credit is averaged, and over the long haul, the answers are right. ID: 107907 ·

Paul D. Buck Volunteer tester Send message Joined: 19 Jul 00 Posts: 3898 Credit: 1,158,042 RAC: 0	Message 107931 - Posted: 5 May 2005, 21:13:11 UTC - in response to Message 107847. <blockquote>oh boy, I can see it also coming... aren't benchmarks used in computing the whetstone,drystone,which also compute credits? ooops, wish I hadn't said that, here come the ..... </blockquote> John, There is also a copy of my lecture notes on performance, that talk to benchmarks. It is a little dated, and I do give references, which I have and you are welcome to stop by and read them ... :) The one real and true answer is that there is only one valid measurement of performance of a compuer, and that is to run it doing the work for which it was intended. The problem with BOINC is that we have what can be called (or is called) in benchmarking, a "mixed" workload. For examples, if you look at my data on Average processing time which shows some interesting oddities. If you contrast the performance of SETI@Home and Einstein@Home they do not show the same proportion gain/loss on the same CPUs. For specific, the Xeon 3.4 GHz for SETI@Home is 12 minutes faster than the G5, yet the same two systems for Einstein@Home have a 3 hour difference with the G5 doing significantly better. Of course, you have to also note that the G5 is only doing one WU per physical CPU while the Xeon is doing 2. Yet the Xeon also has 4 times the L2 cache. So, none of this is easy or completely makes obvious sense. ID: 107931 ·

Astro Volunteer tester Send message Joined: 16 Apr 02 Posts: 8026 Credit: 600,015 RAC: 0	Message 107934 - Posted: 5 May 2005, 21:18:01 UTC - in response to Message 107845. <blockquote>The benchmark for my PC consistently estimates that units will take about 8 1/2 hours, when the reality is that they take just over 5 hours.</blockquote> Wesley Johnston, are you sorry you asked?? LOL Your question is very valid. It's just a long winded answer, and not an absolutely clear one at that. ID: 107934 ·

Paul D. Buck Volunteer tester Send message Joined: 19 Jul 00 Posts: 3898 Credit: 1,158,042 RAC: 0	Message 108146 - Posted: 6 May 2005, 11:06:27 UTC - in response to Message 107934. <blockquote> Wesley Johnston, are you sorry you asked?? LOL Your question is very valid. It's just a long winded answer, and not an absolutely clear one at that.</blockquote> Well, I am not sure that the answer is all that unclear. "It depends." There! Now how much more clearly can it be said? ID: 108146 ·

Kajunfisher Volunteer tester Send message Joined: 29 Mar 05 Posts: 1407 Credit: 126,476 RAC: 0	Message 108904 - Posted: 8 May 2005, 13:15:48 UTC Good Morning Paul :-) I'm going to throw another wrench at the system. My stats page on the site states 1546 & 2607... During the outage BOINC 4.30 decided it was time to run benchmarks... Results are this: 955 & 1780 (while posting to another thread) I know you've said in previous posts/threads not to hit the mouse, etc. while it's running to get an accurate benchmark value.... Values will be lower (as they are now) when benchmarking is taking place while another application is in use, correct? The only reason I happen to notice that it was running the benchmark while I was posting was because I happened to have BOINCMgr "up" and displaying the messages (seeing if I had managed to ul/dl yet). If these values are lower would that mean more credit claimed later? I try not to run alot of other applications so it will crunch quicker. As to whether I get that credit or not... I already understand that whole can of worms. If I had "touched nothing" while it was running the benchmark the values would have been larger, but, if I'm using another application while something is crunching then obviously (IMO) it will take longer to crunch no matter what the benchmark values are and also claim less credit, am i right here? (just woke up, on first cup of coffee...) No matter where you go, there you are... ID: 108904 ·

Paul D. Buck Volunteer tester Send message Joined: 19 Jul 00 Posts: 3898 Credit: 1,158,042 RAC: 0	Message 108923 - Posted: 8 May 2005, 14:35:36 UTC - in response to Message 108904. <blockquote>Good Morning Paul :-)</blockquote> Good morning <blockquote> I'm going to throw another wrench at the system. My stats page on the site states 1546 & 2607... During the outage BOINC 4.30 decided it was time to run benchmarks... Results are this: 955 & 1780 (while posting to another thread) I know you've said in previous posts/threads not to hit the mouse, etc. while it's running to get an accurate benchmark value.... </blockquote> That is to prevent events from consuming CPU. Even though the benchmark is run at a normal or higher priority (or it should be) thinks like mouse and keyboard events should be running at the highest priority so that they will get serviced in all cases. Or, how else could you stop the system if it is awry. <blockquote> Values will be lower (as they are now) when benchmarking is taking place while another application is in use, correct? </blockquote> Yes, because there is something else consuming CPU time. <blockquote> The only reason I happen to notice that it was running the benchmark while I was posting was because I happened to have BOINCMgr "up" and displaying the messages (seeing if I had managed to ul/dl yet). If these values are lower would that mean more credit claimed later? I try not to run alot of other applications so it will crunch quicker. As to whether I get that credit or not... I already understand that whole can of worms. </blockquote> I cannot recall and am too brain dead right now to look it up. But lower scores should bean lower credit. This is perhaps my greatest compalint with the system, the fact that there is so much instability in the benchmark runs. I can understand changes in the Least Significant DIgit, but not more than one or two. Yet, if you run benchmarks several times in a row, youi are likely to see changes in the second and third most significant digits. These are two runs on two different computers. My Apple Dual G5, and Xeon 3.4 ... 5/8/2005 7:19:49 AM\|\|Running CPU benchmarks 5/8/2005 7:19:49 AM\|\|Suspending computation and network activity - running CPU benchmarks 5/8/2005 7:20:48 AM\|\|Benchmark results: 5/8/2005 7:20:48 AM\|\| Number of CPUs: 4 5/8/2005 7:20:48 AM\|\| 1499 double precision MIPS (Whetstone) per CPU 5/8/2005 7:20:48 AM\|\| 1823 integer MIPS (Dhrystone) per CPU 5/8/2005 7:20:48 AM\|\|Finished CPU benchmarks 5/8/2005 7:20:48 AM\|\|Resuming computation and network activity 5/8/2005 7:20:57 AM\|\|Running CPU benchmarks 5/8/2005 7:20:57 AM\|\|Suspending computation and network activity - running CPU benchmarks 5/8/2005 7:21:57 AM\|\|Benchmark results: 5/8/2005 7:21:57 AM\|\| Number of CPUs: 4 5/8/2005 7:21:57 AM\|\| 1503 double precision MIPS (Whetstone) per CPU 5/8/2005 7:21:57 AM\|\| 1837 integer MIPS (Dhrystone) per CPU 5/8/2005 7:21:57 AM\|\|Finished CPU benchmarks 5/8/2005 7:21:57 AM\|\|Resuming computation and network activity 5/8/2005 7:22:50 AM\|\|Running CPU benchmarks 5/8/2005 7:22:50 AM\|\|Suspending computation and network activity - running CPU benchmarks 5/8/2005 7:23:49 AM\|\|Benchmark results: 5/8/2005 7:23:49 AM\|\| Number of CPUs: 2 5/8/2005 7:23:49 AM\|\| 1072 double precision MIPS (Whetstone) per CPU 5/8/2005 7:23:49 AM\|\| 2729 integer MIPS (Dhrystone) per CPU 5/8/2005 7:23:49 AM\|\|Finished CPU benchmarks 5/8/2005 7:23:49 AM\|\|Resuming computation and network activity 5/8/2005 7:24:09 AM\|\|Running CPU benchmarks 5/8/2005 7:24:09 AM\|\|Suspending computation and network activity - running CPU benchmarks 5/8/2005 7:25:08 AM\|\|Benchmark results: 5/8/2005 7:25:08 AM\|\| Number of CPUs: 2 5/8/2005 7:25:08 AM\|\| 1071 double precision MIPS (Whetstone) per CPU 5/8/2005 7:25:08 AM\|\| 2751 integer MIPS (Dhrystone) per CPU 5/8/2005 7:25:08 AM\|\|Finished CPU benchmarks 5/8/2005 7:25:09 AM\|\|Resuming computation and network activity Everything else about the system I agree with. Just so we are clear on that. I have no troubles with the concept of using a benchmark to gauge workload or to calculate credit claims. I just have trouble with the instability of the benchmark in this system. During the Beta test when we were beating this up pretty hard I suggested an averaging system to remove the instability, or to use a system where the database collected the numbers on a per CPU type and I was told the combinations would be a killer. Not having access to the database, I cannot say that this is a true condition or not. I know in data I am collecting now I don't see that large of a dataset, or at least I have not to this point. <blockquote> If I had "touched nothing" while it was running the benchmark the values would have been larger, but, if I'm using another application while something is crunching then obviously (IMO) it will take longer to crunch no matter what the benchmark values are and also claim less credit, am i right here? (just woke up, on first cup of coffee...) </blockquote> Yes, using your computer for anything else is an obvious waste of good BOINC Processing time. And all I can say about that is "HOW COULD YOU?" :) Just for the record, the second pair of tests above were of the G5 and the first one I was plkaying musinc through iTunes and the second one I was not. Times to uit my tyiping is goind ID: 108923 ·

Kajunfisher Volunteer tester Send message Joined: 29 Mar 05 Posts: 1407 Credit: 126,476 RAC: 0	Message 108928 - Posted: 8 May 2005, 14:55:10 UTC Thanks Paul :-) I've seen the same thing myself when I've run the benchmarks "back-to-back", the values change... I agree that you about the instability of the benchmark system. But I imagine that that whole issue is real far back on the "back burner". If I ever manage to get this other computer (desktop that was given to me by a friend from work) up & running I will probably try to crunch on it too, I have doubts though... Dell Dimension P166V, 16MB RAM, 1.2GB HD, Win95...it will be a creeper...If nothing else I'll use it to do all my other work while my laptop crunches away. I've been looking really hard at some newer systems...Complete & Bare-Bone... Might even try my hand at a different OS... No matter where you go, there you are... ID: 108928 ·

Paul D. Buck Volunteer tester Send message Joined: 19 Jul 00 Posts: 3898 Credit: 1,158,042 RAC: 0	Message 108937 - Posted: 8 May 2005, 15:19:14 UTC - in response to Message 108928. <blockquote>Thanks Paul :-)</blockquote> You are wselcone <blockquote> I've seen the same thing myself when I've run the benchmarks "back-to-back", the values change... I agree that you about the instability of the benchmark system. But I imagine that that whole issue is real far back on the "back burner". </blockquote> Yes, which is why I don't usually say that stuff, don't want to start the wars over again. But that is the biggest weakness in the current system. Once we get to where we have a reasonbaly stable system across platforms I think this will move back up as one of the more interesting probplems. Dr. Anderson asked a question about the benchmarks and how to improve themand I found one that has been run on the three target platforms and isavaible on source. <blockquote> If I ever manage to get this other computer (desktop that was given to me by a friend from work) up & running I will probably try to crunch on it too, I have doubts though... Dell Dimension P166V, 16MB RAM, 1.2GB HD, Win95...it will be a creeper...If nothing else I'll use it to do all my other work while my laptop crunches away. I've been looking really hard at some newer systems...Complete & Bare-Bone... Might even try my hand at a different OS...</blockquote> Me too ... ID: 108937 ·

Ned Slider Send message Joined: 12 Oct 01 Posts: 668 Credit: 4,375,315 RAC: 0	Message 109099 - Posted: 8 May 2005, 21:19:17 UTC Last modified: 8 May 2005, 21:22:21 UTC Claimed credit is a function of Whetstone and Dhrystone: claimed credit = ([whetstone]+[dhrystone])/1000 * 100 / (2 * secs_per_day) * wu_cpu_time whereas estimated cpu time is purely a function of Whetstone: estimated time (Sec) = 27,900,000/Whetstone See here: http://www.pperry.f2s.com/boinc-credit.htm This got me thinking - it would be very easy for me to produce corrected clients that estimate cpu time more accurately but still claim the same amount of credit. We may have a new improved boinc client for linux coming out soon (found some more optimizations) so I may correct this in my next (Linux) release. For me, cpu time is consistantly out by about the same factor on 3 very different speed machines. All are AMD though, so I'm not sure how accurately my corrections would translate to Intel processors, but it's got to be a LOT better than it is at the moment. What do you all think? Ned * My Guide to Compiling Optimised BOINC and SETI Clients * * Download Optimised BOINC and SETI Clients for Linux Here * ID: 109099 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13752 Credit: 208,696,464 RAC: 304	Message 109114 - Posted: 8 May 2005, 22:19:32 UTC - in response to Message 109099. <blockquote>For me, cpu time is consistantly out by about the same factor on 3 very different speed machines. All are AMD though, so I'm not sure how accurately my corrections would translate to Intel processors, but it's got to be a LOT better than it is at the moment.</blockquote> Even the official Dhrystone & Whetstone benchmarks are out when it comes to actual performance. Dhrystones are quite close, but the Whetstones are out by a massive amount. Dhrystones Whetstones As you can see, Dhrystones are quite close to performance in actual applications, but the Whetstones would have to be one of the most meaningless indicators ever. Grant Darwin NT ID: 109114 ·

Jord Volunteer tester Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3	Message 109421 - Posted: 9 May 2005, 20:58:51 UTC I've just reinstalled CC4.37, now the scheduler problems should be over. The first benchmarks it ran were definitely too low. I then ran 3 new benchmarks in a row, all the same way, press benchmark and sit with arms crossed waiting and looking. ;) Automatic: 09/05/2005 22:42:28\|\|Running CPU benchmarks 09/05/2005 22:43:25\|\|Benchmark results: 09/05/2005 22:43:25\|\| Number of CPUs: 1 09/05/2005 22:43:25\|\| 935 double precision MIPS (Whetstone) per CPU 09/05/2005 22:43:25\|\| 1903 integer MIPS (Dhrystone) per CPU 09/05/2005 22:43:25\|\|Finished CPU benchmarks Manual 1: 09/05/2005 22:44:20\|\|Running CPU benchmarks 09/05/2005 22:45:17\|\|Benchmark results: 09/05/2005 22:45:17\|\| Number of CPUs: 1 09/05/2005 22:45:17\|\| 1158 double precision MIPS (Whetstone) per CPU 09/05/2005 22:45:17\|\| 2288 integer MIPS (Dhrystone) per CPU 09/05/2005 22:45:17\|\|Finished CPU benchmarks Manual 2: 09/05/2005 22:53:29\|\|Running CPU benchmarks 09/05/2005 22:54:27\|\|Benchmark results: 09/05/2005 22:54:27\|\| Number of CPUs: 1 09/05/2005 22:54:27\|\| 1172 double precision MIPS (Whetstone) per CPU 09/05/2005 22:54:27\|\| 2440 integer MIPS (Dhrystone) per CPU 09/05/2005 22:54:27\|\|Finished CPU benchmarks Manual 3: 09/05/2005 22:55:22\|\|Running CPU benchmarks 09/05/2005 22:56:19\|\|Benchmark results: 09/05/2005 22:56:19\|\| Number of CPUs: 1 09/05/2005 22:56:19\|\| 1168 double precision MIPS (Whetstone) per CPU 09/05/2005 22:56:19\|\| 2443 integer MIPS (Dhrystone) per CPU 09/05/2005 22:56:19\|\|Finished CPU benchmarks If "claimed credit = ([whetstone]+[dhrystone])/1000 * 100 / (2 * secs_per_day) * wu_cpu_time" then the first benchmark it did will ask way off-beat credit compared to the manual runs. ID: 109421 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.