GPU FLOPS: Theory vs Reality

Author	Message
Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13751 Credit: 208,696,464 RAC: 304	Message 1806606 - Posted: 2 Aug 2016, 10:49:32 UTC - in response to Message 1806602. Validation pendings aren't valid, so they're of no use when it comes to determining how well a GPU is performing. There's a 99+% chance that my "Validation pending" will become "Valid". Yours, yes. Many others, no. Obviously for hosts that have many "invalids" my suggested task tally approach would not be recommended. Exactly. IBM's World Community Grid already uses 3 metrics and one of them is raw task count...so I don't think a variant that doesn't give much weight to shorties) should be dismissed outright. Seti used to use raw task count, and that's why the Cobblestone & Credit was developed, due to the rampant cheating of people running the shortest possible WUs over & over again to get their count up. Shame they didn't return anything of use. Rewarding people for doing the minimal amount of work (such as only shorties), when all the work needs to be done (including Guppies) is not in the project's best interest. I've said it before & i'll say it again- Seti is a scientific project, valid results are what is important. If you want Credit, there are other projects you can do; or you can help fix Credit New, or help with the development of new applications. Valid work is the only useful indicator of work done. If it's not valid then it's of no use to the project. Grant Darwin NT ID: 1806606 ·

Stubbles Volunteer tester Send message Joined: 29 Nov 99 Posts: 358 Credit: 5,909,255 RAC: 0	Message 1806718 - Posted: 3 Aug 2016, 3:31:37 UTC - in response to Message 1806606. Valid work is the only useful indicator of work done. If it's not valid then it's of no use to the project. For hosts that consistently return valid work, I think a tally of returned tasks per day (rather than anything related to credit) should be a better metric when doing small changes (as long as shorties are not given anywhere close to as much weight as other tasks). ID: 1806718 ·

Shaggie76 Send message Joined: 9 Oct 09 Posts: 282 Credit: 271,858,118 RAC: 196	Message 1807174 - Posted: 5 Aug 2016, 0:08:23 UTC Today I tried some hacks to try to automatically identify how many concurrent tasks were being executed (I used the "Time reported" field with the run-time fields to identify active task intervals and checked them for overlaps with other tasks. Tragically this doesn't work because the time-stamp is on upload not on completion; even for computers that upload "immediately" there's a 5 minute cool-down and you can get a bunch of shorties in that time that look concurrent. I could try to ignore shorties but it would still break for people who only allow network usage on a schedule. ID: 1807174 ·

b101uk Volunteer tester Send message Joined: 11 Jun 01 Posts: 37 Credit: 282,931 RAC: 0	Message 1807229 - Posted: 5 Aug 2016, 5:09:00 UTC if its any use with my stock ASUS GTX 1070 FE, GUP-Z is reporting average GPU TDP power consumption for a >2h period across a mixed bunch of cuda42/50 & openCL SoG/sah workunits is ~45% TDP with trends as low as 43% TDP and as high as 47%. TDP the peek TDP reported by GPU-Z was 71.3%, the lowest 23.3% the ASUS GTX 1070 FE TDP = 150W, so ~45% TDP/hour = ~66W/hour just for the GPU btw, running one WU at a time on the GPU and no WU on the CPU ID: 1807229 ·

Shaggie76 Send message Joined: 9 Oct 09 Posts: 282 Credit: 271,858,118 RAC: 196	Message 1807257 - Posted: 5 Aug 2016, 11:10:39 UTC What did it say your load was? My experience with CUDA/1 WU was it would use ~30% of my 980 Ti which is probably correlated with the lower power consumption. I found that with a fresh stock install I got a few CUDA tasks at first but they stopped coming and my RAC, power, and gpu-utilization went up. ID: 1807257 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1807260 - Posted: 5 Aug 2016, 12:36:05 UTC - in response to Message 1807174. Last modified: 5 Aug 2016, 12:57:23 UTC Today I tried some hacks to try to automatically identify how many concurrent tasks were being executed (I used the "Time reported" field with the run-time fields to identify active task intervals and checked them for overlaps with other tasks. Tragically this doesn't work because the time-stamp is on upload not on completion; even for computers that upload "immediately" there's a 5 minute cool-down and you can get a bunch of shorties in that time that look concurrent. I could try to ignore shorties but it would still break for people who only allow network usage on a schedule. one way that might work (with enough numbers for a gpu over many hosts), is to filter for Arecibo shorties (known fixed #operations re splitter code), plot #operations divided by (peak_flops x elapsed time) [per app version probably needed, e.g. Cuda50]. This will give a per instance 'compute efficiency' value, typically around 5% for single instance. Double instance should be a bit over half that figure (3-4% per instance with wider variance), 3 instances lower and wider again (say 1.5-3%). Y Axis = compute efficiency, X Axis would be a Proportion of tasks (population bins) (Example giving ~5% total efficiency in the single instance case, 6-8% 2 instances, 4.5-9% ---> growing variance per host) Probably due to differing systems/cpus feeding them, all plotted at once may look like an amorphous blob, but overlaying a few known host values on the chart might see if there is a compute efficiency relationship reflected in the mass data (clear groupings of widening variance & lowering peaks by number of instances), or its just buried in system and system usage variation, too overlapped to make use of. [Edit:] *could work for CPU too, though you'd need to factor in Boinc Whetstone being a fraction of what an AVX or SSE application can achieve, so efficiencies would falsely show more than 50-100%+ in some cases. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1807260 ·

b101uk Volunteer tester Send message Joined: 11 Jun 01 Posts: 37 Credit: 282,931 RAC: 0	Message 1807263 - Posted: 5 Aug 2016, 13:01:55 UTC the GPU load was: for Cuda 42/50 >40% for most WU, with some that would stay at >85%. some of the cuda 42/50 WU though they would stat at >40% for 2/3 of their length would go up to 55% to 65% GPU load, while the ones that were at >85% for 2/3 of their length may drop down to <45% at their latter stages, there were other WU types that had regular spaced changes e.g. from ~45% to ~90% for 15-20sec then back to ~45% 60sec for SoG/sah GPU load was mostly >90%, some would stay at 85% while other would stay as high as 99% GPU VDDC, GPU core clock and GUP memory clock were at constant levels, while GPU temperature only had a flux of +-1C regardless of the above, giving some indication that energy being consumed was relatively constant as % TDP would suggest. ID: 1807263 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1807265 - Posted: 5 Aug 2016, 13:07:34 UTC - in response to Message 1807263. With Cuda50 on a 1070, I would probably put no less than 3 instances. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1807265 ·

Shaggie76 Send message Joined: 9 Oct 09 Posts: 282 Credit: 271,858,118 RAC: 196	Message 1807267 - Posted: 5 Aug 2016, 13:07:42 UTC I'd considered looking at it in a similar way -- we actually have a nearly identical statistical problem at work where were have multi-modal distribution in harmonic relation and we want to figure out the base frequency. Sadly the last stats course I took was about 20 years ago and my google-fu is too weak for me to steal a smarter person's math. Combining the two techniques might work -- schedule-based evidence + proximity to predicted harmonic to build confidence. ID: 1807267 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1807269 - Posted: 5 Aug 2016, 13:13:15 UTC - in response to Message 1807267. I'd considered looking at it in a similar way -- we actually have a nearly identical statistical problem at work where were have multi-modal distribution in harmonic relation and we want to figure out the base frequency. Sadly the last stats course I took was about 20 years ago and my google-fu is too weak for me to steal a smarter person's math. Combining the two techniques might work -- schedule-based evidence + proximity to predicted harmonic to build confidence. lol, similar situation. 20 years ago I went hard into the statistics, loved it and did really well, but burned out on it after a couple of years. Now it's a bit of a case where I need to revive some of those skills for other purposes, but the neurons don't fire that well in that region :). Probably once application development settles down I will crack out Matlab or similar, and some stats books again. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1807269 ·

Shaggie76 Send message Joined: 9 Oct 09 Posts: 282 Credit: 271,858,118 RAC: 196	Message 1807270 - Posted: 5 Aug 2016, 13:16:47 UTC - in response to Message 1807265. With Cuda50 on a 1070, I would probably put no less than 3 instances. But he's running stock -- so eventually it'll decide that OpenCL SoG is faster for him and they should go away. ID: 1807270 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1807271 - Posted: 5 Aug 2016, 13:22:03 UTC - in response to Message 1807270. With Cuda50 on a 1070, I would probably put no less than 3 instances. But he's running stock -- so eventually it'll decide that OpenCL SoG is faster for him and they should go away. true. Not sure how/if the server would factor in if he set app_confg for 3 instances of Cuda50. probably doesn't. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1807271 ·

b101uk Volunteer tester Send message Joined: 11 Jun 01 Posts: 37 Credit: 282,931 RAC: 0	Message 1807279 - Posted: 5 Aug 2016, 13:59:03 UTC yes I am running stock, though I am using some setting in the mb_cmdline.txt and mb_cmdline.txt, along with a app_config.xml to set the CPU to 1. i.e. I am not trying to force use of cuda42/50 or SoG/sah by exclusion from an app_info I did briefly try 2 cuda50 WU being ran at once, once I got two files to 20% completed using suspend task, so I could be sure how long they would take to complete, however the time change to complete both at once was only just shorter than the combined time of doing them individually. so while 3 cuda42 or 50 may be good on GPU which only reach <~33% utilisation per WU, it differs when you have WU that are often at >40% or more utilisation, even the memory controller spends most of its time >=36% ID: 1807279 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13751 Credit: 208,696,464 RAC: 304	Message 1807374 - Posted: 5 Aug 2016, 21:53:06 UTC - in response to Message 1807271. With Cuda50 on a 1070, I would probably put no less than 3 instances. But he's running stock -- so eventually it'll decide that OpenCL SoG is faster for him and they should go away. true. Not sure how/if the server would factor in if he set app_confg for 3 instances of Cuda50. probably doesn't. Running multiple instances results in a lower APR (even if you are actually doing more WUs per hour). Once your APR drops down below that for other applications the manager will start running the other applications again until it decides which is giving the best throughput. As I found on Beta, the available work mix can result in the slower application being picked over the faster one. Grant Darwin NT ID: 1807374 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13751 Credit: 208,696,464 RAC: 304	Message 1807376 - Posted: 5 Aug 2016, 21:57:05 UTC - in response to Message 1807265. With Cuda50 on a 1070, I would probably put no less than 3 instances. I've found 3 instances was best for my GTX750Tis, although with Guppies it's probably closer to 2. I suspect my GTX 1070 would be best with 4, however since I've got a GTX 750Ti running with it i'm limited to running 3 at a time. The gain of running 4 on the GTX 1070 would be offset by the larger loss of running 4 on the GTX 7509Ti. Grant Darwin NT ID: 1807376 ·

Al Send message Joined: 3 Apr 99 Posts: 1682 Credit: 477,343,364 RAC: 482	Message 1807380 - Posted: 5 Aug 2016, 22:02:47 UTC - in response to Message 1807376. Think there'll be a time in the not too distant future where it will be granular enough to get it down to specific instructions for specific cards? That would be a neat trick. ID: 1807380 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13751 Credit: 208,696,464 RAC: 304	Message 1807391 - Posted: 5 Aug 2016, 22:14:44 UTC - in response to Message 1807380. Think there'll be a time in the not too distant future where it will be granular enough to get it down to specific instructions for specific cards? That would be a neat trick. From what I've seen it can be done now, but it's no where near as easy as <gpu_usage>0.25</gpu_usage> <cpu_usage>1.00</cpu_usage> in app_config.xml to do. Grant Darwin NT ID: 1807391 ·

Al Send message Joined: 3 Apr 99 Posts: 1682 Credit: 477,343,364 RAC: 482	Message 1807400 - Posted: 5 Aug 2016, 22:44:49 UTC - in response to Message 1807391. Well, yeah, that's kinda what I meant, something that us mere mortals can do, and the app_config/info is about as deep as I feel comfortable going. Having a gasp graphical front end would be almost a dream come true, but hey, I'm just happy with all the great work that these volunteers are putting in to making it work with the new hardware and software that seems to be dropping almost weekly nowadays. Oh for the good 'ol days of XP, when it was on it's way to being a 6-7 year old OS before it had serious competition. Talk about a solid platform to develop for, I bet everyone got things optimized to a gnats arse with that stability. Change is good, to a point, if it is truly for the better, but that doesn't seem to be the case very often any more. Well, thanks again for all of you who are doing things that I can't even hope to achieve, you are greatly appreciated for all you hard work and dedication!!! ID: 1807400 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1807403 - Posted: 5 Aug 2016, 23:02:12 UTC - in response to Message 1807400. Last modified: 5 Aug 2016, 23:06:14 UTC In essence, the mix of different solutions, both that work and running through alpha, is pointing to the need for some flexibility. So I'd say it will become what you're calling 'more granular', but more along the lines of automatic dispatch like the Stock CPU app does for different instruction sets, as opposed many applications and complicated options. [i.e. more sophisticated, but simpler to use] "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1807403 ·

BilBg Volunteer tester Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0	Message 1807434 - Posted: 6 Aug 2016, 1:36:36 UTC - in response to Message 1806239. I was doing SETI@home before BOINC many years ago under my starman2020 user name, but the accounts are not connected today and I do not have access to that yahoo email anymore. All the 3 facts I underscored show that you CAN connect accounts. While you are logged-in your current account: - go to this page and paste your old "yahoo email": http://setiathome.berkeley.edu/sah_classic_link.php Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â ID: 1807434 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.