GPU FLOPS: Theory vs Reality

Message boards : Number crunching : GPU FLOPS: Theory vs Reality
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 17 · Next

AuthorMessage
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13851
Credit: 208,696,464
RAC: 304
Australia
Message 1806606 - Posted: 2 Aug 2016, 10:49:32 UTC - in response to Message 1806602.  

Validation pendings aren't valid, so they're of no use when it comes to determining how well a GPU is performing.

There's a 99+% chance that my "Validation pending" will become "Valid".

Yours, yes. Many others, no.

Obviously for hosts that have many "invalids" my suggested task tally approach would not be recommended.

Exactly.

IBM's World Community Grid already uses 3 metrics and one of them is raw task count...so I don't think a variant that doesn't give much weight to shorties) should be dismissed outright.

Seti used to use raw task count, and that's why the Cobblestone & Credit was developed, due to the rampant cheating of people running the shortest possible WUs over & over again to get their count up.
Shame they didn't return anything of use.
Rewarding people for doing the minimal amount of work (such as only shorties), when all the work needs to be done (including Guppies) is not in the project's best interest.
I've said it before & i'll say it again- Seti is a scientific project, valid results are what is important.
If you want Credit, there are other projects you can do; or you can help fix Credit New, or help with the development of new applications.


Valid work is the only useful indicator of work done. If it's not valid then it's of no use to the project.
Grant
Darwin NT
ID: 1806606 · Report as offensive
Profile Stubbles
Volunteer tester
Avatar

Send message
Joined: 29 Nov 99
Posts: 358
Credit: 5,909,255
RAC: 0
Canada
Message 1806718 - Posted: 3 Aug 2016, 3:31:37 UTC - in response to Message 1806606.  

Valid work is the only useful indicator of work done. If it's not valid then it's of no use to the project.

For hosts that consistently return valid work, I think a tally of returned tasks per day (rather than anything related to credit) should be a better metric when doing small changes (as long as shorties are not given anywhere close to as much weight as other tasks).
ID: 1806718 · Report as offensive
Profile Shaggie76
Avatar

Send message
Joined: 9 Oct 09
Posts: 282
Credit: 271,858,118
RAC: 196
Canada
Message 1807174 - Posted: 5 Aug 2016, 0:08:23 UTC

Today I tried some hacks to try to automatically identify how many concurrent tasks were being executed (I used the "Time reported" field with the run-time fields to identify active task intervals and checked them for overlaps with other tasks.

Tragically this doesn't work because the time-stamp is on upload not on completion; even for computers that upload "immediately" there's a 5 minute cool-down and you can get a bunch of shorties in that time that look concurrent. I could try to ignore shorties but it would still break for people who only allow network usage on a schedule.
ID: 1807174 · Report as offensive
b101uk
Volunteer tester

Send message
Joined: 11 Jun 01
Posts: 37
Credit: 282,931
RAC: 0
United Kingdom
Message 1807229 - Posted: 5 Aug 2016, 5:09:00 UTC

if its any use with my stock ASUS GTX 1070 FE, GUP-Z is reporting average GPU TDP power consumption for a >2h period across a mixed bunch of cuda42/50 & openCL SoG/sah workunits is ~45% TDP with trends as low as 43% TDP and as high as 47%. TDP

the peek TDP reported by GPU-Z was 71.3%, the lowest 23.3%

the ASUS GTX 1070 FE TDP = 150W, so ~45% TDP/hour = ~66W/hour just for the GPU


btw, running one WU at a time on the GPU and no WU on the CPU
ID: 1807229 · Report as offensive
Profile Shaggie76
Avatar

Send message
Joined: 9 Oct 09
Posts: 282
Credit: 271,858,118
RAC: 196
Canada
Message 1807257 - Posted: 5 Aug 2016, 11:10:39 UTC

What did it say your load was? My experience with CUDA/1 WU was it would use ~30% of my 980 Ti which is probably correlated with the lower power consumption.

I found that with a fresh stock install I got a few CUDA tasks at first but they stopped coming and my RAC, power, and gpu-utilization went up.
ID: 1807257 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1807260 - Posted: 5 Aug 2016, 12:36:05 UTC - in response to Message 1807174.  
Last modified: 5 Aug 2016, 12:57:23 UTC

Today I tried some hacks to try to automatically identify how many concurrent tasks were being executed (I used the "Time reported" field with the run-time fields to identify active task intervals and checked them for overlaps with other tasks.

Tragically this doesn't work because the time-stamp is on upload not on completion; even for computers that upload "immediately" there's a 5 minute cool-down and you can get a bunch of shorties in that time that look concurrent. I could try to ignore shorties but it would still break for people who only allow network usage on a schedule.


one way that *might* work (with enough numbers for a gpu over many hosts), is to filter for Arecibo shorties (known fixed #operations re splitter code), plot #operations divided by (peak_flops x elapsed time) [per app version probably needed, e.g. Cuda50]. This will give a per instance 'compute efficiency' value, typically around 5% for single instance. Double instance should be a bit over half that figure (3-4% per instance with wider variance), 3 instances lower and wider again (say 1.5-3%). Y Axis = compute efficiency, X Axis would be a Proportion of tasks (population bins)

(Example giving ~5% total efficiency in the single instance case, 6-8% 2 instances, 4.5-9% ---> growing variance per host)

Probably due to differing systems/cpus feeding them, all plotted at once may look like an amorphous blob, but overlaying a few known host values on the chart might see if there is a compute efficiency relationship reflected in the mass data (clear groupings of widening variance & lowering peaks by number of instances), or its just buried in system and system usage variation, too overlapped to make use of.

[Edit:] *could work for CPU too, though you'd need to factor in Boinc Whetstone being a fraction of what an AVX or SSE application can achieve, so efficiencies would falsely show more than 50-100%+ in some cases.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1807260 · Report as offensive
b101uk
Volunteer tester

Send message
Joined: 11 Jun 01
Posts: 37
Credit: 282,931
RAC: 0
United Kingdom
Message 1807263 - Posted: 5 Aug 2016, 13:01:55 UTC

the GPU load was:

for Cuda 42/50 >40% for most WU, with some that would stay at >85%.

some of the cuda 42/50 WU though they would stat at >40% for 2/3 of their length would go up to 55% to 65% GPU load, while the ones that were at >85% for 2/3 of their length may drop down to <45% at their latter stages, there were other WU types that had regular spaced changes e.g. from ~45% to ~90% for 15-20sec then back to ~45% 60sec

for SoG/sah GPU load was mostly >90%, some would stay at 85% while other would stay as high as 99%


GPU VDDC, GPU core clock and GUP memory clock were at constant levels, while GPU temperature only had a flux of +-1C regardless of the above, giving some indication that energy being consumed was relatively constant as % TDP would suggest.
ID: 1807263 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1807265 - Posted: 5 Aug 2016, 13:07:34 UTC - in response to Message 1807263.  

With Cuda50 on a 1070, I would probably put no less than 3 instances.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1807265 · Report as offensive
Profile Shaggie76
Avatar

Send message
Joined: 9 Oct 09
Posts: 282
Credit: 271,858,118
RAC: 196
Canada
Message 1807267 - Posted: 5 Aug 2016, 13:07:42 UTC

I'd considered looking at it in a similar way -- we actually have a nearly identical statistical problem at work where were have multi-modal distribution in harmonic relation and we want to figure out the base frequency. Sadly the last stats course I took was about 20 years ago and my google-fu is too weak for me to steal a smarter person's math.

Combining the two techniques might work -- schedule-based evidence + proximity to predicted harmonic to build confidence.
ID: 1807267 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1807269 - Posted: 5 Aug 2016, 13:13:15 UTC - in response to Message 1807267.  

I'd considered looking at it in a similar way -- we actually have a nearly identical statistical problem at work where were have multi-modal distribution in harmonic relation and we want to figure out the base frequency. Sadly the last stats course I took was about 20 years ago and my google-fu is too weak for me to steal a smarter person's math.

Combining the two techniques might work -- schedule-based evidence + proximity to predicted harmonic to build confidence.


lol, similar situation. 20 years ago I went hard into the statistics, loved it and did really well, but burned out on it after a couple of years. Now it's a bit of a case where I need to revive some of those skills for other purposes, but the neurons don't fire that well in that region :). Probably once application development settles down I will crack out Matlab or similar, and some stats books again.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1807269 · Report as offensive
Profile Shaggie76
Avatar

Send message
Joined: 9 Oct 09
Posts: 282
Credit: 271,858,118
RAC: 196
Canada
Message 1807270 - Posted: 5 Aug 2016, 13:16:47 UTC - in response to Message 1807265.  

With Cuda50 on a 1070, I would probably put no less than 3 instances.

But he's running stock -- so eventually it'll decide that OpenCL SoG is faster for him and they should go away.
ID: 1807270 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1807271 - Posted: 5 Aug 2016, 13:22:03 UTC - in response to Message 1807270.  

With Cuda50 on a 1070, I would probably put no less than 3 instances.

But he's running stock -- so eventually it'll decide that OpenCL SoG is faster for him and they should go away.


true. Not sure how/if the server would factor in if he set app_confg for 3 instances of Cuda50. probably doesn't.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1807271 · Report as offensive
b101uk
Volunteer tester

Send message
Joined: 11 Jun 01
Posts: 37
Credit: 282,931
RAC: 0
United Kingdom
Message 1807279 - Posted: 5 Aug 2016, 13:59:03 UTC

yes I am running stock, though I am using some setting in the mb_cmdline.txt and mb_cmdline.txt, along with a app_config.xml to set the CPU to 1.

i.e. I am not trying to force use of cuda42/50 or SoG/sah by exclusion from an app_info


I did briefly try 2 cuda50 WU being ran at once, once I got two files to 20% completed using suspend task, so I could be sure how long they would take to complete, however the time change to complete both at once was only just shorter than the combined time of doing them individually.

so while 3 cuda42 or 50 may be good on GPU which only reach <~33% utilisation per WU, it differs when you have WU that are often at >40% or more utilisation, even the memory controller spends most of its time >=36%
ID: 1807279 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13851
Credit: 208,696,464
RAC: 304
Australia
Message 1807374 - Posted: 5 Aug 2016, 21:53:06 UTC - in response to Message 1807271.  

With Cuda50 on a 1070, I would probably put no less than 3 instances.

But he's running stock -- so eventually it'll decide that OpenCL SoG is faster for him and they should go away.


true. Not sure how/if the server would factor in if he set app_confg for 3 instances of Cuda50. probably doesn't.

Running multiple instances results in a lower APR (even if you are actually doing more WUs per hour).
Once your APR drops down below that for other applications the manager will start running the other applications again until it decides which is giving the best throughput.
As I found on Beta, the available work mix can result in the slower application being picked over the faster one.
Grant
Darwin NT
ID: 1807374 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13851
Credit: 208,696,464
RAC: 304
Australia
Message 1807376 - Posted: 5 Aug 2016, 21:57:05 UTC - in response to Message 1807265.  

With Cuda50 on a 1070, I would probably put no less than 3 instances.

I've found 3 instances was best for my GTX750Tis, although with Guppies it's probably closer to 2.
I suspect my GTX 1070 would be best with 4, however since I've got a GTX 750Ti running with it i'm limited to running 3 at a time. The gain of running 4 on the GTX 1070 would be offset by the larger loss of running 4 on the GTX 7509Ti.
Grant
Darwin NT
ID: 1807376 · Report as offensive
Al Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1682
Credit: 477,343,364
RAC: 482
United States
Message 1807380 - Posted: 5 Aug 2016, 22:02:47 UTC - in response to Message 1807376.  

Think there'll be a time in the not too distant future where it will be granular enough to get it down to specific instructions for specific cards? That would be a neat trick.

ID: 1807380 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13851
Credit: 208,696,464
RAC: 304
Australia
Message 1807391 - Posted: 5 Aug 2016, 22:14:44 UTC - in response to Message 1807380.  

Think there'll be a time in the not too distant future where it will be granular enough to get it down to specific instructions for specific cards? That would be a neat trick.

From what I've seen it can be done now, but it's no where near as easy as
<gpu_usage>0.25</gpu_usage>
<cpu_usage>1.00</cpu_usage>
in app_config.xml to do.
Grant
Darwin NT
ID: 1807391 · Report as offensive
Al Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1682
Credit: 477,343,364
RAC: 482
United States
Message 1807400 - Posted: 5 Aug 2016, 22:44:49 UTC - in response to Message 1807391.  

Well, yeah, that's kinda what I meant, something that us mere mortals can do, and the app_config/info is about as deep as I feel comfortable going. Having a *gasp* graphical front end would be almost a dream come true, but hey, I'm just happy with all the great work that these volunteers are putting in to making it work with the new hardware and software that seems to be dropping almost weekly nowadays.

Oh for the good 'ol days of XP, when it was on it's way to being a 6-7 year old OS before it had serious competition. Talk about a solid platform to develop for, I bet everyone got things optimized to a gnats arse with that stability. Change is good, to a point, if it is truly for the better, but that doesn't seem to be the case very often any more. Well, thanks again for all of you who are doing things that I can't even hope to achieve, you are greatly appreciated for all you hard work and dedication!!!

ID: 1807400 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1807403 - Posted: 5 Aug 2016, 23:02:12 UTC - in response to Message 1807400.  
Last modified: 5 Aug 2016, 23:06:14 UTC

In essence, the mix of different solutions, both that work and running through alpha, is pointing to the need for some flexibility. So I'd say it will become what you're calling 'more granular', but more along the lines of automatic dispatch like the Stock CPU app does for different instruction sets, as opposed many applications and complicated options. [i.e. more sophisticated, but simpler to use]
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1807403 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1807434 - Posted: 6 Aug 2016, 1:36:36 UTC - in response to Message 1806239.  

I was doing SETI@home before BOINC many years ago under my starman2020 user name, but the accounts are not connected today and I do not have access to that yahoo email anymore.

All the 3 facts I underscored show that you CAN connect accounts.

While you are logged-in your current account:
- go to this page and paste your old "yahoo email":
http://setiathome.berkeley.edu/sah_classic_link.php
 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1807434 · Report as offensive
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 17 · Next

Message boards : Number crunching : GPU FLOPS: Theory vs Reality


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.