Difference between AP Cuda_opencl_100 and 7.10 (opencl_nvidia_100) for APs?

Message boards : Number crunching : Difference between AP Cuda_opencl_100 and 7.10 (opencl_nvidia_100) for APs?
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1716155 - Posted: 20 Aug 2015, 21:57:50 UTC

Never seen this before on my machine.

Usually the nvidia cards will run 7.10 (opencl_nvidia_100) for the APs

But today I saw this one cuda_opencl_100 I don't know if it will validate but I'm curious as to why it would run this over the nvidia one.

Will it make any difference? Anyone?
ID: 1716155 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1716161 - Posted: 20 Aug 2015, 22:10:35 UTC - in response to Message 1716155.  

Never seen this before on my machine.

Usually the nvidia cards will run 7.10 (opencl_nvidia_100) for the APs

But today I saw this one cuda_opencl_100 I don't know if it will validate but I'm curious as to why it would run this over the nvidia one.

Will it make any difference? Anyone?

No. It's just different plan class. Same binary.
ID: 1716161 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1716162 - Posted: 20 Aug 2015, 22:11:22 UTC - in response to Message 1716161.  

Thanks Raistner for the answer
ID: 1716162 · Report as offensive
Profile JBird Project Donor
Avatar

Send message
Joined: 3 Sep 02
Posts: 297
Credit: 325,260,309
RAC: 549
United States
Message 1724655 - Posted: 11 Sep 2015, 15:20:16 UTC - in response to Message 1716161.  

Well, I still have this burning question, about that and these:

I trimmed these from the Applications page -

7.10 (opencl_nvidia_100) 6,809 GigaFLOPS

7.10 (cuda_opencl_100) 276 GigaFLOPS

7.09 (opencl_intel_gpu_102) 757 GigaFLOPS

And my *question is: do these GigaFLOPS ratings have anything to do with actual runtimes performance per se - or are they just potentials?

ID: 1724655 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1724657 - Posted: 11 Sep 2015, 15:28:44 UTC - in response to Message 1724655.  

Well, I still have this burning question, about that and these:

I trimmed these from the Applications page -

7.10 (opencl_nvidia_100)	6,809 GigaFLOPS 

7.10 (cuda_opencl_100)		  276 GigaFLOPS 

7.09 (opencl_intel_gpu_102)	  757 GigaFLOPS

And my *question is: do these GigaFLOPS ratings have anything to do with actual runtimes performance per se - or are they just potentials?

They will be totals over the population of computers running those particular applications - whether on slow or fast GPUs.

The main thing you learn is that more computers are running the modern BOINC v7 clients (which understand the opencl_nvidia_100 plan_class) than the old BOINC v6 client (which wraps up the same binary under a faux-cuda plan_class).

The third line is for intel GPUs, which are a different beast entirely.
ID: 1724657 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1724658 - Posted: 11 Sep 2015, 15:28:54 UTC - in response to Message 1724655.  

Well, I still have this burning question, about that and these:

I trimmed these from the Applications page -

7.10 (opencl_nvidia_100) 6,809 GigaFLOPS

7.10 (cuda_opencl_100) 276 GigaFLOPS

7.09 (opencl_intel_gpu_102) 757 GigaFLOPS

And my *question is: do these GigaFLOPS ratings have anything to do with actual runtimes performance per se - or are they just potentials?

Neither. The title of the column is "Average computing". It is the average amount of all work being done for that application/plan class. At the bottom of the page you will see Total average computing: 651,739 GigaFLOPS. Which tells us the project is running about 651 TeraFLOPS at the moment.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1724658 · Report as offensive
Profile JBird Project Donor
Avatar

Send message
Joined: 3 Sep 02
Posts: 297
Credit: 325,260,309
RAC: 549
United States
Message 1724669 - Posted: 11 Sep 2015, 16:02:23 UTC

Thanks for the illumination gentlemen.

I was viewing/interpreting the chart and these listings in terms of potential horsepower per app

Mentioning, that I *do want to enable my iGD and "test" its performance -
- wishing I could use-try-test the stock apps *and the Lunatics versions; but fear that BOINC may get upset with me and hiccup my otherwise pleasant output/performance to date.

Is there a testbed that I'm not aware of yet? Maybe a Virtual environment or such?

I think my 7589235 is capable of such a mix

ID: 1724669 · Report as offensive
qbit
Volunteer tester
Avatar

Send message
Joined: 19 Sep 04
Posts: 630
Credit: 6,868,528
RAC: 0
Austria
Message 1724706 - Posted: 11 Sep 2015, 17:03:44 UTC

So there is much more crunching done on IGPUs then on Nvidia cards??
ID: 1724706 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1724718 - Posted: 11 Sep 2015, 17:24:56 UTC - in response to Message 1724706.  

So there is much more crunching done on IGPUs then on Nvidia cards??

No - where on earth did you get that idea?

Summing all plan classes for Windows users, I get

MB	CPU	260,631	GigaFLOPS
	Nvidia	197,380	
	ATI	 46,176	
	Intel	 17,881	
			
AP	CPU	  2,337	GigaFLOPS
	Nvidia	  6,407	
	ATI	  2,778	
	Intel	    660

which puts iGPU firmly in last place for both application types. On this rather unrepresentative snapshot (because there hasn't been much AP work recently), there's still more crunching done on CPUs than anything else.
ID: 1724718 · Report as offensive
qbit
Volunteer tester
Avatar

Send message
Joined: 19 Sep 04
Posts: 630
Credit: 6,868,528
RAC: 0
Austria
Message 1724742 - Posted: 11 Sep 2015, 17:48:33 UTC

Ah, I guess 6,809 GigaFLOPS means 6809 GigaFLOPS then. The comma somehow confused me.
ID: 1724742 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1724744 - Posted: 11 Sep 2015, 17:51:12 UTC - in response to Message 1724742.  

America uses a comma in place of a period for numerical values
ID: 1724744 · Report as offensive
qbit
Volunteer tester
Avatar

Send message
Joined: 19 Sep 04
Posts: 630
Credit: 6,868,528
RAC: 0
Austria
Message 1724748 - Posted: 11 Sep 2015, 18:03:40 UTC

Yeah, I should have known.
ID: 1724748 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1724772 - Posted: 11 Sep 2015, 19:11:29 UTC - in response to Message 1724669.  

Thanks for the illumination gentlemen.

I was viewing/interpreting the chart and these listings in terms of potential horsepower per app

Mentioning, that I *do want to enable my iGD and "test" its performance -
- wishing I could use-try-test the stock apps *and the Lunatics versions; but fear that BOINC may get upset with me and hiccup my otherwise pleasant output/performance to date.

Is there a testbed that I'm not aware of yet? Maybe a Virtual environment or such?

I think my 7589235 is capable of such a mix

If you want to give your iGPU a test. Then be sure to watch your CPU times. Haswell & Ivy Bridge based CPU have been shown to slow down the CPU by ~50% when using the iGPU to crunch SETI@home tasks.
For more information see:
A journey: iGPU slowing CPU processing, iGPU tuning, & Loading APU to the limit: performance considerations - ongoing research

Here are the specific performance numbers for each iGPU.
i7-3770K HD Graphics 4000 @ 1.15GHz 294.4 GFLOPS
i7-4790K HD Graphics 4600 @ 1.25GHz 400.0 GFLOPS
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1724772 · Report as offensive
Profile JBird Project Donor
Avatar

Send message
Joined: 3 Sep 02
Posts: 297
Credit: 325,260,309
RAC: 549
United States
Message 1724804 - Posted: 11 Sep 2015, 21:03:27 UTC - in response to Message 1724772.  

Goodness Gracious! HAL/Richard/Raistmer - not going Near it(iGPU) for APs

I'm practically ecstatic over v7 7.10 opencl_nvidia_100 performance on my 960 and especially, *970.(Thanks Raistmer et al for that)

If the AP mix(availability)is "shorted" towards GPU apps and hardware anytime and only available for CPU apps - I'd like to consider having an app that would handle it - that is, an CPU app that would be *comparable(unlike the older CPU app v7.03 or 05 wasn't it?) to the 7.10 runtimes
(I guess, the Intel opencl isn't classified as an CPU app, since it taps the iGPU ?)

May *consider it when Intel opencl MBs apps are ready and released(either Stock or Anonymous platform versions) but not at the expense of longer runtimes(if that is the case) - that's just counterproductive and the opposite direction of my goals/expectations.

Reducing the CPU clockspeed to 'get it done' is fine by me - that's software technique - the end result(valids with shorter runtimes) is the prize.

ID: 1724804 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1724835 - Posted: 11 Sep 2015, 21:53:13 UTC - in response to Message 1724804.  

May *consider it when Intel opencl MBs apps are ready and released(either Stock or Anonymous platform versions) but not at the expense of longer runtimes(if that is the case) - that's just counterproductive and the opposite direction of my goals/expectations.

They are deployed already:

MB: 7.07 (opencl_intel_gpu_sah) 14 Jul 2015, 21:40:44 UTC
AP: 7.09 (opencl_intel_gpu_102) 23 Apr 2015, 18:50:41 UTC

and included in the Lunatics installer. Check that your Intel iGPU driver produces valid results, though.
ID: 1724835 · Report as offensive
Profile JBird Project Donor
Avatar

Send message
Joined: 3 Sep 02
Posts: 297
Credit: 325,260,309
RAC: 549
United States
Message 1724852 - Posted: 11 Sep 2015, 23:30:03 UTC - in response to Message 1724835.  
Last modified: 11 Sep 2015, 23:49:22 UTC

Gosh, feel like I've been under a rock!
Was *following the testing for awhile with a friend who was doing so at Beta, but "told" not to expect them til September+ - unaware they had been released.
Thanks for the info

I *am still a bit unsettled after Win 10 Pro upgrades on both machines.
And I made a mess of things last time I manually installed an app switch.

Seems straight forward enough to get Intel's latest driver for my HD4600, then enable it in BIOS; followed by Lunatics 43b and app_info.xml edit (offline)

Does that sound right sequence?
=
Edit> most certainly will be messaging Mike for a commandline!

ID: 1724852 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1724910 - Posted: 12 Sep 2015, 3:37:09 UTC
Last modified: 12 Sep 2015, 4:07:57 UTC

Another factor with reading APR's (Gflops) whether aggregate 'averages' for the project app, or an individual host, is a scaling problem.

[tech version - can skip this if your eyes start to roll back in your head :)]
Here is the peak device flops listed for the 780 in my Mac Pro:
4698 GFlops (peak theoretical)

Arbitrarily chosen MB task, runs in 196 seconds (one task at a time for now)
Flopcounter: 16015281710943.390625

That gives (rounding heavily) flopcounter/seconds as ~82 GFlops . about a 5% Compute efficiency ( GFlops /Peak ) which is a typical value for not terribly optimised Cuda App, and is realistic. [For baseline comparison, regression testing nv's highly optimised CUFFT library reveals an impressive 10% compute efficiency, probably reflecting a multimillion dollar investment and intimate knowledge of their own hardware]

Next, normalising out that 5%, 82x20 = 1640 GFlops(normalised APR)

So APR should show either ~82 GFlops (direct for estimating task durations) or ~1640 GFlops (normalised, used for scaling Credit ), but it shows:
428 Gflops
Which is either 5x too high if you want to use the number for estimating throughput and times) or ~3.8x too low if it's for normalised credit purposes.

You can artgue the flopcounter can be out, but that's only +/-10% or so depending on some fudges for counting flops on GPUs. The main discrepancy for the multibeam app, is that we normalise against a different app, the CPU AVX one mostly (in a complex mix with hosts with no AVX) that is scaled against its non-SIMD Boinc Whetstone, so 'claims' about 3.5 timnes too low).
[tech version ends]

So IOW [simple version], treat the displayed figures for any app with care, and factor in how you want to use the numbers. "Actual throughput' will be [much] lower than displayed pretty much across the board (and estimates correspondingly wonky for new machines), and credits will be [far too] low when compared against a verbatim interpretation of the creditnew spec.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1724910 · Report as offensive

Message boards : Number crunching : Difference between AP Cuda_opencl_100 and 7.10 (opencl_nvidia_100) for APs?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.