Message boards :
Number crunching :
Difference between AP Cuda_opencl_100 and 7.10 (opencl_nvidia_100) for APs?
Message board moderation
Author | Message |
---|---|
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
Never seen this before on my machine. Usually the nvidia cards will run 7.10 (opencl_nvidia_100) for the APs But today I saw this one cuda_opencl_100 I don't know if it will validate but I'm curious as to why it would run this over the nvidia one. Will it make any difference? Anyone? |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Never seen this before on my machine. No. It's just different plan class. Same binary. |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
Thanks Raistner for the answer |
JBird Send message Joined: 3 Sep 02 Posts: 297 Credit: 325,260,309 RAC: 549 |
Well, I still have this burning question, about that and these: I trimmed these from the Applications page - 7.10 (opencl_nvidia_100) 6,809 GigaFLOPS 7.10 (cuda_opencl_100) 276 GigaFLOPS 7.09 (opencl_intel_gpu_102) 757 GigaFLOPS And my *question is: do these GigaFLOPS ratings have anything to do with actual runtimes performance per se - or are they just potentials? |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Well, I still have this burning question, about that and these: They will be totals over the population of computers running those particular applications - whether on slow or fast GPUs. The main thing you learn is that more computers are running the modern BOINC v7 clients (which understand the opencl_nvidia_100 plan_class) than the old BOINC v6 client (which wraps up the same binary under a faux-cuda plan_class). The third line is for intel GPUs, which are a different beast entirely. |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
Well, I still have this burning question, about that and these: Neither. The title of the column is "Average computing". It is the average amount of all work being done for that application/plan class. At the bottom of the page you will see Total average computing: 651,739 GigaFLOPS. Which tells us the project is running about 651 TeraFLOPS at the moment. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
JBird Send message Joined: 3 Sep 02 Posts: 297 Credit: 325,260,309 RAC: 549 |
Thanks for the illumination gentlemen. I was viewing/interpreting the chart and these listings in terms of potential horsepower per app Mentioning, that I *do want to enable my iGD and "test" its performance - - wishing I could use-try-test the stock apps *and the Lunatics versions; but fear that BOINC may get upset with me and hiccup my otherwise pleasant output/performance to date. Is there a testbed that I'm not aware of yet? Maybe a Virtual environment or such? I think my 7589235 is capable of such a mix |
qbit Send message Joined: 19 Sep 04 Posts: 630 Credit: 6,868,528 RAC: 0 |
So there is much more crunching done on IGPUs then on Nvidia cards?? |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
So there is much more crunching done on IGPUs then on Nvidia cards?? No - where on earth did you get that idea? Summing all plan classes for Windows users, I get MB CPU 260,631 GigaFLOPS Nvidia 197,380 ATI 46,176 Intel 17,881 AP CPU 2,337 GigaFLOPS Nvidia 6,407 ATI 2,778 Intel 660 which puts iGPU firmly in last place for both application types. On this rather unrepresentative snapshot (because there hasn't been much AP work recently), there's still more crunching done on CPUs than anything else. |
qbit Send message Joined: 19 Sep 04 Posts: 630 Credit: 6,868,528 RAC: 0 |
Ah, I guess 6,809 GigaFLOPS means 6809 GigaFLOPS then. The comma somehow confused me. |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
America uses a comma in place of a period for numerical values |
qbit Send message Joined: 19 Sep 04 Posts: 630 Credit: 6,868,528 RAC: 0 |
Yeah, I should have known. |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
Thanks for the illumination gentlemen. If you want to give your iGPU a test. Then be sure to watch your CPU times. Haswell & Ivy Bridge based CPU have been shown to slow down the CPU by ~50% when using the iGPU to crunch SETI@home tasks. For more information see: A journey: iGPU slowing CPU processing, iGPU tuning, & Loading APU to the limit: performance considerations - ongoing research Here are the specific performance numbers for each iGPU. i7-3770K HD Graphics 4000 @ 1.15GHz 294.4 GFLOPS i7-4790K HD Graphics 4600 @ 1.25GHz 400.0 GFLOPS SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
JBird Send message Joined: 3 Sep 02 Posts: 297 Credit: 325,260,309 RAC: 549 |
Goodness Gracious! HAL/Richard/Raistmer - not going Near it(iGPU) for APs I'm practically ecstatic over v7 7.10 opencl_nvidia_100 performance on my 960 and especially, *970.(Thanks Raistmer et al for that) If the AP mix(availability)is "shorted" towards GPU apps and hardware anytime and only available for CPU apps - I'd like to consider having an app that would handle it - that is, an CPU app that would be *comparable(unlike the older CPU app v7.03 or 05 wasn't it?) to the 7.10 runtimes (I guess, the Intel opencl isn't classified as an CPU app, since it taps the iGPU ?) May *consider it when Intel opencl MBs apps are ready and released(either Stock or Anonymous platform versions) but not at the expense of longer runtimes(if that is the case) - that's just counterproductive and the opposite direction of my goals/expectations. Reducing the CPU clockspeed to 'get it done' is fine by me - that's software technique - the end result(valids with shorter runtimes) is the prize. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
May *consider it when Intel opencl MBs apps are ready and released(either Stock or Anonymous platform versions) but not at the expense of longer runtimes(if that is the case) - that's just counterproductive and the opposite direction of my goals/expectations. They are deployed already: MB: 7.07 (opencl_intel_gpu_sah) 14 Jul 2015, 21:40:44 UTC AP: 7.09 (opencl_intel_gpu_102) 23 Apr 2015, 18:50:41 UTC and included in the Lunatics installer. Check that your Intel iGPU driver produces valid results, though. |
JBird Send message Joined: 3 Sep 02 Posts: 297 Credit: 325,260,309 RAC: 549 |
Gosh, feel like I've been under a rock! Was *following the testing for awhile with a friend who was doing so at Beta, but "told" not to expect them til September+ - unaware they had been released. Thanks for the info I *am still a bit unsettled after Win 10 Pro upgrades on both machines. And I made a mess of things last time I manually installed an app switch. Seems straight forward enough to get Intel's latest driver for my HD4600, then enable it in BIOS; followed by Lunatics 43b and app_info.xml edit (offline) Does that sound right sequence? = Edit> most certainly will be messaging Mike for a commandline! |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Another factor with reading APR's (Gflops) whether aggregate 'averages' for the project app, or an individual host, is a scaling problem. [tech version - can skip this if your eyes start to roll back in your head :)] Here is the peak device flops listed for the 780 in my Mac Pro: 4698 GFlops (peak theoretical) Arbitrarily chosen MB task, runs in 196 seconds (one task at a time for now) Flopcounter: 16015281710943.390625 That gives (rounding heavily) flopcounter/seconds as ~82 GFlops . about a 5% Compute efficiency ( GFlops /Peak ) which is a typical value for not terribly optimised Cuda App, and is realistic. [For baseline comparison, regression testing nv's highly optimised CUFFT library reveals an impressive 10% compute efficiency, probably reflecting a multimillion dollar investment and intimate knowledge of their own hardware] Next, normalising out that 5%, 82x20 = 1640 GFlops(normalised APR) So APR should show either ~82 GFlops (direct for estimating task durations) or ~1640 GFlops (normalised, used for scaling Credit ), but it shows: 428 Gflops Which is either 5x too high if you want to use the number for estimating throughput and times) or ~3.8x too low if it's for normalised credit purposes. You can artgue the flopcounter can be out, but that's only +/-10% or so depending on some fudges for counting flops on GPUs. The main discrepancy for the multibeam app, is that we normalise against a different app, the CPU AVX one mostly (in a complex mix with hosts with no AVX) that is scaled against its non-SIMD Boinc Whetstone, so 'claims' about 3.5 timnes too low). [tech version ends] So IOW [simple version], treat the displayed figures for any app with care, and factor in how you want to use the numbers. "Actual throughput' will be [much] lower than displayed pretty much across the board (and estimates correspondingly wonky for new machines), and credits will be [far too] low when compared against a verbatim interpretation of the creditnew spec. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.