setiathome v7 7.00 MultiBeam

Message boards : Number crunching : setiathome v7 7.00 MultiBeam
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile JBird Project Donor
Avatar

Send message
Joined: 3 Sep 02
Posts: 297
Credit: 325,260,309
RAC: 549
United States
Message 1684919 - Posted: 27 May 2015, 23:17:11 UTC
Last modified: 27 May 2015, 23:21:38 UTC

I've noticed (in Process Lasso), that the AKv8c_r2549_winx86-64_AVXxjfs
app runs at 24% cpu usage.

Is there a way to improve the runtimes of these MBs by say, increasing the cpu usage?
ie - adding another app_config entry referencing such - such as:
=
<cpu_versions>
<cpu_usage>0.75</cpu_usage>
</cpu_versions>
=
Could this help? Would it work? Or should it be just <cpu_usage>1</cpu_usage>
=
Where did 24% usage come from anyway?

ID: 1684919 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1684922 - Posted: 27 May 2015, 23:21:51 UTC - in response to Message 1684919.  

Is that on a quad-core machine, by any chance?
ID: 1684922 · Report as offensive
Profile JBird Project Donor
Avatar

Send message
Joined: 3 Sep 02
Posts: 297
Credit: 325,260,309
RAC: 549
United States
Message 1684923 - Posted: 27 May 2015, 23:25:49 UTC - in response to Message 1684922.  
Last modified: 27 May 2015, 23:31:56 UTC

Yes Richard. i5 2500 Quad with 4 Threads
all cores available in prefs
And I run 4x/4up
That is, always 4 of them running - 1 on each core

ID: 1684923 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1684930 - Posted: 27 May 2015, 23:54:20 UTC
Last modified: 28 May 2015, 0:00:23 UTC

I didn't really understand the issue when you PMed me asking about this previously. However after looking at the run times for some of your CPU tasks
Run time(sec)	CPU time (sec)	Credit 	Application
2,344.23 	1,957.52 	43.18 	SETI@home v7 Anonymous platform (CPU)
2,506.23 	1,973.60 	47.56 	SETI@home v7 Anonymous platform (CPU)
7,234.75 	5,661.46 	119.40 	SETI@home v7 Anonymous platform (CPU)
2,401.64 	1,854.82 	40.20 	SETI@home v7 Anonymous platform (CPU)

I think I understand your issue.

Which I think the question is more "Why are your run times so high compared to your CPU times?".
The answer to that is normally that you have another application running causing the SETI@home CPU app to wait for CPU cycles, or you have in some other way over committed your system resources.

I don't have a Core i CPU from that generation, but I suppose it could also be that the AVX app is not as efficient on AVX v1.0 hardware? But I don't think that is as likely.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1684930 · Report as offensive
Profile JBird Project Donor
Avatar

Send message
Joined: 3 Sep 02
Posts: 297
Credit: 325,260,309
RAC: 549
United States
Message 1684932 - Posted: 28 May 2015, 0:10:07 UTC - in response to Message 1684930.  

Thanks HAL--
I see where you're coming from there

My contention is, since this *is a cpu app and I have 4 cores at 3.3GHz - why isn't it *using the whole thing vs 24% of it.
AVX is pretty strong but why cripple it by calling 24% instead of whole ball o wax.
Is app_info or app_config messed up somehow?

Would my "suggested" app_config entry fix it?
=
Yes I do run GPU versions alongside(CUDA50) - they only draw 1-4% cpu usage.

ID: 1684932 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 1684933 - Posted: 28 May 2015, 0:17:09 UTC - in response to Message 1684932.  
Last modified: 28 May 2015, 0:35:05 UTC

In Windows, the CPU resource is looked at as a whole, with 100% being the sum total of all cores and sockets in a system.

So on your single socket, quad core CPU, 100% is all 4 cores, which means that no individual core can go over 25%. Note that Task Manager shows decaying averages for CPU usage.

Which means ultimately, no, you cannot change any configuration in BOINC to use more than 24/25% CPU on a single core on your machine.
ID: 1684933 · Report as offensive
Profile JBird Project Donor
Avatar

Send message
Joined: 3 Sep 02
Posts: 297
Credit: 325,260,309
RAC: 549
United States
Message 1684940 - Posted: 28 May 2015, 0:48:24 UTC - in response to Message 1684933.  
Last modified: 28 May 2015, 0:53:40 UTC

Man! Thanks for the Feedback on this y'all. Illuminated my misconceptions of what I *thought I was working with.
=
Darn it! Every time I *think I'm onto something/getting somewhere, I get punked by Windows/WinTel I should say - *Thinking(and Marketing, of course).
=
Well, dunno *what to expect from my upgrade I'm working on, then.
Just got and Building out next week:
Intel Core i7-4790K Devil’s Canyon Quad-Core 4.0GHz 8 Threads HT

Which Windows and Boinc will *Read/See-Show as 8 processors
(should I expect it has only 4 FPUs? And will the AVX apps still run at 24%? Whether it's on a Physical *or Virtual core?)

The i7 does boast AVX 2.0 mebe that'll help
And +700MHz surely will too
Dunno how Hyperthreading will *act (as far as my above questions)

ID: 1684940 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 1684946 - Posted: 28 May 2015, 1:17:17 UTC - in response to Message 1684940.  
Last modified: 28 May 2015, 1:31:59 UTC

Which Windows and Boinc will *Read/See-Show as 8 processors
(should I expect it has only 4 FPUs? And will the AVX apps still run at 24%? Whether it's on a Physical *or Virtual core?)


You will have 4 real FPUs and 4 virtual ones, for a total of 8 FPUs to go along with your 8 ALUs.

And again, since Windows sees the CPU resource as a whole of 100% (Windows doesn't really care about virtual or real cores), with a single socket and 8 total threads, your 100% is now divided by 8, so each core cannot go over 12.5%.

My Core i7 3930K has 6 cores and 6 Hyperthreaded ones, so each CPU will not go above 8.33% (100 / 12). My Xeon X5660 has two sockets, 6 cores in each CPU, and each CPU has 6 Hyperthreaded cores, so a single CPU will not go above 4.16% (100 / 24).

Mind you, you are not losing performance. 12.5% is 100% of one individual CPU core on a 8-CPU system. Just as 25% is 100% of one individual core on a quad-core system, and 4.16% is 100% of one individual core on a 24 "CPU" system.

Darn it! Every time I *think I'm onto something/getting somewhere, I get punked by Windows/WinTel I should say - *Thinking(and Marketing, of course).


This wasn't designed and/or implemented by the Marketing team for once. The way the OS handles CPU resources as a whole date all the way back to the first Symmetric Multi-Processor (SMP) enabled OS as co-developed by IBM and Microsoft in OS/2 2.11 and continued on when Microsoft forked from that and started Windows NT (which of course is the precursor to your Windows 7 OS). I'm sure if I expanded by studies beyond the x86 market, I'm sure the design philosophy dates even earlier than IBM and Microsoft's attempt too.
ID: 1684946 · Report as offensive
Profile JBird Project Donor
Avatar

Send message
Joined: 3 Sep 02
Posts: 297
Credit: 325,260,309
RAC: 549
United States
Message 1684957 - Posted: 28 May 2015, 2:01:53 UTC - in response to Message 1684946.  
Last modified: 28 May 2015, 2:10:06 UTC

Ah! I'm beginning to catch-on. To the fractional *nature/the gist of it anyway.

Thanks for 'splainin to me that way.

So from a *performance standpoint - that is, runtimes and the like, Brute force ie +700MHz at Stock is the performance Boost I'll *see (compared to my Quad 3.3GHz cores anyway)
AVX 2.0 Boost = unknown benefit til I Launch/crunch/analyze and see.
=
OS is 64bit; hardware is 64bit; and bigger, faster Busses with more Lanes coming as well with new Z97 Board and its PCIe 3.0 Bus. Plus advent of Hyperthreading increases I/O generally, as I understand it; as well as igniting my Maxwell GPU's Unified Memory (another I/O enhancement for *it)
=
Experimenting with just how many Cores to use (in Prefs) and/or experimenting with P Lasso affinity, is next. My initial thoughts about affinities is to map CPU0 to Windows/Boinc,and run everything else(apps) with the remaining 7 Cores; unless of course It should just be Windows on CPU0 and Boinc *must be on the Remaining - uncertain. Other idea is separate cores(group) for CPU and GPU apps.
Beats me--just venturing into fray. I still don't *get why I would need to "free a core" for my GPU apps, for example.
Onward thru the Fog!
Thanks again for your Feedback!
=
Edit: I do Disable Speedstep in BIOS and TurboBoost ON - because I don't want *any Down-clocking *anywhere.
And Jason G caught me up on why 64bit app is slower than 32bit(at this time due to app dependencies/latency I think he said) -- unless maybe the bigger Busses on this new Board will *allow dbl wide traffic- without a performance hit?)

ID: 1684957 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 1684959 - Posted: 28 May 2015, 2:13:08 UTC - in response to Message 1684957.  
Last modified: 28 May 2015, 2:35:48 UTC

So from a *performance standpoint - that is, runtimes and the like, Brute force ie +700MHz at Stock is the performance Boost I'll *see (compared to my Quad 3.3GHz cores anyway)


Well raw clock speeds are only comparable when you are within the same CPU generation. Every other release (at least in Intel's Tick/Tock cadence for CPU releases) increases the number of instructions per clock cycle, known as Instruction Per Cycle or IPC.

So a 3.3GHz 5xxx series Core i7/i5 is going to be faster than a 3.3GHz 2xxx series Core i7/i5, and even faster still than say a 3.3GHz Intel Core 2 series.

So you're gaining more than just 700MHz in raw clock speed, but only about ~5-10% (depending on the application and optimizations) on a clock-per-clock comparison basis.

Other idea is separate cores(group) for CPU and GPU apps.
Beats me--just venturing into fray. I still don't *get why I would need to "free a core" for my GPU apps, for example.


It has to do with resource contention and how any modern OS handles multitasking and thread-level priorities. A CPU core that is 100% busy working on a thread and has to constantly stop to respond to other active threads (such as feeding the GPU) won't provide all threads the level of attention they need to work efficiently. Given that the GPU is far more efficient than a single CPU core (or all of them put together for that matter), it is recommended to sacrifice the performance of a single core so as to allow it to be responsive to the GPU thread... to sort of "feed" it.

Edit: I do Disable Speedstep in BIOS and TurboBoost ON - because I don't want *any Down-clocking *anywhere.


Yes, that's correct.

And Jason G caught me up on why 64bit app is slower than 32bit(at this time due to app dependencies/latency I think he said) -- unless maybe the bigger Busses on this new Board will *allow dbl wide traffic- without a performance hit?)


Nope. PCI Express (or PCI-e / PCIe) are all the same bits or "width" per lane from PCIe 1.0 through PCIe 3.0. It is a serial bus that gains speed increases from pushing data faster through the same leads, and optimization in the protocols that send the data. Also, the CPU doesn't sit on the PCIe bus, so the faster PCIe 3.0 bus won't offer faster performance vs the number of bits in the CPU registers.
ID: 1684959 · Report as offensive
Profile JBird Project Donor
Avatar

Send message
Joined: 3 Sep 02
Posts: 297
Credit: 325,260,309
RAC: 549
United States
Message 1684965 - Posted: 28 May 2015, 3:09:39 UTC - in response to Message 1684959.  

Aha!(again) - Thanks OzzFan.
So to take advantage on my current machine, I should goto .75/75% Cores in Prefs?

And same percent on new one (more I think) since I only have One discrete GPU
that I run CUDAs at .25GPU config - but AP-OpenCL NV is config with 1+1.
=
I'm currently planning to DVI my new iGD HD 4600 to my Monitor only - to free up discrete GPU VRAM for Crunch only.
I'll "unlist" use Intel GPU/iGD in Prefs - until I see OC options for it in BIOS that is - (that would come close to 960 GPU's base clock of 540MHz) who knows? Not I til I *seethis BIOS.

ID: 1684965 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1684975 - Posted: 28 May 2015, 3:49:18 UTC - in response to Message 1684965.  
Last modified: 28 May 2015, 3:53:53 UTC

Aha!(again) - Thanks OzzFan.
So to take advantage on my current machine, I should goto .75/75% Cores in Prefs?

And same percent on new one (more I think) since I only have One discrete GPU
that I run CUDAs at .25GPU config - but AP-OpenCL NV is config with 1+1.
=
I'm currently planning to DVI my new iGD HD 4600 to my Monitor only - to free up discrete GPU VRAM for Crunch only.
I'll "unlist" use Intel GPU/iGD in Prefs - until I see OC options for it in BIOS that is - (that would come close to 960 GPU's base clock of 540MHz) who knows? Not I til I *seethis BIOS.

Telling BOINC to only use 3 of the 4 CPU cores in your current machine is likely to increase its overall performance.
If you set BOINC CPU usage to 75% the setting in your app_config for the OpenCL AP app to use a whole core will still be used. So while AP tasks are running only 2 CPU cores will be used. So you may want to update your app_config settings so that does not occur. Alternatively you could modify your MB GPU app settings to also reserve a whole core instead of changing the BOINC CPU setting.

I'm not 100% sure what you are referring to in regards to the iGPU clock vs the GTX 960. Most of the iGPUs have a clock that runs 1.0-1.3 GHz. That doesn't really mean anything as they have far fewer shader units & thus a much lower GFLOPS benchmark. The iGPU in the i7-4790K should be rated at 400 GFLOPS according to the formula they use.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1684975 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1684997 - Posted: 28 May 2015, 5:17:51 UTC - in response to Message 1684981.  
Last modified: 28 May 2015, 5:25:14 UTC

Aha!(again) - Thanks OzzFan.
So to take advantage on my current machine, I should goto .75/75% Cores in Prefs?

And same percent on new one (more I think) since I only have One discrete GPU
that I run CUDAs at .25GPU config - but AP-OpenCL NV is config with 1+1.
=
I'm currently planning to DVI my new iGD HD 4600 to my Monitor only - to free up discrete GPU VRAM for Crunch only.
I'll "unlist" use Intel GPU/iGD in Prefs - until I see OC options for it in BIOS that is - (that would come close to 960 GPU's base clock of 540MHz) who knows? Not I til I *seethis BIOS.

Telling BOINC to only use 3 of the 4 CPU cores in your current machine is likely to increase its overall performance.
If you set BOINC CPU usage to 75% the setting in your app_config for the OpenCL AP app to use a whole core will still be used. So while AP tasks are running only 2 CPU cores will be used. So you may want to update your app_config settings so that does not occur. Alternatively you could modify your MB GPU app settings to also reserve a whole core instead of changing the BOINC CPU setting.

I'm not 100% sure what you are referring to in regards to the iGPU clock vs the GTX 960. Most of the iGPUs have a clock that runs 1.0-1.3 GHz. That doesn't really mean anything as they have far fewer shader units & thus a much lower GFLOPS benchmark. The iGPU in the i7-4790K should be rated at 400 GFLOPS according to the formula they use.


The GPU in i7-4790K, is rated much lower than that by BOINC, as can be seen from the startup of my new computer:

2015-05-28 03:05:47 | | CUDA: NVIDIA GPU 0: GeForce GTX 980 (driver version 352.86, CUDA version 7.5, compute capability 5.2, 4096MB, 3066MB available, 5237 GFLOPS peak)
2015-05-28 03:05:47 | | OpenCL: NVIDIA GPU 0: GeForce GTX 980 (driver version 352.86, device version OpenCL 1.2 CUDA, 4096MB, 3066MB available, 5237 GFLOPS peak)
2015-05-28 03:05:47 | | OpenCL: Intel GPU 0: Intel(R) HD Graphics 4600 (driver version 10.18.14.4170, device version OpenCL 1.2, 1400MB, 1400MB available, 56 GFLOPS peak)
2015-05-28 03:05:47 | | OpenCL CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz (OpenCL driver vendor: Intel(R) Corporation, driver version 4.2.0.148, device version OpenCL 1.2 (Build 148))

BOINC gets mine equally wrong as well.
OpenCL: Intel GPU 0: Intel(R) HD Graphics 4600 (driver version 10.18.14.4156, device version OpenCL 1.2, 1195MB, 1195MB available, 32 GFLOPS peak)
OpenCL CPU: Intel(R) Core(TM) i5-4670K CPU @ 3.40GHz (OpenCL driver vendor: Intel(R) Corporation, driver version 4.2.0.148, device version OpenCL 1.2 (Build 148))

The formula used is Seventh generation (HD Graphics 4000, 5000) - EU * 8 * 2 * clock speed
Given there are 20 compute units 20*8*2*1200=384,000 or 384 GFLOPS. The extra 50MHz by default on the i7-4790K makes it 400 GFLOPS.

BOINC is much more optimistic about my HD6870 GPU.
OpenCL: ATI 0: ATI Radeon HD 6870 (Barts XT) (driver version 1573.4 (VM), device version OpenCL 1.2 AMD-APP (1573.4), 1024MB, 991MB available, 4032 GFLOPS peak)

When in reality it is rated at half of that.

Unless your GTX980's are overclocked it looks like BOINC is a bit over their rated 4612 GFLOPS as well.

NVIDIA & ATI both use the formula shaders * 2 * clock speed to compute the SP GFLOP rating. Trying to apply that to the iGPUs doesn't seem to come up with the numbers BOINC is spitting out either.

[sarcasm]It is troubling. As I know we all deeply depend on the benchmark values provided by BOINC for so many things.[/sarcasm]
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1684997 · Report as offensive
Profile JBird Project Donor
Avatar

Send message
Joined: 3 Sep 02
Posts: 297
Credit: 325,260,309
RAC: 549
United States
Message 1685170 - Posted: 28 May 2015, 16:06:23 UTC - in response to Message 1684975.  
Last modified: 28 May 2015, 16:11:58 UTC

Glad you brought it up HAL
This, from Intel ARK:
Processor Graphics ‡ Intel® HD Graphics 4600
Graphics Base Frequency 350 MHz
Graphics Max Dynamic Frequency 1.25 GHz
Graphics Video Max Memory 1.7 GB
Graphics Output eDP/DP/HDMI/VGA
Execution Units 20
=
NVidia GTX 960 SC *Base clock is 540 MHz
There is a *multiplier at play here-for the life of me can't pin it down
NV lists "Direct Compute" = 5.2 in one util/5.0 another (which I have always assoc with Shaders) - 960 has 1024 Unified but sez SM5.0
Yet, there are 8 Multiprocs - *Read as 8 CUs in AP stderr

(So, what and which *multiplier do I use to get a clean number *here?(Intel)

Nor do I *understand the Intel Execution Units 20--for comparative purposes

Memory clocks are a little more obvious.
Intel uses my DDR3 sys RAM which will be 1600 MHz but Bandwidth there ?
NV = the 2048 GDDR5 VRAM 128bit/112 GB/s stuff

So *close/no cigar on the comparisons albeit whatever I find OC wise in ASUS BIOS -may exceed ARK specs.

ID: 1685170 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1685220 - Posted: 28 May 2015, 17:36:45 UTC - in response to Message 1685170.  

Glad you brought it up HAL
This, from Intel ARK:
Processor Graphics ‡ Intel® HD Graphics 4600
Graphics Base Frequency 350 MHz
Graphics Max Dynamic Frequency 1.25 GHz
Graphics Video Max Memory 1.7 GB
Graphics Output eDP/DP/HDMI/VGA
Execution Units 20
=
NVidia GTX 960 SC *Base clock is 540 MHz
There is a *multiplier at play here-for the life of me can't pin it down
NV lists "Direct Compute" = 5.2 in one util/5.0 another (which I have always assoc with Shaders) - 960 has 1024 Unified but sez SM5.0
Yet, there are 8 Multiprocs - *Read as 8 CUs in AP stderr

(So, what and which *multiplier do I use to get a clean number *here?(Intel)

Nor do I *understand the Intel Execution Units 20--for comparative purposes

Memory clocks are a little more obvious.
Intel uses my DDR3 sys RAM which will be 1600 MHz but Bandwidth there ?
NV = the 2048 GDDR5 VRAM 128bit/112 GB/s stuff

So *close/no cigar on the comparisons albeit whatever I find OC wise in ASUS BIOS -may exceed ARK specs.

The Intel "Execution Units" are the shaders. I'm not sure what you are talking about in regards to a multiplier. The "base Frequency" listed for the GPU is GPUs idle frequency when it slows down to to save power. Not to be confused with a CPU "Base Clock" value.
In the case of the GTX 960 it's default clock is 1127MHz when actively under load. That's where the default FLOP rating comes from with 1024 shaders. 1024*2*1128=2308 GFLOPS in SP. The 540MHz I imagine is your GPUs idle frequency. Which someone labeled as "base clock"? Base Clock in that instance could be used to describe the GPUs frequency floor. Rather than a root clock frequency.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1685220 · Report as offensive
Profile JBird Project Donor
Avatar

Send message
Joined: 3 Sep 02
Posts: 297
Credit: 325,260,309
RAC: 549
United States
Message 1685243 - Posted: 28 May 2015, 18:37:16 UTC - in response to Message 1685220.  

Ah. The Core Graphics(Base)clock is associated with NV RAMDAC-the heart of the thing; which until recently(Fermi then Kepler and now Maxwell) has been a 400 MHz. Jumped about 50MHz/per change.

The multiplier I refer to is associated with number of Proc Cores, each with its own RAMDAC backbone; similar to SRAM Cache on a CPU.

Shader Units figure into it *somewhere - as a bandwidth thing - carburetor or engine of a sort, as I understand it - the SM module. Although, the Direct Compute units numbers don't match the number of Procs - rather, the SM Module;
when it comes to discussions of *which unit is a CU - will the real CU stand up please?
=
Yes, the stark difference between Intel and NV in the Shaders and Base Clocks departments, is pretty close to Radical.

Add GDDR5 vs sys RAM DDR3 and the difference widens even more.
Comes down to apples n oranges comparisons and Expectations
ie don't Expect a Volkswagon to beat that Vette off the line - live with it!

ID: 1685243 · Report as offensive
Profile JBird Project Donor
Avatar

Send message
Joined: 3 Sep 02
Posts: 297
Credit: 325,260,309
RAC: 549
United States
Message 1685262 - Posted: 28 May 2015, 19:29:14 UTC - in response to Message 1684975.  

Well, I did *try reducing Cores to 3/75% in Boinc and therefore both SETI and Einstein sites.
Overnight(about 8 hrs +/-) experiment involving about 30 GPU/CPU tasks.
=
Zero *observed benefits - of course near 50% tasks were _0 trailers and went to Pending so difficult to track runtimes improvements
=
Seemingly smoother , faster ops with CUDAs by moving down to .33 (from .25)config - but nothing dynamic/dramatic.
=
I terminated the Plan and reverted to 100% Cores
=
Computer Case I'm waiting on arrived Houston at Noon - hope they're kidding I must wait til June 1 for it to Drive 200 miles to San Antonio and get to my porch!

ID: 1685262 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22202
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1685273 - Posted: 28 May 2015, 19:42:03 UTC

"_0" indicates that this task is the first of the initial replicate of a work unit, the other being "_1".
To the user there is no significance in these two, however if you see "_5" upwards it is a good indication of a problematic work unit. You will personally only ever get one "version" of a given work unit.

The best way of looking at changes in performance is to look at trends over several days (a week at least) as this will allow a wide range of work units to be processed and so give you a reasonable average. This is particularly true when there is a "shorty" storm (tasks which run abnormally quickly), or a pile of very slow tasks.

"Pending" indicates that validation has not been completed on that task.
When validation is complete the status will change to "valid" (and will vanish from your visible list after 24 hours), or, "inconclusive" (which means the two of you didn't agree on the result), or, "invalid" (which means your result has been rejected for one of a number of reasons.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1685273 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1685286 - Posted: 28 May 2015, 19:56:32 UTC - in response to Message 1685243.  

Ah. The Core Graphics(Base)clock is associated with NV RAMDAC-the heart of the thing; which until recently(Fermi then Kepler and now Maxwell) has been a 400 MHz. Jumped about 50MHz/per change.

The multiplier I refer to is associated with number of Proc Cores, each with its own RAMDAC backbone; similar to SRAM Cache on a CPU.

Shader Units figure into it *somewhere - as a bandwidth thing - carburetor or engine of a sort, as I understand it - the SM module. Although, the Direct Compute units numbers don't match the number of Procs - rather, the SM Module;
when it comes to discussions of *which unit is a CU - will the real CU stand up please?
=
Yes, the stark difference between Intel and NV in the Shaders and Base Clocks departments, is pretty close to Radical.

Add GDDR5 vs sys RAM DDR3 and the difference widens even more.
Comes down to apples n oranges comparisons and Expectations
ie don't Expect a Volkswagon to beat that Vette off the line - live with it!

The shader units are a result of how many SM units there are in the GPU. For Maxwell there are 32 shaders per scheduler & 4 schedulers per SM. GTX 960 with 1024 shaders 1024/32/4, or more simply 1024/128, = 8 SM. A view with clinfo, or the output of the Open CL apps, will refer to the SM as "Max compute units:"
For Intel GPUs the shaders are the same as what you see for "Max compute units:" in the OpenCL apps.
Then for ATI it can be found by taking Texture mapping units/4.
So everyone is doing it a completely different way!!!

The memory speed is important, but isn't used when calculating FLOPS. FLOPS just gives us the maximum potential of the GPU. Then depending on what memory is coupled to it will determine how much of that performance you can use. Just like using different speed of memory with the CPU.

I'm not sure, but I think there are 1 or 2 Volkswagens that can take a Vette off of the line. :P


SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1685286 · Report as offensive
Profile JBird Project Donor
Avatar

Send message
Joined: 3 Sep 02
Posts: 297
Credit: 325,260,309
RAC: 549
United States
Message 1685329 - Posted: 28 May 2015, 22:11:00 UTC - in response to Message 1685286.  

Ya, the math impossible to figure out and get *reported numbers to *match.
Either sandbagging or unknown multiplier or some other formula afoot.
ie - GPU Z v0.8.3 reports Maxwell GM206-A as SM5/Direct Compute 5
=
Actually, Best numbers I've seen from a what's-what and how many utility by far, has to be GPU Caps Viewer (Geeks 3D)Do scroll down to the yellow banner for v.1.23.0.2
Very comprehensive
I can't do math without Fractions in my results to save me......*!*
=
Is refreshing to see actual Live clock speed in stderr even tho it refers to a Kepler; at least its not referencing the cards Base
Tuning tuning schmooning ;)

ID: 1685329 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : setiathome v7 7.00 MultiBeam


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.