Cuda 42 app, dont wana split in two

Message boards : Number crunching : Cuda 42 app, dont wana split in two
Message board moderation

To post messages, you must log in.

AuthorMessage
The_Matrix
Volunteer tester

Send message
Joined: 17 Nov 03
Posts: 414
Credit: 5,827,850
RAC: 0
Germany
Message 1764050 - Posted: 11 Feb 2016, 13:25:46 UTC

Hy there.

Got here a GT 640 nvidia card. If its crunshing there is a usage of 75% per gpu.

I dont wana split it in 50%=50% . is there a way to let the graphics card crunsh with a full 100% usage ?

I´ve tryed the *.cfg files, nothing happend.

Any ongoing ideas ?

Greetings
ID: 1764050 · Report as offensive
Jeanette
Volunteer tester

Send message
Joined: 25 Apr 15
Posts: 55
Credit: 7,827,469
RAC: 0
Denmark
Message 1764053 - Posted: 11 Feb 2016, 14:16:35 UTC - in response to Message 1764050.  
Last modified: 11 Feb 2016, 14:22:10 UTC

I use an app_config.xml file placed in the ..\BOINC\projects\setiathome.berkeley.edu folder.

for running 2 concurrent wu on my GTX970M card with an i-7 (8 core) together with 5 CPU wu's contains:

<app_config>
<app>
<name>astropulse_v7</name>
<max_concurrent>8</max_concurrent>
<gpu_versions>
<gpu_usage>.5</gpu_usage>
<cpu_usage>.5</cpu_usage>
</gpu_versions>
</app>
<app>
<name>setiathome_v8</name>
<max_concurrent>8</max_concurrent>
<gpu_versions>
<gpu_usage>0.50</gpu_usage>
<cpu_usage>0.50</cpu_usage>
</gpu_versions>
</app>
<app>
<name>setiathome_v7</name>
<max_concurrent>8</max_concurrent>
<gpu_versions>
<gpu_usage>0.50</gpu_usage>
<cpu_usage>0.50</cpu_usage>
</gpu_versions>
</app>
</app_config>


I am experimenting with running 3 concurrent GPU wu's, testing if it performs better than running 2.
ID: 1764053 · Report as offensive
The_Matrix
Volunteer tester

Send message
Joined: 17 Nov 03
Posts: 414
Credit: 5,827,850
RAC: 0
Germany
Message 1764055 - Posted: 11 Feb 2016, 14:32:40 UTC

ok thats the way i already know, but wont do.

I´ll prefer to extend the usage at ONE workunit, until now i failed.
ID: 1764055 · Report as offensive
Jeanette
Volunteer tester

Send message
Joined: 25 Apr 15
Posts: 55
Credit: 7,827,469
RAC: 0
Denmark
Message 1764056 - Posted: 11 Feb 2016, 14:39:57 UTC - in response to Message 1764055.  
Last modified: 11 Feb 2016, 14:42:52 UTC

Sorry I misunderstood you.

I don't think it's possible to get 100% utilization of a GPU running one wu.
You'd probably have to code your own application - even Lunatics optimized applications cannot get the GPU to run 100% - at least not on my GPU's, but try Lunatics to see if they improve GPU utilazation.
ID: 1764056 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22189
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1764058 - Posted: 11 Feb 2016, 14:53:14 UTC

Jeanette is correct - there is no way to get a single work unit to utilize 100% of a GPU's cores - it is all to do with the way that the GPU does its own internal management of memory, cores etc.

Which is why we run multiple tasks per GPU
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1764058 · Report as offensive
Ulrich Metzner
Volunteer tester
Avatar

Send message
Joined: 3 Jul 02
Posts: 1256
Credit: 13,565,513
RAC: 13
Germany
Message 1764073 - Posted: 11 Feb 2016, 15:49:04 UTC

Jason mentioned an apparently unknown option of the CUDA apps:
http://setiathome.berkeley.edu/forum_thread.php?id=79019&postid=1763301#1763301
The command line option is "-poll", which actively polls the status of the GPU. This may increase GPU usage although at the cost of almost an entire core.
Aloha, Uli

ID: 1764073 · Report as offensive
The_Matrix
Volunteer tester

Send message
Joined: 17 Nov 03
Posts: 414
Credit: 5,827,850
RAC: 0
Germany
Message 1764105 - Posted: 11 Feb 2016, 17:37:24 UTC
Last modified: 11 Feb 2016, 18:09:40 UTC

Greatfull, on cuda5.0 its about 90-100% (jumping), on cuda4.2 about 85-99 %.

I keep looking on the runing times.

My goodness, everywhere it takes to LONG ...
ID: 1764105 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1765152 - Posted: 15 Feb 2016, 10:52:29 UTC - in response to Message 1764058.  

there is no way to get a single work unit to utilize 100% of a GPU's cores - it is all to do with the way that the GPU does its own internal management of memory, cores etc.

I've been pondering this statement since I read it and don't think it's correct.

As far as I know:
- any task will always be divided in equal parts over all available cores inside a GPU or its compute units.
- the GPU Load as seen in an application such as GPU-Z will not be 100% with a single task.

The significance of that statement is different than what you are saying, Rob.

The biggest problem with the GPU Load in GPU-Z is that no one really knows what it means, also because GPU vendors don't explain how it is determined. So one can read it directly off of the GPU, but there is no explanation on what the numbers actually signify.
ID: 1765152 · Report as offensive
The_Matrix
Volunteer tester

Send message
Joined: 17 Nov 03
Posts: 414
Credit: 5,827,850
RAC: 0
Germany
Message 1765153 - Posted: 15 Feb 2016, 10:57:05 UTC

is there any other software that can display the usage correctly ? Using NvidiaInspector too, but is it "good" ?
ID: 1765153 · Report as offensive
Jeanette
Volunteer tester

Send message
Joined: 25 Apr 15
Posts: 55
Credit: 7,827,469
RAC: 0
Denmark
Message 1765157 - Posted: 15 Feb 2016, 11:27:40 UTC - in response to Message 1765153.  
Last modified: 15 Feb 2016, 11:29:41 UTC

I use Open Hardware Monitor http://openhardwaremonitor.org] it seems to give good results and you can let it draw a graph to show usage over time (up to 24 hours)
ID: 1765157 · Report as offensive
Profile Bill G Special Project $75 donor
Avatar

Send message
Joined: 1 Jun 01
Posts: 1282
Credit: 187,688,550
RAC: 182
United States
Message 1765187 - Posted: 15 Feb 2016, 15:20:35 UTC - in response to Message 1765157.  

I have used these handy graphing programs for a long time: http://8gadgetpack.net/
Sometimes a bit of a pain to load but worth the effort. Oh, I use the All CPU meter and the GPU meter. CPU meter shows all cores and the GPU meter shows one GPU at a time (if you have more than one).
They used to be gadgets in W7 but now work in W10 as well.

SETI@home classic workunits 4,019
SETI@home classic CPU time 34,348 hours
ID: 1765187 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1765359 - Posted: 16 Feb 2016, 4:25:44 UTC - in response to Message 1765152.  

there is no way to get a single work unit to utilize 100% of a GPU's cores - it is all to do with the way that the GPU does its own internal management of memory, cores etc.

I've been pondering this statement since I read it and don't think it's correct.

As far as I know:
- any task will always be divided in equal parts over all available cores inside a GPU or its compute units.
- the GPU Load as seen in an application such as GPU-Z will not be 100% with a single task.

The significance of that statement is different than what you are saying, Rob.

The biggest problem with the GPU Load in GPU-Z is that no one really knows what it means, also because GPU vendors don't explain how it is determined. So one can read it directly off of the GPU, but there is no explanation on what the numbers actually signify.

I figured it was a measurement of used clock cycles like is used for CPU load.

Also just because all the clock cycles are used doesn't mean that a processor is under full load. Much like you can rev an engine to max RPM while not in gear.
I demonstrated this to a friend of mine in the 90's by making an app that just did a simple calculation, like 1+1, indefinitely. Apps would show 100% CPU usage, but the CPU temp would not change much from idle. Where something like setiathome.exe would show a significant increase in CPU temp.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1765359 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1765392 - Posted: 16 Feb 2016, 8:24:53 UTC - in response to Message 1765359.  

I figured it was a measurement of used clock cycles like is used for CPU load.

CPU Load is an average of the load of all the CPU cores per time unit. And even then, it's not that simple. Read this Wiki article for that.

But CPU Load cannot be simply compared to GPU Load, for e.g. when throwing kernels at an Nvidia GPU, it's not that half its CUDA cores can be fully loaded, a quarter half loaded and a quarter not loaded. That's not how a GPU works, they're either all fully loaded, or all not loaded. On or off, with all cores at the same time.

So one Seti task running on the CPU will fully load one CPU core.
That same Seti task will fully load one GPU with 1280 cores, running with 1280 parts of the task divided in 1280 kernels. This is where the speed-up of the calculations come from, the ability to do all the work simultaneously on multiple cores.

When running two tasks on the GPU, it's not that you run task one on half the cores and task two on the other half. Instead the tasks are switching in and out of memory, task one takes all the cores, task two takes all the cores, task one takes all the cores, etc.
ID: 1765392 · Report as offensive
The_Matrix
Volunteer tester

Send message
Joined: 17 Nov 03
Posts: 414
Credit: 5,827,850
RAC: 0
Germany
Message 1765397 - Posted: 16 Feb 2016, 8:36:30 UTC

@Ageless

At so as i it understand, 2 gpu workunits cost an extra time of calculation and energy, those who not be necessary when the gpu cores would realy split by 50 % percent each ?
ID: 1765397 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1765400 - Posted: 16 Feb 2016, 8:51:28 UTC
Last modified: 16 Feb 2016, 8:52:16 UTC

Sometimes running 2 or more tasks at a time will help.

GPU kernel syncronisation. CUDASYNC; in current code forces CPU to wait for completion of all GPU processing in that process. Then the GPU waits for the CPU to send in new data via PCIE bus or GPU waits for a new kernel to be launced. That leaves plenty of time to process other tasks on GPU if they are there waiting ready to be processed.

There is an overhead in task switching.

One way to keep GPU busy is to use GPU queues. Data transfer to and from main memory (CPU) can overlap processing, and processing can be performed in multiple queues allowing many kernels to run side by side if those kernels do not utilize the GPU fully. That happens sometimes when the data is such that it can not be processed by all cores or SMM/SMX units or the algorithmic limitations prevent the use of all available resources.

Overlapping data transfer and kernels (chirp, pulse find, gauss fitting, autocorrelations, spike finding, sending data to GPU, receiving results, ...) can result to massive speed-ups. Shorties can be done in 67 seconds and 0.06 ar tasks can be run in 14 minutes. Mid range tasks in 3 minutes (under 200 sec).

One major gain in performance is related to autocorrelations. The current implementation needs to repack (expand actually) existing data to four times as large as it originally is. That strains the GPU memory (read-(expand)-write-reread(for processing)).
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1765400 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1765403 - Posted: 16 Feb 2016, 9:04:50 UTC - in response to Message 1765400.  

Yep, Sooo much fun ahead :)
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1765403 · Report as offensive

Message boards : Number crunching : Cuda 42 app, dont wana split in two


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.