Message boards :
Number crunching :
Cuda 42 app, dont wana split in two
Message board moderation
Author | Message |
---|---|
The_Matrix Send message Joined: 17 Nov 03 Posts: 414 Credit: 5,827,850 RAC: 0 |
Hy there. Got here a GT 640 nvidia card. If its crunshing there is a usage of 75% per gpu. I dont wana split it in 50%=50% . is there a way to let the graphics card crunsh with a full 100% usage ? I´ve tryed the *.cfg files, nothing happend. Any ongoing ideas ? Greetings |
Jeanette Send message Joined: 25 Apr 15 Posts: 55 Credit: 7,827,469 RAC: 0 |
I use an app_config.xml file placed in the ..\BOINC\projects\setiathome.berkeley.edu folder. for running 2 concurrent wu on my GTX970M card with an i-7 (8 core) together with 5 CPU wu's contains: <app_config> <app> <name>astropulse_v7</name> <max_concurrent>8</max_concurrent> <gpu_versions> <gpu_usage>.5</gpu_usage> <cpu_usage>.5</cpu_usage> </gpu_versions> </app> <app> <name>setiathome_v8</name> <max_concurrent>8</max_concurrent> <gpu_versions> <gpu_usage>0.50</gpu_usage> <cpu_usage>0.50</cpu_usage> </gpu_versions> </app> <app> <name>setiathome_v7</name> <max_concurrent>8</max_concurrent> <gpu_versions> <gpu_usage>0.50</gpu_usage> <cpu_usage>0.50</cpu_usage> </gpu_versions> </app> </app_config> I am experimenting with running 3 concurrent GPU wu's, testing if it performs better than running 2. |
The_Matrix Send message Joined: 17 Nov 03 Posts: 414 Credit: 5,827,850 RAC: 0 |
ok thats the way i already know, but wont do. I´ll prefer to extend the usage at ONE workunit, until now i failed. |
Jeanette Send message Joined: 25 Apr 15 Posts: 55 Credit: 7,827,469 RAC: 0 |
Sorry I misunderstood you. I don't think it's possible to get 100% utilization of a GPU running one wu. You'd probably have to code your own application - even Lunatics optimized applications cannot get the GPU to run 100% - at least not on my GPU's, but try Lunatics to see if they improve GPU utilazation. |
rob smith Send message Joined: 7 Mar 03 Posts: 22189 Credit: 416,307,556 RAC: 380 |
Jeanette is correct - there is no way to get a single work unit to utilize 100% of a GPU's cores - it is all to do with the way that the GPU does its own internal management of memory, cores etc. Which is why we run multiple tasks per GPU Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Ulrich Metzner Send message Joined: 3 Jul 02 Posts: 1256 Credit: 13,565,513 RAC: 13 |
Jason mentioned an apparently unknown option of the CUDA apps: http://setiathome.berkeley.edu/forum_thread.php?id=79019&postid=1763301#1763301 The command line option is "-poll", which actively polls the status of the GPU. This may increase GPU usage although at the cost of almost an entire core. Aloha, Uli |
The_Matrix Send message Joined: 17 Nov 03 Posts: 414 Credit: 5,827,850 RAC: 0 |
Greatfull, on cuda5.0 its about 90-100% (jumping), on cuda4.2 about 85-99 %. I keep looking on the runing times. My goodness, everywhere it takes to LONG ... |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
there is no way to get a single work unit to utilize 100% of a GPU's cores - it is all to do with the way that the GPU does its own internal management of memory, cores etc. I've been pondering this statement since I read it and don't think it's correct. As far as I know: - any task will always be divided in equal parts over all available cores inside a GPU or its compute units. - the GPU Load as seen in an application such as GPU-Z will not be 100% with a single task. The significance of that statement is different than what you are saying, Rob. The biggest problem with the GPU Load in GPU-Z is that no one really knows what it means, also because GPU vendors don't explain how it is determined. So one can read it directly off of the GPU, but there is no explanation on what the numbers actually signify. |
The_Matrix Send message Joined: 17 Nov 03 Posts: 414 Credit: 5,827,850 RAC: 0 |
is there any other software that can display the usage correctly ? Using NvidiaInspector too, but is it "good" ? |
Jeanette Send message Joined: 25 Apr 15 Posts: 55 Credit: 7,827,469 RAC: 0 |
I use Open Hardware Monitor http://openhardwaremonitor.org] it seems to give good results and you can let it draw a graph to show usage over time (up to 24 hours) |
Bill G Send message Joined: 1 Jun 01 Posts: 1282 Credit: 187,688,550 RAC: 182 |
I have used these handy graphing programs for a long time: http://8gadgetpack.net/ Sometimes a bit of a pain to load but worth the effort. Oh, I use the All CPU meter and the GPU meter. CPU meter shows all cores and the GPU meter shows one GPU at a time (if you have more than one). They used to be gadgets in W7 but now work in W10 as well. SETI@home classic workunits 4,019 SETI@home classic CPU time 34,348 hours |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
there is no way to get a single work unit to utilize 100% of a GPU's cores - it is all to do with the way that the GPU does its own internal management of memory, cores etc. I figured it was a measurement of used clock cycles like is used for CPU load. Also just because all the clock cycles are used doesn't mean that a processor is under full load. Much like you can rev an engine to max RPM while not in gear. I demonstrated this to a friend of mine in the 90's by making an app that just did a simple calculation, like 1+1, indefinitely. Apps would show 100% CPU usage, but the CPU temp would not change much from idle. Where something like setiathome.exe would show a significant increase in CPU temp. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
I figured it was a measurement of used clock cycles like is used for CPU load. CPU Load is an average of the load of all the CPU cores per time unit. And even then, it's not that simple. Read this Wiki article for that. But CPU Load cannot be simply compared to GPU Load, for e.g. when throwing kernels at an Nvidia GPU, it's not that half its CUDA cores can be fully loaded, a quarter half loaded and a quarter not loaded. That's not how a GPU works, they're either all fully loaded, or all not loaded. On or off, with all cores at the same time. So one Seti task running on the CPU will fully load one CPU core. That same Seti task will fully load one GPU with 1280 cores, running with 1280 parts of the task divided in 1280 kernels. This is where the speed-up of the calculations come from, the ability to do all the work simultaneously on multiple cores. When running two tasks on the GPU, it's not that you run task one on half the cores and task two on the other half. Instead the tasks are switching in and out of memory, task one takes all the cores, task two takes all the cores, task one takes all the cores, etc. |
The_Matrix Send message Joined: 17 Nov 03 Posts: 414 Credit: 5,827,850 RAC: 0 |
@Ageless At so as i it understand, 2 gpu workunits cost an extra time of calculation and energy, those who not be necessary when the gpu cores would realy split by 50 % percent each ? |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
Sometimes running 2 or more tasks at a time will help. GPU kernel syncronisation. CUDASYNC; in current code forces CPU to wait for completion of all GPU processing in that process. Then the GPU waits for the CPU to send in new data via PCIE bus or GPU waits for a new kernel to be launced. That leaves plenty of time to process other tasks on GPU if they are there waiting ready to be processed. There is an overhead in task switching. One way to keep GPU busy is to use GPU queues. Data transfer to and from main memory (CPU) can overlap processing, and processing can be performed in multiple queues allowing many kernels to run side by side if those kernels do not utilize the GPU fully. That happens sometimes when the data is such that it can not be processed by all cores or SMM/SMX units or the algorithmic limitations prevent the use of all available resources. Overlapping data transfer and kernels (chirp, pulse find, gauss fitting, autocorrelations, spike finding, sending data to GPU, receiving results, ...) can result to massive speed-ups. Shorties can be done in 67 seconds and 0.06 ar tasks can be run in 14 minutes. Mid range tasks in 3 minutes (under 200 sec). One major gain in performance is related to autocorrelations. The current implementation needs to repack (expand actually) existing data to four times as large as it originally is. That strains the GPU memory (read-(expand)-write-reread(for processing)). To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Yep, Sooo much fun ahead :) "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.