Message boards :
Number crunching :
Feeding APs to nvidia GPU: CPU usage
Message board moderation
Author | Message |
---|---|
atlov Send message Joined: 11 Aug 12 Posts: 35 Credit: 32,718,664 RAC: 34 |
So, I decided to crunch some APs with my GTX660 for the first time. I'm using the Lunatics apps without any change to app_info.xml or to the cmdline parameters. I noticed the task uses one full CPU core to feed the GPU, which is not the case when crunching MBs. GPU load is permanently >97% and the memory controller load strongly fluctuates (seems legit, but 2xMBs take about 70% permanently). Is this CPU usage normal behaviour? I saw several wingmen with the same thing; on the other hand, some wingmen seem to use much less CPU time, but I cannot see any pattern here (GPU model, driver, OS, etc.). |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Most open_cl GPU apps, which the AP one is, need a full CPU core to feed the GPU. What you are seeing is normal. Cheers, Keith Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
atlov Send message Joined: 11 Aug 12 Posts: 35 Credit: 32,718,664 RAC: 34 |
How can you say it's normal? Your machines don't seem to use much CPU time while crunching AP. |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
So, I decided to crunch some APs with my GTX660 for the first time. I'm using the Lunatics apps without any change to app_info.xml or to the cmdline parameters. The NVIDIA implementation of OpenCL does tend to do that with the AP7 OpenCL_NV app. The -use_sleep option can be used to reduce the CPU usage drastically, but adds some additional CPU latency so run time grows somewhat. Increasing -unroll and the -ffa_block* settings helps counteract that runtime increase. For your GTX660, perhaps the 750Ti settings from the Readme might be fairly good: -use_sleep -unroll 10 -oclFFT_plan 256 16 512 -ffa_block 12288 -ffa_block_fetch 6144 Joe |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
What I failed to state was that I use the use_sleep option along with customized unroll settings for my GPU trying to gain maximum performance for AP tasks. I use these settings for my 970s: -use_sleep -unroll 18 -oclFFT_plan 256 16 256 -ffa_block 16384 -ffa_block_fetch 8192 -tune 1 64 4 1 -tune 2 64 4 1 -hp This is in the ap_cmdline_win_x86_SSE2_OpenCL_NV.txt file. As Josef pointed out the sleep option dramatically reduces the amount of CPU time but incurs a latency hit. I also run with only 0.5 CPU usage per 0.33 GPU usage in my app_config file. I am willing to suffer the longer AP GPU runtimes so as to continue to crunch simultaneous CPU tasks. If I was looking for the shortest GPU runtimes, I wouldn't run CPU tasks at all. Just seems a waste to not use the CPU since its got to feed the GPUs anyway. The amount of power consumed per task for the CPU is much higher than the GPU but I only run during the day now while a I make free solar power and summer temps have returned. Cheers, Keith Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
atlov Send message Joined: 11 Aug 12 Posts: 35 Credit: 32,718,664 RAC: 34 |
Thanks folks for your quick answers. I played with some AP resends that came in some days ago using the cmdline suggested by Josef: Running 1xAP results in approx. the same crunching time (2250s vs. 2270s with default cmdline) for comparable workunits (0% blanking). CPU usage is significantly reduced. The drawback is the appearence of screen lags. 2xAPs simultaneously still scale quite well (4666s both), but screen lags get worse. Running 1xAP+1xMB slows down the MB, so it will take as long as the AP. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.