Massive reduction in GPU processing speed - help needed

Questions and Answers : GPU applications : Massive reduction in GPU processing speed - help needed
Message board moderation

To post messages, you must log in.

AuthorMessage
chris lee

Send message
Joined: 30 May 04
Posts: 9
Credit: 22,759,278
RAC: 0
United Kingdom
Message 1787648 - Posted: 15 May 2016, 6:01:26 UTC

Hi

Over the last week or two, the time taken to complete a work unit (v8 cuda 50) on my 980 Ti has increased from about 25 minutes to an hour or more. I have not changed any operating variables.

For several months, the GPU has been working on 5 units at a time without a problem. CPU usage per wu is 0.04. CPU usage is normally set to 80% of CPUs at most 80% of CPU time.

I had experimented with running 3,4 or 5 work units concurrently on the GPU and found peak output to be at 5.

RAC for the computer has fallen from about 35,000 to abut 25,000.

Currently trying just a single work unit on the GPU and this does appear to be a lot faster, but it should!

Link to my computers: http://setiathome.berkeley.edu/hosts_user.php?userid=7868408

I have checked things like GPU processor speed and operating temperatures. All are as they should be (1354 MHz core, 3304 MHz memory, 68 C and fan duty at 70%)

My laptop has not experienced a similar reduction in processing output and therefore the problem does appear to be localised in my desktop.

Any ideas or suggestions appreciated.



System: 3770K 4.4GHz, 16 GB RAM 2133, Asus P8Z77 mobo, Palit Super Jet 980 Ti GPU. PSU 850 W
ID: 1787648 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22160
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1787652 - Posted: 15 May 2016, 6:10:06 UTC

Five tasks at a time on a gtx980ti is far too many, reduce this to two or three and you will find the performance will increase dramatically - with a GTX980 I was seeing about 18-20 minutes a task, but when "VLARS" were released this shot up to nearer the hour, dropping back to 2 tasks the time has dropped back to about 20 minutes.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1787652 · Report as offensive
chris lee

Send message
Joined: 30 May 04
Posts: 9
Credit: 22,759,278
RAC: 0
United Kingdom
Message 1787657 - Posted: 15 May 2016, 6:55:16 UTC - in response to Message 1787652.  

Thanks for the reply Bob,

I have changed the number of tasks to 3 and will see what effect that has. So far though, it does appear to have reduced the time back down to about 20 minutes per work unit.

It seems that 'VLARS' has reduced the output quite significantly. For quite a few months I was crunching 5 at a time quite happily. Having to reduce to 3 is therefore quite a reduction.
ID: 1787657 · Report as offensive
Profile tullio
Volunteer tester

Send message
Joined: 9 Apr 04
Posts: 8797
Credit: 2,930,782
RAC: 1
Italy
Message 1787695 - Posted: 15 May 2016, 15:34:50 UTC

I am crunching one GPU task at the time both on the GTX 750 Ti on my Windows 10 PC and the AMD HD 7770 on a Linux host. So when I installed the new Nvidia driver on the Windows 10 PC I lost only one of the three Einstein@home GPU tasks. Only one was running, the other 2 were waiting to start.
Tullio
ID: 1787695 · Report as offensive
chris lee

Send message
Joined: 30 May 04
Posts: 9
Credit: 22,759,278
RAC: 0
United Kingdom
Message 1787711 - Posted: 15 May 2016, 17:13:56 UTC - in response to Message 1787695.  
Last modified: 15 May 2016, 17:31:02 UTC

When running 2 tasks at a time, it takes between 11 and 15 minutes per task to complete.

When running 3, it normally takes between 15 22 minutes per task.

I have tested both settings with about 25 tasks completed which I realise is too small a sample to give any real evidence but that is why I am raising this post.

There are a bunch of tasks that seem to take 2 or 3 times longer to complete, regardless of how many are being worked on. From what I can see, they are all have a name that starts 'blc2_2bit_guppi' and take 35 to 55 minutes each.

Prior to making my OP this morning, I had changed no settings and my recent RAC had dropped from about 45,000 to 35,000 as can be seen here http://www.teamocuk.co.uk/cpartcred.php?p=SAH&u=7868408

All a bit confusing knowing what settings to apply!

EDIT:
A look at the properties of some of the work units, and one of them that took 35 minutes required about 184,000 GFLOPs whereas one that took 10 minutes required 181,000 GFLOPs. More head scratching!!!
ID: 1787711 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1787737 - Posted: 15 May 2016, 22:03:21 UTC - in response to Message 1787711.  

More head scratching!!!

If you want to read:
"Average Credit Decreasing?"
http://setiathome.berkeley.edu/forum_thread.php?id=79418

"GBT ('guppi') .vlar tasks will be send to GPUs, what you think about this?"
http://setiathome.berkeley.edu/forum_thread.php?id=79548
 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1787737 · Report as offensive
chris lee

Send message
Joined: 30 May 04
Posts: 9
Credit: 22,759,278
RAC: 0
United Kingdom
Message 1787842 - Posted: 16 May 2016, 8:31:58 UTC - in response to Message 1787737.  

Thanks for the links BilBg.

I had a (quick) read through them and it's quite interesting that these vlars are causing a perceived problem.

I'm not overly worried about RAC as this is just a number at the end of the day. My personal concern is from a desire to contribute to SETI and therefore to know that my efforts are as substantial as possible. It is BOINC / SETI who introduced RAC as a way to measure that. So that is what I naturally use to monitor efficiency.

All that I can see is that the VLARS work units appear to cause a massive reduction in computing output as measured by RAC. The number of GLOPS is more or less the same, but the running time is 3 times longer. That represents a 3 fold reduction in computational output, whatever the unit of measure used.

I will just have to delete vlars units manually and download / upload in batches if that is the only currently viable workaround and hope that a better fix is found.
ID: 1787842 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22160
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1787856 - Posted: 16 May 2016, 13:19:07 UTC

Eric has warned that thee may be periods that there will only be data from the GBT coming forth, and most of that is VLARs, so you could well shoot yourself in the foot by deleting them
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1787856 · Report as offensive
chris lee

Send message
Joined: 30 May 04
Posts: 9
Credit: 22,759,278
RAC: 0
United Kingdom
Message 1787867 - Posted: 16 May 2016, 14:52:25 UTC - in response to Message 1787856.  

If there is no other data than VLARs then I will happily work on those and still know it is the best I can do. Does not make an awful lot of difference to me other than the stated desire to contribute what I can.

The shooting of oneself in the foot may be more appropriate for the person / organisation that reduces the effective performance of their own system by a factor of 3....

In the meantime, I have a workaround thanks to the help you have given me regarding number of processes to run on my GPU - thank you.

Now at 3 tasks consecutively with average time of 13 minutes per task versus the original 5 at a time x 20 minutes each = almost same throughput.

I did try 4 at a time, but average time then dropped to 18 mins and 10 times slower if a VLAR does get worked on.
ID: 1787867 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1787900 - Posted: 16 May 2016, 16:40:50 UTC - in response to Message 1787867.  
Last modified: 16 May 2016, 17:22:57 UTC

Very Low Angle Range (VLAR) = WU recorded when the Telescope is looking at one point in the sky (dedicated observation) = most valuable data

"Normal" Angle Range (~0.42) = WU recorded when the Telescope is "not moving", just rotates with the Earth
- for 107 seconds (the time length of a WU) the Telescope passes 0.42° of the sky:
  (360/86400)*107 = 0.44°
[I'm not sure why in my mind "Normal" Angle Range is ~0.42 but this calculation gives 0.44° - maybe because 86400 seconds (24 h) is the period for Earth to turn again to the Sun and not to distant stars...]

Very High Angle Range (VHAR) = WU recorded when someone moves the Telescope fast through the sky (VHAR WUs are also known/called in this forum as "shorties", they take much less time to compute)

P.S.
If you want very technical explanation from one of the developers/programmers (why GPUs are slow on VLAR)
http://setiathome.berkeley.edu/forum_thread.php?id=79418&postid=1787909#1787909
 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1787900 · Report as offensive
chris lee

Send message
Joined: 30 May 04
Posts: 9
Credit: 22,759,278
RAC: 0
United Kingdom
Message 1787978 - Posted: 16 May 2016, 21:20:03 UTC - in response to Message 1787900.  
Last modified: 16 May 2016, 21:31:04 UTC

Thank you for the explanation. I hadnt even worked out what VLAR stood for and was a bit nervous to ask!

The science involved is beyond my understanding (although I now do understand what the different work units are at least!).


Having people monitor their own contributions, and possibly even compete in a friendly manner with each other, is one of the reasons distributed computing is both popular and useful in my opinion. The developers of BOINC and SETI perhaps need to understand that the average user (me) does not understand nor perhaps even want to understand the exact science that goes on here but does want a simple way to measure the contribution.

If we were talking about the exchange of labour or services then currency would be the correct unit of measure. I dont need to be an economist to use money or even earn it. I just need to know its relative value.

Anyway, the science is fascinating, I do support the project and am happy that I am able to contribute in what I feel is a meaningful way putting to use the processing power of my PC when Im not using it. A win - win scenario
ID: 1787978 · Report as offensive
chris lee

Send message
Joined: 30 May 04
Posts: 9
Credit: 22,759,278
RAC: 0
United Kingdom
Message 1788390 - Posted: 18 May 2016, 11:20:51 UTC

Been doing some experimenting.

When not running vlars units, the 980 ti is most efficient when working on 4 units concurrently. My GPU outpus an average of 14.7 units per hour.

At 3 concurrent work units, that drops to 10.6 per hour.

At 5 concurrent work units, it's 13.

I am allocating 0.04% CPU usage per work unit.

With VLARS work units, what I have noticed is that the CPU usage is several orders of magnitude greater than the set 0.04%. Task manager shows 13% for EACH (opencl_nvidia_SoG) process that is running.

I believe this is what had caused the vast reduction in output I had noticed originally. Running 4 or 5 VLAS processes was requiring 60 to 70% of CPU power and not the expected 0.2% (5 x 0.04% allocated per process).

What I am now doing is reducing CPU usage for non GPU processes to about 30% leaving plenty of headroom.

The GPU has just completed 4 VLARS units concurrently at an average time of 59 minutes each. That is however using 4 x 13% = 52% of available CPU processing power in addition to the GPU.

Hope this information may be helpful.
ID: 1788390 · Report as offensive
Profile Galacticminor

Send message
Joined: 28 Apr 02
Posts: 7
Credit: 9,526,884
RAC: 11
United Kingdom
Message 1789341 - Posted: 21 May 2016, 22:20:14 UTC

This may be a silly question, but does more work get done by processing 2-3 work units on the GPU at once as opposed to just one at a time? If its the same then its probably simpler and more efficient just to leave it doing one at a time...

-Andrew
ID: 1789341 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1789358 - Posted: 21 May 2016, 23:37:04 UTC - in response to Message 1789341.  

Depending on your GPU.

But simply, yes.

You will need to run the multiple instance on your GPU and see what your times are.

Take the time it takes to do 2 and divide that by 2.

Compare this to the time it took to do 1 work unit at a time

Time to run 3 and divide by 3 Same thing here, compare it to the time to do 1 at a time

Whichever gives you the lowest number is the best setting.
ID: 1789358 · Report as offensive
chris lee

Send message
Joined: 30 May 04
Posts: 9
Credit: 22,759,278
RAC: 0
United Kingdom
Message 1789743 - Posted: 23 May 2016, 9:53:10 UTC - in response to Message 1789358.  

Just remember to compare like for like (which was my original problem),

The VLARS work units take an hour or more whereas the non VLARS take 10 to 20 minutes depending on how many you crunch at a time.
ID: 1789743 · Report as offensive

Questions and Answers : GPU applications : Massive reduction in GPU processing speed - help needed


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.