Fastest MB/AP cmdline settings for a NV GTX980Ti?

Message boards : Number crunching : Fastest MB/AP cmdline settings for a NV GTX980Ti?
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1735539 - Posted: 20 Oct 2015, 0:21:43 UTC
Last modified: 20 Oct 2015, 0:21:43 UTC

It would be nice if someone could tell me the fastest cmdline settings for a NV GTX980Ti VGA card.

Which...
pfblockspersm = N
pfperiodsperlaunch = N

...settings for MB, and which...
...settings for AP?

How much MB and AP tasks simultaneously, both 0.33 - yes?

Thanks.
ID: 1735539 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1735543 - Posted: 20 Oct 2015, 0:35:24 UTC - in response to Message 1735539.  
Last modified: 20 Oct 2015, 0:37:05 UTC

Not sure if it's the fastest but here is my commandline for APs

-use_sleep -unroll 18 -oclFFT_plan 256 16 256 -ffa_block 16384 -ffa_block_fetch 8192 -tune 1 64 4 1 -tune 2 64 4 1 -hp



For MBs replace this in the last few lines of the mbcuda.cfg

;[bus1slot0]
;;; Optional GPU specifc control (requires Cuda 3.2 or newer app), example
processpriority = normal
pfblockspersm = 16
pfperiodsperlaunch = 400

Edit..

Make sure you have enough CPU for all your work units and if running more than 2 GPUs leave an extra core free.

Zalster
ID: 1735543 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1735556 - Posted: 20 Oct 2015, 1:23:44 UTC - in response to Message 1735543.  
Last modified: 20 Oct 2015, 1:38:22 UTC

On my J1900 CPU incl. iGPU and NV GT730 I don't reserve CPU-threads for GPU app support (on this PC this wouldn't make sense).
On my two E5-2630v2 CPUs (HT off) with four R9 Fury X's, every GPU app get his own CPU-thread for support.

How it would be with a NV GTX980Ti with i7-5930K CPU (6Cores/12Threads)?
Each AP GPU app get his own CPU-thread?
Each MB GPU app get also his own CPU-thread?

...or for both (app_info.xml file entry):
<avg_ncpus>0.34</avg_ncpus>
<max_ncpus>0.34</max_ncpus>

...so 3 GPU apps get 1 CPU-thread?

And HT off, so just 6 CPU-Cores?

Soon there will be a 2nd GTX980Ti in this PC also.

Thanks.
ID: 1735556 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1735569 - Posted: 20 Oct 2015, 1:52:58 UTC - in response to Message 1735556.  
Last modified: 20 Oct 2015, 2:07:07 UTC

When you use a single GPU you don't need to reserve a second CPU core.

When you do add that second 980Ti then, yes you will need to reserve a second CPU core to help support the gpus.

The debate on hyperthreading goes both ways. I would say leave it on but for simplicity lets say leave it off


for both (app_info.xml file entry): I would use 0.33 instead of 0.34. Here is why. If you use 0.34

<avg_ncpus>0.34</avg_ncpus>
<max_ncpus>0.34</max_ncpus>

Then with just 1 GPU that would be 1.2 Cores. (at this point someone is going to say the computer is going to round up to 2 full cores). So then let us use 0.33 instead. When you do that it's 0.99 rounded to 1.

When you add your second GPU that will be 2 cores of your 6 cores.

You would want to leave 2 of the remaining 4 cores free.

That would allow 2 CPU work units if you choose to do that.

If you are planning on crunching on the CPU as well, then you might want to look at adding a

project_max_concurrent

to a app_config.xml so that you can limit the total number or work units. In this case 3 work units per 2 GPU is 6 plus the 2 CPU work units so the max concurrent would be 8

<project_max_concurrent>8</project_max_concurrent>

You won't need this when you are only crunching on 1 GPU.

If you choose to use the hyperthread then you have to rethink the math.

Edited to show where the project max concurrent goes into a app_config.xml
ID: 1735569 · Report as offensive
Profile AyalaZero
Volunteer tester

Send message
Joined: 14 Aug 05
Posts: 21
Credit: 10,910,119
RAC: 0
United States
Message 1735615 - Posted: 20 Oct 2015, 4:04:34 UTC - in response to Message 1735543.  
Last modified: 20 Oct 2015, 4:04:34 UTC

How do I change the commandline for APs? aka, where do i put this line...

"-use_sleep -unroll 18 -oclFFT_plan 256 16 256 -ffa_block 16384 -ffa_block_fetch 8192 -tune 1 64 4 1 -tune 2 64 4 1 -hp"

I installed the Lunatics installer, with the correct check boxes marked off. It is running pretty fast right now, but... if I can make it go faster, why not?? :)

Thank you,
AyalaZero
ID: 1735615 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1735617 - Posted: 20 Oct 2015, 4:21:15 UTC - in response to Message 1735615.  
Last modified: 20 Oct 2015, 4:22:15 UTC

This will require you having the ProgramData folder unhidden.

If you have done that already then I will list the series of folders you need to go thru to get to the commandline file. If not skip down to the bottom of this post and follow the instructions there.

Computer then click on Local Disk-->ProgramData--->BOINC--->projects---setiathome.berkeley.edu

You will now be in the Setiathome folder.

You have to look for the following file

ap_cmdline_win_x86_SSE2_OpenCL_NV

When you find it, you want to RIGHT click on the file, this will bring up pop up screen, select EDIT (Not Open)

It will ask you what program you want to use to edit this file

Select Notepad

Copy the commands that I listed exactly. In other words, highlight the entire line and right click copy. The reason I suggest this is if you put an extra space or leave out a space it will affect how it works

Once you have the file open, click on the open file and right click and select paste.

Select Save from the pull down then close the box

That should do it.

The next time APs are split it should read these new commands.

If you have never have unhidden your folders, here is how to do it..

Follow these steps to display hidden files and folders.
Open Folder Options by clicking the Start button , clicking Control Panel, clicking Appearance and Personalization, and then clicking Folder Options.
Click the View tab.
Under Advanced settings, click Show hidden files and folders, and then click OK.

If you intend on modifying the mbcuda.cfg Do the same with using edit and the notepad.

Good luck....
ID: 1735617 · Report as offensive
Profile AyalaZero
Volunteer tester

Send message
Joined: 14 Aug 05
Posts: 21
Credit: 10,910,119
RAC: 0
United States
Message 1735624 - Posted: 20 Oct 2015, 4:32:20 UTC - in response to Message 1735617.  
Last modified: 20 Oct 2015, 4:32:20 UTC

Ok, thanks I have done that, now I will have to wait until I receive more AP tasks. As for the mbcuda.cfg file, do I delete what is already on there? I replaced the last few lines and left the instructions on there as well. This is what I have on the mbcuda.cfg file

;;; This configuration file is for optional control of Cuda Multibeam x41zc
;;; Currently, the available options are for
;;; application process priority control (without external tools), and
;;; per gpu priority control (useful for multiple Cuda GPU systems)
[mbcuda]
;;;;; Global applications settings, to apply to all Cuda devices
;;; You can uncomment the processpriority line below, by removing the ';', to engage machine global priority control of x41x
;;; possible options are 'belownormal' (which is the default), 'normal', 'abovenormal', or 'high'
;;; For dedicated crunching machines, 'abovenormal' is recommended
;;; raising global application priorities above the default
;;; may have system dependant usability effects, and can have positive or negative effects on overall throughput
;processpriority = abovenormal
;;; Pulsefinding: Advanced options for long pulsefinds (affect display usability & long kernel runs)
;;; defaults are conservative.
;;; WARNING: Excessive values may induce display lag, driver timeout & recovery, or errors.
;;; pulsefinding blocks per multiprocessor (1-16), default is 1 for Pre-Fermi, 4 for Fermi or newer GPUs
;pfblockspersm = 8
;;; pulsefinding maximum periods per kernel launch (1-1000), default is 100, as per 6.09
;pfperiodsperlaunch = 200

;[bus1slot0]
;;; Optional GPU specifc control (requires Cuda 3.2 or newer app), example
processpriority = normal
pfblockspersm = 16
pfperiodsperlaunch = 400

Is this correct, I'm new to this optimization stuff. :\

Thanks so much for your help!!

AyalaZero
ID: 1735624 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1735626 - Posted: 20 Oct 2015, 4:35:48 UTC - in response to Message 1735624.  
Last modified: 20 Oct 2015, 4:35:48 UTC

Yes that is fine.

You can leave all the other stuff, it's only the last 3 lines that matter.

Be aware that these setting are for pure crunching machines.

If you try these on daily drivers, it may cause lagging or studdering in machines when you try to use them for other activities than crunching.

These changes cause work units to have higher priority of the systems resources than they normally would.

Good Luck.
ID: 1735626 · Report as offensive
Profile AyalaZero
Volunteer tester

Send message
Joined: 14 Aug 05
Posts: 21
Credit: 10,910,119
RAC: 0
United States
Message 1735631 - Posted: 20 Oct 2015, 4:39:30 UTC - in response to Message 1735626.  
Last modified: 20 Oct 2015, 4:39:30 UTC

Ok, thank you so much. I have saved backups of the original files in case I have to revert back. But, my system is set to suspend when NON-Boinc usage is beyond 4%... aka whenever I am using my computer for everyday stuff. :)

AyalaZero
ID: 1735631 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1735651 - Posted: 20 Oct 2015, 7:48:08 UTC - in response to Message 1735569.  
Last modified: 20 Oct 2015, 7:48:08 UTC

for both (app_info.xml file entry): I would use 0.33 instead of 0.34. Here is why. If you use 0.34

<avg_ncpus>0.34</avg_ncpus>
<max_ncpus>0.34</max_ncpus>

Then with just 1 GPU that would be 1.2 Cores. (at this point someone is going to say the computer is going to round up to 2 full cores). So then let us use 0.33 instead. When you do that it's 0.99 rounded to 1

No
0.99 is rounded to 0
1.20 is rounded to 1
1.99 is rounded to 1

And why 1.2 ?!
0.34 * 3 = 1.02 which is rounded to 1
0.34 * 4 = 1.36 which is rounded to 1

(Unless this kind of rounding/truncating was changed in latest BOINC versions)
 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1735651 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1735680 - Posted: 20 Oct 2015, 11:02:30 UTC - in response to Message 1735631.  
Last modified: 20 Oct 2015, 11:09:24 UTC

Ok, thank you so much. I have saved backups of the original files in case I have to revert back. But, my system is set to suspend when NON-Boinc usage is beyond 4%... aka whenever I am using my computer for everyday stuff. :)

AyalaZero

If you set such low % for suspendig crunching, always, maybe example you open a software/tool (and always this software/tool use CPU time), BOINC will suspend the project tasks crunching (CPU & GPU).
If the software/tool will use less % which is set, all project tasks start at the last checkpoint.
So in worst case, during the time you are on the PC, this above mentioned will happen in a loop, and you will not have project tasks progress.

AFAIK, worst case, the project tasks could finish with the 'too many exits' error.

I wouldn't use this settings.
The CPU project tasks use the lowest process priority.

If all above set (if HT off)...
1 CPU-Core for 3 GPU apps
5 CPU-Cores for project tasks crunching
Then you have 5 CPU-Cores immediately ready to do your daily work (the up to 5 project tasks will not suspended, they are just waiting for CPU time (no checkpoint needed)).

On my J1900 CPU incl. iGPU and NV GT730 PC (daily work PC)...
I have no CPU-Core reserved for GPU app crunching (at this PC this wouldn't make sense (iGPU and VGA card not very fast (iGPU & 1 CPU-Core, and CPU and VGA card ~ same performance))).
All GPU apps have process priority 'high'. All apps optimized.
24/7 (CPU & GPU) full loaded.
All daily work is running smoothly. :-)
ID: 1735680 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1735687 - Posted: 20 Oct 2015, 13:02:12 UTC - in response to Message 1735651.  
Last modified: 20 Oct 2015, 13:13:34 UTC


And why 1.2 ?!


It's called being awake too long and to tired to go back and recheck my math. Yes you are correct it is 1.02

Edit..
Since the title of this thread is Fastest MB/AP cmdline settings for a NV GTX980Ti, I took this to mean a full time cruncher.

The setting I gave you are for a dedicated crunching 24/7.

I even pointed out that if you use the computer for anything other than crunching, it might slow down other work done on that computer. If that is a concern, then yes, don't use them.

How do I know these work? Look at the stats pages. I can't speak for Petri, since he has some special mods on his system but I know Perano uses these on his machine and it's currently #1 for the last several months. My own machine trade spots with him at that position until I took it over to Einstein. But again, the choice is yours.
ID: 1735687 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1735696 - Posted: 20 Oct 2015, 14:03:44 UTC
Last modified: 20 Oct 2015, 14:09:20 UTC

Not sure if it's the fastest but here is my commandline for APs

-use_sleep -unroll 18 -oclFFT_plan 256 16 256 -ffa_block 16384 -ffa_block_fetch 8192 -tune 1 64 4 1 -tune 2 64 4 1 -hp


It also depends how many tasks are running at once.
If use_sleep is in place i`d suggest to increase -unroll to 22.
Thats for the 980TI only because it has 22 compute units.
If you experience some lags or other issues reduce ffa_block values to 12288 6144.
This is faster on the 980TI.


With each crime and every kindness we birth our future.
ID: 1735696 · Report as offensive
Profile AyalaZero
Volunteer tester

Send message
Joined: 14 Aug 05
Posts: 21
Credit: 10,910,119
RAC: 0
United States
Message 1735717 - Posted: 20 Oct 2015, 20:07:30 UTC - in response to Message 1735696.  
Last modified: 20 Oct 2015, 20:07:30 UTC

I want to have settings as though it is a full-time cruncher. :) I will revise the -unroll to 22 when I get home. HT is currently on though, do I NEED to turn this off? I guess I can leave it a week doing the current settings, and then turn off HT see which allows me to have a stable PC throughout. So far I haven't ran into any major Hiccups. I'll let you guys know what happens, for the next guy.

AyalaZero
ID: 1735717 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1735720 - Posted: 20 Oct 2015, 20:20:59 UTC - in response to Message 1735717.  
Last modified: 20 Oct 2015, 20:20:59 UTC

You can leave the Hyperthread on.

I do with my system.

The main thing to look at is to see if times are longer with HT on versus when it is off.
ID: 1735720 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1735721 - Posted: 20 Oct 2015, 20:25:09 UTC - in response to Message 1735696.  
Last modified: 20 Oct 2015, 20:25:09 UTC


It also depends how many tasks are running at once.
If use_sleep is in place i`d suggest to increase -unroll to 22.
Thats for the 980TI only because it has 22 compute units.
If you experience some lags or other issues reduce ffa_block values to 12288 6144.
This is faster on the 980TI.


Would you recommend 22 for the Titan Xs since they have 24 CU?
ID: 1735721 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1735731 - Posted: 20 Oct 2015, 21:25:11 UTC - in response to Message 1735721.  


It also depends how many tasks are running at once.
If use_sleep is in place i`d suggest to increase -unroll to 22.
Thats for the 980TI only because it has 22 compute units.
If you experience some lags or other issues reduce ffa_block values to 12288 6144.
This is faster on the 980TI.


Would you recommend 22 for the Titan Xs since they have 24 CU?


The Titan X can handle -unroll 24 easily.
In principle you can say 1 unroll per compute unit.
Just to make it clear this doesn`t count for mid range cards only high end.


With each crime and every kindness we birth our future.
ID: 1735731 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1735732 - Posted: 20 Oct 2015, 21:27:27 UTC - in response to Message 1735720.  
Last modified: 20 Oct 2015, 21:28:37 UTC

You can leave the Hyperthread on.

I do with my system.

The main thing to look at is to see if times are longer with HT on versus when it is off.


The sharper the timings the bigger the chance that the app will go into stall if no CPU cores are freed.
But thats on you.


With each crime and every kindness we birth our future.
ID: 1735732 · Report as offensive
Profile AyalaZero
Volunteer tester

Send message
Joined: 14 Aug 05
Posts: 21
Credit: 10,910,119
RAC: 0
United States
Message 1735839 - Posted: 21 Oct 2015, 2:52:42 UTC - in response to Message 1735732.  

I have another question. reading the Astropulse_OpenCL_NV README file, it states the max group size to be 1024. Is that the CUDA cores? if so the 980 ti has 2816, would I be able to modify my settings differently, or am I completely wrong in my assumption? Here are the Nvidea specs http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-980-ti/specifications

The READ ME states the following:

Tune values must be equal or less than max work group size.
Most modern Nvidia cards have work group size of 1024.

possible values:

-tune 1 256 4 1
-tune 1 128 8 1
-tune 1 64 16 1
-tune 1 32 32 1
-tune 1 16 64 1


Intensive testing highlighted -tune 1 64 8 1 -tune 2 64 8 1 to be fastest on mid range and high end GPU`s.
On entry level cards -tune 1 128 8 1 -tune 2 128 8 1 should be fastest.

Thanks in advance. Sorry, this is waaay over my knowledge-base.

AyalaZero
ID: 1735839 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1735842 - Posted: 21 Oct 2015, 2:59:51 UTC - in response to Message 1735839.  
Last modified: 21 Oct 2015, 3:00:17 UTC

The read me file probably is old. Mike, who answered stating using -unroll 22 ,is the "go to guy" on these matters.

In fact, I believe he wrote the Read me file.

The values I listed were given to me by him sometime early this year.

I would listen and follow any advice he has to offer on tuning the cards.
ID: 1735842 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : Fastest MB/AP cmdline settings for a NV GTX980Ti?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.