Message boards :
Number crunching :
Fastest MB/AP cmdline settings for a NV GTX980Ti?
Message board moderation
Author | Message |
---|---|
Sutaru Tsureku Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
It would be nice if someone could tell me the fastest cmdline settings for a NV GTX980Ti VGA card. Which... pfblockspersm = N pfperiodsperlaunch = N ...settings for MB, and which... ...settings for AP? How much MB and AP tasks simultaneously, both 0.33 - yes? Thanks. |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
Not sure if it's the fastest but here is my commandline for APs -use_sleep -unroll 18 -oclFFT_plan 256 16 256 -ffa_block 16384 -ffa_block_fetch 8192 -tune 1 64 4 1 -tune 2 64 4 1 -hp For MBs replace this in the last few lines of the mbcuda.cfg ;[bus1slot0] ;;; Optional GPU specifc control (requires Cuda 3.2 or newer app), example processpriority = normal pfblockspersm = 16 pfperiodsperlaunch = 400 Edit.. Make sure you have enough CPU for all your work units and if running more than 2 GPUs leave an extra core free. Zalster |
Sutaru Tsureku Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
On my J1900 CPU incl. iGPU and NV GT730 I don't reserve CPU-threads for GPU app support (on this PC this wouldn't make sense). On my two E5-2630v2 CPUs (HT off) with four R9 Fury X's, every GPU app get his own CPU-thread for support. How it would be with a NV GTX980Ti with i7-5930K CPU (6Cores/12Threads)? Each AP GPU app get his own CPU-thread? Each MB GPU app get also his own CPU-thread? ...or for both (app_info.xml file entry): <avg_ncpus>0.34</avg_ncpus> <max_ncpus>0.34</max_ncpus> ...so 3 GPU apps get 1 CPU-thread? And HT off, so just 6 CPU-Cores? Soon there will be a 2nd GTX980Ti in this PC also. Thanks. |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
When you use a single GPU you don't need to reserve a second CPU core. When you do add that second 980Ti then, yes you will need to reserve a second CPU core to help support the gpus. The debate on hyperthreading goes both ways. I would say leave it on but for simplicity lets say leave it off for both (app_info.xml file entry): I would use 0.33 instead of 0.34. Here is why. If you use 0.34 <avg_ncpus>0.34</avg_ncpus> <max_ncpus>0.34</max_ncpus> Then with just 1 GPU that would be 1.2 Cores. (at this point someone is going to say the computer is going to round up to 2 full cores). So then let us use 0.33 instead. When you do that it's 0.99 rounded to 1. When you add your second GPU that will be 2 cores of your 6 cores. You would want to leave 2 of the remaining 4 cores free. That would allow 2 CPU work units if you choose to do that. If you are planning on crunching on the CPU as well, then you might want to look at adding a project_max_concurrent to a app_config.xml so that you can limit the total number or work units. In this case 3 work units per 2 GPU is 6 plus the 2 CPU work units so the max concurrent would be 8 <project_max_concurrent>8</project_max_concurrent> You won't need this when you are only crunching on 1 GPU. If you choose to use the hyperthread then you have to rethink the math. Edited to show where the project max concurrent goes into a app_config.xml |
AyalaZero Send message Joined: 14 Aug 05 Posts: 21 Credit: 10,910,119 RAC: 0 |
How do I change the commandline for APs? aka, where do i put this line... "-use_sleep -unroll 18 -oclFFT_plan 256 16 256 -ffa_block 16384 -ffa_block_fetch 8192 -tune 1 64 4 1 -tune 2 64 4 1 -hp" I installed the Lunatics installer, with the correct check boxes marked off. It is running pretty fast right now, but... if I can make it go faster, why not?? :) Thank you, AyalaZero |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
This will require you having the ProgramData folder unhidden. If you have done that already then I will list the series of folders you need to go thru to get to the commandline file. If not skip down to the bottom of this post and follow the instructions there. Computer then click on Local Disk-->ProgramData--->BOINC--->projects---setiathome.berkeley.edu You will now be in the Setiathome folder. You have to look for the following file ap_cmdline_win_x86_SSE2_OpenCL_NV When you find it, you want to RIGHT click on the file, this will bring up pop up screen, select EDIT (Not Open) It will ask you what program you want to use to edit this file Select Notepad Copy the commands that I listed exactly. In other words, highlight the entire line and right click copy. The reason I suggest this is if you put an extra space or leave out a space it will affect how it works Once you have the file open, click on the open file and right click and select paste. Select Save from the pull down then close the box That should do it. The next time APs are split it should read these new commands. If you have never have unhidden your folders, here is how to do it.. Follow these steps to display hidden files and folders. Open Folder Options by clicking the Start button , clicking Control Panel, clicking Appearance and Personalization, and then clicking Folder Options. Click the View tab. Under Advanced settings, click Show hidden files and folders, and then click OK. If you intend on modifying the mbcuda.cfg Do the same with using edit and the notepad. Good luck.... |
AyalaZero Send message Joined: 14 Aug 05 Posts: 21 Credit: 10,910,119 RAC: 0 |
Ok, thanks I have done that, now I will have to wait until I receive more AP tasks. As for the mbcuda.cfg file, do I delete what is already on there? I replaced the last few lines and left the instructions on there as well. This is what I have on the mbcuda.cfg file ;;; This configuration file is for optional control of Cuda Multibeam x41zc ;;; Currently, the available options are for ;;; application process priority control (without external tools), and ;;; per gpu priority control (useful for multiple Cuda GPU systems) [mbcuda] ;;;;; Global applications settings, to apply to all Cuda devices ;;; You can uncomment the processpriority line below, by removing the ';', to engage machine global priority control of x41x ;;; possible options are 'belownormal' (which is the default), 'normal', 'abovenormal', or 'high' ;;; For dedicated crunching machines, 'abovenormal' is recommended ;;; raising global application priorities above the default ;;; may have system dependant usability effects, and can have positive or negative effects on overall throughput ;processpriority = abovenormal ;;; Pulsefinding: Advanced options for long pulsefinds (affect display usability & long kernel runs) ;;; defaults are conservative. ;;; WARNING: Excessive values may induce display lag, driver timeout & recovery, or errors. ;;; pulsefinding blocks per multiprocessor (1-16), default is 1 for Pre-Fermi, 4 for Fermi or newer GPUs ;pfblockspersm = 8 ;;; pulsefinding maximum periods per kernel launch (1-1000), default is 100, as per 6.09 ;pfperiodsperlaunch = 200 ;[bus1slot0] ;;; Optional GPU specifc control (requires Cuda 3.2 or newer app), example processpriority = normal pfblockspersm = 16 pfperiodsperlaunch = 400 Is this correct, I'm new to this optimization stuff. :\ Thanks so much for your help!! AyalaZero |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
Yes that is fine. You can leave all the other stuff, it's only the last 3 lines that matter. Be aware that these setting are for pure crunching machines. If you try these on daily drivers, it may cause lagging or studdering in machines when you try to use them for other activities than crunching. These changes cause work units to have higher priority of the systems resources than they normally would. Good Luck. |
AyalaZero Send message Joined: 14 Aug 05 Posts: 21 Credit: 10,910,119 RAC: 0 |
Ok, thank you so much. I have saved backups of the original files in case I have to revert back. But, my system is set to suspend when NON-Boinc usage is beyond 4%... aka whenever I am using my computer for everyday stuff. :) AyalaZero |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
for both (app_info.xml file entry): I would use 0.33 instead of 0.34. Here is why. If you use 0.34 No 0.99 is rounded to 0 1.20 is rounded to 1 1.99 is rounded to 1 And why 1.2 ?! 0.34 * 3 = 1.02 which is rounded to 1 0.34 * 4 = 1.36 which is rounded to 1 (Unless this kind of rounding/truncating was changed in latest BOINC versions) Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â |
Sutaru Tsureku Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
Ok, thank you so much. I have saved backups of the original files in case I have to revert back. But, my system is set to suspend when NON-Boinc usage is beyond 4%... aka whenever I am using my computer for everyday stuff. :) If you set such low % for suspendig crunching, always, maybe example you open a software/tool (and always this software/tool use CPU time), BOINC will suspend the project tasks crunching (CPU & GPU). If the software/tool will use less % which is set, all project tasks start at the last checkpoint. So in worst case, during the time you are on the PC, this above mentioned will happen in a loop, and you will not have project tasks progress. AFAIK, worst case, the project tasks could finish with the 'too many exits' error. I wouldn't use this settings. The CPU project tasks use the lowest process priority. If all above set (if HT off)... 1 CPU-Core for 3 GPU apps 5 CPU-Cores for project tasks crunching Then you have 5 CPU-Cores immediately ready to do your daily work (the up to 5 project tasks will not suspended, they are just waiting for CPU time (no checkpoint needed)). On my J1900 CPU incl. iGPU and NV GT730 PC (daily work PC)... I have no CPU-Core reserved for GPU app crunching (at this PC this wouldn't make sense (iGPU and VGA card not very fast (iGPU & 1 CPU-Core, and CPU and VGA card ~ same performance))). All GPU apps have process priority 'high'. All apps optimized. 24/7 (CPU & GPU) full loaded. All daily work is running smoothly. :-) |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
It's called being awake too long and to tired to go back and recheck my math. Yes you are correct it is 1.02 Edit.. Since the title of this thread is Fastest MB/AP cmdline settings for a NV GTX980Ti, I took this to mean a full time cruncher. The setting I gave you are for a dedicated crunching 24/7. I even pointed out that if you use the computer for anything other than crunching, it might slow down other work done on that computer. If that is a concern, then yes, don't use them. How do I know these work? Look at the stats pages. I can't speak for Petri, since he has some special mods on his system but I know Perano uses these on his machine and it's currently #1 for the last several months. My own machine trade spots with him at that position until I took it over to Einstein. But again, the choice is yours. |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
Not sure if it's the fastest but here is my commandline for APs It also depends how many tasks are running at once. If use_sleep is in place i`d suggest to increase -unroll to 22. Thats for the 980TI only because it has 22 compute units. If you experience some lags or other issues reduce ffa_block values to 12288 6144. This is faster on the 980TI. With each crime and every kindness we birth our future. |
AyalaZero Send message Joined: 14 Aug 05 Posts: 21 Credit: 10,910,119 RAC: 0 |
I want to have settings as though it is a full-time cruncher. :) I will revise the -unroll to 22 when I get home. HT is currently on though, do I NEED to turn this off? I guess I can leave it a week doing the current settings, and then turn off HT see which allows me to have a stable PC throughout. So far I haven't ran into any major Hiccups. I'll let you guys know what happens, for the next guy. AyalaZero |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
You can leave the Hyperthread on. I do with my system. The main thing to look at is to see if times are longer with HT on versus when it is off. |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
Would you recommend 22 for the Titan Xs since they have 24 CU? |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
The Titan X can handle -unroll 24 easily. In principle you can say 1 unroll per compute unit. Just to make it clear this doesn`t count for mid range cards only high end. With each crime and every kindness we birth our future. |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
You can leave the Hyperthread on. The sharper the timings the bigger the chance that the app will go into stall if no CPU cores are freed. But thats on you. With each crime and every kindness we birth our future. |
AyalaZero Send message Joined: 14 Aug 05 Posts: 21 Credit: 10,910,119 RAC: 0 |
I have another question. reading the Astropulse_OpenCL_NV README file, it states the max group size to be 1024. Is that the CUDA cores? if so the 980 ti has 2816, would I be able to modify my settings differently, or am I completely wrong in my assumption? Here are the Nvidea specs http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-980-ti/specifications The READ ME states the following: Tune values must be equal or less than max work group size. Most modern Nvidia cards have work group size of 1024. possible values: -tune 1 256 4 1 -tune 1 128 8 1 -tune 1 64 16 1 -tune 1 32 32 1 -tune 1 16 64 1 Intensive testing highlighted -tune 1 64 8 1 -tune 2 64 8 1 to be fastest on mid range and high end GPU`s. On entry level cards -tune 1 128 8 1 -tune 2 128 8 1 should be fastest. Thanks in advance. Sorry, this is waaay over my knowledge-base. AyalaZero |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
The read me file probably is old. Mike, who answered stating using -unroll 22 ,is the "go to guy" on these matters. In fact, I believe he wrote the Read me file. The values I listed were given to me by him sometime early this year. I would listen and follow any advice he has to offer on tuning the cards. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.