Message boards :
Number crunching :
Best way to get more processing from GTX 1050ti alongside GTX 750ti/ GTX 950
Message board moderation
Author | Message |
---|---|
ralphw Send message Joined: 7 May 99 Posts: 78 Credit: 18,032,718 RAC: 38 |
Hello, I have thee models of NVIDIA GPU,
|
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13746 Credit: 208,696,464 RAC: 304 |
What's the best strategy for having the fast cards (950,1050) process more data? Run Petrie's development application, or Tbar's recent applications based on it. There are some clients that have loop unrolling options, but is running multiple workunits on a GPU - by setting up an app_info.xml file - really taking advantage of the extra CUDA cores? Running more than 1 WU is only of benefit for some high end cards running the SoG application, or for cards that are running the older CUDA applications. If running Petrie's development application, or some of Tbar's based on that application, 1 WU at a time is best. You don't need an app_info to run multiple WUs (with the more recent BOINC managers). App_info allows you to run a non stock, or a specific stock, application. When running the SoG application there are several command line values you can use to significantly boost output from the default settings. Grant Darwin NT |
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
I was looking at the machines (8278492) you have listed. You have one with 3 Gtx 1050 Ti's. It is listing a speed of 183.52 GFLOPS. Before I moved a GTX 750 TI to another box, it had stabilized at 157.14 GFLOPS. The 1050 Ti's are supposed to be upto 40% faster than a 750 Ti. The TI version of the 1050 has more CUDA cores than the 750 TI. And you have 3 of them. Unless you just started up that machine, your GFlops "should" be 'on the order of' 3 X 157 = 471 Gflops. So (I think) you aren't getting the production you are paying for. There are confounding factors in my data but you should be getting north of 300 Gflops at least. Since you are running Linux I can't offer any Linux ideas but it might make sense to post your equivalent of the Windows MB*Sog.txt command line file in case we have some ideas on how to improve them. Here is the one I am running the Gtx 750 Ti with "-sbs 512 -spike_fft_thresh 2048 -tune 1 64 1 4 -period_iterations_num 4" if your command line for the 1050 isn't something similar to this (or has larger parameters) that might be slowing it down. I don't understand all I think I know, but I ran across a 2 Gtx 750 Ti setup where the command line had this for each card: -tune 1 64 1 4 -tune 2 64 1 4 Apparently the first number refers to each card? HTH, Tom A proud member of the OFA (Old Farts Association). |
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
Hello, I would put the 3 GTX 750 Ti cards on one system and the GTX 950 Ti's on the 2nd system and leave the 1050 out of the mix while you try to come up with some reasonable parameters and locations to put them in for the 1050. Don't forget to setup the MB*SOG.txt command lines/files/parameters. Tom A proud member of the OFA (Old Farts Association). |
Wiggo Send message Joined: 24 Jan 00 Posts: 34841 Credit: 261,360,520 RAC: 489 |
Tom, don't go by the computer's details as that only shows the number of GPU's listed by what BOINC considers to be the most powerful. ;-) Look instead at the result details of a completed GPU task and you'll see that the rig listed with the 1050Ti also has a pair of 750Ti's in it while the other just has a pair of 950's (I don't believe that Nvidia ever released a Ti version of the 950's and I can't see the 3rd 750Ti anywhere). ralphw your rig with the 1050Ti/750Ti combo will have to be tuned to suit the 750Ti's performance as tuning to the 1050Ti performance may not suit the 750Ti's at all and I'm sure that someone will supply those tuning settings for both of your rigs. Cheers. |
ralphw Send message Joined: 7 May 99 Posts: 78 Credit: 18,032,718 RAC: 38 |
Thanks. That is my primary configuration (GTX 1050 Ti alongside two GTX 750 Ti systems). The third 750 Ti (from MSI) is currently in an inactive system. |
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
Tom, don't go by the computer's details as that only shows the number of GPU's listed by what BOINC considers to be the most powerful. ;-) Wiggo, I was replying to how to "maximize" production based on the theory that using the same cards on a single PC will maximize the production for those cards. He had offered a list of cards. I didn't look any further. I just looked at 8278492 and I must say I can come close to the listed Gflops with a single Gtx 750 Ti running Lunatix under Windows so something is out of tune. As for "best" SOG command line for GTX 750 Ti, I am using: -sbs 512 -spike_fft_thresh 2048 -tune 1 64 1 4 -period_iterations_num 4 I ran across the following variation for multiple card machines: -sbs 512 -spike_fft_thresh 2048 -tune 1 64 1 4 -tune 2 64 1 4 -tune 3 64 1 4 -period_iterations_num 4 I can't swear that the multiple -tune commands, one for each card work exactly like that. Nor can I swear that whichever one turns out to be the 1050 shouldn't be different. If you add the -hp it makes the system laggy but does seem to load up my gpu a bit more. (from low 90's to high 90's). I suspect that the above command line(s) MIGHT improve the overall production on the mixed 750/1050 machine. Tom A proud member of the OFA (Old Farts Association). |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
For the 750TI`s i suggest -sbs 684 -spike_fft_thresh 2048 -tune 1 64 1 4 -period_iterations_num 4 Tune 2 and tune 3 don`t give much benefit on those cards. But this is host dependent. With each crime and every kindness we birth our future. |
ralphw Send message Joined: 7 May 99 Posts: 78 Credit: 18,032,718 RAC: 38 |
I ended up moving an MSI GTX 750 Ti back into this system. I was expecting to put four GPUs on this motherboard, but I apparently need all of my 750 Ti systems to be the shorter 5-6" long cards. Only the first slot of this Gigabyte motherboard really accommodates a full-length card such as MSI's dual-fan GTX 750Ti. The fan shroud and card length really don't fit well with the other heat sinks and other connectors I will have to limit myself to the smaller form factor GPUs to mechanically use all of my remaining motherboard slots I'll see how well the WU averages keep up. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
. . Petri's app definitely. . . Actually Grant both my GTX950 and GTX1050ti give/gave better results under SoG by running 2 at a time. But that is r3557, I can't speak for r3584 (V8.22). Stephen .. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
I was looking at the machines (8278492) you have listed. You have one with 3 Gtx 1050 Ti's. It is listing a speed of 183.52 GFLOPS. Before I moved a GTX 750 TI to another box, it had stabilized at 157.14 GFLOPS. . . Hmmm, something strange there. Oh he has moved some cards around ... . . But if he is willing to tackle running CUDA80 with Petri's special app it will definitely improve his productivity. It is not hard to do when he is already running Linux. Stephen ?? |
TheHoosh Send message Joined: 17 Aug 12 Posts: 12 Credit: 11,693,138 RAC: 0 |
Two days ago I've added a KFA² 1050 Ti to my main cruncher, which is already housing a Palit 750 Ti StormX Dual (using the CUDA80 app under Linux, driver version 381.22). So far, everything is looking good. Running several hundred WUs for Milkyway and Einstein confirmed, that the 1050 Ti is about 35%-40% faster than the 750 Ti. However, with SETI the 1050 Ti runs only ~15% faster than the 750 Ti, which is odd considering the results for Milkyway and Einstein. I'm crunching only 1 WU per GPU and have set the unroll option to "autotune". My 750 Ti needs 640s to crunch one WU, whereas the 1050 Ti needs 550s, although I would expect it to be around 400s per WU. Is there anything I need to adjust in my app_info.xml in order to leverage the 1050 Ti's full potential for SETI? As far as I've understood, the command line options that have been discussed in this thread only apply to the OpenCL application. |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
Linux Questions are better here http://setiathome.berkeley.edu/forum_thread.php?id=80636 I run autotune -nobs ... it uses a full core but increases performance. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
. . Hi Hoosh, . . I am running a 1050ti on a C2D E7600 rig using Linux and CUDA80, it is in a x16 slot and is the only GPU in that rig. I am not sure what differences may exist between our two setups but my runtimes are as follows: . . Normal AR Arecibo tasks ... 250 to 300 secs . . Halflings (VHAR) ... 125 to 160 secs . . GBT Blc05 tasks ... 480 to 520 secs. . . I am at a loss to understand why yours are so much longer. The only 'tweaks' I know of for CUDA80 (special) are -nobs to disable blocking sync and -unroll autune which is very necessary with two different cards with different abilites which you say you have already set. If you want to use -nobs be sure you have plenty of CPU resources for it to use, it will use 100% of a CPU core and then some, so with 2 x GPUs you would need three spare CPU cores to get the full benefit of it. . . Good luck Stephen . |
rob smith Send message Joined: 7 Mar 03 Posts: 22219 Credit: 416,307,556 RAC: 380 |
The default is blocking sync disabled, so the command line should look something like: -unroll autotune To use blocking sync (which can help reduce temperature problems) the command line becomes -bs -unroll autotune or if you want to fiddle around with the unroll command will also take an integer value in the range 1 to ?? (possibly 64?) -unroll x When I had two GTX1080 and a GTX980 in the same system (in the days before autotune) I had to use a compromise value for unroll which was OK for the GTX980, but too low for the GTX1080s - using the value for the GTX1080s resulted in the GTX980 not playing ball at all. It would appear that having unroll set too low is "OK", but too high is "very bad". Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
Hey Rob, What are you running for 1080 cards? I'm curious as to the specs as I seem to outperform you. I have EVGA 6188 Hybrids. Maybe it's thermal throttling, don't know. |
rob smith Send message Joined: 7 Mar 03 Posts: 22219 Credit: 416,307,556 RAC: 380 |
Just basic single fan cards, no over clocking or anything like that - and the computer does spend a few hours a day off doing other things. It may be thermal issues as the "room" is currently at >40C (outside its <20C) - I need to sort the air-con (again). Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
Hi, you could add option -pfb 32 The latest version has blocking sync on by default an it can be disabled with -nobs The -unroll autotune should give best performance in mixed gpu setups To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.