GPU FLOPS: Theory vs Reality

Author	Message
Wiggo Send message Joined: 24 Jan 00 Posts: 34896 Credit: 261,360,520 RAC: 489	Message 1870424 - Posted: 31 May 2017, 23:58:51 UTC - in response to Message 1870420. Three of these? Looks like you are getting about 22k per card. So three would give me 66k? Wow they're a lot shorter than mine, but they'll work. Cheers. ID: 1870424 ·

HAL9000 Volunteer tester Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57	Message 1870430 - Posted: 1 Jun 2017, 0:18:00 UTC - in response to Message 1870411. Last modified: 1 Jun 2017, 0:18:32 UTC I was going to tell him to go with the 6GB 1060 but I don't own either so take that for what it's worth, lol... I've spoken with several people here who also run 1060's about the memory size difference and unless you're gaming with them you'll see no difference with them here (it maybe different with other projects though), in fact most of those with the 6GB versions now wish that they'd with the cheaper option. ;-) Cheers. On paper the 6GB version with 1280 cores vs the 3GB version with 1152 cores has 11% more performance, but is about 25% more expensive. So in GFLOPs per $ the 3GB wins out. In real world performance the 11% more performance would probably be closer to 5% too. GPU GFLOPS MSRP TDP GFLOPS/$ GFLOPS/Watt GT 1030 942 $70 30 13.457143 31.40 GTX 1050 1733 $109 75 15.899083 23.11 GTX 1050 Ti 1981 $139 75 14.251799 26.41 GTX 1060 3GB 3470 $199 120 17.437186 28.92 GTX 1060 6GB 3855 $249 120 15.481928 32.13 GTX 1070 5783 $379 150 15.258575 38.55 GTX 1080 8228 $599 180 13.736227 45.71 GTX 1080 Ti 10609 $699.00 250 15.177396 42.44 Given the base specs the 1060 3GB seems like a better overall choice than the 1050 ti for the most efficient cruncher. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ ID: 1870430 ·

Carlos Volunteer tester Send message Joined: 9 Jun 99 Posts: 29872 Credit: 57,275,487 RAC: 157	Message 1870463 - Posted: 1 Jun 2017, 2:05:08 UTC Cool thanks guys. That really helps. ID: 1870463 ·

Shaggie76 Send message Joined: 9 Oct 09 Posts: 282 Credit: 271,858,118 RAC: 196	Message 1872353 - Posted: 11 Jun 2017, 2:20:27 UTC It's been a while since I ran a scan and there are enough 1080 Ti's in circulation now to get a sense of how fast they are: ID: 1872353 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13750 Credit: 208,696,464 RAC: 304	Message 1872356 - Posted: 11 Jun 2017, 2:37:05 UTC - in response to Message 1872353. It's been a while since I ran a scan and there are enough 1080 Ti's in circulation now to get a sense of how fast they are: And, wow! Imagine what a TitanXp can do. After all this time, it's interesting to see the GTX 750Ti still in the top section of the Credit/WH list. And it's good the see the Radeon RX470 and 460/480s have improved AMD's Credit/WH position considerably. Grant Darwin NT ID: 1872356 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1872358 - Posted: 11 Jun 2017, 2:48:36 UTC - in response to Message 1870407. Assume that I will put $700 in to upgrading. What would give me the best credit per hour? I can only fit 3 cards into my main cruncher. Suggestions? I'd suggest 3x 3GB 1060's, but I'm probably a bit biased there (just be sure to get proper 2 slot design jobs as quite a few around now are 2.1-2.3 designs are won't fit together in 2 slot spacings). ;-) Cheers. . . Yep, they are very good producers. But maybe 3 of the single slot 1050tis would give him very good bang for his buck! . . Also the cooler width would be a non-event if he chose water cooling :) Stephen :) ID: 1872358 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1872359 - Posted: 11 Jun 2017, 2:50:59 UTC - in response to Message 1870408. I was going to tell him to go with the 6GB 1060 but I don't own either so take that for what it's worth, lol... . . I have a brace of 1060-6GB and Wiggo's 1060-3GB do very well by comparison. For crunching (and bang for your buck) they would be the better value. Stephen :) ID: 1872359 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1872361 - Posted: 11 Jun 2017, 2:55:04 UTC - in response to Message 1870411. I was going to tell him to go with the 6GB 1060 but I don't own either so take that for what it's worth, lol... I've spoken with several people here who also run 1060's about the memory size difference and unless you're gaming with them you'll see no difference with them here (it maybe different with other projects though), in fact most of those with the 6GB versions now wish that they'd with the cheaper option. ;-) Cheers. . . It was not the memory difference that lead me to the 6GB versions but the extra CU and 128 more CUDA cores. That should have made them much more productive. But in the real world there seems to be little benefit from them. So for the dollar value just for crunching the 3GB would be the go. Stephen <shrug> ID: 1872361 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1872362 - Posted: 11 Jun 2017, 3:00:19 UTC - in response to Message 1870420. Last modified: 11 Jun 2017, 3:01:25 UTC Three of these? Looks like you are getting about 22k per card. So three would give me 66k? . . That would make a pretty hefty crunching machine. But remember they draw about 80W to 90W each at full crunch, probably a bit more than double your 750ti's, so be sure your PSU can cope. The good thing is that, unlike many of the more powerful GPUs, they only require one external power connector each so your current hardware can probably support them. Stephen :) ID: 1872362 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1872364 - Posted: 11 Jun 2017, 3:03:54 UTC - in response to Message 1870424. Three of these? Looks like you are getting about 22k per card. So three would give me 66k? Wow they're a lot shorter than mine, but they'll work. Cheers. . . The benefit(?) of single fan design over twin fan. Stephen :) ID: 1872364 ·

Brent Norman Volunteer tester Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835	Message 1873135 - Posted: 15 Jun 2017, 13:46:07 UTC - in response to Message 1872353. Hey Shaggie, is there by chance anyway to pull Linux cuda results out of your dataset for a chart? ID: 1873135 ·

Shaggie76 Send message Joined: 9 Oct 09 Posts: 282 Credit: 271,858,118 RAC: 196	Message 1873154 - Posted: 15 Jun 2017, 15:20:25 UTC - in response to Message 1873135. Hey Shaggie, is there by chance anyway to pull Linux cuda results out of your dataset for a chart? I'm guessing you mean Petri's special app and want to know just how much faster it is. As I've said before including the anonymous platform would defeat the purpose of this comparison; I deliberately filter for only the stock app running one job at a time so that you can make meaningful comparisons and get a sense of the relative performance and power consumption for each. The other problem with the anonymous platform is that not clear how many jobs are being run concurrently per card; the regular CUDA app only really performs if you double or triple-job it it but the data I have to work with can't see the concurrency so I can't tell if it's 'really slow' (because concurrent), really fast (because Petri's app), or just a normal (Lunatics build). People running stock tend not to mess around with multiple jobs so those that do are eliminated as outliers by the median-window (plus there's a clue in the output from the OpenCL app that I can use to sometimes detect when they're doubling up so I can reject them). I'm also opposed to encouraging what I see as basically cheating -- if Petri's app isn't accurate enough for everybody to use then the extra credit it awards those that use it comes at the extra validation cost of those of us running stock who have to double (and possibly triple) check the work that it does. When it's part of the stock app set I'll be happy to report on the relative performance of the OpenCL SoG vs CUDA apps (as I've done before). ID: 1873154 ·

Brent Norman Volunteer tester Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835	Message 1873159 - Posted: 15 Jun 2017, 15:50:58 UTC - in response to Message 1873154. Last modified: 15 Jun 2017, 15:56:16 UTC It's really not as bad as one might think. Looking at your computers you're at 3-5% inconclusive, and the 2 computers I looked throuh did not have 1 anonymous platform in the list. is seems quite typical for the latest CUDA8 app to be in the 4-7% range. So it really is not far off the mark from stock apps/Lunatics. Heck my Astropulse is sitting at 10.8% right now, and those are very well accepted apps. It would just be nice to see a Linux cuda8 comparison vs ~~openCL~~ SoG. But if it is not easy, I understand. ID: 1873159 ·

petri33 Volunteer tester Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156	Message 1873199 - Posted: 15 Jun 2017, 20:01:55 UTC - in response to Message 1873154. Hey Shaggie, is there by chance anyway to pull Linux cuda results out of your dataset for a chart? I'm guessing you mean Petri's special app and want to know just how much faster it is. As I've said before including the anonymous platform would defeat the purpose of this comparison; I deliberately filter for only the stock app running one job at a time so that you can make meaningful comparisons and get a sense of the relative performance and power consumption for each. The other problem with the anonymous platform is that not clear how many jobs are being run concurrently per card; the regular CUDA app only really performs if you double or triple-job it it but the data I have to work with can't see the concurrency so I can't tell if it's 'really slow' (because concurrent), really fast (because Petri's app), or just a normal (Lunatics build). People running stock tend not to mess around with multiple jobs so those that do are eliminated as outliers by the median-window (plus there's a clue in the output from the OpenCL app that I can use to sometimes detect when they're doubling up so I can reject them). I'm also opposed to encouraging what I see as basically cheating -- if Petri's app isn't accurate enough for everybody to use then the extra credit it awards those that use it comes at the extra validation cost of those of us running stock who have to double (and possibly triple) check the work that it does. When it's part of the stock app set I'll be happy to report on the relative performance of the OpenCL SoG vs CUDA apps (as I've done before). And to the cheating.. I'm doing that. I do not have 16 1080Tu graphics cards. I have only 4: 3x1080+1x1080Ti. But It would still be interesting to know, since the top 10 hosts is full of linux anonymous apps, how do they perform... To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones ID: 1873199 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1873202 - Posted: 15 Jun 2017, 20:12:12 UTC - in response to Message 1873199. Hey Petri, I was about to PM you. I was wondering if you thought it would be a good idea to submit the Linux zi3v to Beta. Along with zi3t2b it appears to be well within the 5% Inconclusive rate requested by the project. Looking over a few Hosts it would appear the Gross rate is around 3.5% with the Net Inconclusive rate a bit lower. So far the only problem is that zi3v uses a little more vRam and I'm seeing problems on my Mac again with the 2 GB card. Any Ideas? ID: 1873202 ·

petri33 Volunteer tester Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156	Message 1873204 - Posted: 15 Jun 2017, 20:29:04 UTC - in response to Message 1873202. Hey Petri, I was about to PM you. I was wondering if you thought it would be a good idea to submit the Linux zi3v to Beta. Along with zi3t2b it appears to be well within the 5% Inconclusive rate requested by the project. Looking over a few Hosts it would appear the Gross rate is around 3.5% with the Net Inconclusive rate a bit lower. So far the only problem is that zi3v uses a little more vRam and I'm seeing problems on my Mac again with the 2 GB card. Any Ideas? The t2b is kind of an original. It is my code and it does not try to recheck the pulses. The zi3v scans the Wu and if it finds any suspects it runs the part pf the wu again with unroll 1. That idea came from jason_gee. I tried and coded it and That is what I'm running now. It may be more accurate and a bit slower. Just keep testing. To tell you all, I'd like to stay as a developer/experimenter/propel hat/tin foil hat escapee/a man; and let the others do the political decisions. I release my code and you can do what ever you want to. This is a hobby for me. I'd like to keep it that way. I was a SW/DB engineer for 20 years. Now I'm a teacher and a teacher for children/adults with special needs. So TBar, it is entirely up to You. A <5% level is good enough. You decide. I'll provide the updates when I feel to. Thank you TBar for all the testing. p.s. I read that there is a V9 MB coming. I'll wait for that and do whatever is needed. Petri To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones ID: 1873204 ·

Shaggie76 Send message Joined: 9 Oct 09 Posts: 282 Credit: 271,858,118 RAC: 196	Message 1873223 - Posted: 15 Jun 2017, 23:32:15 UTC - in response to Message 1873204. I'd like to stay as a developer/experimenter/propel hat/tin foil hat escapee/a man; and let the others do the political decisions. I release my code and you can do what ever you want to. This is totally fine (and appreciated!) -- I'm just a little vexed at the people's enthusiasm for the glory of more internet points rather than getting your version finished and certified to be in the stock set by checking their inconclusives and getting diagnostics to make it conform. ID: 1873223 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13750 Credit: 208,696,464 RAC: 304	Message 1873260 - Posted: 16 Jun 2017, 4:09:38 UTC Last modified: 16 Jun 2017, 4:10:48 UTC Even if it doesn't make it as a stock application (does it run on pre Maxwell or pre Kepler hardware, and minimum VRAM requirements?), it would be good if it were available for general use under Anonymous Platform for all OSs. But it does need to keep the Inconclusives below 5% to be able to make it available for general use. If the current version is good for less than 5% Inconclusives, it would be nice to see a Windows version made available for some testing to see if under the many versions of Windows and the many versions of video drivers it's able to keep the Inconclusives below that 5% threshold. Grant Darwin NT ID: 1873260 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1873759 - Posted: 18 Jun 2017, 7:11:39 UTC - in response to Message 1873260. Last modified: 18 Jun 2017, 7:12:28 UTC Even if it doesn't make it as a stock application (does it run on pre Maxwell or pre Kepler hardware, and minimum VRAM requirements?), it would be good if it were available for general use under Anonymous Platform for all OSs. But it does need to keep the Inconclusives below 5% to be able to make it available for general use. If the current version is good for less than 5% Inconclusives, it would be nice to see a Windows version made available for some testing to see if under the many versions of Windows and the many versions of video drivers it's able to keep the Inconclusives below that 5% threshold. Those are the current rubs for stock, mainly Boinc server limitations on distribution side. That's where I step in when I can. I'm confident most of the refinements can be propagated back through the generations, with varying levels of benefit. With the majority of validation concerns apparently addressed, that helps a lot. in the meantime It'll be suitable for 'Advanced-User' anonymous platform distribution until appropriate dispatch code can be embedded to support all Cuda devices at some level. Once it does though, options open up for, in no particular order, stock distribution (via Beta test first), retooling for Cuda 9 inclusion, and then incorporating some more modern feature recognition methods. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1873759 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.