GPU FLOPS: Theory vs Reality

Author	Message
Micky Badgero Send message Joined: 26 Jul 16 Posts: 44 Credit: 21,373,673 RAC: 83	Message 1806180 - Posted: 31 Jul 2016, 22:17:14 UTC - in response to Message 1805996. When I started, I was also running climateprediction.net code for half the resources. Both were running fairly slow, so I stopped the SETI@home. The climate software loaded more tasks and had calculation errors on half of them with just a few percent complete. I stopped the climate software and reloaded the SETI@home. It is noticeably faster now, but the SETI@home may only be seeing 4GB of GPU RAM. Can't tell, because I updated my GPU driver and it does not show up when I look at my computer from SET@home. ID: 1806180 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1806182 - Posted: 31 Jul 2016, 22:31:24 UTC - in response to Message 1806180. When I started, I was also running climateprediction.net code for half the resources. Both were running fairly slow, so I stopped the SETI@home. The climate software loaded more tasks and had calculation errors on half of them with just a few percent complete. I stopped the climate software and reloaded the SETI@home. It is noticeably faster now, but the SETI@home may only be seeing 4GB of GPU RAM. Can't tell, because I updated my GPU driver and it does not show up when I look at my computer from SET@home. In Windows, most of the applications are 32 bit, so memory amounts >4GiB make no sense. 64 bit would allow accessing large amounts of memory that the current work/apps don't need. Windows on Windows (i.e running 32 bit apps on 64 bit system) uses smaller addresses, which on GPU saves precious register space, and the host CPU has a feature called 'register renaming' that then lets the 32 bit apps run quite efficiently. Things are changing, such that there are so many registers in new hardware a lot of that doesn't matter anymore, and there are moves to deprecate 32 bit OS support entirely. The rub then becomes only whether to supply a slower native 64 bit application and multiple builds. Linux made this choice pretty easy by breaking 32 bit app compiling on 64 bit host, and deprecating various Cuda devices early. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1806182 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874	Message 1806190 - Posted: 31 Jul 2016, 22:55:00 UTC climateprediction.net doesn't have, has never had, and probably will never have any GPU applications (VM apps are more likely). The climateprediction.net failures are unlikely to have any direct connection with the SETI@home GPU applications. Indirect connections are certainly possible (CPDN saves large files to hard disk, frequently, for example), but I don't think they justify this juxtaposition. ID: 1806190 ·

Shaggie76 Send message Joined: 9 Oct 09 Posts: 282 Credit: 271,858,118 RAC: 196	Message 1806227 - Posted: 1 Aug 2016, 1:03:42 UTC I updated my script to check which API was used for each work unit; this revealed that a small number of hosts were running CUDA when most others were running OpenCL (those running CUDA tasks were generally earning lower credit and brought the average down). After I ran a regular scan I didn't find enough work units to qualify so I ran a second complementary scan to collect data from different hosts; interestingly this collected enough data from the more rare GPU types to include them so my graphs are bigger than normal. ID: 1806227 ·

Micky Badgero Send message Joined: 26 Jul 16 Posts: 44 Credit: 21,373,673 RAC: 83	Message 1806238 - Posted: 1 Aug 2016, 1:39:29 UTC - in response to Message 1806182. Is SETI@home a 32 bit app? ID: 1806238 ·

Micky Badgero Send message Joined: 26 Jul 16 Posts: 44 Credit: 21,373,673 RAC: 83	Message 1806239 - Posted: 1 Aug 2016, 1:45:03 UTC - in response to Message 1806190. Too bad about climateprediction.net not using GPU. Testing out my new GPU was one of my reasons for joining BOINC. I was doing SETI@home before BOINC many years ago under my starman2020 user name, but the accounts are not connected today and I do not have access to that yahoo email anymore. Whatever the case, SETI@home was doing 4,000 credits a day with climateprediction.net, and without it, it is easily doing twice that. ID: 1806239 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22227 Credit: 416,307,556 RAC: 380	Message 1806262 - Posted: 1 Aug 2016, 5:00:16 UTC For MultiBeam there are 32 and 64 bit CPU applications, however the CPU part of the GPU applications are 32 bit (for windows). Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1806262 ·

Rockhount Send message Joined: 29 May 00 Posts: 34 Credit: 31,935,954 RAC: 29	Message 1806340 - Posted: 1 Aug 2016, 11:13:05 UTC Hi Shaggie76, great work with your script. How big is your database already? Could you check this results too? They are made with Lunatics 0.44 Host: https://setiathome.berkeley.edu/results.php?hostid=7450977 Intel Xeon L5640 HT on, but 7 tasks max. https://setiathome.berkeley.edu/result.php?resultid=4970894174 Nvidia Quadro 4000, 1 task only https://setiathome.berkeley.edu/result.php?resultid=4970804541 https://setiathome.berkeley.edu/result.php?resultid=4970804543 I'm switching boinc between two machines @home(up&download) and the Xeon @Work (crunching). Regards from nothern Germany Roman SETI@home classic workunits 207,059 SETI@home classic CPU time 1,251,095 hours ID: 1806340 ·

Shaggie76 Send message Joined: 9 Oct 09 Posts: 282 Credit: 271,858,118 RAC: 196	Message 1806349 - Posted: 1 Aug 2016, 12:35:52 UTC - in response to Message 1806340. How big is your database already? I reset the collection every week when I get a fresh host DB from SETI. Roughly speaking here's how many hosts it found for each GPU in the last scan (minus 1 or 2 for the CSV headers): wc -l *.csv 6 AMD Radeon R9 200 Series.csv 69 Bonaire.csv 4 Carrizo.csv 23 Ellesmere.csv 20 Fiji.csv 23 GeForce GTX 1060.csv 81 GeForce GTX 1070.csv 93 GeForce GTX 1080.csv 48 GeForce GTX 260.csv 34 GeForce GTX 275.csv 6 GeForce GTX 280.csv 27 GeForce GTX 285.csv 75 GeForce GTX 460.csv 12 GeForce GTX 465.csv 55 GeForce GTX 470.csv 27 GeForce GTX 480.csv 91 GeForce GTX 550 Ti.csv 10 GeForce GTX 555.csv 68 GeForce GTX 560 Ti.csv 77 GeForce GTX 560.csv 73 GeForce GTX 570.csv 57 GeForce GTX 580.csv 65 GeForce GTX 645.csv 79 GeForce GTX 650 Ti.csv 74 GeForce GTX 650.csv 74 GeForce GTX 660 Ti.csv 80 GeForce GTX 660.csv 87 GeForce GTX 670.csv 86 GeForce GTX 680.csv 81 GeForce GTX 745.csv 86 GeForce GTX 750 Ti.csv 92 GeForce GTX 750.csv 96 GeForce GTX 760.csv 81 GeForce GTX 770.csv 68 GeForce GTX 780 Ti.csv 65 GeForce GTX 780.csv 76 GeForce GTX 950.csv 87 GeForce GTX 960.csv 87 GeForce GTX 970.csv 78 GeForce GTX 980 Ti.csv 81 GeForce GTX 980.csv 9 GeForce GTX TITAN Black.csv 43 GeForce GTX TITAN X.csv 25 GeForce GTX TITAN.csv 20 Hainan.csv 71 Hawaii.csv 9 Iceland.csv 54 Kalindi.csv 56 Mullins.csv 70 Oland.csv 73 Spectre.csv 65 Tonga.csv 2997 total Could you check this results too? They are made with Lunatics 0.44 Host: https://setiathome.berkeley.edu/results.php?hostid=7450977 Intel Xeon L5640 HT on, but 7 tasks max. https://setiathome.berkeley.edu/result.php?resultid=4970894174 Nvidia Quadro 4000, 1 task only https://setiathome.berkeley.edu/result.php?resultid=4970804541 https://setiathome.berkeley.edu/result.php?resultid=4970804543 I'm switching boinc between two machines @home(up&download) and the Xeon @Work (crunching). The website only shows 2 validated tasks so there isn't much to take an average of: C:\SETI>aggregate.pl -anon -v 7450977 Host, API, Device, Credit, Seconds, Credit/Hour, Work Units wuid=2178135221 25.1364450165141 CR/h cpu wuid=2178093785 29.3952630946179 CR/h gpu 7450977, cpu, Intel Core i5-2500K @ 3.30GHz, 98.24, 3517.4425, 100.545780066057, 1 7450977, gpu, NVIDIA GeForce GTX 750 Ti, 89.65, 10979.32, 29.3952630946179, 1 ID: 1806349 ·

Shaggie76 Send message Joined: 9 Oct 09 Posts: 282 Credit: 271,858,118 RAC: 196	Message 1806358 - Posted: 1 Aug 2016, 13:14:36 UTC Someone privately asked me how the data was extracted and I figured I'd share the reply here as well. The scripts are pretty simple (which has serious limitations). get a list of all hosts from the SETI XML dump distill that list into a list of Host Ids for each GPU filter list to single-GPU setups with a minimum activity and total credit select 50 random hosts for each GPU and scan the website task list to get credit & time stats ignore hosts without a minimum number of work-units validated (10) discard GPUs that have less than 10 hosts with enough work validated "winsorize" the stats for each GPU: sort them by credit/hr and average the top 25% if there are 20+ hosts or the top 50% otherwise (trying to avoid multiprocessing). fart around in Excel for a few minutes to turn the CSV into pictures Note that currently for my graphs I omit tasks running the anonymous platform: anon users are more likely to run multiple tasks concurrently (and I can't tell how many they're doing at once) anon users might be using the GUPPI rescheduler (which misleads the server about whether the task actually ran on CPU). Although Perl tends to be write-only I keep the source published on GitHub in case anyone wants to see how I did it. ID: 1806358 ·

Stubbles Volunteer tester Send message Joined: 29 Nov 99 Posts: 358 Credit: 5,909,255 RAC: 0	Message 1806461 - Posted: 1 Aug 2016, 21:03:21 UTC - in response to Message 1806358. Hey Shaggie, Great stats again...but this time I have a few Qs! Don't know if I understand the big picture wrt your GPU count stats above. Are there only ~3k GPUs out of the 100k+ active hosts crunching for S@h? As for: * select 50 random hosts for each GPU and scan the website task list to get credit & time stats Why are you not selecting those with the top RAC? I'm certain there's a good reason. I just can't seem to see one! :-S Are you trying to avoid getting too many "anonymous platforms"? Cheers, Rob :-) ID: 1806461 ·

Shaggie76 Send message Joined: 9 Oct 09 Posts: 282 Credit: 271,858,118 RAC: 196	Message 1806472 - Posted: 1 Aug 2016, 21:59:27 UTC - in response to Message 1806461. Last modified: 1 Aug 2016, 21:59:53 UTC Don't know if I understand the big picture wrt your GPU count stats above. Are there only ~3k GPUs out of the 100k+ active hosts crunching for S@h? No no -- I'm just taking a random sampling and that was how many samples I collected (because I was asked "How big is your database already?"). Why are you not selecting those with the top RAC? I'm certain there's a good reason. I just can't seem to see one! :-S Are you trying to avoid getting too many "anonymous platforms"? I guess I'm not looking for the best hosts but rather a measure of central tendency -- the fastest hosts might be overclocking, doing rescheduling tricks or something else I can't account for. The only reason I winsorize the mean at the end is to try to cull multiprocessing dragging the average down. ID: 1806472 ·

Zalster Volunteer tester Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242	Message 1806484 - Posted: 1 Aug 2016, 23:30:18 UTC - in response to Message 1806472. The only reason I winsorize the mean at the end is to try to cull multiprocessing dragging the average down. Can you explain that statement? My experience has show multiprocessing to be faster than single processing ID: 1806484 ·

Shaggie76 Send message Joined: 9 Oct 09 Posts: 282 Credit: 271,858,118 RAC: 196	Message 1806503 - Posted: 2 Aug 2016, 0:28:38 UTC - in response to Message 1806484. The only reason I winsorize the mean at the end is to try to cull multiprocessing dragging the average down. Can you explain that statement? My experience has show multiprocessing to be faster than single processing Because I can't tell how many tasks users are running concurrently I can't account for it -- their tasks will just seem to take longer which will make it look like the GPU doesn't perform very well. ID: 1806503 ·

Stubbles Volunteer tester Send message Joined: 29 Nov 99 Posts: 358 Credit: 5,909,255 RAC: 0	Message 1806522 - Posted: 2 Aug 2016, 1:31:56 UTC - in response to Message 1806503. Last modified: 2 Aug 2016, 1:32:54 UTC Instead of looking at credit, what about just doing a tally of tasks/day? (and possibly ignoring shorties) [edit] that way your stats wouln't be affected by multiple tasks on GPUs[/e] That could give you better stats since: 1. CreditNew is skewed and screwed up; and 2. you're only capturing a subset of tasks: those that have been validated. This way you could include all tasks during the last 24-or-so hrs. Just a thought...but I have no idea what the level of coding effort that would be required. R :-) ID: 1806522 ·

Shaggie76 Send message Joined: 9 Oct 09 Posts: 282 Credit: 271,858,118 RAC: 196	Message 1806527 - Posted: 2 Aug 2016, 2:01:14 UTC - in response to Message 1806522. Instead of looking at credit, what about just doing a tally of tasks/day? (and possibly ignoring shorties) [edit] that way your stats wouln't be affected by multiple tasks on GPUs[/e] That could give you better stats since: 1. CreditNew is skewed and screwed up; and 2. you're only capturing a subset of tasks: those that have been validated. This way you could include all tasks during the last 24-or-so hrs. Just a thought...but I have no idea what the level of coding effort that would be required. Not everyone crunches 24/7 and this approach would favor full-time hosts and dilute the stats for everyone else. I could bend over backwards and do silly stuff but I expect it would be dramatically more PHP queries (ie: slower for me, and more annoying for the people running the servers). And again: people running the GUPPI scheduler cause the server to mislead my stats and so I'm extremely apprehensive about bothering with trying to analyze anonymous data in bulk. If task.gz is ever a thing I might dig deeper out but for now I'm pretty content -- my scripts gives data that applies to the vast majority of users who don't mess around with the settings. ID: 1806527 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13751 Credit: 208,696,464 RAC: 304	Message 1806572 - Posted: 2 Aug 2016, 7:24:53 UTC - in response to Message 1806522. 2. you're only capturing a subset of tasks: those that have been validated. Makes sense as only valid tasks matter. Doesn't matter how fast you pump them out; if they aren't valid they're of no use, so there's not much point pumping out all those errors. Grant Darwin NT ID: 1806572 ·

Stubbles Volunteer tester Send message Joined: 29 Nov 99 Posts: 358 Credit: 5,909,255 RAC: 0	Message 1806581 - Posted: 2 Aug 2016, 7:49:30 UTC - in response to Message 1806572. 2. you're only capturing a subset of tasks: those that have been validated. Makes sense as only valid tasks matter. Doesn't matter how fast you pump them out; if they aren't valid they're of no use, so there's not much point pumping out all those errors. There's a group named "Validation pending" whose count is often bigger than those "Valid" ID: 1806581 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13751 Credit: 208,696,464 RAC: 304	Message 1806582 - Posted: 2 Aug 2016, 7:56:18 UTC - in response to Message 1806581. There's a group named "Validation pending" whose count is often bigger than those "Valid" The validated WUs get cleared from the database after a set time so it doesn't come to a grinding halt. Validation pendings aren't valid, so they're of no use when it comes to determining how well a GPU is performing. Generally, the smaller the cache & the more powerful the GPU, the larger the number of pendings will be. Grant Darwin NT ID: 1806582 ·

Stubbles Volunteer tester Send message Joined: 29 Nov 99 Posts: 358 Credit: 5,909,255 RAC: 0	Message 1806602 - Posted: 2 Aug 2016, 10:06:54 UTC - in response to Message 1806582. Validation pendings aren't valid, so they're of no use when it comes to determining how well a GPU is performing. There's a 99+% chance that my "Validation pending" will become "Valid". Obviously for hosts that have many "invalids" my suggested task tally approach would not be recommended. IBM's World Community Grid already uses 3 metrics and one of them is raw task count...so I don't think a variant that doesn't give much weight to shorties) should be dismissed outright. Cheers, Rob :-) ID: 1806602 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.