Energy Efficiency of Arm Based sysetms over x86 or GPU based systems

Author	Message
mavrrick Send message Joined: 12 Apr 00 Posts: 17 Credit: 1,894,993 RAC: 4	Message 1552274 - Posted: 4 Aug 2014, 17:49:04 UTC I brought this up a few months ago and didn't get real far with the analysis. For some reason my interest in this topic was sparked again recently. So since the last time I brought this up I have been letting my Ouya run Seti@home full time and have racked up a fair amount of credit. I am not really interested in going crazy with it. I am just letting it run. Only thing I did was move the little box next to a system with a active fan with cooling. It has run well and for the little power it uses 4.5-5 watts I am very impressed. So now to the point of all this. While I was looking over the host information I found information referring to Average GFLOP based on the app running. I also found what is labeled as Device Peak Flops, on the task information for my computer and the processed work units. Interestingly enough the two aren't very close. The one that seems to correlate best with the time the Wu takes is the Average GFLOP number. So my first question is does anyone clearly understand what that number is. Like what is being used to create it. So my math is pretty simple. Take the Average GFLOP value, multiply it by the number of cores the system has, then device that by the aprox. power usage of the device getting as close as I can. I don't expect this to be exact but it should be a fairly decent approximation of the performance per watt. or more specifically GFLOPS per watt. Unfortunately I was only able to get fairly perciese power numbers for a few systems: My desktop and the Ouya. My desktop is a Phenom II 6 core system and a radeon 4870. Nothing cutting edge but also a big enough system to churn through some work if I want to. So the numbers: Ouya got a average Gflops across the 4 apps it had used of 1.515 with 4 cores running at 5 watt was getting about 1.212 GFLOPS per watt My desktop got a average of GFLOPS across 1 app of 8.06 GFLOPS, and had 6 cores. The desktop functioned at around 360 watts of energy which gives it 0.13433 GFlops per watt. The Radeon 4870 did a AstroPulse WU and achieved a average GFlops of 101.38, and used about 60 Watts of energy giving it the best performance per watt of 1.689. The catch though is that you have to have the rest of the computer running. If you combine the performance values for the CPU and GPU the performance per watt drops down to 0.3565 I suspect the most efficient option is a lower power CPU with a really good GPU. As long as the CPU can feed the data needed to keep the GPU happy and feed to chug away that may produce the best performance per watt results. Now of course this doesn't account for RAC at all. Just simply the amount of work processing work a device can do compared to the power consumed. ID: 1552274 ·

mavrrick Send message Joined: 12 Apr 00 Posts: 17 Credit: 1,894,993 RAC: 4	Message 1552292 - Posted: 4 Aug 2014, 18:54:39 UTC - in response to Message 1552274. A few things I though of after posting. 1. Depending on the GPU a significant hike and power usage can occur just be installing a dedicated GPU. The 4870 adds about 90-100 watts to the base systems power draw. So if the card was removed my desktop's power efficiency would increase a far amount. up to around .193 GFlops per watt. 2. There are obviously more power efficient CPU's now then my Phenom 2 6 core cpu. It would be nice to get some comparable numbers with some newer higher end systems. Here is a interesting point to think about to. If you take the "Average GFLOPs" as a way to indicated the processing speed of the device. Then in in theory with 8 Ouya's I could generate the same seti@home processing power as my desktop. The Ouya' would only need 40 watts to do so. That is a fair amount of power savings, and possibly heat savings. And if doing the same work should receive the same RAC. ID: 1552292 ·

HAL9000 Volunteer tester Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57	Message 1552293 - Posted: 4 Aug 2014, 19:01:09 UTC Last modified: 4 Aug 2014, 19:10:41 UTC Watts per flop or flops per watt is the name of the game. I think it might be best to calculate the performance of each app. For one of my i5-4670K systems. Application GFLOPS Cores Total GLOPS System Watts GFLOPS/Watt SETI@home v7 42.81 4 171.24 90 1.903 AstroPulse v6 106.52 4 426.08 90 4.734 Compared to my Bay Trail-D system. Application GFLOPS Cores Total GLOPS System Watts GFLOPS/Watt SETI@home v7 10.25 4 41.00 25 2.050 AstroPulse v6 21.30 4 85.20 25 4.260 For ARM system I would do each app version for comparison. One might be better than the others performance wise & that information could be feed back to home base. EDIT: I will also add that my low powered system is drawing more than it should. As I have an oversized PSU that is not the most efficient right now. Ideally it would be in the 10-20w range for power consumption. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ ID: 1552293 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13950 Credit: 208,696,464 RAC: 304	Message 1552368 - Posted: 4 Aug 2014, 22:12:37 UTC - in response to Message 1552292. Last modified: 4 Aug 2014, 22:16:13 UTC If you take the "Average GFLOPs" as a way to indicated the processing speed of the device. There's the problem. GFLOPs is an indicator, but a very, very poor one. Depending on the application being run, a card with a lower GFLOPS rating can process more work per hour than one with a much higher rating. The number of WUs per hour is a better indicator, but the mix of WUs (VHARs, shorties) makes it difficult to compare things. Average Processing Rate is a good one as it directly relates to the work being done, unfortunately it isn't accurate as processing more than one WU at a time results in a lower APR, even though the work done per hour is much higher than doing a single WU at a time. RAC is probably the best indicator, however due to the nature of Credit New (almost completely borked) you can only compare MB to MB, AP to AP. And people that run a mix of the 2 can't really be compared to either (or even each other due to the different mixes). Grant Darwin NT ID: 1552368 ·

mavrrick Send message Joined: 12 Apr 00 Posts: 17 Credit: 1,894,993 RAC: 4	Message 1552384 - Posted: 4 Aug 2014, 23:11:45 UTC - in response to Message 1552368. So are you saying that field is a rating and not a measured value. The fact it indicated a Average would indicate some level of analysis is being done. To me it looks more like a indication about how many GFLOPS that host is able to achieve when running that application. The WU per hour is useless for what I am getting at here. I am looking actually computational performance instead of relating it to WU. We all know not all WU are the same. There has to be some way to calculate the amount of work done in a second and I suspect using GFLOPS. Doesn't GFLOPS refer to Billions of Floating-point Operations Per Second. That by definition is a measurement of computational work per second and should be usable to compare several devices of a range of configurations. It of course isn't perfect as each system build will have it's nuances. Nothing is going to be exactly perfect. The math I present was based on using 1 100% of each core. My expectation is that the application would only represent 1 core as each application runs as a single thread. So in the math I setup the value for Average GFLOP's is multiplied by the number of cores in the system. If there are any processes within the app that float out from that core then the number will fluctuate a little. The point here is to evaluate the potential of lowered powered systems compared to Giant number crunchers. Can you make up for the less CPU performance with raw numbers and still maintain a good electrical foot print. ID: 1552384 ·

ivan Volunteer tester Send message Joined: 5 Mar 01 Posts: 783 Credit: 348,560,338 RAC: 223	Message 1552389 - Posted: 4 Aug 2014, 23:20:45 UTC - in response to Message 1552293. Compared to my Bay Trail-D system. Application GFLOPS Cores Total GLOPS System Watts GFLOPS/Watt SETI@home v7 10.25 4 41.00 25 2.050 AstroPulse v6 21.30 4 85.20 25 4.260 Hmm, my machine is running somewhat fewer FLOPS than yours for both MB and AP. I haven't worked out how to enable the iGPU for crunching under Linux yet. I took delivery of an Nvidia Tegra K "Jetson" SDK tonight and should have all the bits needed to run it (HDMI->DVI cable, USB hub, Keyboard+mouse) on next-day delivery tomorrow. First plan is to work out how it runs (it's an ARM version of Ubuntu) and install the latest CUDA libraries. Then, after I've got my hologram reconstructions running on the 192-core Kepler, I'll see if there are all the resources needed to compile BOINC & S@H on it. Watch, as they say, this space. Must take my Wattmeter back into work next time I have to power down this rig (which is running 143 W ATM, it's usually around 250 W when the GPUs have APs to crunch). ID: 1552389 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13950 Credit: 208,696,464 RAC: 304	Message 1552395 - Posted: 4 Aug 2014, 23:48:07 UTC - in response to Message 1552384. Last modified: 4 Aug 2014, 23:49:09 UTC Doesn't GFLOPS refer to Billions of Floating-point Operations Per Second. That by definition is a measurement of computational work per second and should be usable to compare several devices of a range of configurations. It of course isn't perfect as each system build will have it's nuances. Not all FLOPS are equal, different operations have different overheads. That's why FLOPS, just like the number of WUs per hour (due to the different types of WUs) aren't a good indicator of actual performance. My present video cards have a much higher FLOP rating than the cards they replace, however the older cards can actually process more WUs per hour than the new ones because the present applications aren't optimised for the new video cards. However, I can run 3 of my new video cards for less power than one of the old ones used. As badly screwed as Credit New is, and even with the very lagging nature of RAC, RAC is the best indicator of work done we have. Unfortunately it's not as good as it once was for comparing between different types of WU, and it's of no use at all for comparing between MB & AP, but it is very good at comparing between similar types of WU. Can you make up for the less CPU performance with raw numbers and still maintain a good electrical foot print. With out a doubt (as I mentioned with my new video cards). However similar to games, there will instances where more CPU power will be required to keep the faster video cards busy. AP WUs are a good example- many people running highend & multiple highend video cards have to leave 1, 2 or even more CPU cores free, just to feed the GPUs. Grant Darwin NT ID: 1552395 ·

HAL9000 Volunteer tester Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57	Message 1552453 - Posted: 5 Aug 2014, 3:05:19 UTC - in response to Message 1552368. If you take the "Average GFLOPs" as a way to indicated the processing speed of the device. There's the problem. GFLOPs is an indicator, but a very, very poor one. Depending on the application being run, a card with a lower GFLOPS rating can process more work per hour than one with a much higher rating. The number of WUs per hour is a better indicator, but the mix of WUs (VHARs, shorties) makes it difficult to compare things. Average Processing Rate is a good one as it directly relates to the work being done, unfortunately it isn't accurate as processing more than one WU at a time results in a lower APR, even though the work done per hour is much higher than doing a single WU at a time. RAC is probably the best indicator, however due to the nature of Credit New (almost completely borked) you can only compare MB to MB, AP to AP. And people that run a mix of the 2 can't really be compared to either (or even each other due to the different mixes). The way they stated "average GFLOPS" I figured they were talking about APR. Which is displayed in GFLOPS. While it may not be the most accurate it is a measure of the application output on that device. So it does seem to be a valid measure to use. It will be lower when running more tasks on a device. However, (GFLOPS * instances) should reflect the increased output. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ ID: 1552453 ·

HAL9000 Volunteer tester Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57	Message 1552454 - Posted: 5 Aug 2014, 3:09:14 UTC - in response to Message 1552389. Compared to my Bay Trail-D system. Application GFLOPS Cores Total GLOPS System Watts GFLOPS/Watt SETI@home v7 10.25 4 41.00 25 2.050 AstroPulse v6 21.30 4 85.20 25 4.260 Hmm, my machine is running somewhat fewer FLOPS than yours for both MB and AP. I haven't worked out how to enable the iGPU for crunching under Linux yet. I took delivery of an Nvidia Tegra K "Jetson" SDK tonight and should have all the bits needed to run it (HDMI->DVI cable, USB hub, Keyboard+mouse) on next-day delivery tomorrow. First plan is to work out how it runs (it's an ARM version of Ubuntu) and install the latest CUDA libraries. Then, after I've got my hologram reconstructions running on the 192-core Kepler, I'll see if there are all the resources needed to compile BOINC & S@H on it. Watch, as they say, this space. Must take my Wattmeter back into work next time I have to power down this rig (which is running 143 W ATM, it's usually around 250 W when the GPUs have APs to crunch). I am running optimized apps & my system seems to like to stay running at its Burt frequency all of the time. Either or both of those could be the reason for my systems higher numbers. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ ID: 1552454 ·

ivan Volunteer tester Send message Joined: 5 Mar 01 Posts: 783 Credit: 348,560,338 RAC: 223	Message 1552676 - Posted: 5 Aug 2014, 20:30:30 UTC - in response to Message 1552389. I took delivery of an Nvidia Tegra TK1 "Jetson" SDK tonight and should have all the bits needed to run it (HDMI->DVI cable, USB hub, Keyboard+mouse) on next-day delivery tomorrow. First plan is to work out how it runs (it's an ARM version of Ubuntu) and install the latest CUDA libraries. Then, after I've got my hologram reconstructions running on the 192-core Kepler, I'll see if there are all the resources needed to compile BOINC & S@H on it. Watch, as they say, this space. Well, I've got this far so far: 05-Aug-2014 16:03:28 [---] cc_config.xml not found - using defaults 05-Aug-2014 16:03:28 [---] Starting BOINC client version 7.2.42 for armv7l-unknown-linux-gnueabihf 05-Aug-2014 16:03:28 [---] log flags: file_xfer, sched_ops, task 05-Aug-2014 16:03:28 [---] Libraries: libcurl/7.35.0 OpenSSL/1.0.1f zlib/1.2.8 libidn/1.28 librtmp/2.3 05-Aug-2014 16:03:28 [---] Data directory: /home/ubuntu/BOINC 05-Aug-2014 16:03:28 [---] CUDA: NVIDIA GPU 0: GK20A (driver version unknown, CUDA version 6.0, compute capability 3.2, 1746MB, 141MB available, 327 GFLOPS peak) 05-Aug-2014 16:03:28 [---] Host name: tegra-ubuntu 05-Aug-2014 16:03:28 [---] Processor: 1 ARM ARMv7 Processor rev 3 (v7l) 05-Aug-2014 16:03:28 [---] Processor features: swp half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt 05-Aug-2014 16:03:28 [---] OS: Linux: 3.10.24-g6a2d13a 05-Aug-2014 16:03:28 [---] Memory: 1.71 GB physical, 0 bytes virtual 05-Aug-2014 16:03:28 [---] Disk: 11.69 GB total, 5.63 GB free 05-Aug-2014 16:03:28 [---] Local time is UTC +0 hours 05-Aug-2014 16:03:28 [---] No general preferences found - using defaults 05-Aug-2014 16:03:28 [---] Preferences: 05-Aug-2014 16:03:28 [---] max memory usage when active: 873.11MB 05-Aug-2014 16:03:28 [---] max memory usage when idle: 1571.60MB 05-Aug-2014 16:03:28 [---] max disk usage: 5.55GB 05-Aug-2014 16:03:28 [---] don't use GPU while active 05-Aug-2014 16:03:28 [---] suspend work if non-BOINC CPU load exceeds 25% 05-Aug-2014 16:03:28 [---] (to change preferences, visit a project web site or select Preferences in the Manager) 05-Aug-2014 16:03:28 [---] Not using a proxy 05-Aug-2014 16:03:28 [---] This computer is not attached to any projects 05-Aug-2014 16:03:28 [---] Visit http://boinc.berkeley.edu for instructions 05-Aug-2014 16:03:29 Initialization completed 05-Aug-2014 16:03:29 [---] Suspending GPU computation - computer is in use 05-Aug-2014 16:04:00 [---] Received signal 2 05-Aug-2014 16:04:01 [---] Exit requested by user As with the Celeron I bought recently, I had a lot of trouble with the graphics, especially finding the GL, GLU and GLT libraries -- compounded by the fact that neither install (the Celeron is CENTOS 7) had g++ by default and ./configure doesn't really point that out to you. Big showstopper is wxWidgets. Need to compile it myself, and it looks like BOINC code isn't compatible with anything past 2.8.3 -- but 2.8.3 won't compile with gcc 4.8.3 apparently. So I haven't got boincmgr running on either yet. Now to try to attach to S@H since the project is up again! ID: 1552676 ·

ivan Volunteer tester Send message Joined: 5 Mar 01 Posts: 783 Credit: 348,560,338 RAC: 223	Message 1552687 - Posted: 5 Aug 2014, 20:52:09 UTC - in response to Message 1552676. 05-Aug-2014 16:03:28 [---] This computer is not attached to any projects 05-Aug-2014 16:03:28 [---] Visit http://boinc.berkeley.edu for instructions 05-Aug-2014 16:03:29 Initialization completed 05-Aug-2014 16:03:29 [---] Suspending GPU computation - computer is in use 05-Aug-2014 16:04:00 [---] Received signal 2 05-Aug-2014 16:04:01 [---] Exit requested by user Now to try to attach to S@H since the project is up again! 05-Aug-2014 20:31:55 [---] Suspending GPU computation - computer is in use 05-Aug-2014 20:39:33 [---] Running CPU benchmarks 05-Aug-2014 20:39:33 [---] Suspending computation - CPU benchmarks in progress 05-Aug-2014 20:39:33 [---] Running CPU benchmarks 05-Aug-2014 20:39:33 [---] Running CPU benchmarks 05-Aug-2014 20:39:33 [---] Running CPU benchmarks 05-Aug-2014 20:39:33 [---] Running CPU benchmarks 05-Aug-2014 20:40:05 [---] Benchmark results: 05-Aug-2014 20:40:05 [---] Number of CPUs: 4 05-Aug-2014 20:40:05 [---] 966 floating point MIPS (Whetstone) per CPU 05-Aug-2014 20:40:05 [---] 6829 integer MIPS (Dhrystone) per CPU 05-Aug-2014 20:40:06 [---] Resuming computation 05-Aug-2014 20:40:12 [http://setiathome.berkeley.edu/] Master file download succeeded 05-Aug-2014 20:40:17 [---] Number of usable CPUs has changed from 4 to 1. 05-Aug-2014 20:40:17 [http://setiathome.berkeley.edu/] Sending scheduler request: Project initialization. 05-Aug-2014 20:40:17 [http://setiathome.berkeley.edu/] Requesting new tasks for CPU and NVIDIA 05-Aug-2014 20:40:22 [SETI@home] Scheduler request completed: got 0 new tasks 05-Aug-2014 20:40:22 [SETI@home] This project doesn't support computers of type armv7l-unknown-linux-gnueabihf 05-Aug-2014 20:40:24 [SETI@home] Started download of arecibo_181.png 05-Aug-2014 20:40:24 [SETI@home] Started download of sah_40.png 05-Aug-2014 20:40:27 [SETI@home] Finished download of arecibo_181.png 05-Aug-2014 20:40:27 [SETI@home] Finished download of sah_40.png 05-Aug-2014 20:40:27 [SETI@home] Started download of sah_banner_290.png 05-Aug-2014 20:40:27 [SETI@home] Started download of sah_ss_290.png 05-Aug-2014 20:40:29 [SETI@home] Finished download of sah_banner_290.png 05-Aug-2014 20:40:29 [SETI@home] Finished download of sah_ss_290.png 05-Aug-2014 20:43:41 [---] Resuming GPU computation 05-Aug-2014 20:44:27 [---] Suspending GPU computation - computer is in use :-) Ah, there it is! ID: 1552687 ·

ivan Volunteer tester Send message Joined: 5 Mar 01 Posts: 783 Credit: 348,560,338 RAC: 223	Message 1552955 - Posted: 6 Aug 2014, 16:53:29 UTC - in response to Message 1552687. Last modified: 6 Aug 2014, 16:54:28 UTC Ah, there it is! Well, I got both my hologram reconstruction and s@h compiled and running on the Jetson today. The holograms run about 10x slower than on my GTX 750 Ti (1.5 frames/sec for a 4Kx4K reconstruction). No real problems with the s@h, just the missing include I reported last January, and I had to edit the config file to remove the old compute capabilities that nvcc didn't like and put in 3.2 for the Tegra. The first WU has just finished; Run time 50 min 57 sec, CPU time 21 min 32 sec. Not validated yet. Run time is just about twice what I'm currently achieving with the 750 Ti, but that's running two at once. ID: 1552955 ·

qbit Volunteer tester Send message Joined: 19 Sep 04 Posts: 630 Credit: 6,868,528 RAC: 0	Message 1552966 - Posted: 6 Aug 2014, 17:19:47 UTC - in response to Message 1552955. The first WU has just finished; Run time 50 min 57 sec, CPU time 21 min 32 sec. Not validated yet. Run time is just about twice what I'm currently achieving with the 750 Ti, but that's running two at once. From your stderr out: setiathome enhanced x41zc, Cuda 6.00 Where did you get this version from? ID: 1552966 ·

mavrrick Send message Joined: 12 Apr 00 Posts: 17 Credit: 1,894,993 RAC: 4	Message 1552975 - Posted: 6 Aug 2014, 17:32:51 UTC - in response to Message 1552966. I suspect he compiled it himself based on his previous posts. I am really curious about how he set this up and well, and if he will share directions later for those adventurous enough to attempt it after him :). That is pretty awesome performance for a system that NVidia says is using 5 watts under real work loads. It would be nice to see that validated, but even at 10 watts that is some awesome crunching. This topic is actually kind of making me a little annoyed at my own gear now. I am seeing how truly bad my desktop really is and am to the point I would rather not turn it on at all. Considering just selling it off for parts. To build something more energy efficient. Now all you need to do is setup a about 8 of them for a cluster solution, and let it churn out some WU. ID: 1552975 ·

ivan Volunteer tester Send message Joined: 5 Mar 01 Posts: 783 Credit: 348,560,338 RAC: 223	Message 1552992 - Posted: 6 Aug 2014, 18:57:13 UTC - in response to Message 1552966. The first WU has just finished; Run time 50 min 57 sec, CPU time 21 min 32 sec. Not validated yet. Run time is just about twice what I'm currently achieving with the 750 Ti, but that's running two at once. From your stderr out: setiathome enhanced x41zc, Cuda 6.00 Where did you get this version from? As mavrrick says, I compiled it myself. However, the hard part isn't s@h; the hard part is compiling BOINC. It has so many prerequisites. The basic instructions are here. git clone git://boinc.berkeley.edu/boinc-v2.git boinc cd boinc git tag [Note the version corresponding to the latest recommendation.] git checkout client_release/<required release>; git status ./_autosetup ./configure --disable-server --enable-manager make -j n [where n is the number of cores/threads at your disposal] The problems you will have is first finding the libraries and utilities that _autosetup wants, then ensuring that you have g++ installed, and then finding all the libraries and development packs that configure wants (you need the -devs for the header definition files). The final hurdle, if you want to use the boincmgr graphical command interface, is getting wxWidgets. It tends not to be included in repositories for modern distributions now so you have to try to compile it yourself. Which I haven't managed lately as BOINC wants an old version which was (apparently) badly coded and gives lots of problems with the newest, smartest gcc/g++ compilers. You may need to just learn how to use the boinccmd command-line controller... The simplest way to then compile s@h was detailed back in January, in this thread. cd <directory your boinc directory is in> svn checkout -r1921 https://setisvn.ssl.berkeley.edu/svn/branches/sah_v7_opt/Xbranch cd Xbranch [edit client/analyzeFuncs.h and add the line '#include <unistd.h>'] sh ./_autosetup sh ./configure BOINCDIR=../boinc --enable-sse2 --enable-fast-math make -j n This assumes you've installed the CUDA SDK and added the appropriate locations to your PATH and LD_LIBRARY_PATH environment variables, but that's well-covered in the Nvidia documentation. As I alluded to above, you will probably have to edit the configure file too, to make sure obsolete gencode entries are removed and appropriate ones for your kit are included. Oh, and drop the --enable-sse2 if you're compiling for other than Intel/AMD CPUs. ID: 1552992 ·

HAL9000 Volunteer tester Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57	Message 1553003 - Posted: 6 Aug 2014, 19:32:19 UTC - in response to Message 1552975. I suspect he compiled it himself based on his previous posts. I am really curious about how he set this up and well, and if he will share directions later for those adventurous enough to attempt it after him :). That is pretty awesome performance for a system that NVidia says is using 5 watts under real work loads. It would be nice to see that validated, but even at 10 watts that is some awesome crunching. This topic is actually kind of making me a little annoyed at my own gear now. I am seeing how truly bad my desktop really is and am to the point I would rather not turn it on at all. Considering just selling it off for parts. To build something more energy efficient. Now all you need to do is setup a about 8 of them for a cluster solution, and let it churn out some WU. A home-built cluster is not much of an advantage for SETI@home or BOINC. As you have tot run the app on each node in the cluster. Most of my computers are on for other reasons. So I run SETI@home on them. My past several system upgrades have been to increase my system efficiency. More than to increase the system performance. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ ID: 1553003 ·

mavrrick Send message Joined: 12 Apr 00 Posts: 17 Credit: 1,894,993 RAC: 4	Message 1553006 - Posted: 6 Aug 2014, 19:49:17 UTC - in response to Message 1552975. That is pretty awesome performance for a system that NVidia says is using 5 watts under real work loads. The more I look at Nvidia's support page this doesn't make much since. I don't think this applies to pushing the Cuda cores to their limit. It would be interesting to get a power meter on it to see what it's usage is. ID: 1553006 ·

HAL9000 Volunteer tester Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57	Message 1553011 - Posted: 6 Aug 2014, 20:19:34 UTC - in response to Message 1553006. Last modified: 6 Aug 2014, 20:21:17 UTC That is pretty awesome performance for a system that NVidia says is using 5 watts under real work loads. The more I look at Nvidia's support page this doesn't make much since. I don't think this applies to pushing the Cuda cores to their limit. It would be interesting to get a power meter on it to see what it's usage is. "the Kepler GPU in Tegra K1 consists of 192 CUDA cores and consumes less than two watts. Average power measured on GPU power rail while playing a collection of popular mobile games." Under full load from SETI@home the GPU average power consumption may be much more. As the load will be as varied as when playing a game. I just did a quick test with a i5-3470. The iGPU averages 3.6w in GPUz under full load, from an app called HeavyLoad. While playing flash games the average reading in GPUz is 0.4w. I don't have much else game wise to check on this system. As it is my cubical system. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ ID: 1553011 ·

ivan Volunteer tester Send message Joined: 5 Mar 01 Posts: 783 Credit: 348,560,338 RAC: 223	Message 1553058 - Posted: 6 Aug 2014, 21:49:55 UTC - in response to Message 1553006. That is pretty awesome performance for a system that NVidia says is using 5 watts under real work loads. The more I look at Nvidia's support page this doesn't make much since. I don't think this applies to pushing the Cuda cores to their limit. It would be interesting to get a power meter on it to see what it's usage is. The Jetson docs I was reading yesterday said that total at-the-wall consumption was (IIRC) 10.somethng W. Then it went through the chain describing the inefficiencies (20% loss in the power brick, etc...). It did stress that because it was a development chip the peripherals hadn't been chosen for low power consumption. Next time I power down my home system I'll remove my power-meter and apply it to the Jetson instead. Remember, though, that the Jetson runs its cores at a lower frequency than many PCI-e video cards and the memory bus is narrower, which drops power consumption. /home/ubuntu/CUDA-SDK/NVIDIA_CUDA-6.0_Samples/bin/armv7l/linux/release/gnueabihf/deviceQuery Starting... CUDA Device Query (Runtime API) version (CUDART static linking) Detected 1 CUDA Capable device(s) Device 0: "GK20A" CUDA Driver Version / Runtime Version 6.0 / 6.0 CUDA Capability Major/Minor version number: 3.2 Total amount of global memory: 1746 MBytes (1831051264 bytes) ( 1) Multiprocessors, (192) CUDA Cores/MP: 192 CUDA Cores GPU Clock rate: 852 MHz (0.85 GHz) Memory Clock rate: 924 Mhz Memory Bus Width: 64-bit L2 Cache Size: 131072 bytes Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096) Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 32768 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 1 copy engine(s) Run time limit on kernels: No Integrated GPU sharing Host Memory: Yes Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled Device supports Unified Addressing (UVA): Yes Device PCI Bus ID / PCI location ID: 0 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.0, CUDA Runtime Version = 6.0, NumDevs = 1, Device0 = GK20A Result = PASS ID: 1553058 ·

mavrrick Send message Joined: 12 Apr 00 Posts: 17 Credit: 1,894,993 RAC: 4	Message 1553085 - Posted: 6 Aug 2014, 22:43:06 UTC - in response to Message 1553058. Well I was just looking over some docs and saw a peak power draw into the 40's. I am not as much into the hardware as I use to be and it got me thinking what would probably be the biggest consumer of power. The peripherals are a given as that was brought up on the doc I was looking over about power, but I was also wondering how they quantified typical real work load. I was just thinking that running a CUDA SETI@home app isn't typical real world load. 10 Watts seems very reasonable to me The comment about a Jetson TK1 cluster was really about the two ways I see to increase efficiency. You either get a faster cpu's/GPU that do the work in a shorter amount of time or you get many smaller cpu's that don't use as much energy and do more WU at one time just each takes longer. Goes back to the question about what is more efficient. If it only takes 4 TK1's to complete the same amount of work as a regular high end desktop and it uses 1/5 the power to run those WU then that would be the most energy efficient way to go. I was being sarcastic but that was my thought when I said it. Someone on the TK1 developer forums has a cluster of like 9 nodes setup. I may be the minority on this site, but I don't run my systems 24/7 anymore. I have one box at my house that is and it is my server, so it has to be up. The rest are just clients so I got all the power saving goodness setup on them, and let them sleep normally if not being used. The second most used system in a HTPC which was built to be rather power efficient, although I am sure I could do better now. This is what drives this for me. I would love to still contribute, but want to do it in a rather green manor. ARM Devices that are low power seem to be possibly the best option. Just hoping it can keep the power company from raiding my wallet. You could almost say this is research to see if I can find a way for me to get back into contributing :) much again. I also like what you talked about with the Baytrail-D system. If you don't mind me asking which ones do you have? ID: 1553085 ·

©2025 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.