Linux boinc version 7.6.31 can't run optimized applications

Author	Message
Zytra Send message Joined: 29 Aug 16 Posts: 36 Credit: 58,532,935 RAC: 0	Message 1820629 - Posted: 29 Sep 2016, 20:45:24 UTC Good tip on suspending tasks prior to making changes/restarting the client! I don't mind trying that app being worked on, is it posted on this board? I've google searched but I must be using the wrong keywords... ID: 1820629 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1820641 - Posted: 29 Sep 2016, 22:06:21 UTC - in response to Message 1820629. Well, petri33 is the creator and keeper of the Special code. It sounds as though he may be building a new version shortly. If you just want to run the version that's at beta until then you can download the files here; http://boinc2.ssl.berkeley.edu/beta/download/setiathome_8.10_x86_64-pc-linux-gnu__opencl_nvidia_sah http://boinc2.ssl.berkeley.edu/beta/download/MultiBeam_Kernels_r3430.cl You'll need to make a mb_cmdline_pc-linux_opencl_nvidia_sah.txt file and construct an app_info.xml section for the App in your current app_info.xml. ID: 1820641 ·

Zytra Send message Joined: 29 Aug 16 Posts: 36 Credit: 58,532,935 RAC: 0	Message 1820662 - Posted: 29 Sep 2016, 23:05:55 UTC Last modified: 29 Sep 2016, 23:15:59 UTC thanks, got both files and here's my first attempt at making the app_info file... (I haven't added the CPU stuff yet) (and I haven't tested this yet) <app_info> <app> <name>setiathome_v8</name> </app> <file_info> <name>setiathome_8.10_x86_64-pc-linux-gnu__opencl_nvidia_sah</name> <executable/> </file_info> <file_info> <name>MultiBeam_Kernels_r3430.cl</name> </file_info> <file_info> <name>mb_cmdline-opencl_nvidia_sah.txt</name> </file_info> <app_version> <app_name>setiathome_v8</app_name> <platform>x86_64-pc-linux-gnu</platform> <version_num>810</version_num> <plan_class>opencl_ati5_nocal</plan_class> <coproc> <type>ATI</type> <count>1</count> </coproc> <avg_ncpus>0.1</avg_ncpus> <max_ncpus>0.1</max_ncpus> <file_ref> <file_name>setiathome_8.10_x86_64-pc-linux-gnu__opencl_nvidia_sah</file_name> <main_program/> </file_ref> <file_ref> <file_name>MultiBeam_Kernels_r3430.cl</file_name> </file_ref> <file_ref> <file_name>mb_cmdline-opencl_nvidia_sah.txt</file_name> <open_name>mb_cmdline.txt</open_name> </file_ref> </app_version> </app_info> I've used an openCL ATI file as draft, and I don't know how to: 1. edit the "plan_class" class - so I've left the ATI in it for now but that won't work 2. I've created a txt file: mb_cmdline-opencl_nvidia_sah.txt containing these command lines modifiers: -tt 1500 -hp -period_iterations_num 3 -sbs 768 -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64 Also, what should I have in mb_cmdline.txt? I've looked at the ATI OpenCL stuff I ran before and the file was empty ID: 1820662 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1820677 - Posted: 29 Sep 2016, 23:49:20 UTC - in response to Message 1820662. thanks, got both files and here's my first attempt at making the app_info file... (I haven't added the CPU stuff yet) (and I haven't tested this yet) <app_info> <app> <name>setiathome_v8</name> </app> <file_info> <name>setiathome_8.10_x86_64-pc-linux-gnu__opencl_nvidia_sah</name> <executable/> </file_info> <file_info> <name>MultiBeam_Kernels_r3430.cl</name> </file_info> <file_info> <name>mb_cmdline-opencl_nvidia_sah.txt</name> </file_info> <app_version> <app_name>setiathome_v8</app_name> <platform>x86_64-pc-linux-gnu</platform> <version_num>810</version_num> <plan_class>opencl_nvidia_sah</plan_class> <coproc> <type>NVIDIA</type> <count>0.5</count> </coproc> <avg_ncpus>0.1</avg_ncpus> <max_ncpus>0.1</max_ncpus> <file_ref> <file_name>setiathome_8.10_x86_64-pc-linux-gnu__opencl_nvidia_sah</file_name> <main_program/> </file_ref> <file_ref> <file_name>MultiBeam_Kernels_r3430.cl</file_name> </file_ref> <file_ref> <file_name>mb_cmdline-opencl_nvidia_sah.txt</file_name> <open_name>mb_cmdline.txt</open_name> </file_ref> </app_version> </app_info> I've used an openCL ATI file as draft, and I don't know how to: 1. edit the "plan_class" class - so I've left the ATI in it for now but that won't work 2. I've created a txt file: mb_cmdline-opencl_nvidia_sah.txt containing these command lines modifiers: -sbs 768 -spike_fft_thresh 4096 -oclfft_tune_gr 256 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -period_iterations_num 3 -instances_per_device 2 Also, what should I have in mb_cmdline.txt? I've looked at the ATI OpenCL stuff I ran before and the file was empty I changed it a little. It should work after you add the CPU section. There shouldn't be a mb_cmdline.txt file, just a mb_cmdline-opencl_nvidia_sah.txt file with the altered commands in it. The line <open_name>mb_cmdline.txt</open_name> just tells the program to open the mb_cmdline-opencl_nvidia_sah.txt file. The above changes will have the GPU run 2 tasks at once. So far it looks as though the R480s are running 2 tasks without any problems. If there were going to be problems you should have seen a few Inconclusives and maybe an Error or 2 by now. The R290s would produce a large number of Triplets when running 2 tasks at once. ID: 1820677 ·

Zytra Send message Joined: 29 Aug 16 Posts: 36 Credit: 58,532,935 RAC: 0	Message 1820703 - Posted: 30 Sep 2016, 1:24:08 UTC That worked right away. So the 980Ti is going for 2 at a time. The RX480's were not going 2 at a time earlier (I just had installed the second GPU). Now though, I've added the modifiers for 2 instances each and it seems to be going well. Not sure yet if it's better than just doing one. I'll try to monitor those... Still new to this and for me the best monitoring is looking at the RAC is evolving daily... :/ nothing new on the 1080 side of things. Almost tempted to switch it to the Linux machine. But we'll see how that goes first. ID: 1820703 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1820728 - Posted: 30 Sep 2016, 2:48:18 UTC - in response to Message 1820703. Hmmmm, OK. I thought running 2 in the high 600 sec range was a little too fast. So, if it runs one BLC4 task in around 675 secs you would need to finish 2 tasks in under 1350 secs to have an advantage. I'm not sure what these numbers are showing, it shows two taking over 4000 secs and a few faster. That doesn't make sense. The longest it should take running 2 at a time should be 1350. At least none were Inconclusive...yet. The best way to monitor your results is to look at the completed tasks as they appear on the webpage, https://setiathome.berkeley.edu/results.php?hostid=8089460&offset=240 If the just completed tasks are listed as Inconclusive or an Error, it's not good. Those run times just don't look right compared to the times when running one task at a time. Oh well, see how the next ones run. ID: 1820728 ·

Urs Echternacht Volunteer tester Send message Joined: 15 May 99 Posts: 692 Credit: 135,197,781 RAC: 211	Message 1820733 - Posted: 30 Sep 2016, 3:09:04 UTC - in response to Message 1820728. Set priority higher on linux, if you run more than one task at a time per GPU. (see cc_config.xml in your BOINC directory, check <no_priority_change> ) _\\|/_ U r s ID: 1820733 ·

Zytra Send message Joined: 29 Aug 16 Posts: 36 Credit: 58,532,935 RAC: 0	Message 1820764 - Posted: 30 Sep 2016, 5:50:23 UTC Last modified: 30 Sep 2016, 6:03:04 UTC which one should I look at: CPU time or Run time? If Run times then yes I agree it looks like I should go back to 1 unit per GPU. too bad. Oh well maybe there are some more little tweaks to do with the command line modifiers. Hard to say if the 980 is doing any good with 2 units per GPU. I'm hoping it does, in the top ranking of computers there are a few machines with 4 of those with ridiculous RAC running various OS's, so there must be a trick. But out of 3 valid units they're around 900 seconds for each of the 2 instances on that GPU so that's pretty good. much better than the 1800+ than the RX was doing on that machine. [edit: looked at the completed/not yet validated and this GPU is super steady all units are crunched within 20 seconds of 860s average] The Run Time of the CPU's on this machine look pretty terrible though. ID: 1820764 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1820780 - Posted: 30 Sep 2016, 8:06:11 UTC - in response to Message 1820764. Looking at the r480 Window's Host it's hard to understand why the GPU Run Times are so long. You could add the cmdline -hp to the cmdline file, but I don't think that will help any. It would probably be best to go back to one task at a time unless you can get the times down. The -hp cmd won't work in Linux, you'll have to use the no_priority_change option in Linux. But, the Linux machine seems to be working normally. Look at the link in Urs post, basically it will look similar to; <cc_config> <log_flags> <sched_op_debug>1</sched_op_debug> </log_flags> <options> <no_priority_change>1</no_priority_change> <use_all_gpus>1</use_all_gpus> <save_stats_days>365</save_stats_days> </options> </cc_config> I finally got the new HDD installed and installed Ubuntu 14.04.4, it does have Kernel 4.2. So, if anyone wants to try and see if their r480 will show the correct Compute Unit count and Clock rate in Ubuntu that would be the system to install. ID: 1820780 ·

Zytra Send message Joined: 29 Aug 16 Posts: 36 Credit: 58,532,935 RAC: 0	Message 1820885 - Posted: 30 Sep 2016, 17:54:14 UTC Last modified: 30 Sep 2016, 18:05:33 UTC I've removed the 2 instances per GPU on the machine with the 2 RX's. what's really odd though is that the other machine (also Windows10) that's running the one RX seems be performing well with 2 instances (around 1000s per unit). edit: the 1080 is going - with no additional command line settings, it's doing about 4-6 mn per Unit. I've just added the command line suggested before and try running 2 units at once. we'll see how that goes ID: 1820885 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1820896 - Posted: 30 Sep 2016, 18:26:15 UTC - in response to Message 1820885. ...what's really odd though is that the other machine (also Windows10) that's running the one RX seems be performing well with 2 instances (around 1000s per unit). Yes that is interesting. A quick comparison from this end shows the single GPU machine is using more CPU time and doesn't appear to have any cmdline settings. So, I'd suggest trying two at a time on the dual GPU machine making sure you have at least 3 or maybe 4 Free CPU cores to feed the GPUs. After trying that for an hour or so, you could duplicate the cmdline settings that the single GPU machine has to the dual GPU machine. See if that makes a difference. ID: 1820896 ·

Zytra Send message Joined: 29 Aug 16 Posts: 36 Credit: 58,532,935 RAC: 0	Message 1820897 - Posted: 30 Sep 2016, 18:32:13 UTC - in response to Message 1820896. ...what's really odd though is that the other machine (also Windows10) that's running the one RX seems be performing well with 2 instances (around 1000s per unit). Yes that is interesting. A quick comparison from this end shows the single GPU machine is using more CPU time and doesn't appear to have any cmdline settings. So, I'd suggest trying two at a time on the dual GPU machine making sure you have at least 3 or maybe 4 Free CPU cores to feed the GPUs. After trying that for an hour or so, you could duplicate the cmdline settings that the single GPU machine has to the dual GPU machine. See if that makes a difference. I was wondering about that... Do you just use the boinc preferences to assign only 4 out of 8 CPU's to crunching? Wouldn't any gain from having free CPU's feeding into those GPU's be countered by not having all 8 CPU's crunching? CPU's are very slow compared to modern GPU's but looking at the single RX machine, the gain from doing 2 instances instead of 1 is relatively small. ID: 1820897 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1820909 - Posted: 30 Sep 2016, 19:14:21 UTC - in response to Message 1820897. Well, you could try some complicated math formulas , or you could use visuals. So, look at the bottom of these pages. This Mac is running 3 tasks at a time with the appropriate freed CPU cores, it has 12 CPUs, http://setiathome.berkeley.edu/top_hosts.php?sort_by=expavg_credit&offset=20 This Mac is the closest similar competitor that is running single GPU tasks and has 24 CPUs, http://setiathome.berkeley.edu/top_hosts.php?sort_by=expavg_credit&offset=120 Mac #1 - 41k Mac #2 - 27k There are a few different ways to reserve CPU cores. SETI uses the <max_ncpus>0.76</max_ncpus> tags. Using that setting of 0.76 should free 3 cores when running 4 instances. That way the reserved CPUs are available when the GPUs aren't running. Others change the BOINC settings Use at Most __ % of CPU Time to free appropriate CPUs. There is also a flag that can be placed the cc_config.xml file to set the number of CPUs to use, all of those methods have their advantages. ID: 1820909 ·

Zytra Send message Joined: 29 Aug 16 Posts: 36 Credit: 58,532,935 RAC: 0	Message 1820918 - Posted: 30 Sep 2016, 19:44:36 UTC trying this on the 1080 machine before I try it on the RX's I changed both the <max_ncpus> tag from 0.2 to 0.5 (only running 2 instances) so that 1 CPU thread is dedicated to the 2 GPU's. But I also changed the <avg_ncpus> tag to 0.5... is that ok? or is it better to leave to 0.04? right now the 1080 is still crunching 2 WU's in under 10 minute each... here's a screenshot; it shows 0.5 CPU and 0.5 GPU per thread. https://s17.postimg.org/7i7k5fzun/1080screen.png ID: 1820918 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1821313 - Posted: 2 Oct 2016, 15:40:13 UTC - in response to Message 1820918. Everything seems to be working well. The 2 r480s are giving consistent times. I missed what happened when running it with the cmdline settings and freed cores. It doesn't have any cmdlines now, have you tried it with just the basic settings? I tried building a newer Linux AMD App but every build using r3532 resulted in many Computation Errors. Strange it doesn't happen with the r3505 builds. I also found the current nVidia Linux build p_zi+ doesn't care much for driver 364.19. I went back to 367.44 to see how that works. ID: 1821313 ·

Zytra Send message Joined: 29 Aug 16 Posts: 36 Credit: 58,532,935 RAC: 0	Message 1821457 - Posted: 3 Oct 2016, 6:22:13 UTC The times went up 4000s instead of 1300 as it should have when running 2 instances. Single instances were doing 650 in average. I moved a few things around and I may have removed the command line, I'm not sure. Yes everything seems to be working well. I've got a FuryX card to test next. I've had it lying around for a while and I can't remember if it's even working still. I think if it's good it should perform well. ID: 1821457 ·

©2025 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.