Lunatics_x41g_linux64

Author	Message
jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1660954 - Posted: 3 Apr 2015, 2:57:05 UTC - in response to Message 1660841. Looking good. None of the new WUs have validated yet but I am not seeing any overflows which is a good sign. Seeing some [Cuda 6] valids there now, so Cuda 3.2 no good on Maxwell confirmed for Linux. (for this application anyway, as on Windows) I have done some research in/on linux and a hand full of GTX780's (1- max 4). What would be the best way to deliver the code to the community To Help Us All to do more science before we get the Jason's x42? (Jason?) I have done some kernel optimizations (780 specific, not tested anywhere else. just on my computer) and some stream-inclined/induced/oriented changes to the already Well and Good optimized Jason's and his precursors code. What would be the most neutral way to publish? (I can not host any piece of a code for three-four years) ?? I'll drop the source to Your mail box or whatever. I'm running two MB at at time with 3 GPUs and in between an AP on GPU or CPU if available. See for yourselves (on top hosts) .. (remember to divide the time by 2 for any MB). And Boom! published XBranch on Github last commit is me shovelling in the exisitng x41zc code. Please Fork, modify, test and submit pull requests for discussion/collaboration getting them back into the master :D "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1660954 ·

Andrew Send message Joined: 28 Mar 15 Posts: 47 Credit: 1,053,596 RAC: 0	Message 1660988 - Posted: 3 Apr 2015, 5:34:49 UTC Last modified: 3 Apr 2015, 5:50:13 UTC So far the compile of the Xbranch I did is working smoothly and performance seems ok. Nothing has errored out, but BOINC is completely out to lunch on how to judge estimated computation size and times... http://setiathome.berkeley.edu/results.php?hostid=7533120 ID: 1660988 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1660995 - Posted: 3 Apr 2015, 6:01:43 UTC - in response to Message 1660988. Last modified: 3 Apr 2015, 6:07:44 UTC So far the compile of the Xbranch I did is working smoothly and performance seems ok. Nothing has errored out, but BOINC is completely out to lunch on how to judge estimated computation size and times... http://setiathome.berkeley.edu/results.php?hostid=7533120 Jason.. I could try and do a staticly linked build of what I'm using if you have others who want to try it out? Static'd be good, if it can get more static. What I'd suggest is we all consolidate (yourself, Petri etc), and pile into the Github thing. So if you have specific modification to the makefiles, do a fork and pull request to master, and we can all discuss it there. That's a new thing for me too, so a bit of an adventure, but getting that jangling feeling that it's the right way to go (which happens sometimes). Then we can choose to field specific tests to wider audiences pretty quickly. There's a lot I want to change going into x42, and some of that is process for involvement/collaboration, testing and publication, another is no-compromise re-jigging to prepare for the next application, part of which include abstracting/wrapping some problematic BoincApi Code that isn;t going to be fixed. Estimate-wise, yeah lots (and I mean LOTS) of research and development been spent over about 5-7 years on that (specifically the 'CreditNew' mechanism). A customised client with improved estimation/prediction isn't out of the question, though plenty is understood to offer a superior mechanism to any project that wants it for server side improvement down the road (better estimates and fairer RAC normalising to the COBBLESTONE_SCALE as intended) . "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1660995 ·

Andrew Send message Joined: 28 Mar 15 Posts: 47 Credit: 1,053,596 RAC: 0	Message 1661022 - Posted: 3 Apr 2015, 7:45:45 UTC - in response to Message 1660995. So far the compile of the Xbranch I did is working smoothly and performance seems ok. Nothing has errored out, but BOINC is completely out to lunch on how to judge estimated computation size and times... http://setiathome.berkeley.edu/results.php?hostid=7533120 Jason.. I could try and do a staticly linked build of what I'm using if you have others who want to try it out? Static'd be good, if it can get more static. What I'd suggest is we all consolidate (yourself, Petri etc), and pile into the Github thing. So if you have specific modification to the makefiles, do a fork and pull request to master, and we can all discuss it there. That's a new thing for me too, so a bit of an adventure, but getting that jangling feeling that it's the right way to go (which happens sometimes). Then we can choose to field specific tests to wider audiences pretty quickly. There's a lot I want to change going into x42, and some of that is process for involvement/collaboration, testing and publication, another is no-compromise re-jigging to prepare for the next application, part of which include abstracting/wrapping some problematic BoincApi Code that isn;t going to be fixed. Estimate-wise, yeah lots (and I mean LOTS) of research and development been spent over about 5-7 years on that (specifically the 'CreditNew' mechanism). A customised client with improved estimation/prediction isn't out of the question, though plenty is understood to offer a superior mechanism to any project that wants it for server side improvement down the road (better estimates and fairer RAC normalising to the COBBLESTONE_SCALE as intended) . I'll see what I can come up with. Depending on what library routines the program lies on it could be easy or hell... Trying right now to see what is causing the binary to keep crashing when copied over to the laptop. ID: 1661022 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 1661151 - Posted: 3 Apr 2015, 18:34:09 UTC - in response to Message 1660988. So far the compile of the Xbranch I did is working smoothly and performance seems ok. Nothing has errored out, but BOINC is completely out to lunch on how to judge estimated computation size and times... http://setiathome.berkeley.edu/results.php?hostid=7533120 I gather your app_init.xml file has set the CUDA app to do tasks sent as SETI@home v7 (anonymous platform, CPU) as well as those sent as SETI@home v7 (anonymous platform, NVIDIA GPU). That does add even more uncertainty into BOINC's estimate methodology. On the Application details for host 7533120 the "CPU" side has 45 completed which have produced an "Average processing rate 58.11 GFLOPS". However, that includes several .vlar tasks which are relatively slow on the CUDA app, so probably when an average is eventually established for the "GPU" tasks it will be somewhat higher. The In progress tasks for computer 7533120 does show 9 GPU tasks were sent April 1, but it takes 11 completed for BOINC to start using the average. If it were my system, I'd put a <flops>8e10</flops> in the app_init section for the GPU app, and set the project preferences to not send CPU tasks. That way the existing CPU done on GPU work would continue to be done but all new tasks would be properly assigned to the GPU and estimated based on that <flops> value until the server average takes over. The Quadro FX 4800 is similar hardware to a GTX 260, and by NVIDIA formula rated at about 462 GFLOPS single precision. Dividing that by 6 probably gives a reasonable approximation of where the x41zc Average processing rate will settle when .vlars are not included. Joe ID: 1661151 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14671 Credit: 200,643,578 RAC: 874	Message 1661157 - Posted: 3 Apr 2015, 19:00:53 UTC - in response to Message 1660995. I wish all participants in this thread the best of success with your endeavours, especially if it results in a long-overdue stock Linux application for CUDA. But I would urge a modicum of caution. The binaries you are trying to compile are only useful when they run in the context of the BOINC framework: you have to understand at least the basic operation of that framework, and how your application fits into it, if you hope to get the final product of your collaborations accepted as a project stock binary. Three points have arisen in the course of this thread, which have prompted me to write those words. 1) A couple of days ago, Andrew wrote "For the missing boinc_temporary_exit, i commented them out and it finished compiling". For a test piece to validate the build system, that's absolutely fine - but the temporary exit hooks are part of BOINC API library which enables the application to run in the real world, in the way volunteers expect BOINC applications to run. If that function call isn't operating properly, then it is likely that other parts of the BOINC API are also missing or sub-optimal. These include things like ensuring that the application runs - and more importantly, stops running - when BOINC tells it to; that the application runs on the correct hardware when multiple GPUs are present; and so on. This bit of code isn't the easiest to work with, and tends to be neglected, but it is important and you need to persevere with it until the code compiles without commenting out important calls. 2) dsh commented on the 'nice' values (19 for CPU apps, 10 for GPU apps) for the running tasks. These are default values set by the BOINC client when it launches applications: you would probably see different values when the apps are run standalone for testing. They should be appropriate for general use, but Jason's X-branch code provides for the default priorities to be over-ridden by the application itself, in response to a configuration file: I suspect that's one area where the code may need to be modified to achieve the same effect in the Linux context, and there may be others before the application, BOINC, and operating system are operating in complete harmony. 3) The dreaded initial run-time estimates. Estimate-wise, yeah lots (and I mean LOTS) of research and development been spent over about 5-7 years on that (specifically the 'CreditNew' mechanism). A customised client with improved estimation/prediction isn't out of the question, though plenty is understood to offer a superior mechanism to any project that wants it for server side improvement down the road (better estimates and fairer RAC normalising to the COBBLESTONE_SCALE as intended). This is one area where your applications, and the computers they're running on, can be absolved of all blame. The runtime estimations, especially in the early stages, are entirely (but indirectly) set by the SETI server in Berkeley, and you can safely ignore them for development purposes. But you should have some understanding of how they work, so you are not taken by surprise when the estimate (as it will) changes, and possibly results in your computer fetching more work than you really intended during testing. Basically, the estimate will jump suddenly to more realistic values when you have returned 11 tasks which have successfully validated at the server against other people's work. You can monitor your progress via the 'Application details' link in your computer's detail page on this website. Here are Andrew's Application details for host 7533120: it's a little worrying that it's still showing no 'completed' tasks in the 'SETI@home v7 (anonymous platform, NVIDIA GPU)' section at the bottom, from the batch which were allocated two days ago. Thinking about these issues has prompted me to finally write an email to David Anderson, which has been on the back burner for the last week or so. If you think your estimates are bad, try the ones on another project where my tasks are getting a 5-week estimate to go with a 5-day deadline! I've suggested - not for the first time - that the BOINC developers themselves should re-visit the niggles in that runtime estimation process, in time for the fifth anniversary of CreditNew next month. ID: 1661157 ·

Andrew Send message Joined: 28 Mar 15 Posts: 47 Credit: 1,053,596 RAC: 0	Message 1661237 - Posted: 3 Apr 2015, 23:05:35 UTC - in response to Message 1661157. [quote]I wish all participants in this thread the best of success with your endeavours, especially if it results in a long-overdue stock Linux application for CUDA. From my experience I don't think you'll ever be able to do a stock app for all linux distro's. In the 15 years I've been using it, it seems that its inevitable you need to do a release for each distro you want to support. Right now I'm trying to copy it between my desktop and laptop both running Gentoo and the laptop refuses to work.. all I can determine at this point through GDB is that its failing with a SIGILL in the boinc-api's COPROCS function.. I'm guessing with x86_64 there shouldn't be enough differences between intel and amd to cause an issue. 1) A couple of days ago, Andrew wrote "For the missing boinc_temporary_exit, i commented them out and it finished compiling". For a test piece to validate the build system, that's absolutely fine - but the temporary exit hooks are part of BOINC API library which enables the application to run in the real world, in the way volunteers expect BOINC applications to run. If that function call isn't operating properly, then it is likely that other parts of the BOINC API are also missing or sub-optimal. These include things like ensuring that the application runs - and more importantly, stops running - when BOINC tells it to; that the application runs on the correct hardware when multiple GPUs are present; and so on. This bit of code isn't the easiest to work with, and tends to be neglected, but it is important and you need to persevere with it until the code compiles without commenting out important calls. This is no longer an issue. I was using the api from the SVN repository, but have since pulled down a newer version 7.2.42 from the git repo which has eliminated that issue. ID: 1661237 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14671 Credit: 200,643,578 RAC: 874	Message 1661239 - Posted: 3 Apr 2015, 23:11:47 UTC - in response to Message 1661237. [quote]I wish all participants in this thread the best of success with your endeavours, especially if it results in a long-overdue stock Linux application for CUDA. From my experience I don't think you'll ever be able to do a stock app for all linux distro's. In the 15 years I've been using it, it seems that its inevitable you need to do a release for each distro you want to support. Right now I'm trying to copy it between my desktop and laptop both running Gentoo and the laptop refuses to work.. all I can determine at this point through GDB is that its failing with a SIGILL in the boinc-api's COPROCS function.. I'm guessing with x86_64 there shouldn't be enough differences between intel and amd to cause an issue. 1) A couple of days ago, Andrew wrote "For the missing boinc_temporary_exit, i commented them out and it finished compiling". For a test piece to validate the build system, that's absolutely fine - but the temporary exit hooks are part of BOINC API library which enables the application to run in the real world, in the way volunteers expect BOINC applications to run. If that function call isn't operating properly, then it is likely that other parts of the BOINC API are also missing or sub-optimal. These include things like ensuring that the application runs - and more importantly, stops running - when BOINC tells it to; that the application runs on the correct hardware when multiple GPUs are present; and so on. This bit of code isn't the easiest to work with, and tends to be neglected, but it is important and you need to persevere with it until the code compiles without commenting out important calls. This is no longer an issue. I was using the api from the SVN repository, but have since pulled down a newer version 7.2.42 from the git repo which has eliminated that issue. That's great, and useful information for the next coder who tries to follow your lead. ID: 1661239 ·

Andrew Send message Joined: 28 Mar 15 Posts: 47 Credit: 1,053,596 RAC: 0	Message 1661295 - Posted: 4 Apr 2015, 3:06:54 UTC - in response to Message 1661151. So far the compile of the Xbranch I did is working smoothly and performance seems ok. Nothing has errored out, but BOINC is completely out to lunch on how to judge estimated computation size and times... http://setiathome.berkeley.edu/results.php?hostid=7533120 I gather your app_init.xml file has set the CUDA app to do tasks sent as SETI@home v7 (anonymous platform, CPU) as well as those sent as SETI@home v7 (anonymous platform, NVIDIA GPU). That does add even more uncertainty into BOINC's estimate methodology. On the Application details for host 7533120 the "CPU" side has 45 completed which have produced an "Average processing rate 58.11 GFLOPS". However, that includes several .vlar tasks which are relatively slow on the CUDA app, so probably when an average is eventually established for the "GPU" tasks it will be somewhat higher. The In progress tasks for computer 7533120 does show 9 GPU tasks were sent April 1, but it takes 11 completed for BOINC to start using the average. If it were my system, I'd put a <flops>8e10</flops> in the app_init section for the GPU app, and set the project preferences to not send CPU tasks. That way the existing CPU done on GPU work would continue to be done but all new tasks would be properly assigned to the GPU and estimated based on that <flops> value until the server average takes over. The Quadro FX 4800 is similar hardware to a GTX 260, and by NVIDIA formula rated at about 462 GFLOPS single precision. Dividing that by 6 probably gives a reasonable approximation of where the x41zc Average processing rate will settle when .vlars are not included. Joe The (anonymous platform, NVIDIA GPU) was from when I was trying to use the file from Jason's website, those tasks were wiped off the machine while configuring the app_info.xml file to use the app I built from source. Unless someone is able to cancel those pending tasks that will end up expiring. Actually anything at this point anything that isn't done that doesn't fall under the SETI@home v7 (anonymous platform, CPU) can be manually expired. Right now BOINC has somehow gone from overestimating by 40+ hours to underestimating by an hour or so which isn't as bad.. I've added the <flops> to the app_info.xml and will see how that affects things.. Thanks ID: 1661295 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 1661354 - Posted: 4 Apr 2015, 6:53:31 UTC - in response to Message 1661295. I really meant to add the <flops> to the GPU app_version, but from your recent posts I guess there isn't any. If you'd post a copy of your app_info.xml we can figure out why not. Processing tasks which the servers think are being done on CPU with your GPU skews the statistics. A few hundred done that way won't make any long term difference, but if you ever decide to actually do tasks on CPU they'll have short estimates for quite awhile. Joe ID: 1661354 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14671 Credit: 200,643,578 RAC: 874	Message 1661360 - Posted: 4 Apr 2015, 7:04:29 UTC - in response to Message 1661354. Yes, I agree with Joe that if you are doing the crunching on the GPU, you should declare the app as a GPU app in app_info.xml - work with BOINC's scheduling tools, rather than try to fight against them. It will make for a much more scalable and better-behaved application in the end. ID: 1661360 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1661363 - Posted: 4 Apr 2015, 7:26:09 UTC - in response to Message 1661360. Last modified: 4 Apr 2015, 7:26:40 UTC The basic principles involved are that if you have a robust system (referring to Boinc) then it will cope with all sorts of odd situations, while if a flea farting in Brazil sends it into meltdown, then probably more care is warranted. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1661363 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51477 Credit: 1,018,363,574 RAC: 1,004	Message 1661364 - Posted: 4 Apr 2015, 7:29:03 UTC - in response to Message 1661363. The basic principles involved are that if you have a robust system (referring to Boinc) then it will cope with all sorts of odd situations, while if a flea farting in Brazil sends it into meltdown, then probably more care is warranted. ROFLMAO, Jason..... "Time is simply the mechanism that keeps everything from happening all at once." ID: 1661364 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14671 Credit: 200,643,578 RAC: 874	Message 1661365 - Posted: 4 Apr 2015, 7:31:11 UTC - in response to Message 1661363. He could always try running it under Synecdoche instead. ID: 1661365 ·

Andrew Send message Joined: 28 Mar 15 Posts: 47 Credit: 1,053,596 RAC: 0	Message 1661366 - Posted: 4 Apr 2015, 7:32:19 UTC - in response to Message 1661360. Yes, I agree with Joe that if you are doing the crunching on the GPU, you should declare the app as a GPU app in app_info.xml - work with BOINC's scheduling tools, rather than try to fight against them. It will make for a much more scalable and better-behaved application in the end. This is what I have adapted from Jason's sample in the app package he provided. <app_info> <app> <name>setiathome_v7</name> </app> <file_info> <name>setiathome_x41zc_x86_64-pc-linux-gnu_cuda55</name> <executable/> </file_info> <app_version> <app_name>setiathome_v7</app_name> <version_num>701</version_num> <platform>x86_64-pc-linux-gnu</platform> <avg_ncpus>1</avg_ncpus> <max_ncpus>1</max_ncpus> <file_ref> <file_name>setiathome_x41zc_x86_64-pc-linux-gnu_cuda55</file_name> <main_program/> </file_ref> </app_version> <file_info> <name>setiathome_x41zc_x86_64-pc-linux-gnu_cuda55</name> <executable/> </file_info> <app_version> <app_name>setiathome_v7</app_name> <version_num>701</version_num> <platform>x86_64-pc-linux-gnu</platform> <plan_class>cuda55</plan_class> <flops>8e10</flops> <avg_ncpus>1</avg_ncpus> <max_ncpus>1</max_ncpus> <coproc> <type>CUDA</type> <count>1.0</count> </coproc> <file_ref> <file_name>setiathome_x41zc_x86_64-pc-linux-gnu_cuda55</file_name> <main_program/> </file_ref> </app_version> </app_info> ID: 1661366 ·

Andrew Send message Joined: 28 Mar 15 Posts: 47 Credit: 1,053,596 RAC: 0	Message 1661367 - Posted: 4 Apr 2015, 7:35:00 UTC Last modified: 4 Apr 2015, 7:41:12 UTC I do have some news regarding the debugging I've been at... I've narrowed it down to somehow the binary I'm building on my machine is getting AVX instructions optimized in so that explains the faulting on the Intel processor which doesn't support AVX. Now I have to figure out how to stop GCC from generating the avx optimizations. If anyone wants to take a crack at running the existing binary I'm running as long as you have an AMD processor with AVX instruction set on it, it should work. It should work with any Intel has too I would assume. If your really curious.. this is the offending code.. from the COPROCS:COPROCS 00000000004c9e40 <_ZN7COPROCSC1Ev>: 4c9e40: 41 55 push %r13 ->>>>>>>> 4c9e42: c5 f0 57 c9 vxorps %xmm1,%xmm1,%xmm1 4c9e46: 4c 8d 8f 88 a6 00 00 lea 0xa688(%rdi),%r9 4c9e4d: 41 54 push %r12 4c9e4f: c5 f8 57 c0 vxorps %xmm0,%xmm0,%xmm0 4c9e53: 55 push %rbp 4c9e54: 48 8d 6f 08 lea 0x8(%rdi),%rbp 4c9e58: 53 push %rbx 4c9e59: 48 89 ea mov %rbp,%rdx 4c9e5c: 48 89 fb mov %rdi,%rbx 4c9e5f: 48 81 ec f8 14 00 00 sub $0x14f8,%rsp 4c9e66: c5 f8 28 15 c2 37 04 vmovaps 0x437c2(%rip),%xmm2 # 50d630 <_ZL14XML_MAX_INDENT+0x1590> ID: 1661367 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1661368 - Posted: 4 Apr 2015, 7:42:30 UTC - in response to Message 1661365. He could always try running it under Synecdoche instead. Looks awesome. I might need to have a conference with those guys in due course ;) "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1661368 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14671 Credit: 200,643,578 RAC: 874	Message 1661369 - Posted: 4 Apr 2015, 7:48:22 UTC - in response to Message 1661366. That - subject to further checking after more coffee - looks to be properly structured, although I'm still a bit worried about the <platform> tags. Leave them for now while you have work on board. As you get things sorted out, you should - eventually - remove the first app_version section, which declares the cuda55 executable to be a CPU app. But again, not while you have work on board. But so far, it looks as if you have only been allocated CPU work - which suggests that possibly BOINC hasn't "requested new work for NVIDIA GPU". Is BOINC reporting successful detection of your GPU at startup, shown in the initial messages in the Event Log? If BOINC doesn't detect the GPU first, it won't request work for the second <app_version>, the one with the cuda55 plan_class. ID: 1661369 ·

Claggy Volunteer tester Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4	Message 1661406 - Posted: 4 Apr 2015, 11:25:42 UTC - in response to Message 1661237. This is no longer an issue. I was using the api from the SVN repository, but have since pulled down a newer version 7.2.42 from the git repo which has eliminated that issue. A word of warning about building the api from a tagged version, not all the api changesets get applied to them, David recommends building the api from head, then you do get the most upto date api, some people don't like that as there is no version control, but it's what we have at the moment. Claggy ID: 1661406 ·

Juha Volunteer tester Send message Joined: 7 Mar 04 Posts: 388 Credit: 1,857,738 RAC: 0	Message 1661410 - Posted: 4 Apr 2015, 11:31:18 UTC - in response to Message 1661367. I do have some news regarding the debugging I've been at... I've narrowed it down to somehow the binary I'm building on my machine is getting AVX instructions optimized in so that explains the faulting on the Intel processor which doesn't support AVX. Now I have to figure out how to stop GCC from generating the avx optimizations. Did you say you run Gentoo? Where everything is compiled from source and carefully fine-tuned to match your hardware and needs? I'm guessing you have -march=native or something equivalent somewhere. ID: 1661410 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.

Lunatics_x41g_linux64_cuda32.7z