Tests of new scheduler features.

Author	Message
Raistmer Volunteer tester Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0	Message 46288 - Posted: 10 Jun 2013, 17:15:04 UTC - in response to Message 46277. One number to rule them all, one number to find them, one number to bring them all and in the darkness bind them. :DDDDDDDDDDDDD ID: 46288 ·

Raistmer Volunteer tester Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0	Message 46289 - Posted: 10 Jun 2013, 17:20:08 UTC - in response to Message 46278. OTOH nobody is going to complain if they get more credit! EXACTLY! The very nature of this pure sociologically-driven feature. Give more credits always, NOT less. And you will have happy users. And feel free to do inflation, there is no gold storage needed to cover all credits issued (AFAIK there is no gold storage to cover every $ issued too already, but it's another story :P ;D ) ID: 46289 ·

jason_gee Volunteer tester Send message Joined: 11 Dec 08 Posts: 198 Credit: 658,573 RAC: 0	Message 46293 - Posted: 10 Jun 2013, 20:02:44 UTC - in response to Message 46285. Last modified: 10 Jun 2013, 20:05:44 UTC For multithreaded apps, the elapsed time is useless, and CPU time is king - sorry, queen. I do hope you find that properly catered for in the code. There are certainly formulae relating elapsed time to uniformly multithreadeded implementations, but even those simple ones contain non-linear scaling & algorithm dependant communication overhead factors. I highly doubt they'd be included. Then you'd have to model alternate parallel mappings/topologies with different communication cost, like hypercube. ID: 46293 ·

Josef W. Segur Volunteer tester Send message Joined: 14 Oct 05 Posts: 1137 Credit: 1,848,733 RAC: 0	Message 46298 - Posted: 11 Jun 2013, 14:26:44 UTC - in response to Message 46273. ... Still very far from convergence. Cause same app used I would expect more similar APRs if v7 task credit granting would be OK. It's not OK obviously still. and I would have thought that with larger rsc_fpops_est to account for longer runtimes APR would automatically be smaller. That said, APR is really the ratio of estimated operations to actual runtime. IOW runtimes increased more than estimated. With CreditNew if apps appear less efficient, that would certainly account for less credit. Ouch. Eric, can you perhaps apply another 30% increase of rsc_fops_est across all AR? I think that's the one screw you can turn to change credit awarded. We certainly know that when other projects use insanely high rsc_fpops_est values, the tasks do get awarded a LOT of credit. Inversly if rsc_fpops_est is small it gives little credit. So if rsc_est_fpops wasn't increased as much as runtimes increased... As I remember from the last time Eric was kind enough to post a graph of the project pfc average for Sah, it was clearly settling toward a value around 0.2 which I took to be the same old effect of David considering the Whetstone benchmark a peak flops measurement. I think it possible that adjusting the rsc_fpops_est values gradually to make that pfc average move toward 1.0 might be what is needed for CreditNew to work better. Certainly adapting to the underlying assumptions of the method shouldn't hurt. A 30% increase at all angle ranges would be a start in that direction. Having new task estimated runtimes increased by 30% would have an effect on work fetch, and host averages would take some time to adapt, but it ought to increase granted credits at least temporarily. If part of the increase persisted after a week or two, further adjustment could be considered. Joe ID: 46298 ·

Alex Storey Volunteer tester Send message Joined: 10 Feb 12 Posts: 107 Credit: 305,151 RAC: 0	Message 46299 - Posted: 12 Jun 2013, 13:40:47 UTC Last modified: 12 Jun 2013, 13:48:20 UTC Any chance it's the lack of those old Intel apps (the ones that had to be taken offline) that is causing the drop? Those were almost 100% faster than stock. So if some credit-giving Boinc machine somewhere doesn't know that CPUs can do twice the work, then of course it is gonna award half the credit and rightly so. In other words is there a chance that CreditNew is (pretty much) working, that any rsc_fpops_est changes/compensation made for V7 were correct, and it's just a simple case of Boinc not knowing how fast the CPUs are? I mean, how would it? As a mental exercise (completely theoretical question): If an optimized app that behaved exactly like the withdrawn (for good reason) optimized V6 CPU app was introduced into Seti Main's ecosystem right now... would everything fall into place? Would all the numbers make sense all of a sudden? Edit: this whole post assumes that GPU credit is using CPU credit as a benchmark. If that is not the case, then this whole post is wrong out the gate. ID: 46299 ·

Raistmer Volunteer tester Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0	Message 46300 - Posted: 12 Jun 2013, 14:38:19 UTC - in response to Message 46299. Hardly possible. Anonymous platform fraction not too big. ID: 46300 ·

zombie67 [MM] Volunteer tester Send message Joined: 18 May 06 Posts: 280 Credit: 26,477,429 RAC: 0	Message 46302 - Posted: 12 Jun 2013, 17:43:33 UTC I haven't been following this thread, so sorry if this is a duplicate. My 7970s are getting issued cal_ati tasks. I think the 7970s don't support cal or brooke+ or whatever it's called. In any case, the tasks run on and on, but no load on the GPUs. Dublin, California Team: SETI.USA ID: 46302 ·

Raistmer Volunteer tester Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0	Message 46305 - Posted: 13 Jun 2013, 7:40:41 UTC - in response to Message 46302. I haven't been following this thread, so sorry if this is a duplicate. My 7970s are getting issued cal_ati tasks. I think the 7970s don't support cal or brooke+ or whatever it's called. In any case, the tasks run on and on, but no load on the GPUs. If it doesnt error still let it goes. There were reports of success for HD7xxx cards already (suprisingly). So don't abort task if you see even slow progress. And report link of result when it finished. ID: 46305 ·

Raistmer Volunteer tester Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0	Message 46306 - Posted: 13 Jun 2013, 7:45:13 UTC - in response to Message 46302. Last modified: 13 Jun 2013, 7:53:02 UTC I haven't been following this thread, so sorry if this is a duplicate. My 7970s are getting issued cal_ati tasks. I think the 7970s don't support cal or brooke+ or whatever it's called. In any case, the tasks run on and on, but no load on the GPUs. I see lots of 203 (0xcb) EXIT_ABORTED_VIA_GUI at least on one of your hosts. Please keep in mind that this doesn't help beta testing in ANY way to abort tasks this way. If you don't want to participate in testing - opt out from AstroPulse. Else try to crunch what server offer to your host. EDIT: and more on this: It's result from your host: http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=14146946. As you can see Tahiti GPU you use can crunch CAL AP. Cause it's proven already there is no need for subsequent AP testing on these hosts for now. Better to stop AP work fetch on SETI beta then. Your massive task abortions just waste server bandwidth (and your own time to make them). ID: 46306 ·

Richard Haselgrove Volunteer tester Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0	Message 46366 - Posted: 17 Jun 2013, 10:47:14 UTC @ Eric, Did you turn off 'VLAR to Kepler' here at Beta, too? I'm seeing VLAR active on a CPU host, but 'got 0 new tasks' (no reason given) for Kepler/CUDA requests. Although VLARs were disruptive on the Main project, it would be helpful to have them allowed here so that tests on possible solutions can continue. ID: 46366 ·

Eric J Korpela Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 15 Mar 05 Posts: 1547 Credit: 27,183,456 RAC: 0	Message 46368 - Posted: 17 Jun 2013, 15:23:36 UTC - in response to Message 46366. Right now they both running the same scheduler binary. Maybe I need to add an app option "send VLAR to GPU". ID: 46368 ·

Richard Haselgrove Volunteer tester Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0	Message 46370 - Posted: 17 Jun 2013, 15:32:30 UTC - in response to Message 46368. Right now they both running the same scheduler binary. Maybe I need to add an app option "send VLAR to GPU". Yes please, if it could be done without too much bother. Default to 'no', ideally. ID: 46370 ·

Raistmer Volunteer tester Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0	Message 46376 - Posted: 17 Jun 2013, 21:02:20 UTC - in response to Message 46368. Last modified: 17 Jun 2013, 21:02:42 UTC Right now they both running the same scheduler binary. Maybe I need to add an app option "send VLAR to GPU". It would be the best solution indeed. Default "no" that will allow peoples w/o big GUI lags opt in and process VLARs on GPU helping project to balance load while keeping all others away from "laggy tasks". ID: 46376 ·

Claggy Volunteer tester Send message Joined: 29 May 06 Posts: 1037 Credit: 8,440,339 RAC: 0	Message 46396 - Posted: 20 Jun 2013, 11:04:46 UTC Last modified: 20 Jun 2013, 11:53:03 UTC Eric, there seems to have been a spike in Errored AP tasks/Wu's with Too many errors (may have bug) since the AP apps were released last night: This is not fair Looking at one of the hosts that Errored on the ATI OpenCL AP app show it going out to a host with a too old a driver: http://setiathome.berkeley.edu/show_host_detail.php?hostid=5215447 1.4.900 may or may not have OpenCL support, depends if they used the Cat 10.12 APP edition or the Normal Cat 10.12 edition, or the Cat 11.1 edition (where OpenCL support is included) (they all use the same CAL version), the minimum needs to be at least Cat 11.2 (1.4.1016) for (ati_opencl_100) tasks that is when OpenCL support was always included in the driver, and possibly later. I don't have any Ideas why the cal_ati AP apps are erroring. Claggy ID: 46396 ·

Eric J Korpela Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 15 Mar 05 Posts: 1547 Credit: 27,183,456 RAC: 0	Message 46397 - Posted: 20 Jun 2013, 13:47:31 UTC - in response to Message 46396. I'll take a look and grant credit for the failures. ID: 46397 ·

Raistmer Volunteer tester Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0	Message 46398 - Posted: 20 Jun 2013, 14:01:50 UTC <core_client_version>6.10.18</core_client_version> <![CDATA[ <message> CreateProcess() failed - </message> ]]> no ideas of what summened such failure too. ID: 46398 ·

Eric J Korpela Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 15 Mar 05 Posts: 1547 Credit: 27,183,456 RAC: 0	Message 46399 - Posted: 20 Jun 2013, 15:24:34 UTC - in response to Message 46398. I'm deprecating the cal_ati version. The executable on USB flash drive I used yesterday to transfer the version is now unreadable. I'm guessing that the executable was damaged when when I wrote it to the flash drive. ID: 46399 ·

Eric J Korpela Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 15 Mar 05 Posts: 1547 Credit: 27,183,456 RAC: 0	Message 46400 - Posted: 20 Jun 2013, 18:28:31 UTC - in response to Message 46399. It looks like the brook DLLs were corrupted as well. Which raises the question, how do I release a new version without running into the same versioning problems as the with the CUDA22 and CUDA23 versions of SETI@home. Releasing a new version will overwrite the bad DLLs on the server, but machines that already have the bad dlls won't download the new ones and will fail with a checksum error. We could add a version number to the brook DLLs, but machines would still find the old versions first. Unless there's some way to rebuild the brook dlls with new names there's going to be trouble. ID: 46400 ·

Raistmer Volunteer tester Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0	Message 46401 - Posted: 20 Jun 2013, 20:55:51 UTC Last modified: 20 Jun 2013, 20:58:43 UTC They are part of Brook+ runtime so no easy way at least. Maybe worth to discuss this flaw with David ? There should be some backup way for such situations in design... EDIT: There was some mechanism to delete files from client - Einstein uses it to delete files time to time. Can it be applied to DLL ? That way on first update cal_ati will be deprecated, on next (ot the same) client recive file deletion request and on next update new one will be issued. ID: 46401 ·

Eric J Korpela Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 15 Mar 05 Posts: 1547 Credit: 27,183,456 RAC: 0	Message 46403 - Posted: 20 Jun 2013, 22:23:44 UTC - in response to Message 46401. There is a way to delete files remotely, but it doesn't seem to be working too well for cudart.dll and cufft.dll. There's no message from the client that indicates whether it was successful or not. I did a test with a revised version for a couple hours. 2291 cal_ati results went out. This far 3 have come back completed (overflows) and 127 have come back with errors. Depending on the numbers, I might send out the delete message for brook.dll and brook_cal.dll to hosts that have returned errors. BOINC really needs a "delete old app version" message for cases like this. ID: 46403 ·

©2025 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.