Message boards :
News :
Tests of new scheduler features.
Message board moderation
Previous · 1 . . . 13 · 14 · 15 · 16 · 17 · Next
Author | Message |
---|---|
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
:DDDDDDDDDDDDD |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
EXACTLY! The very nature of this pure sociologically-driven feature. Give more credits always, NOT less. And you will have happy users. And feel free to do inflation, there is no gold storage needed to cover all credits issued (AFAIK there is no gold storage to cover every $ issued too already, but it's another story :P ;D ) |
Send message Joined: 11 Dec 08 Posts: 198 Credit: 658,573 RAC: 0 ![]() |
For multithreaded apps, the elapsed time is useless, and CPU time is king - sorry, queen. I do hope you find that properly catered for in the code. There are certainly formulae relating elapsed time to uniformly multithreadeded implementations, but even those simple ones contain non-linear scaling & algorithm dependant communication overhead factors. I highly doubt they'd be included. Then you'd have to model alternate parallel mappings/topologies with different communication cost, like hypercube. |
Send message Joined: 14 Oct 05 Posts: 1137 Credit: 1,848,733 RAC: 0 ![]() |
... As I remember from the last time Eric was kind enough to post a graph of the project pfc average for Sah, it was clearly settling toward a value around 0.2 which I took to be the same old effect of David considering the Whetstone benchmark a peak flops measurement. I think it possible that adjusting the rsc_fpops_est values gradually to make that pfc average move toward 1.0 might be what is needed for CreditNew to work better. Certainly adapting to the underlying assumptions of the method shouldn't hurt. A 30% increase at all angle ranges would be a start in that direction. Having new task estimated runtimes increased by 30% would have an effect on work fetch, and host averages would take some time to adapt, but it ought to increase granted credits at least temporarily. If part of the increase persisted after a week or two, further adjustment could be considered. Joe |
![]() Send message Joined: 10 Feb 12 Posts: 107 Credit: 305,151 RAC: 0 ![]() |
Any chance it's the lack of those old Intel apps (the ones that had to be taken offline) that is causing the drop? Those were almost 100% faster than stock. So if some credit-giving Boinc machine somewhere doesn't know that CPUs can do twice the work, then of course it is gonna award half the credit and rightly so. In other words is there a chance that CreditNew is (pretty much) working, that any rsc_fpops_est changes/compensation made for V7 were correct, and it's just a simple case of Boinc not knowing how fast the CPUs are? I mean, how would it? As a mental exercise (completely theoretical question): If an optimized app that behaved exactly like the withdrawn (for good reason) optimized V6 CPU app was introduced into Seti Main's ecosystem right now... would everything fall into place? Would all the numbers make sense all of a sudden? Edit: this whole post assumes that GPU credit is using CPU credit as a benchmark. If that is not the case, then this whole post is wrong out the gate. |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
Hardly possible. Anonymous platform fraction not too big. |
Send message Joined: 18 May 06 Posts: 280 Credit: 26,477,429 RAC: 0 ![]() |
I haven't been following this thread, so sorry if this is a duplicate. My 7970s are getting issued cal_ati tasks. I think the 7970s don't support cal or brooke+ or whatever it's called. In any case, the tasks run on and on, but no load on the GPUs. Dublin, California Team: SETI.USA ![]() |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
I haven't been following this thread, so sorry if this is a duplicate. If it doesnt error still let it goes. There were reports of success for HD7xxx cards already (suprisingly). So don't abort task if you see even slow progress. And report link of result when it finished. |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
I haven't been following this thread, so sorry if this is a duplicate. I see lots of 203 (0xcb) EXIT_ABORTED_VIA_GUI at least on one of your hosts. Please keep in mind that this doesn't help beta testing in ANY way to abort tasks this way. If you don't want to participate in testing - opt out from AstroPulse. Else try to crunch what server offer to your host. EDIT: and more on this: It's result from your host: http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=14146946. As you can see Tahiti GPU you use can crunch CAL AP. Cause it's proven already there is no need for subsequent AP testing on these hosts for now. Better to stop AP work fetch on SETI beta then. Your massive task abortions just waste server bandwidth (and your own time to make them). |
Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0 ![]() |
@ Eric, Did you turn off 'VLAR to Kepler' here at Beta, too? I'm seeing VLAR active on a CPU host, but 'got 0 new tasks' (no reason given) for Kepler/CUDA requests. Although VLARs were disruptive on the Main project, it would be helpful to have them allowed here so that tests on possible solutions can continue. |
![]() Send message Joined: 15 Mar 05 Posts: 1547 Credit: 27,183,456 RAC: 0 ![]() |
Right now they both running the same scheduler binary. Maybe I need to add an app option "send VLAR to GPU". ![]() |
Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0 ![]() |
Right now they both running the same scheduler binary. Maybe I need to add an app option "send VLAR to GPU". Yes please, if it could be done without too much bother. Default to 'no', ideally. |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
Right now they both running the same scheduler binary. Maybe I need to add an app option "send VLAR to GPU". It would be the best solution indeed. Default "no" that will allow peoples w/o big GUI lags opt in and process VLARs on GPU helping project to balance load while keeping all others away from "laggy tasks". |
Send message Joined: 29 May 06 Posts: 1037 Credit: 8,440,339 RAC: 0 ![]() |
Eric, there seems to have been a spike in Errored AP tasks/Wu's with Too many errors (may have bug) since the AP apps were released last night: This is not fair Looking at one of the hosts that Errored on the ATI OpenCL AP app show it going out to a host with a too old a driver: http://setiathome.berkeley.edu/show_host_detail.php?hostid=5215447 1.4.900 may or may not have OpenCL support, depends if they used the Cat 10.12 APP edition or the Normal Cat 10.12 edition, or the Cat 11.1 edition (where OpenCL support is included) (they all use the same CAL version), the minimum needs to be at least Cat 11.2 (1.4.1016) for (ati_opencl_100) tasks that is when OpenCL support was always included in the driver, and possibly later. I don't have any Ideas why the cal_ati AP apps are erroring. Claggy |
![]() Send message Joined: 15 Mar 05 Posts: 1547 Credit: 27,183,456 RAC: 0 ![]() |
I'll take a look and grant credit for the failures. ![]() |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
<core_client_version>6.10.18</core_client_version> <![CDATA[ <message> CreateProcess() failed - </message> ]]> no ideas of what summened such failure too. |
![]() Send message Joined: 15 Mar 05 Posts: 1547 Credit: 27,183,456 RAC: 0 ![]() |
I'm deprecating the cal_ati version. The executable on USB flash drive I used yesterday to transfer the version is now unreadable. I'm guessing that the executable was damaged when when I wrote it to the flash drive. ![]() |
![]() Send message Joined: 15 Mar 05 Posts: 1547 Credit: 27,183,456 RAC: 0 ![]() |
It looks like the brook DLLs were corrupted as well. Which raises the question, how do I release a new version without running into the same versioning problems as the with the CUDA22 and CUDA23 versions of SETI@home. Releasing a new version will overwrite the bad DLLs on the server, but machines that already have the bad dlls won't download the new ones and will fail with a checksum error. We could add a version number to the brook DLLs, but machines would still find the old versions first. Unless there's some way to rebuild the brook dlls with new names there's going to be trouble. ![]() |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
They are part of Brook+ runtime so no easy way at least. Maybe worth to discuss this flaw with David ? There should be some backup way for such situations in design... EDIT: There was some mechanism to delete files from client - Einstein uses it to delete files time to time. Can it be applied to DLL ? That way on first update cal_ati will be deprecated, on next (ot the same) client recive file deletion request and on next update new one will be issued. |
![]() Send message Joined: 15 Mar 05 Posts: 1547 Credit: 27,183,456 RAC: 0 ![]() |
There is a way to delete files remotely, but it doesn't seem to be working too well for cudart.dll and cufft.dll. There's no message from the client that indicates whether it was successful or not. I did a test with a revised version for a couple hours. 2291 cal_ati results went out. This far 3 have come back completed (overflows) and 127 have come back with errors. Depending on the numbers, I might send out the delete message for brook.dll and brook_cal.dll to hosts that have returned errors. BOINC really needs a "delete old app version" message for cases like this. ![]() |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.