Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /disks/centurion/b/carolyn/b/home/boincadm/projects/beta/html/inc/boinc_db.inc on line 147
Tests of new scheduler features.

Tests of new scheduler features.

Message boards : News : Tests of new scheduler features.
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 13 · 14 · 15 · 16 · 17 · Next

AuthorMessage
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 46288 - Posted: 10 Jun 2013, 17:15:04 UTC - in response to Message 46277.  


One number to rule them all, one number to find them,
one number to bring them all and in the darkness bind them.

:DDDDDDDDDDDDD
ID: 46288 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 46289 - Posted: 10 Jun 2013, 17:20:08 UTC - in response to Message 46278.  


OTOH nobody is going to complain if they get more credit!


EXACTLY! The very nature of this pure sociologically-driven feature. Give more credits always, NOT less. And you will have happy users. And feel free to do inflation, there is no gold storage needed to cover all credits issued (AFAIK there is no gold storage to cover every $ issued too already, but it's another story :P ;D )
ID: 46289 · Report as offensive
jason_gee
Volunteer tester

Send message
Joined: 11 Dec 08
Posts: 198
Credit: 658,573
RAC: 0
Australia
Message 46293 - Posted: 10 Jun 2013, 20:02:44 UTC - in response to Message 46285.  
Last modified: 10 Jun 2013, 20:05:44 UTC

For multithreaded apps, the elapsed time is useless, and CPU time is king - sorry, queen. I do hope you find that properly catered for in the code.


There are certainly formulae relating elapsed time to uniformly multithreadeded implementations, but even those simple ones contain non-linear scaling & algorithm dependant communication overhead factors. I highly doubt they'd be included. Then you'd have to model alternate parallel mappings/topologies with different communication cost, like hypercube.
ID: 46293 · Report as offensive
Josef W. Segur
Volunteer tester

Send message
Joined: 14 Oct 05
Posts: 1137
Credit: 1,848,733
RAC: 0
United States
Message 46298 - Posted: 11 Jun 2013, 14:26:44 UTC - in response to Message 46273.  

...
Still very far from convergence.
Cause same app used I would expect more similar APRs if v7 task credit granting would be OK. It's not OK obviously still.

and I would have thought that with larger rsc_fpops_est to account for longer runtimes APR would automatically be smaller.

That said, APR is really the ratio of estimated operations to actual runtime.

IOW runtimes increased more than estimated.

With CreditNew if apps appear less efficient, that would certainly account for less credit. Ouch.

Eric, can you perhaps apply another 30% increase of rsc_fops_est across all AR?

I think that's the one screw you can turn to change credit awarded.
We certainly know that when other projects use insanely high rsc_fpops_est values, the tasks do get awarded a LOT of credit. Inversly if rsc_fpops_est is small it gives little credit. So if rsc_est_fpops wasn't increased as much as runtimes increased...

As I remember from the last time Eric was kind enough to post a graph of the project pfc average for Sah, it was clearly settling toward a value around 0.2 which I took to be the same old effect of David considering the Whetstone benchmark a peak flops measurement.

I think it possible that adjusting the rsc_fpops_est values gradually to make that pfc average move toward 1.0 might be what is needed for CreditNew to work better. Certainly adapting to the underlying assumptions of the method shouldn't hurt.

A 30% increase at all angle ranges would be a start in that direction. Having new task estimated runtimes increased by 30% would have an effect on work fetch, and host averages would take some time to adapt, but it ought to increase granted credits at least temporarily. If part of the increase persisted after a week or two, further adjustment could be considered.
                                                                  Joe
ID: 46298 · Report as offensive
Alex Storey
Volunteer tester
Avatar

Send message
Joined: 10 Feb 12
Posts: 107
Credit: 305,151
RAC: 0
Greece
Message 46299 - Posted: 12 Jun 2013, 13:40:47 UTC
Last modified: 12 Jun 2013, 13:48:20 UTC

Any chance it's the lack of those old Intel apps (the ones that had to be taken offline) that is causing the drop? Those were almost 100% faster than stock. So if some credit-giving Boinc machine somewhere doesn't know that CPUs can do twice the work, then of course it is gonna award half the credit and rightly so.

In other words is there a chance that CreditNew is (pretty much) working, that any rsc_fpops_est changes/compensation made for V7 were correct, and it's just a simple case of Boinc not knowing how fast the CPUs are? I mean, how would it?

As a mental exercise (completely theoretical question):
If an optimized app that behaved exactly like the withdrawn (for good reason) optimized V6 CPU app was introduced into Seti Main's ecosystem right now... would everything fall into place? Would all the numbers make sense all of a sudden?


Edit: this whole post assumes that GPU credit is using CPU credit as a benchmark. If that is not the case, then this whole post is wrong out the gate.
ID: 46299 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 46300 - Posted: 12 Jun 2013, 14:38:19 UTC - in response to Message 46299.  

Hardly possible.
Anonymous platform fraction not too big.
ID: 46300 · Report as offensive
zombie67 [MM]
Volunteer tester
Avatar

Send message
Joined: 18 May 06
Posts: 280
Credit: 26,477,429
RAC: 0
United States
Message 46302 - Posted: 12 Jun 2013, 17:43:33 UTC

I haven't been following this thread, so sorry if this is a duplicate.

My 7970s are getting issued cal_ati tasks. I think the 7970s don't support cal or brooke+ or whatever it's called. In any case, the tasks run on and on, but no load on the GPUs.
Dublin, California
Team: SETI.USA

ID: 46302 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 46305 - Posted: 13 Jun 2013, 7:40:41 UTC - in response to Message 46302.  

I haven't been following this thread, so sorry if this is a duplicate.

My 7970s are getting issued cal_ati tasks. I think the 7970s don't support cal or brooke+ or whatever it's called. In any case, the tasks run on and on, but no load on the GPUs.

If it doesnt error still let it goes. There were reports of success for HD7xxx cards already (suprisingly). So don't abort task if you see even slow progress. And report link of result when it finished.
ID: 46305 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 46306 - Posted: 13 Jun 2013, 7:45:13 UTC - in response to Message 46302.  
Last modified: 13 Jun 2013, 7:53:02 UTC

I haven't been following this thread, so sorry if this is a duplicate.

My 7970s are getting issued cal_ati tasks. I think the 7970s don't support cal or brooke+ or whatever it's called. In any case, the tasks run on and on, but no load on the GPUs.

I see lots of 203 (0xcb) EXIT_ABORTED_VIA_GUI at least on one of your hosts. Please keep in mind that this doesn't help beta testing in ANY way to abort tasks this way. If you don't want to participate in testing - opt out from AstroPulse. Else try to crunch what server offer to your host.

EDIT: and more on this:
It's result from your host: http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=14146946. As you can see Tahiti GPU you use can crunch CAL AP.
Cause it's proven already there is no need for subsequent AP testing on these hosts for now. Better to stop AP work fetch on SETI beta then. Your massive task abortions just waste server bandwidth (and your own time to make them).
ID: 46306 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,272,268
RAC: 0
United Kingdom
Message 46366 - Posted: 17 Jun 2013, 10:47:14 UTC

@ Eric,

Did you turn off 'VLAR to Kepler' here at Beta, too? I'm seeing VLAR active on a CPU host, but 'got 0 new tasks' (no reason given) for Kepler/CUDA requests.

Although VLARs were disruptive on the Main project, it would be helpful to have them allowed here so that tests on possible solutions can continue.
ID: 46366 · Report as offensive
Profile Eric J Korpela
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 15 Mar 05
Posts: 1547
Credit: 27,183,456
RAC: 0
United States
Message 46368 - Posted: 17 Jun 2013, 15:23:36 UTC - in response to Message 46366.  

Right now they both running the same scheduler binary. Maybe I need to add an app option "send VLAR to GPU".
ID: 46368 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,272,268
RAC: 0
United Kingdom
Message 46370 - Posted: 17 Jun 2013, 15:32:30 UTC - in response to Message 46368.  

Right now they both running the same scheduler binary. Maybe I need to add an app option "send VLAR to GPU".

Yes please, if it could be done without too much bother. Default to 'no', ideally.
ID: 46370 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 46376 - Posted: 17 Jun 2013, 21:02:20 UTC - in response to Message 46368.  
Last modified: 17 Jun 2013, 21:02:42 UTC

Right now they both running the same scheduler binary. Maybe I need to add an app option "send VLAR to GPU".


It would be the best solution indeed.
Default "no" that will allow peoples w/o big GUI lags opt in and process VLARs on GPU helping project to balance load while keeping all others away from "laggy tasks".
ID: 46376 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 29 May 06
Posts: 1037
Credit: 8,440,339
RAC: 0
United Kingdom
Message 46396 - Posted: 20 Jun 2013, 11:04:46 UTC
Last modified: 20 Jun 2013, 11:53:03 UTC

Eric, there seems to have been a spike in Errored AP tasks/Wu's with Too many errors (may have bug) since the AP apps were released last night:

This is not fair

Looking at one of the hosts that Errored on the ATI OpenCL AP app show it going out to a host with a too old a driver:

http://setiathome.berkeley.edu/show_host_detail.php?hostid=5215447

1.4.900 may or may not have OpenCL support, depends if they used the Cat 10.12 APP edition or the Normal Cat 10.12 edition, or the Cat 11.1 edition (where OpenCL support is included) (they all use the same CAL version),
the minimum needs to be at least Cat 11.2 (1.4.1016) for (ati_opencl_100) tasks that is when OpenCL support was always included in the driver, and possibly later.

I don't have any Ideas why the cal_ati AP apps are erroring.

Claggy
ID: 46396 · Report as offensive
Profile Eric J Korpela
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 15 Mar 05
Posts: 1547
Credit: 27,183,456
RAC: 0
United States
Message 46397 - Posted: 20 Jun 2013, 13:47:31 UTC - in response to Message 46396.  

I'll take a look and grant credit for the failures.
ID: 46397 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 46398 - Posted: 20 Jun 2013, 14:01:50 UTC

<core_client_version>6.10.18</core_client_version>
<![CDATA[
<message>
CreateProcess() failed -
</message>
]]>

no ideas of what summened such failure too.
ID: 46398 · Report as offensive
Profile Eric J Korpela
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 15 Mar 05
Posts: 1547
Credit: 27,183,456
RAC: 0
United States
Message 46399 - Posted: 20 Jun 2013, 15:24:34 UTC - in response to Message 46398.  

I'm deprecating the cal_ati version. The executable on USB flash drive I used yesterday to transfer the version is now unreadable. I'm guessing that the executable was damaged when when I wrote it to the flash drive.


ID: 46399 · Report as offensive
Profile Eric J Korpela
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 15 Mar 05
Posts: 1547
Credit: 27,183,456
RAC: 0
United States
Message 46400 - Posted: 20 Jun 2013, 18:28:31 UTC - in response to Message 46399.  

It looks like the brook DLLs were corrupted as well. Which raises the question, how do I release a new version without running into the same versioning problems as the with the CUDA22 and CUDA23 versions of SETI@home.

Releasing a new version will overwrite the bad DLLs on the server, but machines that already have the bad dlls won't download the new ones and will fail with a checksum error. We could add a version number to the brook DLLs, but machines would still find the old versions first.

Unless there's some way to rebuild the brook dlls with new names there's going to be trouble.
ID: 46400 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 46401 - Posted: 20 Jun 2013, 20:55:51 UTC
Last modified: 20 Jun 2013, 20:58:43 UTC

They are part of Brook+ runtime so no easy way at least.
Maybe worth to discuss this flaw with David ?
There should be some backup way for such situations in design...

EDIT:
There was some mechanism to delete files from client - Einstein uses it to delete files time to time.
Can it be applied to DLL ?
That way on first update cal_ati will be deprecated, on next (ot the same) client recive file deletion request and on next update new one will be issued.
ID: 46401 · Report as offensive
Profile Eric J Korpela
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 15 Mar 05
Posts: 1547
Credit: 27,183,456
RAC: 0
United States
Message 46403 - Posted: 20 Jun 2013, 22:23:44 UTC - in response to Message 46401.  


There is a way to delete files remotely, but it doesn't seem to be working too well for cudart.dll and cufft.dll. There's no message from the client that indicates whether it was successful or not.


I did a test with a revised version for a couple hours. 2291 cal_ati results went out. This far 3 have come back completed (overflows) and 127 have come back with errors.

Depending on the numbers, I might send out the delete message for brook.dll and brook_cal.dll to hosts that have returned errors.

BOINC really needs a "delete old app version" message for cases like this.


ID: 46403 · Report as offensive
Previous · 1 . . . 13 · 14 · 15 · 16 · 17 · Next

Message boards : News : Tests of new scheduler features.


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.