v7 7.00 not getting CPU time; Recent Estimated Credit

Message boards : Number crunching : v7 7.00 not getting CPU time; Recent Estimated Credit
Message board moderation

To post messages, you must log in.

AuthorMessage
Saz

Send message
Joined: 22 Aug 12
Posts: 5
Credit: 513,176
RAC: 0
Canada
Message 1662443 - Posted: 8 Apr 2015, 3:52:38 UTC

I am finding that SETI v7 7.00 is not getting CPU time on a system where I am also running Einstein FGRP4-SSE2 tasks. If I happen to get an Astropulse task then that task will get scheduled normally on my GPUs.

In discussion over in the Einstein forum, http://einstein.phys.uwm.edu/forum_thread.php?id=11236 the current BOINC scheduler design documents were referenced. The behaviour I am observing could hypothetically be explained if BOINC's Recent Estimated Credit for Astropulse on nvidia or intel GPU is approximately 30 times higher than actual granted credit. If that were the case, then BOINC might think that the Einstein CPU tasks are balancing out the credits from the Astropulse GPU tasks.

Is it plausible that the BOINC Estimated Credit for SETI Astropulse GPU could be so different than actual credits?

The behaviour I have observed so far is consistent with the possibility that the v7 7.00 CPU jobs are not going to get scheduled until they hit EDF. :(
ID: 1662443 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1662506 - Posted: 8 Apr 2015, 8:11:42 UTC - in response to Message 1662443.  

There's one certain way to find out, and that's to enable the <work_fetch_debug> log flag, and read the actual figures in the Event Log.

My experience is the opposite: I've run GPUGrid and SETI on the same machines for several years. GPUGrid awards much higher credit (and has an impressive list of proper scientific publications too - it's not all about the credit), but the REC for the two projects is essentially the same.
ID: 1662506 · Report as offensive
Saz

Send message
Joined: 22 Aug 12
Posts: 5
Credit: 513,176
RAC: 0
Canada
Message 1662685 - Posted: 8 Apr 2015, 18:57:38 UTC - in response to Message 1662506.  

It appears that in the last 10 days, I have been credited with about 48500 from Einstein (CPU jobs), and the Recent Estimated Credit for it is currently about 2967. In the same 10 days, I have been credited with about 10000 for SETI (including Astropulse GPU jobs) and that the Recent Estimated Credit for it is about 4072.

REC is an average, so BOINC must be estimating that I was granted about 40720 for SETI during those 10 days, and must be estimating that I was granted about 29670 for Einstein. That makes the BOINC credit estimate for the SETI work about 4 times too high, and the BOINC credit estimate for the Einstein work about 0.6 times the actual.

The difficulty is not in the credits themselves (SETI has always been conservative on credits), but rather that the REC at least appears to be leading to incorrect scheduling.

Using the rr simulation debugging flags, BOINC does think all of the SETI CPU jobs will run normally before their deadline. If the REC explanation is correct, then I calculate that BOINC should start scheduling the SETI CPU jobs in about another 3.7 days, provided that no GPU jobs come in.

I guess I will keep watching...
ID: 1662685 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1662714 - Posted: 8 Apr 2015, 20:10:00 UTC - in response to Message 1662685.  

You also have to consider that, YES, you did 10 days work on a project, but not granted anything yet because it has yet to be validated by a 2nd person.

SETI has a fairly long window for work to be verified, so you may not see credit for another 30 days, or 90 if it has to be resent. A month from now your credits 'should' come up.
ID: 1662714 · Report as offensive
Harri Liljeroos
Avatar

Send message
Joined: 29 May 99
Posts: 3991
Credit: 85,281,665
RAC: 126
Finland
Message 1663157 - Posted: 9 Apr 2015, 18:05:19 UTC - in response to Message 1662685.  

I've come to similar conclusions by trying to adjust the project shares between Seti and Einstein so that they would get about same RAC. But Seti's inability to produce constant flow of work has made it difficult to track. Currently Einstein's share is 12 and Seti's 50. That was giving almost what I wanted until the current server problems here hit us.

To make it more interesting, I'm running LHC with a share of 150. This keeps the CPU constantly busy with LHC work (when available) and as LHC don't have GPU application, Seti and Einstein use my GTX970 totally. 3 WUs at a time for both Einstein and Seti. Einstein is currently allowed to use the CPU as well but LHC dominates there and all Einstein CPU WUs (only very few at a time) run basically only at EDF mode usually starting just about 20 hours before the deadline.
ID: 1663157 · Report as offensive
Saz

Send message
Joined: 22 Aug 12
Posts: 5
Credit: 513,176
RAC: 0
Canada
Message 1663403 - Posted: 10 Apr 2015, 4:11:43 UTC

The SETI jobs started running earlier today, starting about 22 hours ago, which was earlier than the 3.x days my hypothesis suggested. Whether by coincidence or not, first glance suggests that none of them started until all Einstein jobs with earlier deadline were either completed or in progress.

15 of the SETI jobs have completed; most of them have validated for about 35 granted credits, with another bunch of validations spread about 80 to 140 credits. At the moment SETI and Einstein are getting 3 CPUs each. It is not the Earliest Deadline jobs that are necessarily being run for SETI, but as all the jobs appear to have been downloaded the same day, it might be the case that it is running them in Earliest Downloaded order.

I will have a look at the event log later and see if anything of interest showed up that might explain why it decided to run the SETI jobs. Perhaps the REC balanced sooner than I estimated.
ID: 1663403 · Report as offensive
Saz

Send message
Joined: 22 Aug 12
Posts: 5
Credit: 513,176
RAC: 0
Canada
Message 1663981 - Posted: 11 Apr 2015, 5:47:52 UTC - in response to Message 1663403.  

The event logs showed that at the time that the SETI jobs started running, the REC had not yet balanced for SETI and Einstein (SETI was still showing a fair bit higher), and although the calculated priorities for both projects had been narrowing, Einstein continued to be more negative. There was nothing at all obvious based upon REC and priority and work queues as to why BOINC decided to start running the SETI jobs (and kept running through them until they were all complete.) All that I can find is that it appears that the SETI jobs started running when all Einstein jobs with earlier deadlines had finished.

Einstein tasks usually have much tighter deadlines than SETI v7 7.00 tasks.

At the moment only Einstein tasks are in my queue. Eventually some SETI tasks will flow in; those will have much later deadlines than any of the Einstein tasks currently in the queue. If the deadline scenario turns out to be accurate, then those SETI jobs will not start and the Einstein jobs will keep running. As those Einstein tasks complete and report, the work queue filler will ask Einstein if there are more tasks to run to top up the emptying queue. For a period of several days, each time that happens and Einstein provides new works, the shorter deadline it requests would, under this scenario, end up with those newer Einstein tasks scheduled before the SETI tasks. Perhaps eventually the REC will balance or EDF will be triggered. But if not, then under this scenario Einstein would keep control of the CPU job queue until finally the top-up tasks started having deadlines later than some of the SETI CPU tasks. Work through all the earlier-deadline tasks, eventually one of the SETI tasks becomes the task with the earliest deadline, and somehow that opens the stuck gateway and allows the SETI tasks to schedule normally with Einstein tasks, roughly half of the CPUs effectively devoted to each of the two projects. Until all the SETI projects get cleared, leaving Einstein still going. The equality mode could potentially continue indefinitely, I guess, but SETI is more prone to have periods without available work units, even if only due to the regular weekly maintenance, so the SETI queue is more likely to empty completely. Leading us back to the Einstein-in-control state of the scenario.

Note that at this time I have no explanation within the design documents as to how this locking could occur; maybe REC comes into it. The above scenario is a model consistent with what has been observed on my system; I might not be seeing the whole of the proverbial elephant.
ID: 1663981 · Report as offensive

Message boards : Number crunching : v7 7.00 not getting CPU time; Recent Estimated Credit


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.