How to get one of my computers to ask for CPU work

Author	Message
Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340	Message 1664329 - Posted: 12 Apr 2015, 2:35:34 UTC Last modified: 12 Apr 2015, 2:42:01 UTC Now that things seem to be getting back to normal, I find that one of my machines is only asking for NVIDIA work, while the other asks for NVIDIA and CPU. The first has 200 WUs now (2 GPUs) while the other has 300 (CPU and 2 GPUs). This has been going on for about 9 hours now. No changes were made to any parameters or files by me since before the shutdown of the past few days, when both machines were getting CPU and GPU work. Looking at the Event Log, I see that at first it DID ask for CPU or CPU and NVIDIA, but it gave up on CPU after only a few tries. Is there any likely explanation for this phenomenon? ID: 1664329 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13746 Credit: 208,696,464 RAC: 304	Message 1664331 - Posted: 12 Apr 2015, 2:51:27 UTC - in response to Message 1664329. Click on the Projects Tab & then Properties. Any values for the CPU work fetch deferred for/interval? Grant Darwin NT ID: 1664331 ·

Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340	Message 1664338 - Posted: 12 Apr 2015, 3:08:35 UTC - in response to Message 1664331. Last modified: 12 Apr 2015, 3:18:09 UTC Click on the Projects Tab & then Properties. Any values for the CPU work fetch deferred for/interval? Yes - it says CPU work fetch deferral interval 5:20:00 (which makes no sense, since he has none). On the other machine, it has 0:20:00 (which makes sense, since he won't run out in 20 minutes). BTW: Running BOINC 7.0.64 on that machine. But it didn't do this before, so I don't think the version is relevant. ID: 1664338 ·

HAL9000 Volunteer tester Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57	Message 1664349 - Posted: 12 Apr 2015, 3:52:17 UTC - in response to Message 1664338. Click on the Projects Tab & then Properties. Any values for the CPU work fetch deferred for/interval? Yes - it says CPU work fetch deferral interval 5:20:00 (which makes no sense, since he has none). On the other machine, it has 0:20:00 (which makes sense, since he won't run out in 20 minutes). BTW: Running BOINC 7.0.64 on that machine. But it didn't do this before, so I don't think the version is relevant. As of right now both of your machines have 300 tasks. I'm guessing that BOINC sorted this out on its own? SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ ID: 1664349 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13746 Credit: 208,696,464 RAC: 304	Message 1664352 - Posted: 12 Apr 2015, 4:06:45 UTC - in response to Message 1664338. Click on the Projects Tab & then Properties. Any values for the CPU work fetch deferred for/interval? Yes - it says CPU work fetch deferral interval 5:20:00 (which makes no sense, since he has none) Actually it does make sense because it has none. Each time you ask for work, and don't get any, the work request backoff increases. Each time you report completed work, it gets reset. So when there is an outage, after about 5-6 requests when there is no work available the deferral will be up to 4-5 hours before it asks for work again. If it gets work then, well and good, if not the backoff starts increasing again with each unsuccessful attempt. Grant Darwin NT ID: 1664352 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874	Message 1664433 - Posted: 12 Apr 2015, 8:18:58 UTC Reduce your cache size - ask for less work. Your machines both have the maximum allowed number of workunits in progress. If you request new work without at the same time returning completed work, you'll get nothing. If you do return work, you'll just get enough to replace those returns - and the chances are they'll be GPU tasks. Enable work fetch debug - just one cycle will do - and note the value here: 12/04/2015 09:13:52 \| \| [work_fetch] --- state for NVIDIA GPU --- 12/04/2015 09:13:52 \| \| [work_fetch] shortfall 13163.73 nidle 0.00 saturated 7572.27 busy 0.00 That's the number of seconds before your GPUs need to request more work. Make your total cache size smaller than that number. When you next finish GPU work, click update: you will request CPU work only (GPU not needed) and you're running again. ID: 1664433 ·

Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340	Message 1664506 - Posted: 12 Apr 2015, 13:31:25 UTC - in response to Message 1664433. Last modified: 12 Apr 2015, 13:37:06 UTC Thanks, Richard. Next time I get in that fix I will try your solution. As others above noted, BOINC did fix itself eventually (after the 5:20 expired). I have been ignoring cache size settings since the project started limiting the number of WUs I could have in my queues. The cache limitations became meaningless for my crunchers (or so I thought). I do have a problem with the interval getting so long. In my case, I had no CPU and the servers certainly had plenty, so the excessive time delay hurt me with no benefit to the project. Perhaps this topic needs more analysis. ID: 1664506 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874	Message 1664523 - Posted: 12 Apr 2015, 14:19:18 UTC - in response to Message 1664506. Backoffs only really come into play in situations like this, recovery after an outage or work shortage. Once a host has an initial loading, and so long as work continues to be available reasonably consistently (doesn't have to be on every request), the important backoffs are are cleared every time a task finishes. Backoffs are most visible (and people get most irritated by them!) during recovery phases. But, from the point of view of BOINC and the projects it supports, that's probably when they are most needed to divide the limited amount of available work evenly amongst the population of crunchers. ID: 1664523 ·

Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340	Message 1667069 - Posted: 19 Apr 2015, 1:44:17 UTC Richard: turns out that if I do a MANUAL Update, the NVIDIA work fetch deferral interval gets reset to 10 minutes. Unfortunately, I7-3820-PC is in this state, and since he gets no GPU WUs ("No work available"), he gets the larger and larger deferral interval. He does get occasional CPU work, and KeplerBox, my other machine, is getting both. This really SUCKS. BOINC is screwed up, in my estimation. I understand the 5 minute interval when he asks for more work, but stretching it out when the system is running and generating work for the GPU is just stupid, even if my particular machine happens not to be getting any. It should only be stretched when there is no work being generated, since then the non-asking makes sense. ID: 1667069 ·

Claggy Volunteer tester Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4	Message 1667143 - Posted: 19 Apr 2015, 8:32:19 UTC - in response to Message 1667069. Last modified: 19 Apr 2015, 8:32:36 UTC Richard: turns out that if I do a MANUAL Update, the NVIDIA work fetch deferral interval gets reset to 10 minutes. Unfortunately, I7-3820-PC is in this state, and since he gets no GPU WUs ("No work available"), he gets the larger and larger deferral interval. He does get occasional CPU work, and KeplerBox, my other machine, is getting both. This really SUCKS. BOINC is screwed up, in my estimation. I understand the 5 minute interval when he asks for more work, but stretching it out when the system is running and generating work for the GPU is just stupid, even if my particular machine happens not to be getting any. It should only be stretched when there is no work being generated, since then the non-asking makes sense. If you must sit on old Boinc versions, Boinc 7.0.64 in this case, you'll get that, If you update to the current recommended Boinc, 7.4.42 at this time, you'll get this useful changeset: http://boinc.berkeley.edu/gitweb/?p=boinc-v2.git;a=commit;h=789637f637753c4e06f7ca58ce2de285d1491cc8 client: request work from backed-off resources if doing RPC anyway Claggy ID: 1667143 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1667159 - Posted: 19 Apr 2015, 10:03:28 UTC - in response to Message 1667143. Last modified: 19 Apr 2015, 10:17:21 UTC client: request work from backed-off resources if doing RPC anyway Maybe because I stick to old versions modified by myself, I don't recognise that. Could you tell me what that means please ? Is it in English ? [Edit:] ahhh perhaps it's the semantically challenged version of: client: request work for backed-off resources when doing RPC One more successful humpty-dumptyism demystification, check. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1667159 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874	Message 1667170 - Posted: 19 Apr 2015, 10:32:30 UTC - in response to Message 1667159. client: request work from backed-off resources if doing RPC anyway Maybe because I stick to old versions modified by myself, I don't recognise that. Could you tell me what that means please ? Is it in English ? [Edit:] ahhh perhaps it's the semantically challenged version of: client: request work for backed-off resources when doing RPC Yes, the logic is: If you didn't get work for a given resource last time, slow down the requests. If the project hasn't got an application for your GPU yet, it isn't worth hammering the server every 10 seconds to find out if the programmer has finished writing it yet. But if your CPU is ready for another task anyway, it doesn't cost anything to tag on a GPU request at the same time. That's the general BOINC client picture, which of course is not SETI-specific: other projects are available. Here, applications are available for most hardware types, so the reason for non-allocation of work is usually different. In the OP's case, he'd used up his maximum allocation of 200 tasks on his GPU queue, so SETI was refusing to send CPU tasks. That triggered a backoff: and the backoffs for 'quota reached' or 'feeder empty' are exactly the same as the backoffs for 'programmer hasn't finished writing yet'. I did suggest (many years ago) that that the backoff algorithm should take account of the reason for non-allocation, but I know I wouldn't want to design such a function myself. The significant point for this thread is that once you manage to get hold of some work (difficult with the current server gremlins), any backoff caused by failure to receive work when requested is cleared each time you complete any of the tasks you've already got. So, if you keep the cache low, and the work requests "little and often" (which means a low or zero 'additional work' setting), you stand a far better chance of continuous running. ID: 1667170 ·

Claggy Volunteer tester Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4	Message 1667174 - Posted: 19 Apr 2015, 10:47:23 UTC - in response to Message 1667170. Can't say I'm 100% happy with that changeset, on my i7-2600K/GTX760/HD7770 host when i get work i tend to get it for one vendors device only, ie the GTX760 can finish a MBv7 shortie in 5 minutes or so, that'll reset the backoff for the NV device, and allow ATI/AMD work to be asked for too, then I get ATI/AMD work first, and no NV work, it can get very one sided when trying for APv7 only, the NV device doesn't always get a chance to ask on it's own, the work around I use is to lower the cache level to below the amount the ATI/AMD device already has. Claggy ID: 1667174 ·

Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340	Message 1667370 - Posted: 19 Apr 2015, 21:41:19 UTC Last modified: 19 Apr 2015, 21:44:49 UTC RICHARD: *In the OP's case, he'd used up his maximum allocation of 200 tasks on his GPU queue, so SETI was refusing to send CPU tasks. That triggered a backoff* But why was SETI refusing to ASK FOR (not send) CPU work in that instance? I had no CPU work at all. What does the status of my GPU work have to do with CPU work??? (and vice versa, I might add). ID: 1667370 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874	Message 1667396 - Posted: 19 Apr 2015, 23:40:43 UTC - in response to Message 1667370. RICHARD: *In the OP's case, he'd used up his maximum allocation of 200 tasks on his GPU queue, so SETI was refusing to send CPU tasks. That triggered a backoff* But why was SETI refusing to ASK FOR (not send) CPU work in that instance? I had no CPU work at all. What does the status of my GPU work have to do with CPU work??? (and vice versa, I might add). I thought we'd covered that. It had asked, been refused, and gone into backoff because of the refusal. You mentioned the backoff: that's the only way a (resource) backoff is allowed to accrue. ID: 1667396 ·

bluestar Send message Joined: 5 Sep 12 Posts: 7033 Credit: 2,084,789 RAC: 3	Message 1667409 - Posted: 20 Apr 2015, 0:39:32 UTC The Preferences page needs an update. ID: 1667409 ·

Claggy Volunteer tester Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4	Message 1667410 - Posted: 20 Apr 2015, 0:56:02 UTC - in response to Message 1667409. The Preferences page needs an update. The Preference has Just had an update. Claggy ID: 1667410 ·

Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340	Message 1667512 - Posted: 20 Apr 2015, 8:12:15 UTC - in response to Message 1667396. RICHARD: *In the OP's case, he'd used up his maximum allocation of 200 tasks on his GPU queue, so SETI was refusing to send CPU tasks. That triggered a backoff* But why was SETI refusing to ASK FOR (not send) CPU work in that instance? I had no CPU work at all. What does the status of my GPU work have to do with CPU work??? (and vice versa, I might add). I thought we'd covered that. It had asked, been refused, and gone into backoff because of the refusal. You mentioned the backoff: that's the only way a (resource) backoff is allowed to accrue. I am questioning that policy. Since the project went to the 5-minute minimum between allowed requests, I contend there is no need for the backoff when the project is producing work; I was refused NOT because I was at my max WUs onboard but because of an artifact in the way the project queues work - NOT MY FAULT, WHY SHOULD I SUFFER? ID: 1667512 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874	Message 1667513 - Posted: 20 Apr 2015, 8:23:02 UTC - in response to Message 1667512. RICHARD: *In the OP's case, he'd used up his maximum allocation of 200 tasks on his GPU queue, so SETI was refusing to send CPU tasks. That triggered a backoff* But why was SETI refusing to ASK FOR (not send) CPU work in that instance? I had no CPU work at all. What does the status of my GPU work have to do with CPU work??? (and vice versa, I might add). I thought we'd covered that. It had asked, been refused, and gone into backoff because of the refusal. You mentioned the backoff: that's the only way a (resource) backoff is allowed to accrue. I am questioning that policy. Since the project went to the 5-minute minimum between allowed requests, I contend there is no need for the backoff when the project is producing work; I was refused NOT because I was at my max WUs onboard but because of an artifact in the way the project queues work - NOT MY FAULT, WHY SHOULD I SUFFER? As Claggy said, you would not have 'suffered' (it's a pretty minor sort of suffering, in my view) if you had been running a more recent version of BOINC: people like Claggy and I (and some, but too few, others) pay attention to how BOINC works, and try to get changes made when we see undesirable side-effects from policies which make sense in other parts of the BOINC community. ID: 1667513 ·

Claggy Volunteer tester Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4	Message 1667515 - Posted: 20 Apr 2015, 8:28:26 UTC - in response to Message 1667410. The Preferences page needs an update. The Preference has Just had an update. Claggy That should have been: The Preference pages have Just had an update, note the new layout: Computing preferences Claggy ID: 1667515 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.