Message boards :
Number crunching :
How to get one of my computers to ask for CPU work
Message board moderation
Author | Message |
---|---|
Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340 |
Now that things seem to be getting back to normal, I find that one of my machines is only asking for NVIDIA work, while the other asks for NVIDIA and CPU. The first has 200 WUs now (2 GPUs) while the other has 300 (CPU and 2 GPUs). This has been going on for about 9 hours now. No changes were made to any parameters or files by me since before the shutdown of the past few days, when both machines were getting CPU and GPU work. Looking at the Event Log, I see that at first it DID ask for CPU or CPU and NVIDIA, but it gave up on CPU after only a few tries. Is there any likely explanation for this phenomenon? |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13746 Credit: 208,696,464 RAC: 304 |
Click on the Projects Tab & then Properties. Any values for the CPU work fetch deferred for/interval? Grant Darwin NT |
Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340 |
Click on the Projects Tab & then Properties. Yes - it says CPU work fetch deferral interval 5:20:00 (which makes no sense, since he has none). On the other machine, it has 0:20:00 (which makes sense, since he won't run out in 20 minutes). BTW: Running BOINC 7.0.64 on that machine. But it didn't do this before, so I don't think the version is relevant. |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
Click on the Projects Tab & then Properties. As of right now both of your machines have 300 tasks. I'm guessing that BOINC sorted this out on its own? SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13746 Credit: 208,696,464 RAC: 304 |
Click on the Projects Tab & then Properties. Actually it does make sense because it has none. Each time you ask for work, and don't get any, the work request backoff increases. Each time you report completed work, it gets reset. So when there is an outage, after about 5-6 requests when there is no work available the deferral will be up to 4-5 hours before it asks for work again. If it gets work then, well and good, if not the backoff starts increasing again with each unsuccessful attempt. Grant Darwin NT |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874 |
Reduce your cache size - ask for less work. Your machines both have the maximum allowed number of workunits in progress. If you request new work without at the same time returning completed work, you'll get nothing. If you do return work, you'll just get enough to replace those returns - and the chances are they'll be GPU tasks. Enable work fetch debug - just one cycle will do - and note the value here: 12/04/2015 09:13:52 | | [work_fetch] --- state for NVIDIA GPU --- That's the number of seconds before your GPUs need to request more work. Make your total cache size smaller than that number. When you next finish GPU work, click update: you will request CPU work only (GPU not needed) and you're running again. |
Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340 |
Thanks, Richard. Next time I get in that fix I will try your solution. As others above noted, BOINC did fix itself eventually (after the 5:20 expired). I have been ignoring cache size settings since the project started limiting the number of WUs I could have in my queues. The cache limitations became meaningless for my crunchers (or so I thought). I do have a problem with the interval getting so long. In my case, I had no CPU and the servers certainly had plenty, so the excessive time delay hurt me with no benefit to the project. Perhaps this topic needs more analysis. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874 |
Backoffs only really come into play in situations like this, recovery after an outage or work shortage. Once a host has an initial loading, and so long as work continues to be available reasonably consistently (doesn't have to be on every request), the important backoffs are are cleared every time a task finishes. Backoffs are most visible (and people get most irritated by them!) during recovery phases. But, from the point of view of BOINC and the projects it supports, that's probably when they are most needed to divide the limited amount of available work evenly amongst the population of crunchers. |
Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340 |
Richard: turns out that if I do a MANUAL Update, the NVIDIA work fetch deferral interval gets reset to 10 minutes. Unfortunately, I7-3820-PC is in this state, and since he gets no GPU WUs ("No work available"), he gets the larger and larger deferral interval. He does get occasional CPU work, and KeplerBox, my other machine, is getting both. This really SUCKS. BOINC is screwed up, in my estimation. I understand the 5 minute interval when he asks for more work, but stretching it out when the system is running and generating work for the GPU is just stupid, even if my particular machine happens not to be getting any. It should only be stretched when there is no work being generated, since then the non-asking makes sense. |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
Richard: turns out that if I do a MANUAL Update, the NVIDIA work fetch deferral interval gets reset to 10 minutes. Unfortunately, I7-3820-PC is in this state, and since he gets no GPU WUs ("No work available"), he gets the larger and larger deferral interval. He does get occasional CPU work, and KeplerBox, my other machine, is getting both. If you must sit on old Boinc versions, Boinc 7.0.64 in this case, you'll get that, If you update to the current recommended Boinc, 7.4.42 at this time, you'll get this useful changeset: http://boinc.berkeley.edu/gitweb/?p=boinc-v2.git;a=commit;h=789637f637753c4e06f7ca58ce2de285d1491cc8 client: request work from backed-off resources if doing RPC anyway Claggy |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
client: request work from backed-off resources if doing RPC anyway Maybe because I stick to old versions modified by myself, I don't recognise that. Could you tell me what that means please ? Is it in English ? [Edit:] ahhh perhaps it's the semantically challenged version of: client: request work for backed-off resources when doing RPC One more successful humpty-dumptyism demystification, check. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874 |
client: request work from backed-off resources if doing RPC anyway Yes, the logic is: If you didn't get work for a given resource last time, slow down the requests. If the project hasn't got an application for your GPU yet, it isn't worth hammering the server every 10 seconds to find out if the programmer has finished writing it yet. But if your CPU is ready for another task anyway, it doesn't cost anything to tag on a GPU request at the same time. That's the general BOINC client picture, which of course is not SETI-specific: other projects are available. Here, applications are available for most hardware types, so the reason for non-allocation of work is usually different. In the OP's case, he'd used up his maximum allocation of 200 tasks on his GPU queue, so SETI was refusing to send CPU tasks. That triggered a backoff: and the backoffs for 'quota reached' or 'feeder empty' are exactly the same as the backoffs for 'programmer hasn't finished writing yet'. I did suggest (many years ago) that that the backoff algorithm should take account of the reason for non-allocation, but I know I wouldn't want to design such a function myself. The significant point for this thread is that once you manage to get hold of some work (difficult with the current server gremlins), any backoff caused by failure to receive work when requested is cleared each time you complete any of the tasks you've already got. So, if you keep the cache low, and the work requests "little and often" (which means a low or zero 'additional work' setting), you stand a far better chance of continuous running. |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
Can't say I'm 100% happy with that changeset, on my i7-2600K/GTX760/HD7770 host when i get work i tend to get it for one vendors device only, ie the GTX760 can finish a MBv7 shortie in 5 minutes or so, that'll reset the backoff for the NV device, and allow ATI/AMD work to be asked for too, then I get ATI/AMD work first, and no NV work, it can get very one sided when trying for APv7 only, the NV device doesn't always get a chance to ask on it's own, the work around I use is to lower the cache level to below the amount the ATI/AMD device already has. Claggy |
Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340 |
RICHARD: In the OP's case, he'd used up his maximum allocation of 200 tasks on his GPU queue, so SETI was refusing to send CPU tasks. That triggered a backoff But why was SETI refusing to ASK FOR (not send) CPU work in that instance? I had no CPU work at all. What does the status of my GPU work have to do with CPU work??? (and vice versa, I might add). |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874 |
RICHARD: I thought we'd covered that. It had asked, been refused, and gone into backoff because of the refusal. You mentioned the backoff: that's the only way a (resource) backoff is allowed to accrue. |
bluestar Send message Joined: 5 Sep 12 Posts: 7033 Credit: 2,084,789 RAC: 3 |
The Preferences page needs an update. |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
The Preferences page needs an update. The Preference has Just had an update. Claggy |
Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340 |
RICHARD: I am questioning that policy. Since the project went to the 5-minute minimum between allowed requests, I contend there is no need for the backoff when the project is producing work; I was refused NOT because I was at my max WUs onboard but because of an artifact in the way the project queues work - NOT MY FAULT, WHY SHOULD I SUFFER? |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874 |
RICHARD: As Claggy said, you would not have 'suffered' (it's a pretty minor sort of suffering, in my view) if you had been running a more recent version of BOINC: people like Claggy and I (and some, but too few, others) pay attention to how BOINC works, and try to get changes made when we see undesirable side-effects from policies which make sense in other parts of the BOINC community. |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
The Preferences page needs an update. That should have been: The Preference pages have Just had an update, note the new layout: Computing preferences Claggy |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.