Questions and Answers :
Unix/Linux :
All CPU tasks not running. Now all are: - "Waiting to run"
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 . . . 6 · Next
Author | Message |
---|---|
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
'Debt' as a concept and technique was removed from BOINC in 2010, as part of the changes that introduced CreditNew. There is still a concept of Resource Share and balance between projects - both for work fetch and for CPU scheduling - but it's now based on REC. Unless you've updated your cc_config.xml file, REC has a half-life of 10 days - which IMHO is too slow. I usually set 1 day. Yes, I knew "debt" was the wrong term, but couldn't think of the proper terminology at 2AM. I do have the recent half life set to 1 day as someone recommended way back when. I do know there was a completely different output for the cpu_sched_debug when I ran it with only cpu taks onboard and then later when the schedulers finally sent out work and I got gpu tasks again. So the confusion that BOINC gets into with multiple gpu projects and how much cpu support they tie up is still the crux of the problem it seems. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I've added some further analysis to #1677. We'll see what happens. Thanks for the analysis added to the bug. Understand a little better what the debug output means. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Richard, can you tell me what this means? Numbskull 1055 SETI@home 12/11/2018 5:27:29 PM task postponed 600.000000 sec: Waiting to acquire slot directory lock. Another instance may be running. 1056 SETI@home 12/11/2018 5:27:30 PM task postponed 600.000000 sec: Waiting to acquire slot directory lock. Another instance may be running. No Seti gpu tasks running, just gpu tasks from other projects. These are Seti cpu tasks that for some reason won't finish running. If I exit BOINC and then restart then they start running for a few seconds but soon shift into this waiting to acquire slot. No other instance of BOINC is running. This is with an empty Seti cache other than these two cpu tasks that won't finish. These are the only cpu threads being used other than the four cpu threads supporting the four gpu cards. [Edit] I figured it out. There wasn't any slot cleanup on the two slots that contained those cpu tasks that wouldn't compute and exited after 35 seconds. Seems there was a boinc_lockfile in the slot that wasn't removed when the previous task finished up. Once I removed the boinc_lockfile and the postponement message, the tasks started computing properly. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
Keith, could you have another look at #1677, please? David would like you to upload the core files from that machine, so that he can use the simulator to work out what's happening.I've added some further analysis to #1677. We'll see what happens.Thanks for the analysis added to the bug. Understand a little better what the debug output means. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Keith, could you have another look at #1677, please? David would like you to upload the core files from that machine, so that he can use the simulator to work out what's happening.I've added some further analysis to #1677. We'll see what happens.Thanks for the analysis added to the bug. Understand a little better what the debug output means. Richard, what are the "core" files that need to be uploaded. Are they the four ones mentioned on the client emulator page? Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Hi Richard, I have attempted to create a scenario for the client simulation. I put back the conditions for the cpu tasks to be put into waiting to run by adding my <project_max_concurrent>16</project_max_concurrent> statement into my app_config.xml file. This has caused the original condition again. No cpu tasks are running. However I am unable to create the scenario as after uploading the requested files, the website throws back an error message. Unable to handle request You must specify a client_state.xml file. I am positive I am selecting my client_state.xml file and I have attempted the upload twice now and it complains it has not received the client_state.xml file. So what to do next? OK, just tried the client_state_bkup.xml file S &G and it wasn't liked either. Ideas please? [Edit] Richard I put the client_state.xml onto my Dropbox account. I posted the link in the bug thread. Maybe you can grab it from there. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
[Edit] Richard I put the client_state.xml onto my Dropbox account. I posted the link in the bug thread. Maybe you can grab it from there.I'll go take a look. I may need to upload all the four named files in a single session - could you ensure all are present in the same place, please? |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
OK, will have to reconfigure for the problem? I would think that only client_state.xml is the only dynamic file. Global_preferences and cc_config should be static between the two different configurations. The only real change is to app_config. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
OK, I have uploaded all four files to a new Dropbox folder named Seti client simulator files. I will post the new link to the files in the bug thread. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
I got the simulation to run, after a bit of editing. Over to David now. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I got the simulation to run, after a bit of editing. Over to David now. Thanks Richard, I assume the simulator won't allow my file size, hence the whittling? I guess I can unset the NNT on my other projects now? Or should I keep reducing their count in the client_state file for a future upload to try to keep the file size down? Simulator didn't prove anything I guess because it can't duplicate my actual running condition with <project_max_concurrent> statement in play? Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
Precisely. I don't think we can add anything right now, so you may as well go back to normal production. If David does manage to add any form of <max_concurrent> (which means he'll have to allow us to upload which project it applies to), I can submit the same files again. On the other hand, if he manages to fix the problem (which he sometimes does quite quickly after a simulator run - that's why it's there) our next problem is to build a running client to test. I can do that for Windows, but you'll have to build your own for Linux. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I believe what you mean by"client" is the emulator software? Is there some guide that will tell me what resources are going to be needed to build the emulator client for the Linux platform? Or are you referring to the BOINC client? Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
I meant the BOINC client, so you and I can test it in the field. David will (presumably?) test it in the simulator before he even tells us he's fixed it. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
OK, thanks for the clarification. So I would have to git the BOINC repository, correct? git clone https://github.com/BOINC/boinc boinc And then compile the new client with David's fixes? Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
And download/compile the supplementary dependencies for components like curl. Gary Roberts of Einstein could probably guide you through it on Linux (and there are plenty of other posts) - Gary updated his rigs with a patch I wrote for him last week. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
And download/compile the supplementary dependencies for components like curl. Gary Roberts of Einstein could probably guide you through it on Linux (and there are plenty of other posts) - Gary updated his rigs with a patch I wrote for him last week. Thanks for the tip on who to ask for help. I'm sure I will need it since this would be my first time compiling a major program. Hope David can figure out how to add the app_config to the emulator so my problem can be replicated and a patch developed to cure the problem. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
In case you need it: Building BOINC on Unix |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
In case you need it: Building BOINC on Unix Great. Hadn't seen that one yet. Bookmarked. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
Keith, now might be a good moment to take a copy of your client_state.xml file. Set 'NNT', report all the tasks you completed during the outage (the server is accepting them), and allow new tasks again. Take the file copy during the server timeout, when the file will be at its smallest. You probably couldn't get any new tasks yet, until the database caches have had time to refill. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.