All CPU tasks not running. Now all are: - "Waiting to run"

Questions and Answers : Unix/Linux : All CPU tasks not running. Now all are: - "Waiting to run"
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 6 · Next

AuthorMessage
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1969352 - Posted: 8 Dec 2018, 17:10:35 UTC - in response to Message 1969297.  

'Debt' as a concept and technique was removed from BOINC in 2010, as part of the changes that introduced CreditNew. There is still a concept of Resource Share and balance between projects - both for work fetch and for CPU scheduling - but it's now based on REC. Unless you've updated your cc_config.xml file, REC has a half-life of 10 days - which IMHO is too slow. I usually set 1 day.

A local CPU % setting will prevent CPU over-commitment leading to over-heating by reducing the number of stressful SETI tasks running concurrently. But it will prevent low-stress tasks from other projects running on the 'spare' CPU cores.

I've never found a way of squaring the complex set of circles that Keith has boxed himself in with, but I'll keep thinking about it.

Yes, I knew "debt" was the wrong term, but couldn't think of the proper terminology at 2AM. I do have the recent half life set to 1 day as someone recommended way back when.

I do know there was a completely different output for the cpu_sched_debug when I ran it with only cpu taks onboard and then later when the schedulers finally sent out work and I got gpu tasks again. So the confusion that BOINC gets into with multiple gpu projects and how much cpu support they tie up is still the crux of the problem it seems.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1969352 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1969354 - Posted: 8 Dec 2018, 17:16:15 UTC - in response to Message 1969310.  

I've added some further analysis to #1677. We'll see what happens.

Thanks for the analysis added to the bug. Understand a little better what the debug output means.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1969354 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1969939 - Posted: 12 Dec 2018, 1:34:24 UTC
Last modified: 12 Dec 2018, 2:20:19 UTC

Richard, can you tell me what this means?

Numbskull

1055 SETI@home 12/11/2018 5:27:29 PM task postponed 600.000000 sec: Waiting to acquire slot directory lock. Another instance may be running.
1056 SETI@home 12/11/2018 5:27:30 PM task postponed 600.000000 sec: Waiting to acquire slot directory lock. Another instance may be running.


No Seti gpu tasks running, just gpu tasks from other projects. These are Seti cpu tasks that for some reason won't finish running. If I exit BOINC and then restart then they start running for a few seconds but soon shift into this waiting to acquire slot. No other instance of BOINC is running.

This is with an empty Seti cache other than these two cpu tasks that won't finish. These are the only cpu threads being used other than the four cpu threads supporting the four gpu cards.

[Edit] I figured it out. There wasn't any slot cleanup on the two slots that contained those cpu tasks that wouldn't compute and exited after 35 seconds. Seems there was a boinc_lockfile in the slot that wasn't removed when the previous task finished up. Once I removed the boinc_lockfile and the postponement message, the tasks started computing properly.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1969939 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1970281 - Posted: 14 Dec 2018, 9:45:30 UTC - in response to Message 1969354.  

I've added some further analysis to #1677. We'll see what happens.
Thanks for the analysis added to the bug. Understand a little better what the debug output means.
Keith, could you have another look at #1677, please? David would like you to upload the core files from that machine, so that he can use the simulator to work out what's happening.
ID: 1970281 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1970336 - Posted: 14 Dec 2018, 18:10:16 UTC - in response to Message 1970281.  

I've added some further analysis to #1677. We'll see what happens.
Thanks for the analysis added to the bug. Understand a little better what the debug output means.
Keith, could you have another look at #1677, please? David would like you to upload the core files from that machine, so that he can use the simulator to work out what's happening.

Richard, what are the "core" files that need to be uploaded. Are they the four ones mentioned on the client emulator page?
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1970336 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1970369 - Posted: 14 Dec 2018, 21:14:38 UTC
Last modified: 14 Dec 2018, 21:25:50 UTC

Hi Richard, I have attempted to create a scenario for the client simulation. I put back the conditions for the cpu tasks to be put into waiting to run by adding my <project_max_concurrent>16</project_max_concurrent> statement into my app_config.xml file. This has caused the original condition again. No cpu tasks are running. However I am unable to create the scenario as after uploading the requested files, the website throws back an error message.

Unable to handle request
You must specify a client_state.xml file.

I am positive I am selecting my client_state.xml file and I have attempted the upload twice now and it complains it has not received the client_state.xml file.

So what to do next? OK, just tried the client_state_bkup.xml file S &G and it wasn't liked either. Ideas please?

[Edit] Richard I put the client_state.xml onto my Dropbox account. I posted the link in the bug thread. Maybe you can grab it from there.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1970369 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1970379 - Posted: 14 Dec 2018, 21:55:55 UTC - in response to Message 1970369.  

[Edit] Richard I put the client_state.xml onto my Dropbox account. I posted the link in the bug thread. Maybe you can grab it from there.
I'll go take a look. I may need to upload all the four named files in a single session - could you ensure all are present in the same place, please?
ID: 1970379 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1970381 - Posted: 14 Dec 2018, 22:11:56 UTC - in response to Message 1970379.  

OK, will have to reconfigure for the problem? I would think that only client_state.xml is the only dynamic file. Global_preferences and cc_config should be static between the two different configurations. The only real change is to app_config.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1970381 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1970382 - Posted: 14 Dec 2018, 22:20:37 UTC

OK, I have uploaded all four files to a new Dropbox folder named Seti client simulator files. I will post the new link to the files in the bug thread.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1970382 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1970525 - Posted: 15 Dec 2018, 20:27:22 UTC - in response to Message 1970382.  

I got the simulation to run, after a bit of editing. Over to David now.
ID: 1970525 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1970529 - Posted: 15 Dec 2018, 20:48:16 UTC - in response to Message 1970525.  

I got the simulation to run, after a bit of editing. Over to David now.

Thanks Richard, I assume the simulator won't allow my file size, hence the whittling? I guess I can unset the NNT on my other projects now? Or should I keep reducing their count in the client_state file for a future upload to try to keep the file size down?

Simulator didn't prove anything I guess because it can't duplicate my actual running condition with <project_max_concurrent> statement in play?
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1970529 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1970545 - Posted: 15 Dec 2018, 22:05:25 UTC - in response to Message 1970529.  

Precisely. I don't think we can add anything right now, so you may as well go back to normal production. If David does manage to add any form of <max_concurrent> (which means he'll have to allow us to upload which project it applies to), I can submit the same files again.

On the other hand, if he manages to fix the problem (which he sometimes does quite quickly after a simulator run - that's why it's there) our next problem is to build a running client to test. I can do that for Windows, but you'll have to build your own for Linux.
ID: 1970545 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1970547 - Posted: 15 Dec 2018, 22:28:08 UTC - in response to Message 1970545.  

I believe what you mean by"client" is the emulator software? Is there some guide that will tell me what resources are going to be needed to build the emulator client for the Linux platform?

Or are you referring to the BOINC client?
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1970547 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1970553 - Posted: 15 Dec 2018, 22:42:18 UTC - in response to Message 1970547.  

I meant the BOINC client, so you and I can test it in the field. David will (presumably?) test it in the simulator before he even tells us he's fixed it.
ID: 1970553 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1970582 - Posted: 16 Dec 2018, 1:37:38 UTC - in response to Message 1970553.  

OK, thanks for the clarification. So I would have to git the BOINC repository, correct?

git clone https://github.com/BOINC/boinc boinc

And then compile the new client with David's fixes?
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1970582 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1970637 - Posted: 16 Dec 2018, 8:35:01 UTC - in response to Message 1970582.  

And download/compile the supplementary dependencies for components like curl. Gary Roberts of Einstein could probably guide you through it on Linux (and there are plenty of other posts) - Gary updated his rigs with a patch I wrote for him last week.
ID: 1970637 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1970645 - Posted: 16 Dec 2018, 9:54:19 UTC - in response to Message 1970637.  

And download/compile the supplementary dependencies for components like curl. Gary Roberts of Einstein could probably guide you through it on Linux (and there are plenty of other posts) - Gary updated his rigs with a patch I wrote for him last week.

Thanks for the tip on who to ask for help. I'm sure I will need it since this would be my first time compiling a major program. Hope David can figure out how to add the app_config to the emulator so my problem can be replicated and a patch developed to cure the problem.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1970645 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1970646 - Posted: 16 Dec 2018, 10:03:01 UTC - in response to Message 1970645.  

In case you need it: Building BOINC on Unix
ID: 1970646 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1970648 - Posted: 16 Dec 2018, 10:15:11 UTC - in response to Message 1970646.  

In case you need it: Building BOINC on Unix

Great. Hadn't seen that one yet. Bookmarked.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1970648 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1970929 - Posted: 18 Dec 2018, 20:02:53 UTC

Keith, now might be a good moment to take a copy of your client_state.xml file.

Set 'NNT', report all the tasks you completed during the outage (the server is accepting them), and allow new tasks again. Take the file copy during the server timeout, when the file will be at its smallest. You probably couldn't get any new tasks yet, until the database caches have had time to refill.
ID: 1970929 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 . . . 6 · Next

Questions and Answers : Unix/Linux : All CPU tasks not running. Now all are: - "Waiting to run"


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.