All CPU tasks not running. Now all are: - "Waiting to run"

Questions and Answers : Unix/Linux : All CPU tasks not running. Now all are: - "Waiting to run"
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

AuthorMessage
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1970959 - Posted: 18 Dec 2018, 22:01:07 UTC - in response to Message 1970929.  

Yes, earlier when I didn't have any tasks but was unable to report the client_state was over 3MB. Now after reporting the client_state is only 1.1MB so I am going to attempt the upload again.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1970959 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1970961 - Posted: 18 Dec 2018, 22:05:43 UTC

But without any Seti tasks, I can't replicate the waiting to run on the cpu tasks. So don't think my current state is valid for the simulator.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1970961 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1970967 - Posted: 18 Dec 2018, 23:16:35 UTC

And of course when I do get tasks, I get nothing but gpu tasks first. I'm afraid that after I fill up my gpu cache I will already be close to the 2MB limit. Getting my 100 cpu tasks will probably put me over the limit. I have to have cpu tasks onboard to trigger the waiting to run.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1970967 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14656
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1970969 - Posted: 18 Dec 2018, 23:29:28 UTC - in response to Message 1970967.  

Indeed. Well, it was worth a try.

The BOINC server went offline today. It's conceivable (but guesswork) that a reboot was needed to implement the upload size limit increase that failed yesterday. Worth a re-test.
ID: 1970969 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1970984 - Posted: 19 Dec 2018, 0:23:47 UTC - in response to Message 1970969.  

Indeed. Well, it was worth a try.

The BOINC server went offline today. It's conceivable (but guesswork) that a reboot was needed to implement the upload size limit increase that failed yesterday. Worth a re-test.

I was successful in finally getting cpu work. I put my troublesome configuration in play. Restarted BOINC and none of the onboard cpu tasks will run. Just the 4 gpu tasks running. I did a cpu_sched_debug and got the 4 out of 24 cpu threads running message. I had knocked out 100 gpu tasks by then with the NNT and my client_state finally was below the cutoff point.

I uploaded the files and started a simulation. But I didn't know what options I was supposed to choose. I think I chose cpu_sched and round_robin simulation. Don't know if that was correct or not. Another thing was after the files were uploaded it only acknowledged the original 3 default files as input files. It didn't list my app_config file. But it was uploaded. I used the default values for the simulation which is one day I see. Don't know if I could have shortened it to have the simulation finish sooner.

I don't know how to interpret the output of the simulation. But it seemed to me that it didn't show any cpu tasks running. Richard could you look at the simulation scenario #163 and see if you can make sense of it?
https://boinc.berkeley.edu/sim_web.php?action=show_scenario&name=163
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1970984 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14656
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1971038 - Posted: 19 Dec 2018, 11:36:20 UTC - in response to Message 1970984.  
Last modified: 19 Dec 2018, 11:47:33 UTC

Holding response - I'll look at the scenario later.

You'll have noted David's "Oops! I forgot to restart Apache" comment. He's on the case, but this is clearly still a work in progress. We may have to go through more iterations yet.

I think you were right to select 'cpu_sched' from the options. 'round_robin simulation' was probably less necessary, but won't do any harm - maybe just makes the log file a bit harder to read.

OK, having said 'later' - I couldn't resist a quick peek. The timeline is interesting.

It starts with four GPU jobs running, and no CPU jobs. That's exactly the problem we're trying to address, so David has the evidence he wanted.

But even after the GPU tasks run out, the CPU tasks (and they are present in client_state.xml) still don't run. That's different from, and worse than, the problem I originally saw and reported two years ago.
ID: 1971038 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1971059 - Posted: 19 Dec 2018, 17:37:40 UTC - in response to Message 1971038.  

The difference between cpu tasks "waiting to run" versus the state that the client_state captured is that no cpu tasks started to run after being downloaded was that cpu tasks are the last to fill on that host after the outage. So I had my full 400 task gpu cache downloaded and they were running and I had already set the two conditions necessary to cause the cpu tasks to not run when the cpu cache was finally filled. But they never even started.

Removed the <project_max_concurrent> statement and reduced my cpu % to 68 in Local Preferences and re-read config files and my desired 12 cpu tasks began to run.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1971059 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14656
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1971066 - Posted: 19 Dec 2018, 18:12:28 UTC - in response to Message 1971059.  

But in real life, you wouldn't have run in the "problematic" state until all 400 GPU tasks had completed and reported, and with NNT set throughout.

That's what the simulator seems to have done, and my assumption (from manual testing two years ago), that as soon as fewer than 16 GPU tasks were in the 'runnable' state, the queue would start to be topped off with CPU tasks, and we would see those running. Until, once the last GPU task had finished, there would be 16 CPU tasks active. Not ideal, but it would have shown more light on the cause of the problem. Instead, it's back to the drawing-board or the think-tank.
ID: 1971066 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1971068 - Posted: 19 Dec 2018, 18:28:56 UTC

You mentioned that <max_concurrent> can't be used with the same project name on different apps. If that were possible with a code change, would that allow my individual task count for cpu and gpu, and obviate the need for <project_max_concurrent>? Is that a trivial solution to ask for or one that is difficult and opens a different can of worms?
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1971068 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14656
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1971079 - Posted: 19 Dec 2018, 18:59:08 UTC - in response to Message 1971068.  

For any project, there are three levels to consider.

Project
Application
App_version

Here at SETI, the applications are SETI@home v8 and AstroPulse v7 - just the two of them. Nothing else. CPU/GPU are split at the App_version level.

If you look at the manual for app_config.xml files. you see that <max_concurrent> can be set (currently) at the Project and Application levels only - which doesn't help your current predicament. It would be perfectly reasonable to ask David to extend and complete the app_config set by adding <max_concurrent> to the <app_version> level. That would be a feature request rather than a bug fix, but in the end it might be the best way of solving this problem.
ID: 1971079 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1971099 - Posted: 19 Dec 2018, 20:22:24 UTC - in response to Message 1971079.  

So which venue is the 'best' place to ask for a new 'feature' request from David?
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1971099 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14656
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1971106 - Posted: 19 Dec 2018, 20:47:32 UTC - in response to Message 1971099.  

I find it best to ask for just one thing at a time. I'm saving up the next one (that I mentioned at Einstein) until this one is done.

If it turns out to take a long time - especially, if he tries but fails with an attempted solution - that might be the moment to suggest it in the thread where he's responding now. Otherwise, a new 'issue' on Github (because everyone else sees it there).

I'll sleep on it.
ID: 1971106 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1971107 - Posted: 19 Dec 2018, 20:52:09 UTC - in response to Message 1971106.  

I know either you or Jord answered me before but I can't find the post now. Where is the applications directory on Github for the stock Seti apps? I am trying to help someone with an ATI card. I wanted to point them at the codebase for the stock OpenCL Linux SoG MB app. I will definitely bookmark it this time.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1971107 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1971110 - Posted: 19 Dec 2018, 21:18:30 UTC - in response to Message 1971107.  

Not on Github, but on SVN, available via https://setiathome.berkeley.edu/sah_porting.php
ID: 1971110 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1971111 - Posted: 19 Dec 2018, 21:29:51 UTC - in response to Message 1971110.  

Thanks Jord. That is a better link than what I had attempted to figure out. Bookmarked this time for sure.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1971111 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1971112 - Posted: 19 Dec 2018, 21:35:10 UTC
Last modified: 19 Dec 2018, 21:38:40 UTC

Hi again Jord. Can you tell me where the download directory for the stock science apps is located? The person I am helping says that BOINC never sends him a MB app automatically and he doesn't believe one exists for ATI cards. I know that is not the case from the Applications page on the website.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1971112 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14656
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1971121 - Posted: 19 Dec 2018, 23:10:27 UTC - in response to Message 1971112.  

The download directory for pre-compiled binaries is http://boinc2.ssl.berkeley.edu/sah/download_fanout/ - but don't go there.

The listing of files in the directory is hidden from public view. If you know exactly which file you need, you can add the name and it will be downloaded - but you have to get the filename exactly right.

And that's not the end of the story. Some applications require additional supporting library files. Under Windows, you can identify what's needed by using Dependency Walker: I'm sure other operating systems have similar tools. And even once you have ALL the required files, you still have to build an app_info.xml file to turn them into a working cruncher. It can be done, and some of us have done it, but it's not easy.

Start, at the very least, by telling us what operating system your friend is using - we might guess some flavour of Linux from the Q&A area, but certainty is better.

On the thread subject, have you seen David's request for a fresh upload?
ID: 1971121 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1971123 - Posted: 19 Dec 2018, 23:28:39 UTC - in response to Message 1971121.  
Last modified: 19 Dec 2018, 23:54:25 UTC

Thanks Richard,

The thread I am trying to help in is https://setiathome.berkeley.edu/forum_thread.php?id=83690#1971120 over in Number Crunching.

He doesn't want to unhide his computers which makes things difficult. He did post what his system is identified as:

Sun 09 Dec 2018 06:19:05 PM PST | | OpenCL: AMD/ATI GPU 0: AMD Radeon (TM) RX 480 Graphics (POLARIS10, DRM 3.27.0, 4.19.6-300.fc29.x86_64, LLVM 7.0.0) (driver version 18.2.6, device version OpenCL 1.1 Mesa 18.2.6, 8192MB, 8192MB available, 3709 GFLOPS peak)

He is stating he has never received a MB GPU app from Seti in 10 years. He has only received an AP app and it errored out. He states his system runs fine on other OpenCL GPU projects like Einstein.

I wanted to point him at the science app directory for a direct download. I also gave him some links and also a post from a couple of NC members that have successfully built their own ATI MB gpu science app for the latest ATI ROCm drivers which he seems to be using.

On my problem. Yes I saw David's response. First question what was the error in the simulation? And what was truncated in the client_state file? I still am having trouble understanding what the simulation output files mean.

I uploaded another set of files and started another simulation with just cpu_sched_debug and without rr_simulation. The client_state uploaded fine with a file size over 2.1MB so that issue is resolved it seems.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1971123 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1971191 - Posted: 20 Dec 2018, 12:47:03 UTC - in response to Message 1971123.  

The Seti applications cannot use the Mesa OpenCL drivers as they're not built against that API. Mesa uses something different in its drivers than standard OpenCL. Astropulse have quite some difficulty running on the RX GPUs, under any platform.
ID: 1971191 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1971212 - Posted: 20 Dec 2018, 16:14:03 UTC - in response to Message 1971191.  

Thanks Jord, yes he and I now know you can't use Mesa drivers for Seti and he is out of luck. So we are dropping the issue in his thread. I told him to move anything further to the Unix/Linux forum.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1971212 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

Questions and Answers : Unix/Linux : All CPU tasks not running. Now all are: - "Waiting to run"


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.