Message boards :
Number crunching :
Why not in order of date?
Message board moderation
Previous · 1 · 2
Author | Message |
---|---|
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
Rob, Keith - we're going a bit off-topic here, but thanks for volunteering to try to investigate. Before you get too far into it, please compare notes on exactly what version of BOINC you're each using - and I mean EXACTLY. And please set the <sched_op_debug> Event Log flag, because... Here's a log from my new Linux host. By kind permission of the spoofer, I've been allowed to test the spoofed client: this one is set up for two real cards, 16 cards total. 31/08/2019 17:34:36 | SETI@home | Requesting new tasks for NVIDIA GPU 31/08/2019 17:34:36 | SETI@home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices 31/08/2019 17:34:36 | SETI@home | [sched_op] NVIDIA GPU work request: 28.35 seconds; 0.00 devices 31/08/2019 17:34:39 | SETI@home | Scheduler request completed: got 16 new tasks 31/08/2019 17:34:39 | SETI@home | [sched_op] estimated total CPU task duration: 0 seconds 31/08/2019 17:34:39 | SETI@home | [sched_op] estimated total NVIDIA GPU task duration: 1737 secondsThe machine is currently estimating around 110 seconds per task, so a 28 second request should have got precisely one task. I think that the 1,737 second estimate is because of the 16-task allocation, and precisely 16 tasks have been allocated because of the 16 GPU spoofing. Discuss? |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
A question before I do the actual test - roughly how long does a "typical" Einstein job take to run on a CPU?And were they assigned to be run on a CPU? From Keith's previous post: Einstein Gamma Ray Pulsar tasks on the first connection. A Einstein task on my host crunches in between 465 -585 seconds depending on the card.Einstein Gamma Ray Pulsar tasks have both CPU and GPU applications available. Noting Keith's use of the word 'card' at the end, and the expected runtime, I'd suggest that these tasks were slated to be run on GPUs - hence my request for sched_op_debug output, so we see the numbers and the devices 'in the raw'. |
rob smith ![]() ![]() ![]() Send message Joined: 7 Mar 03 Posts: 22829 Credit: 416,307,556 RAC: 380 ![]() ![]() |
Ignore my question - I found a rough answer of three hours from some old notes of mine.... OK, with a very similar set of settings to Kieth (apart from using a secondary cache of 0.001) I've just received 21 tasks Given the estimated run-time for these tasks is typically a bit under two hours that's about 40 hours total CPU time, which equates to about 13.5 hours of work in hand. Which is a lot more than 0.01 days (which is about 15 minutes, which would round up to three tasks, one for each of the CPU cores in use). Actually the situation is even worse than that described above, a pile of the tasks are estimating at about 12 hours..... Second and third data requests have just gone through getting 17 and 5 tasks respectively. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
rob smith ![]() ![]() ![]() Send message Joined: 7 Mar 03 Posts: 22829 Credit: 416,307,556 RAC: 380 ![]() ![]() |
Sorry, I missed your question Richard - I'm using stock 7.14.2 on a Windows 7 machine. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
It would also help if during these tests, you grabbed a sample of the Einstein server log (available through the 'last contact' link on the host page, Einstein web site), so we can match your client Event log figures (what we assume it asked for) with the same request as seen by the server. Einstein is useful, because it's the only project which routinely makes this data available. |
rob smith ![]() ![]() ![]() Send message Joined: 7 Mar 03 Posts: 22829 Credit: 416,307,556 RAC: 380 ![]() ![]() |
Having missed Richard's post I forced an update, which, as far as I can see correctly identified that no more work was needed: 31/08/2019 18:58:18 | Einstein@Home | update requested by user 31/08/2019 18:58:20 | Einstein@Home | sched RPC pending: Requested by user 31/08/2019 18:58:20 | Einstein@Home | [sched_op] Starting scheduler request 31/08/2019 18:58:20 | Einstein@Home | Sending scheduler request: Requested by user. 31/08/2019 18:58:20 | Einstein@Home | Not requesting tasks: don't need (job cache full) 31/08/2019 18:58:20 | Einstein@Home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices 31/08/2019 18:58:21 | Einstein@Home | Scheduler request completed 31/08/2019 18:58:21 | Einstein@Home | [sched_op] Server version 611 31/08/2019 18:58:21 | Einstein@Home | Project requested delay of 60 seconds 31/08/2019 18:58:21 | Einstein@Home | [sched_op] Deferring communication for 00:01:00 31/08/2019 18:58:21 | Einstein@Home | [sched_op] Reason: requested by project It looks as though I'm going to have a long wait for an "automagic" update - the tasks are about 0.5% done in about 20 minutes..... I'll increase the cache a bit and see what happens..... Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
Now the WOW! event is over, I'll give Einstein a quick poke with the spoofed machine and 16-GPU requests. |
rob smith ![]() ![]() ![]() Send message Joined: 7 Mar 03 Posts: 22829 Credit: 416,307,556 RAC: 380 ![]() ![]() |
Is this what you are looking or Richard?? 2019-08-31 18:05:18.6371 [PID=29271] Request: [USER#xxxxx] [HOST#12788495] [IP xxx.xxx.xxx.154] client 7.14.2 2019-08-31 18:05:18.6973 [PID=29271] [debug] have_master:1 have_working: 1 have_db: 1 2019-08-31 18:05:18.6974 [PID=29271] [debug] using working prefs 2019-08-31 18:05:18.6974 [PID=29271] [debug] have db 1; dbmod 1567274694.000000; global mod 1567272183.000000 2019-08-31 18:05:18.6974 [PID=29271] [debug] sending db prefs in reply 2019-08-31 18:05:18.6974 [PID=29271] [send] effective_ncpus 3 max_jobs_on_host_cpu 999999 max_jobs_on_host 999999 2019-08-31 18:05:18.6974 [PID=29271] [send] effective_ngpus 0 max_jobs_on_host_gpu 999999 2019-08-31 18:05:18.6974 [PID=29271] [send] Not using matchmaker scheduling; Not using EDF sim 2019-08-31 18:05:18.6974 [PID=29271] [send] CPU: req 0.00 sec, 0.00 instances; est delay 0.00 2019-08-31 18:05:18.6974 [PID=29271] [send] work_req_seconds: 0.00 secs 2019-08-31 18:05:18.6974 [PID=29271] [send] available disk 99.64 GB, work_buf_min 864000 2019-08-31 18:05:18.6974 [PID=29271] [send] active_frac 0.999365 on_frac 0.841301 DCF 1.000000 2019-08-31 18:05:18.7033 [PID=29271] Sending reply to [HOST#12788495]: 0 results, delay req 60.00 2019-08-31 18:05:18.7034 [PID=29271] Scheduler ran 0.069 seconds Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
Yes, that's the right place. Most helpful would be to catch an 'overfetch', such as the complained-about behaviour that started this conversation, and compare/contrast the client log and the server log for the same event. Like this: First effective work request for the spoofed Linux, after I got it attached and on the right venue. Sat 31 Aug 2019 19:28:33 BST | Einstein@Home | Sending scheduler request: To fetch work. Sat 31 Aug 2019 19:28:33 BST | Einstein@Home | Requesting new tasks for NVIDIA GPU Sat 31 Aug 2019 19:28:33 BST | Einstein@Home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices Sat 31 Aug 2019 19:28:33 BST | Einstein@Home | [sched_op] NVIDIA GPU work request: 24192.00 seconds; 16.00 devices Sat 31 Aug 2019 19:28:41 BST | Einstein@Home | Scheduler request completed: got 16 new tasks Sat 31 Aug 2019 19:28:41 BST | Einstein@Home | [sched_op] estimated total NVIDIA GPU task duration: 37052 secondsAnd the server said: 2019-08-31 18:28:33.8522 [PID=6591 ] [send] effective_ncpus 6 max_jobs_on_host_cpu 999999 max_jobs_on_host 999999 2019-08-31 18:28:33.8522 [PID=6591 ] [send] effective_ngpus 8 max_jobs_on_host_gpu 999999 2019-08-31 18:28:33.8522 [PID=6591 ] [send] Not using matchmaker scheduling; Not using EDF sim 2019-08-31 18:28:33.8522 [PID=6591 ] [send] CPU: req 0.00 sec, 0.00 instances; est delay 0.00 2019-08-31 18:28:33.8522 [PID=6591 ] [send] CUDA: req 24192.00 sec, 16.00 instances; est delay 0.00 2019-08-31 18:28:33.8522 [PID=6591 ] [send] work_req_seconds: 0.00 secs 2019-08-31 18:28:33.8522 [PID=6591 ] [send] available disk 99.31 GB, work_buf_min 864and then went on to pick out a number of both 'Gamma-ray pulsar binary search' and 'Continuous Gravitational Wave search' tasks. Both task types were given an initial estimate of 38:41 runtime (no host history, no DCF). Looks like the pulsars are going to take about 16:25 (half way through the first one now): GW around 52 minutes, guessing from the 20% mark. I'll bodge preferences to allow Gamma-ray oly from now on, so DCF doesn't go all over the place and confuse the issue. Note that both client and server are consistent in showing this as a 16-GPU request. |
rob smith ![]() ![]() ![]() Send message Joined: 7 Mar 03 Posts: 22829 Credit: 416,307,556 RAC: 380 ![]() ![]() |
There could be some wait for that (3.3% in 1 hour). A passing comment - SETI reports serve version 709, Beta 715 and Einstein 611. That might be a complete red herring, but it adds a bit of data to the picture. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
Well my last experiment was on one of the temp closet contest machines as I pulled it out to update it for the last year of non-use. So I was testing the AIO 7.14.2 stock client. No spoofing. I didn't spoof either of the temp machines. I tried to set 0.001 for additional cache but every time I updated, the value never changed. But then I found someone's reply to me (Wiggo??) that the additional days value WILL take 0.0. I tried that and it saved and updated. I don't think the value field can handle so many digits under the decimal sign. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
A passing comment - SETI reports serve version 709, Beta 715 and Einstein 611. That might be a complete red herring, but it adds a bit of data to the picture.Einstein consciously and deliberately stopped applying Berkeley patches when CreditNew came out, so their server won't have seen a Berkeley Client version number in the 7xx range. They've updated it themselves, of course, but not using the version number script. |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
2019-08-31 18:05:18.6974 [PID=29271] [send] available disk 99.64 GB, work_buf_min 864000 What does this reply in Rob's Einstein scheduler log mean? Does that mean it was setting the minimum request at Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
I've been wondering where my client got that "NVIDIA GPU work request: 24192.00 seconds" from. I'd deliberately set local preferences to 0.01 + 0.01 days, which equates to 1,728 seconds per device. There was plenty of SETI work on board (getting on for 6 hours worth), but it was only allowed to run on two of the GPUs - the real ones. (This is the spoofed version of the 7.16.1 client, which should have the benefit of the work that Keith and I did on max_concurrent since v7.14.2) So, trying 1,728 seconds * 14 apparently idle resources, I get 24,192 - voila. QED. |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
Yes you are correct, I was referring to only gpu work. I only do cpu work for Seti. I had 3 1070's in the temp contest machine I tested on. I knew I shouldn't have deleted the logs from that test. I had logs for 0.1 and logs for 0.01 days of primary cache. I even had the resource share ratio at one time at 10,000:1 between Seti and Einstein. Didn't make any difference and I know that resource share is a long term average that I never let establish. I can pull the machine out of the closet again and run the tests again. On another note, I just added the beta Continuous GW gpu application and received work along with the normal Gamma Ray Pulsar work. But for some reason I have yet to run one of them. Still only running the Gamma Ray tasks and yet both types have the same deadline of Sept.12. Curious how you were able to get your host to run the new GW app tasks. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
I've been wondering where my client got that "NVIDIA GPU work request: 24192.00 seconds" from. I'd deliberately set local preferences to 0.01 + 0.01 days, which equates to 1,728 seconds per device. There was plenty of SETI work on board (getting on for 6 hours worth), but it was only allowed to run on two of the GPUs - the real ones. (This is the spoofed version of the 7.16.1 client, which should have the benefit of the work that Keith and I did on max_concurrent since v7.14.2) Are you saying you can run a max_concurrent statement with the spoofed 7.16.1 client? I cannot and neither can Juan or Ian. On prior versions of the spoofed client, namely 7.14.2 and 7.15.0, it was mandatory to have a max_concurrent statement set to the number of spoofed gpus PLUS the number of desired cpu tasks running or the cpus would go idle. That is no longer the case with the 7.16.1 client. If you run a max_concurrent statement now, you will run on all possible devices and also not request any cpu work. The only way to restrict resource usage now on the spoofed 7.16.1 client is through Local Preferences cpu% usage settings. At least now with this new client without the max_concurrent statement it will ask for cpu work even when there are no more cpu tasks waiting to run. With the previous spoofed client I always had to revert to an older client that did not have the #3076 commit in it so I could request cpu work when I ran through all my cpu tasks on Tuesdays. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
On another note, I just added the beta Continuous GW gpu application and received work along with the normal Gamma Ray Pulsar work. But for some reason I have yet to run one of them. Still only running the Gamma Ray tasks and yet both types have the same deadline of Sept.12. Curious how you were able to get your host to run the new GW app tasks.Just the order they happened to arrive in. They arrived in the same work request (so deadlines identical to the second), but of course the FIFO buffer has a sequence, and there happened to be a GW at the top (or - just possibly - it was the first app to download the binary and all other required files*). I think I'll wind down the research for tonight. Burn off that initial batch of Einstein tasks while I eat, drink, and sleep: let SETI get ahead again on resource share (but a tight cache), and see what else I can think of to test in the morning. Including that 'work_buf_min' question - no answer for that one yet. * - no, not download order. Einstein has a fast pipe and server: all downloads had completed before the first GPU came free from SETI work, so it picked the first one off the queue. |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
That makes sense, the FIFO. I already had the Gamma app and had to download the new CGW app, so the regular Gamma Ray tasks came down the pipe first. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
Are you saying you can run a max_concurrent statement with the spoofed 7.16.1 client? I cannot and neither can Juan or Ian. On prior versions of the spoofed client, namely 7.14.2 and 7.15.0, it was mandatory to have a max_concurrent statement set to the number of spoofed gpus PLUS the number of desired cpu tasks running or the cpus would go idle.I'm running the 20 August version of the v7.16.1 client - I had a bug in the 2--24 version I tried at first. I have max_concurrent statements for both my GPU project (SETI) and my CPU project (not SETI), and each project fetches and runs the expected amount of work for their respective device. I don't try to run the same project on multiple devices, and I think I would expect that to fail under spoofing - or at least not to work as well as might be expected. That would be something the spoofer would have to explore: we can't expect David to work on that one! |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
I am running the 7.16.1 client branch from 10 August snaphot. Haven't checked the repo in a while to see if any more commits have been added since. Yes, running both cpu and gpu tasks on the same host certainly is a challenge for the client. Every client works a little bit differently when spoofed. I now have the viewpoint that max_concurrent in its proper use hews more closely to the original definition and one that David espouses. So to use the cpu % usage now for task control is probably the proper way to achieve my goals now. Each client I have used required slight changes in my configuration to achieve my desired goal. This new 7.16.1 client is easy to get along with and I think I understand how it interacts with projects fairly well now. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.