Message boards :
News :
Tests of new scheduler features.
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 17 · Next
Author | Message |
---|---|
![]() Send message Joined: 15 Mar 05 Posts: 1547 Credit: 27,183,456 RAC: 0 ![]() |
Arh, Jeff move old logs offline during the reorg. I'll need to go find them. ![]() |
Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0 ![]() |
Sorry for not warning you. I needed to reset stats and cancel previous results. Your versions should be random again until you get back to 10 results per. I was clearing down the run on host 63280, and was left with just two tasks to report when the server closed for maintenance at approx 15:30 UTC. I haven't even fetched any new tasks, let alone reported any except those two, since then. But according to Application details for host 63280, we're practically up to non-random timings on all three apps already. I'm assuming that the 347 invalid tasks in All tasks for computer 63280 - all except five of which were 'Validation pending' seven hours ago - are slowly being semi-validated as 'work in progress' tasks are reported by wingmates. Is that a helpful leg-up for your next stats-gathering run, or might you need to nuke the database even more completely between runs? |
![]() Send message Joined: 15 Mar 05 Posts: 1547 Credit: 27,183,456 RAC: 0 ![]() |
So far it appears that things will settle out pretty quickly. We're approaching the "100 normal results" for all the GPU setiathomes and the Windows CPU is there already. My major worry is that all the available versions won't go out to some hosts, that some hosts might continue to get the wrong versions even after they've done a couple hundred, and that credit granting might go haywire. I expect to see the usual "15 second deadline" problems, but they should go away quickly. ![]() |
Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0 ![]() |
I'll leave that host on autopilot (other projects - not SETI Beta) overnight, and then try another semi-managed run tomorrow when I'm around to keep an eye on it. See what happens then. |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
And no fix possible? Some hardwired defaults until mean values are formed at least... |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
My main ATi host recived all possible mix of tasks: CAL + OpenCL for AP and HD5 and non-HD5 for MB. So at least it has chance to form relative speed comparisons. EDIT: and what is not good: initial estimate for CAL AP 4 minutes shorter (!) than for OpenCL AP. But it's good known that CAL version much slower... EDIT2: and currently wrong APR relation between HD5 and non-HD5 for that host. |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
Arh, Jeff move old logs offline during the reorg. I'll need to go find them. No need to look old logs, look new ones. That host asked for work already and recived 97 cuda22 task. Again, only cuda22... http://setiweb.ssl.berkeley.edu/beta/host_app_versions.php?hostid=63368 Such number of slowest possible tasks makes me worry about work fetch algorithm sanity, at least on initial phase. |
![]() Send message Joined: 15 Mar 05 Posts: 1547 Credit: 27,183,456 RAC: 0 ![]() |
I haven't figured out where this one comes from yet, and apparently nobody on the BOINC team cares all that much because it's a temporary problem. I guess I can understand why. On my list of current server problems it's #5 in priority.
![]() |
![]() Send message Joined: 15 Mar 05 Posts: 1547 Credit: 27,183,456 RAC: 0 ![]() |
I'll check them out first thing in the morning. My priorities right now are to make Angela go to bed, and then go to bed myself. ![]() |
![]() Send message Joined: 15 Mar 05 Posts: 1547 Credit: 27,183,456 RAC: 0 ![]() |
Oh, and I should point out that I purposely increased the "cal_ati" computation speed estimates in order to check that things eventually get back to normal. The way things were, nobody with OpenCL was getting the cal_ati application and that was preventing me from testing the server logic. ![]() |
Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0 ![]() |
... try another semi-managed run tomorrow when I'm around to keep an eye on it. See what happens then. Opened up for work fetch - first new work since the project reset Tuesday 14 May. The good news - got cuda50 at the first attempt (correct choice) The bad news - estimates are screwy. Application details for host 63280 showed: Number of tasks completed 29 (from wingmates reporting against cancelled pending tasks) But the app_version came out with <flops>474687872384.139160</flops> - three times the APR speed, with a commensurate under-estimate of runtimes. |
Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0 ![]() |
<flops>474687872384.139160</flops> The first of those triple-estimated WUs has validated (5313350) ... ... and been awarded triple credit. |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
I foresee biased APRs when few tasks per GPU used. For example on my ATi host currently CAL and non-HD5 are paired. Obviously, non-HD5 got nice boost in speed, its elapsed time became lower and its APR will rise. IF cla task will be finished until HD5 task comes in play its APR will be lower than APR for slower app. But currently have no solution how to avoid this. GPU downclock provides another bias in APR and possible wrong app allocations.... All this can be solved only by sending slower app time to time to probe if best app chosen right... but such probes can be expensive performance-wise on properly configured hosts... So, some AI required ;) (for active users there will be nothing better than good old app_info with carefully chosen app and its params :) ) |
![]() Send message Joined: 15 Mar 05 Posts: 1547 Credit: 27,183,456 RAC: 0 ![]() |
I found out why 63368 is only getting cuda22. It's reporting 360MB of available GPU RAM, but cuda23-cuda50 require 384MB. 2013-05-14 00:40:23.2153 [PID=7483 ] [version] [AV#370] Skipping CPU version - user prefs say no CPU 2013-05-14 00:40:23.2153 [PID=7483 ] [version] Checking plan class 'cuda22' 2013-05-14 00:40:23.2158 [PID=7483 ] [version] reading plan classes from file '../plan_class_spec.xml' 2013-05-14 00:40:23.2159 [PID=7483 ] [version] plan_class_spec: host_flops: 2.749583e+09, scale: 1.00, projected_flops: 2.391303e+10, peak_flops: 2.455174e+10 2013-05-14 00:40:23.2159 [PID=7483 ] [quota] [AV#364] scaled max jobs per day: 33 2013-05-14 00:40:23.2159 [PID=7483 ] [version] [AV#364] (cuda22) adjusting projected flops based on PFC avg: 27.72G 2013-05-14 00:40:23.2159 [PID=7483 ] [version] Best app version is now AV364 (18.34 GFLOP) 2013-05-14 00:40:23.2159 [PID=7483 ] [version] Checking plan class 'cuda23' 2013-05-14 00:40:23.2159 [PID=7483 ] [version] plan_class_spec: GPU RAM required min: 402653184.000000, supplied: 377028608.000000 2013-05-14 00:40:23.2159 [PID=7483 ] [version] [AV#365] app_plan() returned false 2013-05-14 00:40:23.2159 [PID=7483 ] [version] Checking plan class 'cuda32' 2013-05-14 00:40:23.2159 [PID=7483 ] [version] plan_class_spec: GPU RAM required min: 402653184.000000, supplied: 377028608.000000 2013-05-14 00:40:23.2159 [PID=7483 ] [version] [AV#366] app_plan() returned false 2013-05-14 00:40:23.2160 [PID=7483 ] [version] Checking plan class 'cuda42' 2013-05-14 00:40:23.2160 [PID=7483 ] [version] plan_class_spec: CUDA version required min: 4020, supplied: 3020 2013-05-14 00:40:23.2160 [PID=7483 ] [version] [AV#367] app_plan() returned false 2013-05-14 00:40:23.2160 [PID=7483 ] [version] Checking plan class 'cuda50' 2013-05-14 00:40:23.2160 [PID=7483 ] [version] plan_class_spec: CUDA version required min: 5000, supplied: 3020 2013-05-14 00:40:23.2160 [PID=7483 ] [version] [AV#368] app_plan() returned false It's possible we've overestimated the memory requirements of the cuda apps. Anyone have a better estimate of what is required? ![]() |
![]() Send message Joined: 15 Mar 05 Posts: 1547 Credit: 27,183,456 RAC: 0 ![]() |
The bad news - estimates are screwy. I'm not surprised at this. Hopefully the estimates trend toward better values quickly. The problem is that any time I change the initial estimates, it makes all the measured APR values incorrect by the same amount and screws up the credit calculation for workunits in process and the APR corrections that result from those calculation (hence the reason I need to cancel all workunits in process when I do this.) I think I've learned enough that I can do reasonable initial values when we move to the main project. ![]() |
Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0 ![]() |
The bad news - estimates are screwy. It's beginning to shift. Latest estimates are 3:00, instead of 2:30 at the time I posted - but still short of the actual 7:30 for the current configuration. |
![]() Send message Joined: 14 Feb 13 Posts: 606 Credit: 588,843 RAC: 0 |
I found out why 63368 is only getting cuda22. It's reporting 360MB of available GPU RAM, but cuda23-cuda50 require 384MB. 'less' It's a bit difficult. All x41zc varieties from 2.2 to 5.0 will squeeze into a 256MiB card with 237 MiB available and run without problems (e.g. host 62763. I don't know if BOINC correctly reports free VRAM back to the scheduler for use. What does it say for that host? If there is more memory available and you run more than one at a time and you have a Fermi or Kepler class card, it will use more memory per instance and requirements may not be linear with number of instances either. We can try and check the memory footprint of the different versions on available alpha tester cards. A person who won't read has no advantage over one who can't read. (Mark Twain) |
![]() Send message Joined: 15 Mar 05 Posts: 1547 Credit: 27,183,456 RAC: 0 ![]() |
I'll drop it to 237 and will check for failures. Host 62763 reports 256MB on the card, but the logs don't show what it's reporting as available. We'll see if its greater than 237MB next time is contacts the scheduler. ![]() |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
I found out why 63368 is only getting cuda22. It's reporting 360MB of available GPU RAM, but cuda23-cuda50 require 384MB. FYI card has 384 MB of dedicated GPU RAM (so don't know where BOINC took 360 MB) and fully capable of doing any CUDA tasks we have till now. EDIT: checked what BOINC reports locally: 360MB of GPU RAM and 330MB free GPU RAM. Total reported RAM amount definitely wrong one. |
![]() Send message Joined: 28 Jan 11 Posts: 619 Credit: 2,580,051 RAC: 0 ![]() |
I had a 4850 card with 512MB Ram and only 480 of thoose MB's where for apps. Dunno if that is info relevant here though... |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.