Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /disks/centurion/b/carolyn/b/home/boincadm/projects/beta/html/inc/boinc_db.inc on line 147
Tests of new scheduler features.

Tests of new scheduler features.

Message boards : News : Tests of new scheduler features.
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 17 · Next

AuthorMessage
Profile Eric J Korpela
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 15 Mar 05
Posts: 1547
Credit: 27,183,456
RAC: 0
United States
Message 45855 - Posted: 14 May 2013, 21:21:51 UTC - in response to Message 45854.  

Arh, Jeff move old logs offline during the reorg. I'll need to go find them.

ID: 45855 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,272,268
RAC: 0
United Kingdom
Message 45856 - Posted: 14 May 2013, 22:35:26 UTC - in response to Message 45854.  

Sorry for not warning you. I needed to reset stats and cancel previous results. Your versions should be random again until you get back to 10 results per.

I was clearing down the run on host 63280, and was left with just two tasks to report when the server closed for maintenance at approx 15:30 UTC.

I haven't even fetched any new tasks, let alone reported any except those two, since then. But according to Application details for host 63280, we're practically up to non-random timings on all three apps already.

I'm assuming that the 347 invalid tasks in All tasks for computer 63280 - all except five of which were 'Validation pending' seven hours ago - are slowly being semi-validated as 'work in progress' tasks are reported by wingmates.

Is that a helpful leg-up for your next stats-gathering run, or might you need to nuke the database even more completely between runs?
ID: 45856 · Report as offensive
Profile Eric J Korpela
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 15 Mar 05
Posts: 1547
Credit: 27,183,456
RAC: 0
United States
Message 45857 - Posted: 14 May 2013, 22:45:52 UTC - in response to Message 45856.  

So far it appears that things will settle out pretty quickly. We're approaching the "100 normal results" for all the GPU setiathomes and the Windows CPU is there already.

My major worry is that all the available versions won't go out to some hosts, that some hosts might continue to get the wrong versions even after they've done a couple hundred, and that credit granting might go haywire.

I expect to see the usual "15 second deadline" problems, but they should go away quickly.
ID: 45857 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,272,268
RAC: 0
United Kingdom
Message 45858 - Posted: 14 May 2013, 22:54:09 UTC - in response to Message 45857.  

I'll leave that host on autopilot (other projects - not SETI Beta) overnight, and then try another semi-managed run tomorrow when I'm around to keep an eye on it. See what happens then.
ID: 45858 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 45859 - Posted: 15 May 2013, 3:17:21 UTC - in response to Message 45857.  


I expect to see the usual "15 second deadline" problems, but they should go away quickly.


And no fix possible? Some hardwired defaults until mean values are formed at least...
ID: 45859 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 45860 - Posted: 15 May 2013, 3:23:18 UTC
Last modified: 15 May 2013, 3:31:25 UTC

My main ATi host recived all possible mix of tasks:
CAL + OpenCL for AP and HD5 and non-HD5 for MB.
So at least it has chance to form relative speed comparisons.

EDIT: and what is not good: initial estimate for CAL AP 4 minutes shorter (!) than for OpenCL AP. But it's good known that CAL version much slower...

EDIT2: and currently wrong APR relation between HD5 and non-HD5 for that host.
ID: 45860 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 45861 - Posted: 15 May 2013, 3:45:58 UTC - in response to Message 45855.  
Last modified: 15 May 2013, 3:48:20 UTC

Arh, Jeff move old logs offline during the reorg. I'll need to go find them.

No need to look old logs, look new ones.
That host asked for work already and recived 97 cuda22 task. Again, only cuda22...

http://setiweb.ssl.berkeley.edu/beta/host_app_versions.php?hostid=63368


Such number of slowest possible tasks makes me worry about work fetch algorithm sanity, at least on initial phase.
ID: 45861 · Report as offensive
Profile Eric J Korpela
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 15 Mar 05
Posts: 1547
Credit: 27,183,456
RAC: 0
United States
Message 45862 - Posted: 15 May 2013, 4:59:15 UTC - in response to Message 45859.  

I haven't figured out where this one comes from yet, and apparently nobody on the BOINC team cares all that much because it's a temporary problem. I guess I can understand why. On my list of current server problems it's #5 in priority.


I expect to see the usual "15 second deadline" problems, but they should go away quickly.


And no fix possible? Some hardwired defaults until mean values are formed at least...


ID: 45862 · Report as offensive
Profile Eric J Korpela
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 15 Mar 05
Posts: 1547
Credit: 27,183,456
RAC: 0
United States
Message 45863 - Posted: 15 May 2013, 5:00:32 UTC - in response to Message 45861.  


No need to look old logs, look new ones.
That host asked for work already and recived 97 cuda22 task. Again, only cuda22...

http://setiweb.ssl.berkeley.edu/beta/host_app_versions.php?hostid=63368


Such number of slowest possible tasks makes me worry about work fetch algorithm sanity, at least on initial phase.


I'll check them out first thing in the morning. My priorities right now are to make Angela go to bed, and then go to bed myself.
ID: 45863 · Report as offensive
Profile Eric J Korpela
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 15 Mar 05
Posts: 1547
Credit: 27,183,456
RAC: 0
United States
Message 45864 - Posted: 15 May 2013, 5:05:32 UTC - in response to Message 45860.  
Last modified: 15 May 2013, 5:06:04 UTC

Oh, and I should point out that I purposely increased the "cal_ati" computation speed estimates in order to check that things eventually get back to normal. The way things were, nobody with OpenCL was getting the cal_ati application and that was preventing me from testing the server logic.
ID: 45864 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,272,268
RAC: 0
United Kingdom
Message 45869 - Posted: 15 May 2013, 10:17:31 UTC - in response to Message 45858.  

... try another semi-managed run tomorrow when I'm around to keep an eye on it. See what happens then.

Opened up for work fetch - first new work since the project reset Tuesday 14 May.

The good news - got cuda50 at the first attempt (correct choice)

The bad news - estimates are screwy. Application details for host 63280 showed:

Number of tasks completed 29
Average processing rate 158.29961718573

(from wingmates reporting against cancelled pending tasks)

But the app_version came out with

<flops>474687872384.139160</flops>

- three times the APR speed, with a commensurate under-estimate of runtimes.
ID: 45869 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,272,268
RAC: 0
United Kingdom
Message 45871 - Posted: 15 May 2013, 13:23:31 UTC - in response to Message 45869.  

<flops>474687872384.139160</flops>

- three times the APR speed, with a commensurate under-estimate of runtimes.

The first of those triple-estimated WUs has validated (5313350) ...

... and been awarded triple credit.
ID: 45871 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 45872 - Posted: 15 May 2013, 15:48:32 UTC

I foresee biased APRs when few tasks per GPU used. For example on my ATi host currently CAL and non-HD5 are paired. Obviously, non-HD5 got nice boost in speed, its elapsed time became lower and its APR will rise. IF cla task will be finished until HD5 task comes in play its APR will be lower than APR for slower app.
But currently have no solution how to avoid this. GPU downclock provides another bias in APR and possible wrong app allocations.... All this can be solved only by sending slower app time to time to probe if best app chosen right... but such probes can be expensive performance-wise on properly configured hosts...
So, some AI required ;) (for active users there will be nothing better than good old app_info with carefully chosen app and its params :) )
ID: 45872 · Report as offensive
Profile Eric J Korpela
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 15 Mar 05
Posts: 1547
Credit: 27,183,456
RAC: 0
United States
Message 45873 - Posted: 15 May 2013, 17:04:53 UTC - in response to Message 45863.  
Last modified: 15 May 2013, 17:11:00 UTC

I found out why 63368 is only getting cuda22. It's reporting 360MB of available GPU RAM, but cuda23-cuda50 require 384MB.

2013-05-14 00:40:23.2153 [PID=7483 ]    [version] [AV#370] Skipping CPU version - user prefs say no CPU
2013-05-14 00:40:23.2153 [PID=7483 ]    [version] Checking plan class 'cuda22'
2013-05-14 00:40:23.2158 [PID=7483 ]    [version] reading plan classes from file '../plan_class_spec.xml'
2013-05-14 00:40:23.2159 [PID=7483 ]    [version] plan_class_spec: host_flops: 2.749583e+09,    scale: 1.00,    projected_flops: 2.391303e+10,  peak_flops: 2.455174e+10
2013-05-14 00:40:23.2159 [PID=7483 ]    [quota] [AV#364] scaled max jobs per day: 33
2013-05-14 00:40:23.2159 [PID=7483 ]    [version] [AV#364] (cuda22) adjusting projected flops based on PFC avg: 27.72G
2013-05-14 00:40:23.2159 [PID=7483 ]    [version] Best app version is now AV364 (18.34 GFLOP)
2013-05-14 00:40:23.2159 [PID=7483 ]    [version] Checking plan class 'cuda23'
2013-05-14 00:40:23.2159 [PID=7483 ]    [version] plan_class_spec: GPU RAM required min: 402653184.000000, supplied: 377028608.000000
2013-05-14 00:40:23.2159 [PID=7483 ]    [version] [AV#365] app_plan() returned false
2013-05-14 00:40:23.2159 [PID=7483 ]    [version] Checking plan class 'cuda32'
2013-05-14 00:40:23.2159 [PID=7483 ]    [version] plan_class_spec: GPU RAM required min: 402653184.000000, supplied: 377028608.000000
2013-05-14 00:40:23.2159 [PID=7483 ]    [version] [AV#366] app_plan() returned false
2013-05-14 00:40:23.2160 [PID=7483 ]    [version] Checking plan class 'cuda42'
2013-05-14 00:40:23.2160 [PID=7483 ]    [version] plan_class_spec: CUDA version required min: 4020, supplied: 3020
2013-05-14 00:40:23.2160 [PID=7483 ]    [version] [AV#367] app_plan() returned false
2013-05-14 00:40:23.2160 [PID=7483 ]    [version] Checking plan class 'cuda50'
2013-05-14 00:40:23.2160 [PID=7483 ]    [version] plan_class_spec: CUDA version required min: 5000, supplied: 3020
2013-05-14 00:40:23.2160 [PID=7483 ]    [version] [AV#368] app_plan() returned false


It's possible we've overestimated the memory requirements of the cuda apps. Anyone have a better estimate of what is required?
ID: 45873 · Report as offensive
Profile Eric J Korpela
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 15 Mar 05
Posts: 1547
Credit: 27,183,456
RAC: 0
United States
Message 45874 - Posted: 15 May 2013, 17:24:23 UTC - in response to Message 45869.  

The bad news - estimates are screwy.


I'm not surprised at this. Hopefully the estimates trend toward better values quickly. The problem is that any time I change the initial estimates, it makes all the measured APR values incorrect by the same amount and screws up the credit calculation for workunits in process and the APR corrections that result from those calculation (hence the reason I need to cancel all workunits in process when I do this.) I think I've learned enough that I can do reasonable initial values when we move to the main project.
ID: 45874 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,272,268
RAC: 0
United Kingdom
Message 45875 - Posted: 15 May 2013, 17:40:14 UTC - in response to Message 45874.  

The bad news - estimates are screwy.

I'm not surprised at this. Hopefully the estimates trend toward better values quickly. The problem is that any time I change the initial estimates, it makes all the measured APR values incorrect by the same amount and screws up the credit calculation for workunits in process and the APR corrections that result from those calculation (hence the reason I need to cancel all workunits in process when I do this.) I think I've learned enough that I can do reasonable initial values when we move to the main project.

It's beginning to shift. Latest estimates are 3:00, instead of 2:30 at the time I posted - but still short of the actual 7:30 for the current configuration.
ID: 45875 · Report as offensive
William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 606
Credit: 588,843
RAC: 0
Message 45876 - Posted: 15 May 2013, 17:56:11 UTC - in response to Message 45873.  

I found out why 63368 is only getting cuda22. It's reporting 360MB of available GPU RAM, but cuda23-cuda50 require 384MB.

It's possible we've overestimated the memory requirements of the cuda apps. Anyone have a better estimate of what is required?

'less'

It's a bit difficult.
All x41zc varieties from 2.2 to 5.0 will squeeze into a 256MiB card with 237 MiB available and run without problems (e.g. host 62763. I don't know if BOINC correctly reports free VRAM back to the scheduler for use. What does it say for that host?
If there is more memory available and you run more than one at a time and you have a Fermi or Kepler class card, it will use more memory per instance and requirements may not be linear with number of instances either.

We can try and check the memory footprint of the different versions on available alpha tester cards.
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 45876 · Report as offensive
Profile Eric J Korpela
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 15 Mar 05
Posts: 1547
Credit: 27,183,456
RAC: 0
United States
Message 45877 - Posted: 15 May 2013, 18:20:12 UTC - in response to Message 45876.  
Last modified: 15 May 2013, 18:34:30 UTC

I'll drop it to 237 and will check for failures.

Host 62763 reports 256MB on the card, but the logs don't show what it's reporting as available. We'll see if its greater than 237MB next time is contacts the scheduler.
ID: 45877 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 45878 - Posted: 15 May 2013, 19:16:25 UTC - in response to Message 45873.  
Last modified: 15 May 2013, 19:25:41 UTC

I found out why 63368 is only getting cuda22. It's reporting 360MB of available GPU RAM, but cuda23-cuda50 require 384MB.

2013-05-14 00:40:23.2153 [PID=7483 ]    [version] [AV#370] Skipping CPU version - user prefs say no CPU
2013-05-14 00:40:23.2153 [PID=7483 ]    [version] Checking plan class 'cuda22'
2013-05-14 00:40:23.2158 [PID=7483 ]    [version] reading plan classes from file '../plan_class_spec.xml'
2013-05-14 00:40:23.2159 [PID=7483 ]    [version] plan_class_spec: host_flops: 2.749583e+09,    scale: 1.00,    projected_flops: 2.391303e+10,  peak_flops: 2.455174e+10
2013-05-14 00:40:23.2159 [PID=7483 ]    [quota] [AV#364] scaled max jobs per day: 33
2013-05-14 00:40:23.2159 [PID=7483 ]    [version] [AV#364] (cuda22) adjusting projected flops based on PFC avg: 27.72G
2013-05-14 00:40:23.2159 [PID=7483 ]    [version] Best app version is now AV364 (18.34 GFLOP)
2013-05-14 00:40:23.2159 [PID=7483 ]    [version] Checking plan class 'cuda23'
2013-05-14 00:40:23.2159 [PID=7483 ]    [version] plan_class_spec: GPU RAM required min: 402653184.000000, supplied: 377028608.000000
2013-05-14 00:40:23.2159 [PID=7483 ]    [version] [AV#365] app_plan() returned false
2013-05-14 00:40:23.2159 [PID=7483 ]    [version] Checking plan class 'cuda32'
2013-05-14 00:40:23.2159 [PID=7483 ]    [version] plan_class_spec: GPU RAM required min: 402653184.000000, supplied: 377028608.000000
2013-05-14 00:40:23.2159 [PID=7483 ]    [version] [AV#366] app_plan() returned false
2013-05-14 00:40:23.2160 [PID=7483 ]    [version] Checking plan class 'cuda42'
2013-05-14 00:40:23.2160 [PID=7483 ]    [version] plan_class_spec: CUDA version required min: 4020, supplied: 3020
2013-05-14 00:40:23.2160 [PID=7483 ]    [version] [AV#367] app_plan() returned false
2013-05-14 00:40:23.2160 [PID=7483 ]    [version] Checking plan class 'cuda50'
2013-05-14 00:40:23.2160 [PID=7483 ]    [version] plan_class_spec: CUDA version required min: 5000, supplied: 3020
2013-05-14 00:40:23.2160 [PID=7483 ]    [version] [AV#368] app_plan() returned false


It's possible we've overestimated the memory requirements of the cuda apps. Anyone have a better estimate of what is required?


FYI card has 384 MB of dedicated GPU RAM (so don't know where BOINC took 360 MB) and fully capable of doing any CUDA tasks we have till now.

EDIT: checked what BOINC reports locally: 360MB of GPU RAM and 330MB free GPU RAM. Total reported RAM amount definitely wrong one.
ID: 45878 · Report as offensive
TRuEQ & TuVaLu
Volunteer tester
Avatar

Send message
Joined: 28 Jan 11
Posts: 619
Credit: 2,580,051
RAC: 0
Sweden
Message 45879 - Posted: 15 May 2013, 19:23:09 UTC
Last modified: 15 May 2013, 19:23:24 UTC

I had a 4850 card with 512MB Ram and only 480 of thoose MB's where for apps.
Dunno if that is info relevant here though...
ID: 45879 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 17 · Next

Message boards : News : Tests of new scheduler features.


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.