BOINC is bad at math, reason why some people can't keep a full cache?


log in

Advanced search

Message boards : Number crunching : BOINC is bad at math, reason why some people can't keep a full cache?

Author Message
Wembley
Volunteer tester
Avatar
Send message
Joined: 16 Sep 09
Posts: 415
Credit: 882,507
RAC: 144
United States
Message 1258056 - Posted: 9 Jul 2012, 15:39:54 UTC

I'm using BOINC 7.0.31.

Here is a recent scheduler request:

BOINC requested around a day's worth of work, and claims to have received over 41 hours worth of work.

But it actually only got around 12 hours worth of work.


So if it is thinking that it got more than it actually did (41 hours vs 12 hours), is this why people are having problems maintaining a full cache?
____________


Donate with your searches and online buys:
http://www.goodsearch.com/toolbar/university-of-california-setihome

Profile HAL9000
Volunteer tester
Avatar
Send message
Joined: 11 Sep 99
Posts: 3297
Credit: 78,546,580
RAC: 60,359
United States
Message 1258069 - Posted: 9 Jul 2012, 16:03:35 UTC

My first thought would be along the lines that your system is still learning how long these tasks will take. The server shows you only have 2 tasks completed for AP on the GPU.

It could also be that the server did something weird with your TDCF. I noticed that it doubled on one of my machines for several hours the other day for what seemed like no reason.

Another guess is that the task duration also includes your work fetch values.
____________
SETI@home classic workunits: 93,865 CPU time: 863,447 hours

Join the BP6/VP6 User Group today!

Wembley
Volunteer tester
Avatar
Send message
Joined: 16 Sep 09
Posts: 415
Credit: 882,507
RAC: 144
United States
Message 1258603 - Posted: 10 Jul 2012, 23:17:39 UTC

I still say BOINC is bad at math.

I have my cache settings at 3+.01 and I currently have about 26.5 hours of work according to the 'Remaining' column totals, yet I get this in my log:

2012-07-10 07:13:46 PM | SETI@home | Not requesting tasks: don't need

____________


Donate with your searches and online buys:
http://www.goodsearch.com/toolbar/university-of-california-setihome

Profile Gundolf Jahn
Send message
Joined: 19 Sep 00
Posts: 3101
Credit: 344,452
RAC: 101
Germany
Message 1258764 - Posted: 11 Jul 2012, 6:09:56 UTC - in response to Message 1258603.

Are those 26.5 hours of work for SETI or for all your projects?

Gruß,
Gundolf

Josef W. Segur
Volunteer developer
Volunteer tester
Send message
Joined: 30 Oct 99
Posts: 3969
Credit: 937,921
RAC: 170
United States
Message 1258907 - Posted: 11 Jul 2012, 13:57:55 UTC - in response to Message 1258603.

I still say BOINC is bad at math.

I have my cache settings at 3+.01 and I currently have about 26.5 hours of work according to the 'Remaining' column totals, yet I get this in my log:

2012-07-10 07:13:46 PM | SETI@home | Not requesting tasks: don't need

One thing which often makes such difficulties is that the local preferences override the web preferences, and it's fairly easy to unintentionally have local preferences in effect. Unless you've set that 3+.01 from the BOINC Manager advanced menu | Preferences it may not actually be in effect.
Joe

Wembley
Volunteer tester
Avatar
Send message
Joined: 16 Sep 09
Posts: 415
Credit: 882,507
RAC: 144
United States
Message 1259046 - Posted: 11 Jul 2012, 18:34:36 UTC - in response to Message 1258764.

Are those 26.5 hours of work for SETI or for all your projects?

Gruß,
Gundolf

I'm not currently running anything except SETI@Home
____________


Donate with your searches and online buys:
http://www.goodsearch.com/toolbar/university-of-california-setihome

Wembley
Volunteer tester
Avatar
Send message
Joined: 16 Sep 09
Posts: 415
Credit: 882,507
RAC: 144
United States
Message 1259047 - Posted: 11 Jul 2012, 18:34:50 UTC - in response to Message 1258907.

I still say BOINC is bad at math.

I have my cache settings at 3+.01 and I currently have about 26.5 hours of work according to the 'Remaining' column totals, yet I get this in my log:

2012-07-10 07:13:46 PM | SETI@home | Not requesting tasks: don't need

One thing which often makes such difficulties is that the local preferences override the web preferences, and it's fairly easy to unintentionally have local preferences in effect. Unless you've set that 3+.01 from the BOINC Manager advanced menu | Preferences it may not actually be in effect.
Joe

I always use local prefs.
____________


Donate with your searches and online buys:
http://www.goodsearch.com/toolbar/university-of-california-setihome

Profile HAL9000
Volunteer tester
Avatar
Send message
Joined: 11 Sep 99
Posts: 3297
Credit: 78,546,580
RAC: 60,359
United States
Message 1259063 - Posted: 11 Jul 2012, 19:14:01 UTC

I did a test on PrimeGrid with 7.0.31

7/11/2012 3:05:10 PM PrimeGrid [sched_op] Starting scheduler request
7/11/2012 3:05:10 PM PrimeGrid Sending scheduler request: Project initialization.
7/11/2012 3:05:10 PM PrimeGrid Requesting new tasks for CPU and NVIDIA
7/11/2012 3:05:10 PM PrimeGrid [sched_op] CPU work request: 1.00 seconds; 0.00 devices
7/11/2012 3:05:10 PM PrimeGrid [sched_op] NVIDIA work request: 1.00 seconds; 0.00 devices
7/11/2012 3:05:15 PM PrimeGrid Scheduler request completed: got 1 new tasks
7/11/2012 3:05:15 PM PrimeGrid [sched_op] Server version 613
7/11/2012 3:05:15 PM PrimeGrid Project requested delay of 7 seconds
7/11/2012 3:05:15 PM PrimeGrid [sched_op] estimated total CPU task duration: 0 seconds
7/11/2012 3:05:15 PM PrimeGrid [sched_op] estimated total NVIDIA task duration: 9921 seconds
7/11/2012 3:05:15 PM PrimeGrid [sched_op] Deferring communication for 7 sec
7/11/2012 3:05:15 PM PrimeGrid [sched_op] Reason: requested by project

The estimated task time was 2:45:21. Which comes out to 9921 seconds just as the debug shows.

So perhaps it has something to do with the different server version or the data.

Until we are up and running at full speed here again I can't check.
____________
SETI@home classic workunits: 93,865 CPU time: 863,447 hours

Join the BP6/VP6 User Group today!

Horacio
Send message
Joined: 14 Jan 00
Posts: 536
Credit: 33,685,965
RAC: 114,783
Argentina
Message 1259077 - Posted: 11 Jul 2012, 19:35:53 UTC - in response to Message 1259063.

I did a test on PrimeGrid with 7.0.31

7/11/2012 3:05:10 PM PrimeGrid [sched_op] Starting scheduler request
7/11/2012 3:05:10 PM PrimeGrid Sending scheduler request: Project initialization.
7/11/2012 3:05:10 PM PrimeGrid Requesting new tasks for CPU and NVIDIA
7/11/2012 3:05:10 PM PrimeGrid [sched_op] CPU work request: 1.00 seconds; 0.00 devices
7/11/2012 3:05:10 PM PrimeGrid [sched_op] NVIDIA work request: 1.00 seconds; 0.00 devices
7/11/2012 3:05:15 PM PrimeGrid Scheduler request completed: got 1 new tasks
7/11/2012 3:05:15 PM PrimeGrid [sched_op] Server version 613
7/11/2012 3:05:15 PM PrimeGrid Project requested delay of 7 seconds
7/11/2012 3:05:15 PM PrimeGrid [sched_op] estimated total CPU task duration: 0 seconds
7/11/2012 3:05:15 PM PrimeGrid [sched_op] estimated total NVIDIA task duration: 9921 seconds
7/11/2012 3:05:15 PM PrimeGrid [sched_op] Deferring communication for 7 sec
7/11/2012 3:05:15 PM PrimeGrid [sched_op] Reason: requested by project

The estimated task time was 2:45:21. Which comes out to 9921 seconds just as the debug shows.

So perhaps it has something to do with the different server version or the data.

Until we are up and running at full speed here again I can't check.


But... It asked for 1.00 sec and got 9921... So if you were attached to other projects they will not have any chance to get work for a while... then if the other project is SETI and you get a "project has no work" reply followed by a 5 mins delay, then BOINC will ask work for other project and ... well... you know...
____________

Profile HAL9000
Volunteer tester
Avatar
Send message
Joined: 11 Sep 99
Posts: 3297
Credit: 78,546,580
RAC: 60,359
United States
Message 1259080 - Posted: 11 Jul 2012, 19:38:19 UTC
Last modified: 11 Jul 2012, 19:41:16 UTC

It was a new instance of BOINC and I had just attached to PrimeGrid. Asking for 1 second of work for the initial request is normal.

If you have a project as a backup project, resource share 0, BOINC will only ask the backup project for work to keep idle devices busy. Instead of building up a cache on the other project.

The object here was to test the "estimated total NVIDIA task duration" nothing more.
____________
SETI@home classic workunits: 93,865 CPU time: 863,447 hours

Join the BP6/VP6 User Group today!

Richard Haselgrove
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 7670
Credit: 40,280,089
RAC: 21,434
United Kingdom
Message 1259087 - Posted: 11 Jul 2012, 19:43:16 UTC - in response to Message 1259080.

It was a new instance of BOINC and I had just attached to PrimeGrid. Asking for 1 second of work for the initial request is normal.

And getting one workunit, of whatever the standard duration for the project is, in return is also normal.

Horacio
Send message
Joined: 14 Jan 00
Posts: 536
Credit: 33,685,965
RAC: 114,783
Argentina
Message 1259091 - Posted: 11 Jul 2012, 19:52:27 UTC - in response to Message 1259080.

It was a new instance of BOINC and I had just attached to PrimeGrid. Asking for 1 second of work for the initial request is normal.


I know that about the initial request, but anyway, if the RPC asks for 1 sec then it shouldnt get anything longer than that (or at least no more than some margin due to estimations errors).

Ive found that assigning my hosts to only one project (instead of sharing all the host across all the projects) give me much better results, not only with the RAC, but also keeping the cache filled.
Also, in SETI, to get the cache to the right size and with enough work for GPU and CPUs I had to use flops to patch the bad estimations on the tasks lenght.

I know that all this are too fine details, and for the vast majority of hosts out there it works without any micromanagement, but im just too obsesive to let it do its thing... ;D
____________

Richard Haselgrove
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 7670
Credit: 40,280,089
RAC: 21,434
United Kingdom
Message 1259095 - Posted: 11 Jul 2012, 19:57:05 UTC - in response to Message 1259080.

The object here was to test the "estimated total NVIDIA task duration" nothing more.

There was a discussion a couple of months ago on boinc_alpha which touched on this.

Actually, all these quantities reflect average availability
(on_frac*active_frac, or on_frac*gpu_active_frac in the case of GPUs),
except for "estimated total NVIDIA task duration"
which (confusingly) doesn't; I fixed this.

A request for 327854 seconds of NVIDIA jobs means we want enough jobs
so that their estimated wall time is at least 327854 seconds.
In this case the estimated wall time per job was about 105000 seconds,
so the scheduler sent 4 of them.

-- David

Note that the debug messages (including 'NVIDIA task duration' in the latest BOINC clients) have that correction to wall time, but the runtime estimates displayed by BOINC Manager don't. So, if a task actually takes one hour to run, but is only active 50% of the time, the interval between 'start' and 'finish' will be two hours - and the 'NVIDIA task duration' will need to be 7200 seconds to calculate cache size correctly.

Profile HAL9000
Volunteer tester
Avatar
Send message
Joined: 11 Sep 99
Posts: 3297
Credit: 78,546,580
RAC: 60,359
United States
Message 1259098 - Posted: 11 Jul 2012, 20:05:12 UTC - in response to Message 1259095.

The object here was to test the "estimated total NVIDIA task duration" nothing more.

There was a discussion a couple of months ago on boinc_alpha which touched on this.

Actually, all these quantities reflect average availability
(on_frac*active_frac, or on_frac*gpu_active_frac in the case of GPUs),
except for "estimated total NVIDIA task duration"
which (confusingly) doesn't; I fixed this.

A request for 327854 seconds of NVIDIA jobs means we want enough jobs
so that their estimated wall time is at least 327854 seconds.
In this case the estimated wall time per job was about 105000 seconds,
so the scheduler sent 4 of them.

-- David

Note that the debug messages (including 'NVIDIA task duration' in the latest BOINC clients) have that correction to wall time, but the runtime estimates displayed by BOINC Manager don't. So, if a task actually takes one hour to run, but is only active 50% of the time, the interval between 'start' and 'finish' will be two hours - and the 'NVIDIA task duration' will need to be 7200 seconds to calculate cache size correctly.


I had forgotten about the duration values. I guess we would need to get Wembley to look at his client_state for them.

If the values looked like this then it should explain what he was seeing.
<on_frac>1</on_frac>
<connected_frac>1</connected_frac>
<active_frac>1</active_frac>
<gpu_active_frac>0.292682</gpu_active_frac>
____________
SETI@home classic workunits: 93,865 CPU time: 863,447 hours

Join the BP6/VP6 User Group today!

Profile Borgholio
Avatar
Send message
Joined: 2 Aug 99
Posts: 641
Credit: 10,903,960
RAC: 2,811
United States
Message 1261851 - Posted: 18 Jul 2012, 15:29:26 UTC

I have noticed something like this too. I am running a couple dozen projects including SETI. I have noticed that my GPU cache shrinks when I get a lot of CPU tasks from other projects, especially Climateprediction.net. Based on my testing in this thread:

http://setiathome.berkeley.edu/forum_thread.php?id=68453

I came to the conclusion that the number of tasks in queue for the GPU depend on how many tasks are in queue for the CPU. That is, the schedulers for CPU and GPU are in fact NOT separate from each other, as it was back when individual scheduling for CPU and GPU was first introduced a couple years ago. I smell a bug.
____________


You will be assimilated...bunghole!

Message boards : Number crunching : BOINC is bad at math, reason why some people can't keep a full cache?

Copyright © 2013 University of California