Tests of new scheduler features.

5px solid LightGreen" > berkeley.edu/beta/view_profile.php?userid=747"> Profile

Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0

Author	Message
TRuEQ & TuVaLu Volunteer tester Send message Joined: 28 Jan 11 Posts: 619 Credit: 2,580,051 RAC: 0	Message 45709 - Posted: 4 May 2013, 18:57:20 UTC - in response to Message 45708. Meanwhile it is every second Workingunit with low credits. For example: http://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=5209949 Given Credit of 0.23, validating Ati-OpenCl against Ati-OpenCL. Gimme the whole credit, please! ;-) We don't need no steenking credit. Yes we do. ID: 45709 ·

Josef W. Segur Volunteer tester Send message Joined: 14 Oct 05 Posts: 1137 Credit: 1,848,733 RAC: 0	Message 45710 - Posted: 4 May 2013, 21:07:30 UTC - in response to Message 45709. We don't need no steenking credit. Yes we do. The projects do need credits to keep some participants interested, and they are a convenient rough yardstick of performance for all. However, I'm glad there aren't more detailed statistics like sports fans keep for their favorite teams or players. Back to topic - Assuming the scheduler is using the hav->pfc basis for guessing host speed, for a modest CPU system it works out very close to the old method. With 16 results in the averages, the latest <flops> sent to my host 10490 is 3.881207e09 but would have been 3.880348e09 based on the seconds per FLOP elapsed time average. Joe ID: 45710 ·

Raistmer Volunteer tester Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0	Message 45771 - Posted: 10 May 2013, 11:57:24 UTC I'm enabled beta work distribution for my NV host again (with CUDA 22, 23 and 32 apps) to test if BOINC can figure out what is faster now. so far all types in downloaded tasks still. 2 of 3 have more than 10 eligible validations. What estimate can be used for time when BOINC should react? When to expect its reaction? (IMO it should almost stop to send cuda22 tasks to that host and send mostly cuda23 ones). This host works in unattended mode so it reports and asks for new work constantly. not in large chunks as my ATi host doing. ID: 45771 ·

Eric J Korpela Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 15 Mar 05 Posts: 1547 Credit: 27,183,456 RAC: 0	Message 45775 - Posted: 10 May 2013, 18:06:20 UTC - in response to Message 45771. It's a little more complicated than that. When the version selection is done, an app versions processing rate is multiplied by (1+f*r/n) where f is a project specific factor (0.316 for beta), r is normally distributed random number, and n is the number of non-outlier results done. So if two versions are equally fast you'll get typically half and half. If a version is twice as fast and both have 10 results, it's unlikely you'll get any from the slow version. As time goes on you should get fewer and fewer from the slow versions. If the speed difference is only 1% it'll take a longer time to see a difference than if the difference is 10%. All this is modified by the variable quotas. If a version is really fast, but it errors out on half the results, it will only get one result per day and you'll get additional ones for slower but better versions. That's good. But if you have normal quotas you fill the quota of the fast version, your client will get some of the slow version. That's not do good, but since the quotas increase as you return valid results it should be a temporary situation. ID: 45775 ·

Raistmer Volunteer tester Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0	Message 45776 - Posted: 10 May 2013, 18:19:07 UTC - in response to Message 45775. quota part not so good indeed but diminishing allocation for slower app done gradually - that's what really needed. For example on my host both apps process few dozens of tasks already... but all of same AR. Quite possible that for different AR their relative speed changed (and this is what we really see in offline tests for HD5 and non-HD5 builds on some of GPUs). So, it's good to continue to recive new tasks for slower app if speed difference is small. For cuda22 and other cuda speed difference is huge, almost 2x. So I expect not to get cuda22 soon. Will see what will be in reality. ID: 45776 ·

Raistmer Volunteer tester Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0	Message 45784 - Posted: 11 May 2013, 10:45:15 UTC Last modified: 11 May 2013, 10:55:37 UTC Not too good result so far... current host state: SETI@home v7 7.00 windows_intelx86 (cuda22) Number of tasks completed 31 Max tasks per day 65 Number of tasks today 60 Consecutive valid tasks 32 Average processing rate 58.021282201959 Average turnaround time 0.75 days SETI@home v7 7.00 windows_intelx86 (cuda23) Number of tasks completed 30 Max tasks per day 63 Number of tasks today 40 Consecutive valid tasks 30 Average processing rate 133.1874009876 Average turnaround time 0.87 days SETI@home v7 7.00 windows_intelx86 (cuda32) Number of tasks completed 99 Max tasks per day 132 Number of tasks today 100 Consecutive valid tasks 99 Average processing rate 123.59616011718 Average turnaround time 0.88 days As one can see, cuda22 more than twice slower. But today host got whole pack of cuda22 tasks. Is it not time for BOINC to figure already how bad cuda22 for this particular host and to stop allocating cuda22 work for it ? Well past 10 results for all types, yesterday were almost no cuda22 allocated, it's very recent allocation... Eric, could you check this host http://setiweb.ssl.berkeley.edu/beta/host_app_versions.php?hostid=18439 logs, please and decide if it's OK to still have many cuda22 allocations or there is something wrong with BOINC server? EDIT: I'm afraid with such BOINC work we can't release cuda22 and Brook+ in free competition with other builds. It will be too big slowdown for project, not any gain... Can it be that "Average turnaround time" in use anywhere in algorithm? It should not be cause it not directly depends from app performance! (cuda22 has smallest this value so ...) EDIT2: quite funny - fastest app still has smallest number of completed results... Well done BOINC's "optimization" :D ID: 45784 ·

Raistmer Volunteer tester Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0	Message 45785 - Posted: 11 May 2013, 11:05:39 UTC - in response to Message 45775. Last modified: 11 May 2013, 11:07:21 UTC an app versions processing rate is multiplied by (1+f*r/n) where f is a project specific factor (0.316 for beta), r is normally distributed random number, and n is the number of non-outlier results done. Around what value r distributed? <r> = ?, Dr = ? ID: 45785 ·

Eric J Korpela Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 15 Mar 05 Posts: 1547 Credit: 27,183,456 RAC: 0	Message 45788 - Posted: 11 May 2013, 17:24:26 UTC - in response to Message 45785. Normal distribution around r=0 with standard deviation of 1. So it also has the possibility of making an app seem slower. ID: 45788 ·

Eric J Korpela Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 15 Mar 05 Posts: 1547 Credit: 27,183,456 RAC: 0	Message 45789 - Posted: 11 May 2013, 17:46:14 UTC Yep, app_versions below are 364-367 2013-05-11 00:53:43.5332 [PID=2515 ] 2013-05-11 00:53:43.5333 [PID=2515 ] 2013-05-11 00:53:43.5333 [PID=2515 ] 2013-05-11 00:53:43.5334 [PID=2515 ] 2013-05-11 00:53:43.5334 [PID=2515 ] 2013-05-11 00:53:43.5335 [PID=2515 ] 2013-05-11 00:53:43.5335 [PID=2515 ] 2013-05-11 00:53:43.5781 [PID=2515 ] 2013-05-11 00:53:43.5782 [PID=2515 ] 2013-05-11 00:53:43.5782 [PID=2515 ] 2013-05-11 00:53:43.5788 [PID=2515 ] 2013-05-11 00:53:43.5788 [PID=2515 ] 2013-05-11 00:53:43.5788 [PID=2515 ] 2013-05-11 00:53:43.5789 [PID=2515 ] 2013-05-11 00:53:43.5789 [PID=2515 ] 2013-05-11 00:53:43.5789 [PID=2515 ] 2013-05-11 00:53:43.5789 [PID=2515 ] 2013-05-11 00:53:43.5789 [PID=2515 ] 2013-05-11 00:53:43.5789 [PID=2515 ] 2013-05-11 00:53:43.5789 [PID=2515 ] 2013-05-11 00:53:43.5789 [PID=2515 ] 2013-05-11 00:53:43.5789 [PID=2515 ] 2013-05-11 00:53:43.5789 [PID=2515 ] 2013-05-11 00:53:43.5789 [PID=2515 ] 2013-05-11 00:53:43.5789 [PID=2515 ] 2013-05-11 00:53:43.5789 [PID=2515 ] 2013-05-11 00:53:43.5790 [PID=2515 ] 2013-05-11 00:53:43.5790 [PID=2515 ] 2013-05-11 00:53:43.5790 [PID=2515 ] 2013-05-11 00:53:43.5790 [PID=2515 ] ID: 45789 ·

there's still a problem. If it had been the random factor that did it, there would have been a message in the log. There isn't. Time to add more debugging output. are cuda22,cuda23,cuda32,cuda42, and cuda50. [send] [HOST#18439] app version 364 is reliable [send] [HOST#18439] app version 365 is reliable [send] [HOST#18439] app version 366 is reliable [quota] effective ncpus 2 ngpus 1 [quota] max jobs per RPC: 20 [send] CPU: req 0.00 sec, 0.00 instances; est delay 0.00 [send] NVIDIA GPU: req 432013.21 sec, 0.00 instances; est delay 0.00 [version] looking for version of setiathome_v7 [version] [AV#370] Skipping CPU version - user prefs say no CPU [version] Checking plan class 'cuda22' [version] reading plan classes from file '../plan_class_spec.xml' [version] plan_class_spec: host_flops: 2.021320e+09, scale: 1.00, projected_flops: 3.808069e+10, peak_flops: 4.035504e+10 [quota] [AV#364] scaled max jobs per day: 61 [version] Checking plan class 'cuda23' [version] plan_class_spec: host_flops: 2.021320e+09, scale: 1.00, projected_flops: 3.808069e+10, peak_flops: 4.035504e+10 [quota] [AV#365] scaled max jobs per day: 51 [version] Checking plan class 'cuda32' [version] plan_class_spec: host_flops: 2.021320e+09, scale: 1.00, projected_flops: 3.808069e+10, peak_flops: 4.035504e+10 [quota] [AV#366] scaled max jobs per day: 95 [quota] [AV#366] daily quota exceeded: 100 >= 95 [version] [AV#366] daily quota exceeded [version] Checking plan class 'cuda42' [version] plan_class_spec: CUDA version required min: 4020, supplied: 3020 [version] [AV#367] app_plan() returned false [version] Checking plan class 'cuda50' [version] plan_class_spec: CUDA version required min: 5000, supplied: 3020 [version] [AV#368] app_plan() returned false [version] [AV#364] (cuda22) setting projected flops based on host_app_version pfc: 58.01G [version] [AV#364] (cuda22) comparison pfc: 58.01G et: 58.01G [version] Best version of app setiathome_v7 is [AV#364] (58.01 GFLOPS) src="http://boincstats.com/signature/-1/user/6723/sig.png">
Message 45790 - Posted: 11 May 2013, 17:56:12 UTC - in response to Message 45789. Last modified: 11 May 2013, 17:58:06 UTC 366 was ruled out because of quota. Ok, it's understandable. But why 364 was preferred over 365 - looks strange. EDIT: CUDA22 (364) has bigger quota available - can this influent on BOINc's choice? ID: 45790 ·

Eric J Korpela Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 15 Mar 05 Posts: 1547 Credit: 27,183,456 RAC: 0	Message 45791 - Posted: 11 May 2013, 17:58:36 UTC - in response to Message 45790. I'll check on that. ID: 45791 ·

Raistmer Volunteer tester Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0	Message 45794 - Posted: 12 May 2013, 7:12:22 UTC - in response to Message 45791. Last modified: 12 May 2013, 8:01:23 UTC I'll check on that. Any progress in that? Please keep us informed :) EDIT: and regarding r param distribution. SD of 1 - not too small for this purpose? It makes let say r==3 quite unprobably already. and cause N>=10 after 10 eligible validations 3/10*0.3 will give only ~10% of APR change due to random factor. And it's upper bound, usual shift will be even smaller... ID: 45794 ·

Urs Echternacht Volunteer tester Send message Joined: 18 Jan 06 Posts: 1038 Credit: 18,734,730 RAC: 0	Message 45804 - Posted: 12 May 2013, 16:09:03 UTC Have a host where the estimates have settled since, but both app versions, opencl_ati_cat132 and opencl_ati5_cat132, are still sent to the host. The opencl_ati_cat132 runs the currently distributed workunits ca. 5 minutes faster than the opencl_ati5_cat132. Should it really take thousands of wus to settle ? _\\|/_ U r s ID: 45804 ·

Josef W. Segur Volunteer tester Send message Joined: 14 Oct 05 Posts: 1137 Credit: 1,848,733 RAC: 0	Message 45805 - Posted: 12 May 2013, 19:52:24 UTC - in response to Message 45804. Have a host where the estimates have settled since, but both app versions, opencl_ati_cat132 and opencl_ati5_cat132, are still sent to the host. The opencl_ati_cat132 runs the currently distributed workunits ca. 5 minutes faster than the opencl_ati5_cat132. Should it really take thousands of wus to settle ? No, of course it shouldn't, and I'm sure Eric can pin down the reason it does. Actually, I think it never settles with the current code. Note in the debug_version_select log messages for Raistmer's host that CUDA22, CUDA23, and CUDA32 get the same projected_flops of 3.808069e+10 from the plan_class_spec logic. The complete loop choosing the "best" version is based on that projection as modified by the random factor. The "setting projected flops based on host_app_version pfc:" doesn't happen until after the choice has been made. Under those circumstances, each of the CUDAxx plans has an equal chance to be chosen as "best" for a specific work request. Joe ID: 45805 ·

Raistmer Volunteer tester Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0	Message 45806 - Posted: 12 May 2013, 20:34:34 UTC - in response to Message 45805. So, different APR just ignored until app chosen already ??? Clear bug then ID: 45806 ·

Raistmer Volunteer tester Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0	Message 45810 - Posted: 13 May 2013, 13:41:59 UTC Last modified: 13 May 2013, 13:42:54 UTC I stop asking for CUDA work for now cause some server side changes are definitely required. Will allow work fetch when there will be something new to test. [CUDA app itself quite proved already, months passed with it on beta...] ID: 45810 ·

Eric J Korpela Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 15 Mar 05 Posts: 1547 Credit: 27,183,456 RAC: 0	Message 45816 - Posted: 13 May 2013, 16:41:17 UTC There has to be some path around the correct logic, a short circuit in the application choice, but I haven't found it yet. I'm hacking at it today. ID: 45816 ·

Eric J Korpela Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 15 Mar 05 Posts: 1547 Credit: 27,183,456 RAC: 0	Message 45822 - Posted: 13 May 2013, 19:26:24 UTC - in response to Message 45816. I put a new scheduler that generates about 10x the debugging output. Could you start taking new work so I can see a failure? Or point out a host that got the wrong work after the time this message was posted? ID: 45822 ·

Richard Haselgrove Volunteer tester Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0	Message 45823 - Posted: 13 May 2013, 19:47:58 UTC - in response to Message 45822. I put a new scheduler that generates about 10x the debugging output. Could you start taking new work so I can see a failure? Or point out a host that got the wrong work after the time this message was posted? Fired up 63280. First fetch on restart was cuda42: cuda50 would have been a (marginally) better choice. Subsequent fetches were cuda32 (bad, but viable, choice), then cuda50, then back to cuda32 again. ID: 45823 ·

Eric J Korpela Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 15 Mar 05 Posts: 1547 Credit: 27,183,456 RAC: 0	Message 45824 - Posted: 13 May 2013, 19:55:41 UTC - in response to Message 45822. I found the problem. There are apparently two different methods for computing speed... One is based on the predicted speed of the GPU, and it is what is used to determine which version is faster. When the random factor is added into that you're most likely to get the version that has computed the fewest results so far. This is contrary to the behaviour David has said the scheduler should have and so I will fix it. ID: 45824 ·