Tests of new scheduler features.

Author	Message
Raistmer Volunteer tester Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0	Message 46171 - Posted: 1 Jun 2013, 21:33:51 UTC perhaps VLARs should be disabled for GPUs again. Very negative attitude on SETI main boards to VLARs on GPU, even on ATi GPUs though NV GPUs mentioned more often. It's worth to distribute VLAR to GPU only if GPU is idle and server can't offer another work. Kind of "backup work". In other way GPU will be idle or drift to another project. If it's possible to implement such logic it's worth to do. If not maybe worth to disable VLARs again. ID: 46171 ·

Eric J Korpela Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 15 Mar 05 Posts: 1547 Credit: 27,183,456 RAC: 0	Message 46173 - Posted: 1 Jun 2013, 22:36:31 UTC - in response to Message 46170. Have you heard there's odd Credit awards going on for Astropulse v6 now at the Main project, around 15 to 25 Credits per AP Wu: That's odd, since I haven't updated anything related to AP over there. ID: 46173 ·

jason_gee Volunteer tester Send message Joined: 11 Dec 08 Posts: 198 Credit: 658,573 RAC: 0	Message 46175 - Posted: 2 Jun 2013, 0:08:28 UTC - in response to Message 46168. Last modified: 2 Jun 2013, 0:39:30 UTC The right way to do this (and I indicated this to David a long time ago) is to use an estimate of the median rather than weighted averages, as medians are not strongly affected by outliers. I could change the current code make an estimate of the running median... Oooh, sounds much more robust :D Now on the client side, having experimented with stabilising estimates for work fetch & task scheduling, I used a tuned PID controller for dead-reckoning with feedback. That made estimates much more stable, tuned for very slight overshoot for rapid convergence (system usage change etc) without ringing (sufficiently damped still). If I switch from using (custom) per application DCF as the control (fudge factor), over to adaptive flops as suggested by Joe some time back, does the server receive the (application) flops value on each contact ? and, if so, could you possibly combine the more robust longer term median processing rate(s) with the flops using something like a Kalman filter ? My basis for thought there is why recalculate something the client already knows, if you don't have to. or alternatively if the calculations are in different time scales, combine (Kalman filter) them to get the best of both. To me anyway, stable & adaptive estimates proved to solve a lot of problems... On a relatively fast system I typically see stable estimate convergence track system usage or hardware change on the order of minutes, as opposed to APR's days to weeks. ID: 46175 ·

TRuEQ & TuVaLu Volunteer tester Send message Joined: 28 Jan 11 Posts: 619 Credit: 2,580,051 RAC: 0	Message 46178 - Posted: 2 Jun 2013, 8:19:35 UTC - in response to Message 46170. Have you heard there's odd Credit awards going on for Astropulse v6 now at the Main project, around 15 to 25 Credits per AP Wu: http://setiathome.berkeley.edu/forum_thread.php?id=71827&postid=1374823 Valid AstroPulse v6 tasks for computer 6910524 I grabbed some Stock OpenCL AP work, and got very low awarded Credit too: All AstroPulse v6 tasks for computer 5427475 Claggy I have a couple of more with low credits on main. http://setiathome.berkeley.edu/workunit.php?wuid=1257047105 http://setiathome.berkeley.edu/workunit.php?wuid=1257039458 ID: 46178 ·

Claggy Volunteer tester Send message Joined: 29 May 06 Posts: 1037 Credit: 8,440,339 RAC: 0	Message 46179 - Posted: 2 Jun 2013, 8:46:53 UTC - in response to Message 46178. Have you heard there's odd Credit awards going on for Astropulse v6 now at the Main project, around 15 to 25 Credits per AP Wu: http://setiathome.berkeley.edu/forum_thread.php?id=71827&postid=1374823 Valid AstroPulse v6 tasks for computer 6910524 I grabbed some Stock OpenCL AP work, and got very low awarded Credit too: All AstroPulse v6 tasks for computer 5427475 Claggy I have a couple of more with low credits on main. http://setiathome.berkeley.edu/workunit.php?wuid=1257047105 http://setiathome.berkeley.edu/workunit.php?wuid=1257039458 I wouldn't worry about that, we need to populate the project's app version Peak Flop Count Average' for all app versions, that has probably happened for GPU versions already, it'll be a few days before it's done for the CPU AP app, also none of those hosts had reached their 10 validations yet, Claggy ID: 46179 ·

TRuEQ & TuVaLu Volunteer tester Send message Joined: 28 Jan 11 Posts: 619 Credit: 2,580,051 RAC: 0	Message 46182 - Posted: 2 Jun 2013, 10:25:10 UTC - in response to Message 46179. Last modified: 2 Jun 2013, 10:25:26 UTC Have you heard there's odd Credit awards going on for Astropulse v6 now at the Main project, around 15 to 25 Credits per AP Wu: http://setiathome.berkeley.edu/forum_thread.php?id=71827&postid=1374823 Valid AstroPulse v6 tasks for computer 6910524 I grabbed some Stock OpenCL AP work, and got very low awarded Credit too: All AstroPulse v6 tasks for computer 5427475 Claggy I have a couple of more with low credits on main. http://setiathome.berkeley.edu/workunit.php?wuid=1257047105 http://setiathome.berkeley.edu/workunit.php?wuid=1257039458 I wouldn't worry about that, we need to populate the project's app version Peak Flop Count Average' for all app versions, that has probably happened for GPU versions already, it'll be a few days before it's done for the CPU AP app, also none of those hosts had reached their 10 validations yet, Claggy Now I see, it's wingman that needs 10 validations. ID: 46182 ·

Richard Haselgrove Volunteer tester Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0	Message 46189 - Posted: 2 Jun 2013, 22:39:39 UTC YAY - cuda32 is speeding up - APR is 101, and cuda50 only 97. So of course, cuda32 rules. Doesn't it? ID: 46189 ·

jason_gee Volunteer tester Send message Joined: 11 Dec 08 Posts: 198 Credit: 658,573 RAC: 0	Message 46190 - Posted: 2 Jun 2013, 23:41:49 UTC - in response to Message 46189. YAY - cuda32 is speeding up - APR is 101, and cuda50 only 97. So of course, cuda32 rules. Doesn't it? Could do. Still doesn't make David any better at statistics than me, which is pretty bad. ID: 46190 ·

Mike Volunteer tester Send message Joined: 16 Jun 05 Posts: 2531 Credit: 1,074,556 RAC: 0	Message 46191 - Posted: 3 Jun 2013, 7:42:10 UTC Something is definetly wrong. Host stas on main number of completed tasks 1 consecutive valid tasks 646 APR 7740 With each crime and every kindness we birth our future. ID: 46191 ·

Richard Haselgrove Volunteer tester Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0	Message 46192 - Posted: 3 Jun 2013, 7:58:46 UTC - in response to Message 46191. Something is definetly wrong. Host stas on main number of completed tasks 1 consecutive valid tasks 646 APR 7740 Link to host would help, please. ID: 46192 ·

jason_gee Volunteer tester Send message Joined: 11 Dec 08 Posts: 198 Credit: 658,573 RAC: 0	Message 46193 - Posted: 3 Jun 2013, 8:04:38 UTC Last modified: 3 Jun 2013, 8:23:50 UTC Here's an interesting one on main. http://setiathome.berkeley.edu/show_host_detail.php?hostid=6739873 Looks to me like if 'completed tasks' stays zero, i.e. it's spitting out only invalids, then max tasks per day never goes down ? ID: 46193 ·

Mike Volunteer tester Send message Joined: 16 Jun 05 Posts: 2531 Credit: 1,074,556 RAC: 0	Message 46195 - Posted: 3 Jun 2013, 8:51:01 UTC - in response to Message 46192. Something is definetly wrong. Host stas on main number of completed tasks 1 consecutive valid tasks 646 APR 7740 Link to host would help, please. http://setiathome.berkeley.edu/results.php?hostid=5735690 With each crime and every kindness we birth our future. ID: 46195 ·

William Volunteer tester Send message Joined: 14 Feb 13 Posts: 606 Credit: 588,843 RAC: 0	Message 46196 - Posted: 3 Jun 2013, 8:52:52 UTC - in response to Message 46175. The right way to do this (and I indicated this to David a long time ago) is to use an estimate of the median rather than weighted averages, as medians are not strongly affected by outliers. I could change the current code make an estimate of the running median... Oooh, sounds much more robust :D Now on the client side, having experimented with stabilising estimates for work fetch & task scheduling, I used a tuned PID controller for dead-reckoning with feedback. That made estimates much more stable, tuned for very slight overshoot for rapid convergence (system usage change etc) without ringing (sufficiently damped still). If I switch from using (custom) per application DCF as the control (fudge factor), over to adaptive flops as suggested by Joe some time back, does the server receive the (application) flops value on each contact ? and, if so, could you possibly combine the more robust longer term median processing rate(s) with the flops using something like a Kalman filter ? My basis for thought there is why recalculate something the client already knows, if you don't have to. or alternatively if the calculations are in different time scales, combine (Kalman filter) them to get the best of both. To me anyway, stable & adaptive estimates proved to solve a lot of problems... On a relatively fast system I typically see stable estimate convergence track system usage or hardware change on the order of minutes, as opposed to APR's days to weeks. Since you mention it here - my idea was always, that APR shpuld be calculated with the same contril circuit you did for aDCF. You don't feel like doing a spot of server code, perhaps? ;) A person who won't read has no advantage over one who can't read. (Mark Twain) ID: 46196 ·

William Volunteer tester Send message Joined: 14 Feb 13 Posts: 606 Credit: 588,843 RAC: 0	Message 46197 - Posted: 3 Jun 2013, 8:58:15 UTC - in response to Message 46190. YAY - cuda32 is speeding up - APR is 101, and cuda50 only 97. So of course, cuda32 rules. Doesn't it? Could do. Still doesn't make David any better at statistics than me, which is pretty bad. I still think David is using the completely wrong type of statistics for CreditNew. With all modelling (and statistics are a type of modelling) you need to know the assupmtions and the limitations of the model. I maintain that the nature of the data distribution here is not one where the type of statistical analysis David does can be used. But that's only my mathematician's gut feeling :( A person who won't read has no advantage over one who can't read. (Mark Twain) ID: 46197 ·

jason_gee Volunteer tester Send message Joined: 11 Dec 08 Posts: 198 Credit: 658,573 RAC: 0	Message 46198 - Posted: 3 Jun 2013, 9:07:33 UTC - in response to Message 46196. Last modified: 3 Jun 2013, 9:20:16 UTC Since you mention it here - my idea was always, that APR shpuld be calculated with the same contril circuit you did for aDCF. You don't feel like doing a spot of server code, perhaps? ;) Since, by control theory, cascading different controllers in different time domains ( i.e. one slow tracking, one rapid/fine ) tends to be better than one, a server side longer term figure is fine (if done correctly & independently). Ideally the final control 'output' (currently raw APR) would instead be a fusion weighted by trust. You might for example, trust a client generating good results more than one spitting out invalids or errors. These host examples might then allow responsive estimates & conservative/overdamped response respectively. Eric's move to median should help a lot. If after that, it proves too unresponsive when hosts up/downgrade, use machines heavily periodically while crunching, change number of tasks etc, I wouldn't mind taking a look. The basic concept is the same as navigation systems using a fusion of erratic/noisy GPS readings with dead-reckoning. Neither on their own is perfect, but combined is stable, responsive and more accurate. ID: 46198 ·

Richard Haselgrove Volunteer tester Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0	Message 46207 - Posted: 4 Jun 2013, 10:07:05 UTC Well, I got my wish with the VLARs for cuda32. By the time the next 30 have run through, that version should be dead and buried. ID: 46207 ·

Raistmer Volunteer tester Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0	Message 46220 - Posted: 4 Jun 2013, 21:21:50 UTC Last modified: 4 Jun 2013, 21:27:46 UTC Hm.... This host http://setiweb.ssl.berkeley.edu/beta/host_app_versions.php?hostid=39394 has such AP app table: AstroPulse v6 6.06 windows_intelx86 (opencl_ati_100) Number of tasks completed 83 Max tasks per day 163 Number of tasks today 34 Consecutive valid tasks 130 Average processing rate 618.37064112399 Ð¡Ñ€ÐµÐ´Ð½ÐµÐµ Ð¾Ð±Ð¾Ñ€Ð¾Ñ‚Ð½Ð¾Ðµ Ð²Ñ€ÐµÐ¼Ñ 0.50 days AstroPulse v6 6.06 windows_intelx86 (ati_opencl_100) Number of tasks completed 2 Max tasks per day 45 Number of tasks today 0 Consecutive valid tasks 12 Average processing rate 742.57870281709 Ð¡Ñ€ÐµÐ´Ð½ÐµÐµ Ð¾Ð±Ð¾Ñ€Ð¾Ñ‚Ð½Ð¾Ðµ Ð²Ñ€ÐµÐ¼Ñ 2.74 days AstroPulse v6 6.06 windows_intelx86 (cal_ati) Number of tasks completed 7 Max tasks per day 40 Number of tasks today 0 Consecutive valid tasks 7 Average processing rate 56.793048610545 Ð¡Ñ€ÐµÐ´Ð½ÐµÐµ Ð¾Ð±Ð¾Ñ€Ð¾Ñ‚Ð½Ð¾Ðµ Ð²Ñ€ÐµÐ¼Ñ 1.12 days Note, not fastest app recives all the work now. Fastest (by current APR) app can't even collect enough tasks to pass 10 eligible tasks threshold! Is it normal? I think no. All compatible apps shoudl get their 10 eligibles before the best will imprinted in server's mind, right ? EDIT: also note that Mv7 GPU tasks were relatively fast and host was configured with big cache. So, it had opportunity to recive tasks for all apps just because of quota limits. Here, with AP, task takes longer, cache was reduced (intentionally) so we have some another situation to test. And looks like test not passed OK :/ ID: 46220 ·

juan BFB Volunteer tester Send message Joined: 5 May 13 Posts: 2 Credit: 390,921 RAC: 0	Message 46227 - Posted: 5 Jun 2013, 10:51:49 UTC - in response to Message 46171. Last modified: 5 Jun 2013, 10:52:07 UTC perhaps VLARs should be disabled for GPUs again. Very negative attitude on SETI main boards to VLARs on GPU, even on ATi GPUs though NV GPUs mentioned more often. It's worth to distribute VLAR to GPU only if GPU is idle and server can't offer another work. Kind of "backup work". In other way GPU will be idle or drift to another project. If it's possible to implement such logic it's worth to do. If not maybe worth to disable VLARs again. +1 A simple <VlartoGPU>0\|1<VlartoGPU> switch in the client side, could allow us to choose if we want or no crunch the VLARs on the GPUÂ´s, and solve a lot of problems. ID: 46227 ·

TRuEQ & TuVaLu Volunteer tester Send message Joined: 28 Jan 11 Posts: 619 Credit: 2,580,051 RAC: 0	Message 46229 - Posted: 5 Jun 2013, 14:33:11 UTC - in response to Message 46227. perhaps VLARs should be disabled for GPUs again. Very negative attitude on SETI main boards to VLARs on GPU, even on ATi GPUs though NV GPUs mentioned more often. It's worth to distribute VLAR to GPU only if GPU is idle and server can't offer another work. Kind of "backup work". In other way GPU will be idle or drift to another project. If it's possible to implement such logic it's worth to do. If not maybe worth to disable VLARs again. +1 A simple <VlartoGPU>0\|1<VlartoGPU> switch in the client side, could allow us to choose if we want or no crunch the VLARs on the GPUÂ´s, and solve a lot of problems. Or maybe make Vlar's more attractive. .vlar=3 times credit ID: 46229 ·

juan BFB Volunteer tester Send message Joined: 5 May 13 Posts: 2 Credit: 390,921 RAC: 0	Message 46231 - Posted: 5 Jun 2013, 16:12:48 UTC - in response to Message 46229. Last modified: 5 Jun 2013, 16:14:14 UTC perhaps VLARs should be disabled for GPUs again. Very negative attitude on SETI main boards to VLARs on GPU, even on ATi GPUs though NV GPUs mentioned more often. It's worth to distribute VLAR to GPU only if GPU is idle and server can't offer another work. Kind of "backup work". In other way GPU will be idle or drift to another project. If it's possible to implement such logic it's worth to do. If not maybe worth to disable VLARs again. +1 A simple <VlartoGPU>0\|1<VlartoGPU> switch in the client side, could allow us to choose if we want or no crunch the VLARs on the GPUÂ´s, and solve a lot of problems. Or maybe make Vlar's more attractive. .vlar=3 times credit I like that, but is not just about credits, the main problem is the video lag the Vlars crunching produces in some not dedicated crunching hosts, even with changes in the configuration file, specialy when more than one Vlar is crunching on multiple GPU hosts that makes the host simple unuseable to do other simple tasks. Maybe limit the number of vlars allowed to run simultaneusly on the host could fix that problem on fasters GPUs, but not sure if that could work fine on the slowers models. ID: 46231 ·

©2025 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.