Questions and Answers :
Wish list :
DCF when the GPUs are different speeds
Message board moderation
Author | Message |
---|---|
red-ray Send message Joined: 24 Jun 99 Posts: 308 Credit: 9,029,848 RAC: 0 |
I have been trying to get a stable DCF on http://setiathome.berkeley.edu/show_host_detail.php?hostid=6379672 and have managed to improve things quite a lot by adding flops to app_info.xml but I expect I have got it as good as it's going to get. The problem is that I have 4 GPUs of different speeds and when one of the slow GPUs finishes I typically get 16/03/2012 15:20:28 | SETI@home | [dcf] DCF: 0.373319->1.174715, raw_ratio 1.174715, adj_ratio 3.146681 To get things to work vaguely sensibly I have used flops values such that the CPUs and fast GPUs typically have a DCF of 0.4 which means I can get the current 400/50 WU limits and that I don't get timeouts on the slow GPUs. The actual GPU configuration is 16/03/2012 12:32:59 | | NVIDIA GPU 0: GeForce GTX 460 (driver version 28562, CUDA version 4010, compute capability 2.1, 1024MB, 684 GFLOPS peak) and given BOINC reports this then it must know the relative speed of the GPUs. To my thinking clearly BOINC should be taking relative speed of the GPUs into account when calculating the DCF for a given WU. Further I suspect it could even work out the speed of each GPU relative to the CPU and thereby totally remove the need for flops entries in app_info.xml. Were a future release of BOINC to do this maybe some of the Luddites running old versions on BOINC would finally update! |
red-ray Send message Joined: 24 Jun 99 Posts: 308 Credit: 9,029,848 RAC: 0 |
Will this ever be fixed? Currently I get a lot of the following which trigger a load of high priority running. 19/03/2012 13:08:51 | SETI@home | [dcf] DCF: 0.691461->2.026834, raw_ratio 2.026834, adj_ratio 2.931233 |
red-ray Send message Joined: 24 Jun 99 Posts: 308 Credit: 9,029,848 RAC: 0 |
Far too often it jumps way too high and takes a far too long to recover. 20/03/2012 15:24:03 | SETI@home | [dcf] DCF: 0.591590->4.139177, raw_ratio 4.139177, adj_ratio 6.996701 |
John McLeod VII Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0 |
Far too often it jumps way too high and takes a far too long to recover. The DCF is designed to prevent far too much work from being downloaded to a host. It assumes that the estimates for each application from a project will be off in a similar manner. The fix is to have DCF be per application for CPU scheduling. This will not, however, work for work fetch as the work fetch is per project. BOINC WIKI |
red-ray Send message Joined: 24 Jun 99 Posts: 308 Credit: 9,029,848 RAC: 0 |
The DCF is designed to prevent far too much work from being downloaded to a host. It assumes that the estimates for each application from a project will be off in a similar manner. The fix is to have DCF be per application for CPU scheduling. This will not, however, work for work fetch as the work fetch is per project. I wonder do you understand what I am asking for? How can a DCF per application address GPUs of different speeds? As I said initially there needs to be per device DCF adjustment. Further the current code that allows the DCF to jump from 0.591590->4.139177 is inappropriate and needs fixing. It allows the DCF to instantly jump so high that the (adj_ratio < 0.1) applies when it should not and it then takes forever and a day for the DCF to return to what it should be. void PROJECT::update_duration_correction_factor(ACTIVE_TASK* atp) { RESULT* rp = atp->result; double raw_ratio = atp->elapsed_time/rp->estimated_duration_uncorrected(); double adj_ratio = atp->elapsed_time/rp->estimated_duration(); double old_dcf = duration_correction_factor; // it's OK to overestimate completion time, // but bad to underestimate it. // So make it easy for the factor to increase, // but decrease it with caution // if (adj_ratio > 1.1) { duration_correction_factor = raw_ratio; } else { // in particular, don't give much weight to results // that completed a lot earlier than expected // if (adj_ratio < 0.1) { duration_correction_factor = duration_correction_factor*0.99 + 0.01*raw_ratio; } else { duration_correction_factor = duration_correction_factor*0.9 + 0.1*raw_ratio; } } // limit to [.01 .. 100] // if (duration_correction_factor > 100) duration_correction_factor = 100; if (duration_correction_factor < 0.01) duration_correction_factor = 0.01; if (log_flags.dcf_debug) { msg_printf(this, MSG_INFO, "[dcf] DCF: %f->%f, raw_ratio %f, adj_ratio %f", old_dcf, duration_correction_factor, raw_ratio, adj_ratio ); } }[ |
John McLeod VII Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0 |
The DCF is designed to prevent far too much work from being downloaded to a host. It assumes that the estimates for each application from a project will be off in a similar manner. The fix is to have DCF be per application for CPU scheduling. This will not, however, work for work fetch as the work fetch is per project. Actually a DCF per device is not necessarily a requirement. After all, it is the difference between the actual and expected times, but the servers do not specify the time, they specify the FLoating Point OPerations count. So the speed of the processor is entered into the equation when calculating the original estimated time to compute. Then the actual time is divided by the original time to get a duration correction factor for that task. I believe that BOINC only maintains one speed for all GPUs and one speed for all CPUs. It is this number that needs to be replicated for each GPU type rather than the DCF. BOINC WIKI |
red-ray Send message Joined: 24 Jun 99 Posts: 308 Credit: 9,029,848 RAC: 0 |
I believe that BOINC only maintains one speed for all GPUs and one speed for all CPUs. It is this number that needs to be replicated for each GPU type rather than the DCF. Yes, that is what I meant by "there needs to be per device DCF adjustment". In my initial post I also said "clearly BOINC should be taking relative speed of the GPUs into account when calculating the DCF". When will the BOINC that does this or resolves my issue using some other regime be released? You have not commented on the current issues I have with the current DCF jumping way too high. When will that code be fixed or expunged? |
red-ray Send message Joined: 24 Jun 99 Posts: 308 Credit: 9,029,848 RAC: 0 |
Actually a DCF per device is not necessarily a requirement. After all, it is the difference between the actual and expected times, but the servers do not specify the time, they specify the FLoating Point OPerations count. So the speed of the processor is entered into the equation when calculating the original estimated time to compute. Then the actual time is divided by the original time to get a duration correction factor for that task. Actually I have never asked for a "a DCF per device". To me it has always been obvious this would not be a good solution to the issue I have. |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
When will the BOINC that does this be released? As far as I know, never, since CreditNew will take over the function of TDCF and then it all happens on the server. The server will maintain host_app_version.et, the statistics (mean and variance) of job runtimes (normalized by wu.fpops_est) per host and application version. Source, Job runtime estimates. |
red-ray Send message Joined: 24 Jun 99 Posts: 308 Credit: 9,029,848 RAC: 0 |
When will the BOINC that does this be released? Thank you for the link which I have just read. I can't see and explicit referance to how GPUs with different speeds are catered for though. Have I missed it? Which time will get displayed in the Remaining column for GPU tasks that are not running when a system has GPUs of different speeds? Which version of BOINC has this? |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
I answered before your edit, on the notion of "which BOINC will do a per device DCF adjustment". And that's that no BOINC will do that. Also not for per application. As far as I understand from David, DCF is going away and isn't in use in CreditNew and therefore not in use on projects that use CreditNew. Seti is one of the projects that uses CreditNew. |
red-ray Send message Joined: 24 Jun 99 Posts: 308 Credit: 9,029,848 RAC: 0 |
I answered before your edit, on the notion of "which BOINC will do a per device DCF adjustment". And that's that no BOINC will do that. Also not for per application. As far as I understand from David, DCF is going away and isn't in use in CreditNew and therefore not in use on projects that use CreditNew. Seti is one of the projects that uses CreditNew. Once I gathered DCF was going I made the edit to make the request general. The real issue now is will the new regime address GPUs of different speeds being in the same system? Thus far I can't find any information that says it will. Will the new regime address CPUs with HyperThreading Technology where the CPU speed depends on if there is one or two threads active on each Core? |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
The real issue now is will the new regime address GPUs of different speeds being in the same system? Thus far I can't find any information that says it will. These are things that you shouldn't ask in the Seti forums, as they're a BOINC thing. So best ask it at the BOINC development email list. This list requires registration. |
red-ray Send message Joined: 24 Jun 99 Posts: 308 Credit: 9,029,848 RAC: 0 |
The real issue now is will the new regime address GPUs of different speeds being in the same system? Thus far I can't find any information that says it will. It would be better to use a PM rather than posting on this thread, but as you have I have to reply here. I do not wish to join the BOINC development email list as I suspect I would get a large number of emails. Given this what other option do I have but to post the issue here? I feel there should be a "Developer" thread that requires approval before you are allowed to post on which these types of concerns could be raised. I feel approval is needed to keep the Signal to Noise ratio high. |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
I do not wish to join the BOINC development email list as I suspect I would get a large number of emails. Given this what other option do I have but to post the issue here? You could of course detach from that list after you've had your question(s) answered, but if you really do not feel like joining that list, you can always try to email David personally. Be nice and eloquent, though. Make sure to explain your problem in detail. I feel there should be a "Developer" thread that requires approval before you are allowed to post on which these types of concerns could be raised. I feel approval is needed to keep the Signal to Noise ratio high. But then you'd need such a thread on every of the 50+ projects and someone going by those projects on a daily basis to gather information. It's easier to have forums for that, which we do... but even then, the developers will only check in there when they're pointed out such and so thread and what's in it (by me mostly). They're too busy with all other things BOINC, non-BOINC and personal life to also go read and answer forums 3 times a day. The email lists enter directly into their email box, which is why I point that out first. This is where the other volunteer developers (such as John McLeod) will also be reading what you have to say/ask and answer if they know about the subject. |
dads Send message Joined: 14 Jan 12 Posts: 4 Credit: 158,397 RAC: 0 |
this is what gets my goat they send me 25 603 enhance non cuda and when thats done. Im left with 83 cuda and my quad core does nothing until i get most of the units done and they send me more work . i need more work for my cpu x 4 |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.