SETI@home v7 6.98 for NVIDIA CUDA 2.3, 3.2, and 4.2 released.

Author	Message
Raistmer Volunteer tester Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0	Message 44047 - Posted: 15 Oct 2012, 22:35:42 UTC Last modified: 15 Oct 2012, 22:36:22 UTC Well.... if one app running 2 instances at once (on the same device!) and another one running only 1 instance... what we get ? Elapsed time w/o correct renormalization will lead to absolutely nothing... ID: 44047 ·

Richard Haselgrove Volunteer tester Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0	Message 44048 - Posted: 15 Oct 2012, 23:03:32 UTC - in response to Message 44046. I went through the server logic and apparently the server always uses elapsed time to determine which app is fastest unless that information is not available, in which case it uses app version performance averaged across all hosts! If anyone else want to go to the BOINC developers list to ask why elapsed time rather than host performance is used in estimate_flops() in sched_version.cpp, feel free. I'm not feeling up to fighting that battle right now. Nor am I. But somebody needs to do it. The Moving Finger writes; and, having writ, â€ƒMoves on: nor all thy Piety nor Wit, Shall lure it back to cancel half a Line, â€ƒNor all thy Tears wash out a Word of it. Omar KhayyÃ¡m Somehow, I feel I may have posted that quote as commentary on David's coding style before. It's about 30 months since CreditNew and the associated runtime estimation and scheduler changes were first deployed here at SETI Beta as - indeed - a Beta test. But so far as I can tell, he's never been back to evaluate whether theory translates well into practice. ID: 44048 ·

Josef W. Segur Volunteer tester Send message Joined: 14 Oct 05 Posts: 1137 Credit: 1,848,733 RAC: 0	Message 44049 - Posted: 16 Oct 2012, 4:09:42 UTC - in response to Message 44046. I went through the server logic and apparently the server always uses elapsed time to determine which app is fastest unless that information is not available, in which case it uses app version performance averaged across all hosts! If anyone else want to go to the BOINC developers list to ask why elapsed time rather than host performance is used in estimate_flops() in sched_version.cpp, feel free. I'm not feeling up to fighting that battle right now. I fear you've been misled by David's naming. The et average is not a time, it's a rate; seconds per fpop. That is, inputs to the average are elapsed time divided by the rsc_fpops_est produced by the splitter. That et average is the best basis for choosing among app versions to do a specific task. It is of course inverted and scaled by 1e-9 to form the APR displayed to users. Note to Sten-Arne: I agree the ~5% difference between the CUDA23 and CUDA32 APRs on your host is probably significant, but only because the work delivery here has been a continuous stream of tasks with nearly identical AR. With the kind of variability seen at the main project, IMO far more than a 5% difference would be needed to make a sensible judgement this early. Joe ID: 44049 ·

Richard Haselgrove Volunteer tester Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0	Message 44050 - Posted: 16 Oct 2012, 9:24:14 UTC - in response to Message 44049. ... elapsed time divided by the rsc_fpops_est produced by the splitter. Ah. And if I'm not mistaken, that rsc_fpops_est doesn't yet take account of the extra time consumed by autocorrelations - which, especially for the GPU apps, isn't linear with AR. That will introduce a distortion if, by chance, one version happens to get a block of tasks from a tape with different AR characteristics. It'll all come out in the wash in the end, of course, but it'll slow down the settling process. ID: 44050 ·

Fred J. Verster Volunteer tester Send message Joined: 3 May 10 Posts: 88 Credit: 1,594,385 RAC: 0	Message 44053 - Posted: 16 Oct 2012, 12:44:35 UTC - in response to Message 44050. This host, is doing CUDA 3.2 MB tasks. Which driver should be used for CUDA 4.2 as I noticed on BM? ID: 44053 ·

Richard Haselgrove Volunteer tester Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0	Message 44054 - Posted: 16 Oct 2012, 13:49:27 UTC - in response to Message 44053. This host, is doing CUDA 3.2 MB tasks. Which driver should be used for CUDA 4.2 as I noticed on BM? Minimum WHQL driver version (for a desktop GTX 470) is 301.42 Later drivers are available, but you probably don't want to test those at the same time as testing the application. ID: 44054 ·

Grumpy Swede Volunteer tester Send message Joined: 10 Mar 12 Posts: 1700 Credit: 13,216,373 RAC: 0	Message 44055 - Posted: 16 Oct 2012, 22:38:01 UTC - in response to Message 44049. I went through the server logic and apparently the server always uses elapsed time to determine which app is fastest unless that information is not available, in which case it uses app version performance averaged across all hosts! If anyone else want to go to the BOINC developers list to ask why elapsed time rather than host performance is used in estimate_flops() in sched_version.cpp, feel free. I'm not feeling up to fighting that battle right now. Note to Sten-Arne: I agree the ~5% difference between the CUDA23 and CUDA32 APRs on your host is probably significant, but only because the work delivery here has been a continuous stream of tasks with nearly identical AR. With the kind of variability seen at the main project, IMO far more than a 5% difference would be needed to make a sensible judgement this early. Joe Well, as long as the logic of the system recognize that I should not get the slowest app (Cuda22), I really don't mind if it sends me Cuda23 or Cuda32, since those two are almost equally fast. ID: 44055 ·

Grumpy Swede Volunteer tester Send message Joined: 10 Mar 12 Posts: 1700 Credit: 13,216,373 RAC: 0	Message 44056 - Posted: 16 Oct 2012, 22:42:51 UTC Last modified: 16 Oct 2012, 22:44:53 UTC Wonderful!!! First request after the outage, and I get a bunch of 6.98 Seti@home v7 ati_opencl_sah for my ATI HD4850. It was just 2 days ago since I asked for V7 OpenCL tasks for my HD4850. ID: 44056 ·

TRuEQ & TuVaLu Volunteer tester Send message Joined: 28 Jan 11 Posts: 619 Credit: 2,580,051 RAC: 0	Message 44057 - Posted: 16 Oct 2012, 22:46:49 UTC - in response to Message 44054. Last modified: 16 Oct 2012, 22:47:00 UTC This host, is doing CUDA 3.2 MB tasks. Which driver should be used for CUDA 4.2 as I noticed on BM? Minimum WHQL driver version (for a desktop GTX 470) is 301.42 Later drivers are available, but you probably don't want to test those at the same time as testing the application. I use 296.10 for my GTS 250. Is there a faster driver for GTS 250??? ID: 44057 ·

Eric J Korpela Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 15 Mar 05 Posts: 1547 Credit: 27,183,456 RAC: 0	Message 44059 - Posted: 16 Oct 2012, 23:22:24 UTC - in response to Message 44057. I started a thread for the new ATI version. Please report problems there. ID: 44059 ·

Josef W. Segur Volunteer tester Send message Joined: 14 Oct 05 Posts: 1137 Credit: 1,848,733 RAC: 0	Message 44061 - Posted: 16 Oct 2012, 23:24:00 UTC - in response to Message 44050. ... elapsed time divided by the rsc_fpops_est produced by the splitter. Ah. And if I'm not mistaken, that rsc_fpops_est doesn't yet take account of the extra time consumed by autocorrelations - which, especially for the GPU apps, isn't linear with AR. That will introduce a distortion if, by chance, one version happens to get a block of tasks from a tape with different AR characteristics. Yes, there are 519336 Autocorr searches in any task which runs to completion, so no dependence on AR at all. The rsc_fpops_est adjustment should be a constant, the only issue is figuring out how large. For CPU processing the run time of VHAR tasks is increased by about 20%, for GPU processing it may be 100% or more. Probably even more significant, the existing rsc_fpops_est values are based on smoothed curves from average CPU performance several years ago, and even then an individual host might deviate by a 2:1 factor from the estimate for any particular AR. The fit is even worse for GPU processing, of course. Joe ID: 44061 ·

Raistmer Volunteer tester Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0	Message 44062 - Posted: 16 Oct 2012, 23:35:42 UTC - in response to Message 44056. Wonderful!!! First request after the outage, and I get a bunch of 6.98 Seti@home v7 ati_opencl_sah for my ATI HD4850. It was just 2 days ago since I asked for V7 OpenCL tasks for my HD4850. LoL, we worked really hard to make you happy ;D :D ID: 44062 ·

Eric J Korpela Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 15 Mar 05 Posts: 1547 Credit: 27,183,456 RAC: 0	Message 44074 - Posted: 17 Oct 2012, 1:19:39 UTC - in response to Message 44061. Getting that adjustment to the flops estimate into the splitter in on my agenda for this week. ID: 44074 ·

Josef W. Segur Volunteer tester Send message Joined: 14 Oct 05 Posts: 1137 Credit: 1,848,733 RAC: 0	Message 44078 - Posted: 17 Oct 2012, 5:02:25 UTC - in response to Message 44074. Getting that adjustment to the flops estimate into the splitter in on my agenda for this week. While doing that, I suggest also doubling the estimate for AR <= beam width. Both the CUDA x41z and OpenCL builds handle VLAR work much more gracefully than the original 6.08 thru 6.10 CUDA builds, but there is still a speed impact from limited parallelism at low ARs. The change would be preparation for a later test by splitting some VLARs and sending them to all app_versions. The doubling approximates what's needed as a compromise so runtime estimates won't be terrible for either CPU or GPU. One factor which leads me to this suggestion is that the original observation plan for the Kepler field at GBT was to observe selected targets for about half the available time, then go on to scanning across the field. My guess from what I could read between the lines is that the targetted observations took longer than planned. That suggests that more than half that data would produce VLAR tasks, and I think that those doing GPU crunching would be dismayed if they weren't able to participate in processing it. In any case, it seems like a useful kind of Beta testing to do. Joe ID: 44078 ·

Raistmer Volunteer tester Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0	Message 44080 - Posted: 17 Oct 2012, 6:24:48 UTC Last modified: 17 Oct 2012, 6:25:14 UTC CUDA app still has no found signal printing into stderr. It's much more important thing than to add or to remove consumed GPU memory lines inside stderr... ID: 44080 ·

Eric J Korpela Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 15 Mar 05 Posts: 1547 Credit: 27,183,456 RAC: 0	Message 44126 - Posted: 17 Oct 2012, 23:54:49 UTC - in response to Message 44080. Last modified: 17 Oct 2012, 23:54:59 UTC One workunit, one app version, three different answers... http://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=4130183 ID: 44126 ·

Raistmer Volunteer tester Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0	Message 44142 - Posted: 18 Oct 2012, 13:46:54 UTC - in response to Message 44126. One workunit, one app version, three different answers... http://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=4130183 Fourth will be from NV too. Maybe worth to do offline rerun with CPU as ref. ID: 44142 ·

Eric J Korpela Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 15 Mar 05 Posts: 1547 Credit: 27,183,456 RAC: 0	Message 44144 - Posted: 18 Oct 2012, 15:50:24 UTC - in response to Message 44142. Yeah, I've got it running on my linux desktop in standalone. ID: 44144 ·

Alex Storey Volunteer tester Send message Joined: 10 Feb 12 Posts: 107 Credit: 305,151 RAC: 0	Message 44250 - Posted: 25 Oct 2012, 8:43:09 UTC This task crashed and burned when I manually suspended it. ID: 44250 ·

Claggy Volunteer tester Send message Joined: 29 May 06 Posts: 1037 Credit: 8,440,339 RAC: 0	Message 44251 - Posted: 25 Oct 2012, 9:01:14 UTC - in response to Message 44250. This task crashed and burned when I manually suspended it. Known issue, it's been fixed in the forthcoming Cuda22 x41zb app, Claggy ID: 44251 ·

©2025 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.