Message boards :
News :
SETI@home v7 6.98 for NVIDIA CUDA 2.3, 3.2, and 4.2 released.
Message board moderation
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · Next
Author | Message |
---|---|
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
Well.... if one app running 2 instances at once (on the same device!) and another one running only 1 instance... what we get ? Elapsed time w/o correct renormalization will lead to absolutely nothing... |
Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0 ![]() |
I went through the server logic and apparently the server always uses elapsed time to determine which app is fastest unless that information is not available, in which case it uses app version performance averaged across all hosts! Nor am I. But somebody needs to do it. The Moving Finger writes; and, having writ, Somehow, I feel I may have posted that quote as commentary on David's coding style before. It's about 30 months since CreditNew and the associated runtime estimation and scheduler changes were first deployed here at SETI Beta as - indeed - a Beta test. But so far as I can tell, he's never been back to evaluate whether theory translates well into practice. |
Send message Joined: 14 Oct 05 Posts: 1137 Credit: 1,848,733 RAC: 0 ![]() |
I went through the server logic and apparently the server always uses elapsed time to determine which app is fastest unless that information is not available, in which case it uses app version performance averaged across all hosts! I fear you've been misled by David's naming. The et average is not a time, it's a rate; seconds per fpop. That is, inputs to the average are elapsed time divided by the rsc_fpops_est produced by the splitter. That et average is the best basis for choosing among app versions to do a specific task. It is of course inverted and scaled by 1e-9 to form the APR displayed to users. Note to Sten-Arne: I agree the ~5% difference between the CUDA23 and CUDA32 APRs on your host is probably significant, but only because the work delivery here has been a continuous stream of tasks with nearly identical AR. With the kind of variability seen at the main project, IMO far more than a 5% difference would be needed to make a sensible judgement this early. Joe |
Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0 ![]() |
... elapsed time divided by the rsc_fpops_est produced by the splitter. Ah. And if I'm not mistaken, that rsc_fpops_est doesn't yet take account of the extra time consumed by autocorrelations - which, especially for the GPU apps, isn't linear with AR. That will introduce a distortion if, by chance, one version happens to get a block of tasks from a tape with different AR characteristics. It'll all come out in the wash in the end, of course, but it'll slow down the settling process. |
Send message Joined: 3 May 10 Posts: 88 Credit: 1,594,385 RAC: 0 ![]() |
|
Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0 ![]() |
This Minimum WHQL driver version (for a desktop GTX 470) is 301.42 Later drivers are available, but you probably don't want to test those at the same time as testing the application. |
![]() Send message Joined: 10 Mar 12 Posts: 1700 Credit: 13,216,373 RAC: 0 ![]() |
I went through the server logic and apparently the server always uses elapsed time to determine which app is fastest unless that information is not available, in which case it uses app version performance averaged across all hosts! Well, as long as the logic of the system recognize that I should not get the slowest app (Cuda22), I really don't mind if it sends me Cuda23 or Cuda32, since those two are almost equally fast. |
![]() Send message Joined: 10 Mar 12 Posts: 1700 Credit: 13,216,373 RAC: 0 ![]() |
Wonderful!!! First request after the outage, and I get a bunch of 6.98 Seti@home v7 ati_opencl_sah for my ATI HD4850. It was just 2 days ago since I asked for V7 OpenCL tasks for my HD4850. |
![]() Send message Joined: 28 Jan 11 Posts: 619 Credit: 2,580,051 RAC: 0 ![]() |
This I use 296.10 for my GTS 250. Is there a faster driver for GTS 250??? |
![]() Send message Joined: 15 Mar 05 Posts: 1547 Credit: 27,183,456 RAC: 0 ![]() |
|
Send message Joined: 14 Oct 05 Posts: 1137 Credit: 1,848,733 RAC: 0 ![]() |
... elapsed time divided by the rsc_fpops_est produced by the splitter. Yes, there are 519336 Autocorr searches in any task which runs to completion, so no dependence on AR at all. The rsc_fpops_est adjustment should be a constant, the only issue is figuring out how large. For CPU processing the run time of VHAR tasks is increased by about 20%, for GPU processing it may be 100% or more. Probably even more significant, the existing rsc_fpops_est values are based on smoothed curves from average CPU performance several years ago, and even then an individual host might deviate by a 2:1 factor from the estimate for any particular AR. The fit is even worse for GPU processing, of course. Joe |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
Wonderful!!! LoL, we worked really hard to make you happy ;D :D |
![]() Send message Joined: 15 Mar 05 Posts: 1547 Credit: 27,183,456 RAC: 0 ![]() |
Getting that adjustment to the flops estimate into the splitter in on my agenda for this week. ![]() |
Send message Joined: 14 Oct 05 Posts: 1137 Credit: 1,848,733 RAC: 0 ![]() |
Getting that adjustment to the flops estimate into the splitter in on my agenda for this week. While doing that, I suggest also doubling the estimate for AR <= beam width. Both the CUDA x41z and OpenCL builds handle VLAR work much more gracefully than the original 6.08 thru 6.10 CUDA builds, but there is still a speed impact from limited parallelism at low ARs. The change would be preparation for a later test by splitting some VLARs and sending them to all app_versions. The doubling approximates what's needed as a compromise so runtime estimates won't be terrible for either CPU or GPU. One factor which leads me to this suggestion is that the original observation plan for the Kepler field at GBT was to observe selected targets for about half the available time, then go on to scanning across the field. My guess from what I could read between the lines is that the targetted observations took longer than planned. That suggests that more than half that data would produce VLAR tasks, and I think that those doing GPU crunching would be dismayed if they weren't able to participate in processing it. In any case, it seems like a useful kind of Beta testing to do. Joe |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
CUDA app still has no found signal printing into stderr. It's much more important thing than to add or to remove consumed GPU memory lines inside stderr... |
![]() Send message Joined: 15 Mar 05 Posts: 1547 Credit: 27,183,456 RAC: 0 ![]() |
One workunit, one app version, three different answers... http://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=4130183 ![]() |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
One workunit, one app version, three different answers... Fourth will be from NV too. Maybe worth to do offline rerun with CPU as ref. |
![]() Send message Joined: 15 Mar 05 Posts: 1547 Credit: 27,183,456 RAC: 0 ![]() |
Yeah, I've got it running on my linux desktop in standalone. ![]() |
![]() Send message Joined: 10 Feb 12 Posts: 107 Credit: 305,151 RAC: 0 ![]() |
This task crashed and burned when I manually suspended it. |
Send message Joined: 29 May 06 Posts: 1037 Credit: 8,440,339 RAC: 0 ![]() |
This task crashed and burned when I manually suspended it. Known issue, it's been fixed in the forthcoming Cuda22 x41zb app, Claggy |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.