Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /disks/centurion/b/carolyn/b/home/boincadm/projects/beta/html/inc/boinc_db.inc on line 147
SETI@home v7 6.98 for NVIDIA CUDA 2.3, 3.2, and 4.2 released.

SETI@home v7 6.98 for NVIDIA CUDA 2.3, 3.2, and 4.2 released.

Message boards : News : SETI@home v7 6.98 for NVIDIA CUDA 2.3, 3.2, and 4.2 released.
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · Next

AuthorMessage
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 44047 - Posted: 15 Oct 2012, 22:35:42 UTC
Last modified: 15 Oct 2012, 22:36:22 UTC

Well.... if one app running 2 instances at once (on the same device!) and another one running only 1 instance... what we get ?
Elapsed time w/o correct renormalization will lead to absolutely nothing...
ID: 44047 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,272,268
RAC: 0
United Kingdom
Message 44048 - Posted: 15 Oct 2012, 23:03:32 UTC - in response to Message 44046.  

I went through the server logic and apparently the server always uses elapsed time to determine which app is fastest unless that information is not available, in which case it uses app version performance averaged across all hosts!

If anyone else want to go to the BOINC developers list to ask why elapsed time rather than host performance is used in estimate_flops() in sched_version.cpp, feel free. I'm not feeling up to fighting that battle right now.

Nor am I. But somebody needs to do it.

The Moving Finger writes; and, having writ,
 Moves on: nor all thy Piety nor Wit,
Shall lure it back to cancel half a Line,
 Nor all thy Tears wash out a Word of it.

Omar Khayyám

Somehow, I feel I may have posted that quote as commentary on David's coding style before. It's about 30 months since CreditNew and the associated runtime estimation and scheduler changes were first deployed here at SETI Beta as - indeed - a Beta test. But so far as I can tell, he's never been back to evaluate whether theory translates well into practice.
ID: 44048 · Report as offensive
Josef W. Segur
Volunteer tester

Send message
Joined: 14 Oct 05
Posts: 1137
Credit: 1,848,733
RAC: 0
United States
Message 44049 - Posted: 16 Oct 2012, 4:09:42 UTC - in response to Message 44046.  

I went through the server logic and apparently the server always uses elapsed time to determine which app is fastest unless that information is not available, in which case it uses app version performance averaged across all hosts!

If anyone else want to go to the BOINC developers list to ask why elapsed time rather than host performance is used in estimate_flops() in sched_version.cpp, feel free. I'm not feeling up to fighting that battle right now.

I fear you've been misled by David's naming. The et average is not a time, it's a rate; seconds per fpop. That is, inputs to the average are elapsed time divided by the rsc_fpops_est produced by the splitter. That et average is the best basis for choosing among app versions to do a specific task. It is of course inverted and scaled by 1e-9 to form the APR displayed to users.

Note to Sten-Arne: I agree the ~5% difference between the CUDA23 and CUDA32 APRs on your host is probably significant, but only because the work delivery here has been a continuous stream of tasks with nearly identical AR. With the kind of variability seen at the main project, IMO far more than a 5% difference would be needed to make a sensible judgement this early.
                                                                  Joe
ID: 44049 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,272,268
RAC: 0
United Kingdom
Message 44050 - Posted: 16 Oct 2012, 9:24:14 UTC - in response to Message 44049.  

... elapsed time divided by the rsc_fpops_est produced by the splitter.

Ah. And if I'm not mistaken, that rsc_fpops_est doesn't yet take account of the extra time consumed by autocorrelations - which, especially for the GPU apps, isn't linear with AR. That will introduce a distortion if, by chance, one version happens to get a block of tasks from a tape with different AR characteristics.

It'll all come out in the wash in the end, of course, but it'll slow down the settling process.
ID: 44050 · Report as offensive
Fred J. Verster
Volunteer tester

Send message
Joined: 3 May 10
Posts: 88
Credit: 1,594,385
RAC: 0
Netherlands
Message 44053 - Posted: 16 Oct 2012, 12:44:35 UTC - in response to Message 44050.  

This
host,
is doing CUDA 3.2 MB tasks.

Which driver should be used for CUDA 4.2 as I noticed on BM?

ID: 44053 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,272,268
RAC: 0
United Kingdom
Message 44054 - Posted: 16 Oct 2012, 13:49:27 UTC - in response to Message 44053.  

This
host,
is doing CUDA 3.2 MB tasks.

Which driver should be used for CUDA 4.2 as I noticed on BM?

Minimum WHQL driver version (for a desktop GTX 470) is 301.42

Later drivers are available, but you probably don't want to test those at the same time as testing the application.
ID: 44054 · Report as offensive
Grumpy Swede
Volunteer tester
Avatar

Send message
Joined: 10 Mar 12
Posts: 1700
Credit: 13,216,373
RAC: 0
Sweden
Message 44055 - Posted: 16 Oct 2012, 22:38:01 UTC - in response to Message 44049.  

I went through the server logic and apparently the server always uses elapsed time to determine which app is fastest unless that information is not available, in which case it uses app version performance averaged across all hosts!

If anyone else want to go to the BOINC developers list to ask why elapsed time rather than host performance is used in estimate_flops() in sched_version.cpp, feel free. I'm not feeling up to fighting that battle right now.


Note to Sten-Arne: I agree the ~5% difference between the CUDA23 and CUDA32 APRs on your host is probably significant, but only because the work delivery here has been a continuous stream of tasks with nearly identical AR. With the kind of variability seen at the main project, IMO far more than a 5% difference would be needed to make a sensible judgement this early.
                                                                  Joe


Well, as long as the logic of the system recognize that I should not get the slowest app (Cuda22), I really don't mind if it sends me Cuda23 or Cuda32, since those two are almost equally fast.
ID: 44055 · Report as offensive
Grumpy Swede
Volunteer tester
Avatar

Send message
Joined: 10 Mar 12
Posts: 1700
Credit: 13,216,373
RAC: 0
Sweden
Message 44056 - Posted: 16 Oct 2012, 22:42:51 UTC
Last modified: 16 Oct 2012, 22:44:53 UTC

Wonderful!!!

First request after the outage, and I get a bunch of 6.98 Seti@home v7 ati_opencl_sah for my ATI HD4850. It was just 2 days ago since I asked for V7 OpenCL tasks for my HD4850.
ID: 44056 · Report as offensive
TRuEQ & TuVaLu
Volunteer tester
Avatar

Send message
Joined: 28 Jan 11
Posts: 619
Credit: 2,580,051
RAC: 0
Sweden
Message 44057 - Posted: 16 Oct 2012, 22:46:49 UTC - in response to Message 44054.  
Last modified: 16 Oct 2012, 22:47:00 UTC

This
host,
is doing CUDA 3.2 MB tasks.

Which driver should be used for CUDA 4.2 as I noticed on BM?

Minimum WHQL driver version (for a desktop GTX 470) is 301.42

Later drivers are available, but you probably don't want to test those at the same time as testing the application.


I use 296.10 for my GTS 250.
Is there a faster driver for GTS 250???
ID: 44057 · Report as offensive
Profile Eric J Korpela
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 15 Mar 05
Posts: 1547
Credit: 27,183,456
RAC: 0
United States
Message 44059 - Posted: 16 Oct 2012, 23:22:24 UTC - in response to Message 44057.  

I started a thread for the new ATI version. Please report problems there.
ID: 44059 · Report as offensive
Josef W. Segur
Volunteer tester

Send message
Joined: 14 Oct 05
Posts: 1137
Credit: 1,848,733
RAC: 0
United States
Message 44061 - Posted: 16 Oct 2012, 23:24:00 UTC - in response to Message 44050.  

... elapsed time divided by the rsc_fpops_est produced by the splitter.

Ah. And if I'm not mistaken, that rsc_fpops_est doesn't yet take account of the extra time consumed by autocorrelations - which, especially for the GPU apps, isn't linear with AR. That will introduce a distortion if, by chance, one version happens to get a block of tasks from a tape with different AR characteristics.

Yes, there are 519336 Autocorr searches in any task which runs to completion, so no dependence on AR at all. The rsc_fpops_est adjustment should be a constant, the only issue is figuring out how large. For CPU processing the run time of VHAR tasks is increased by about 20%, for GPU processing it may be 100% or more.

Probably even more significant, the existing rsc_fpops_est values are based on smoothed curves from average CPU performance several years ago, and even then an individual host might deviate by a 2:1 factor from the estimate for any particular AR. The fit is even worse for GPU processing, of course.
                                                                   Joe
ID: 44061 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 44062 - Posted: 16 Oct 2012, 23:35:42 UTC - in response to Message 44056.  

Wonderful!!!

First request after the outage, and I get a bunch of 6.98 Seti@home v7 ati_opencl_sah for my ATI HD4850. It was just 2 days ago since I asked for V7 OpenCL tasks for my HD4850.


LoL, we worked really hard to make you happy ;D :D
ID: 44062 · Report as offensive
Profile Eric J Korpela
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 15 Mar 05
Posts: 1547
Credit: 27,183,456
RAC: 0
United States
Message 44074 - Posted: 17 Oct 2012, 1:19:39 UTC - in response to Message 44061.  

Getting that adjustment to the flops estimate into the splitter in on my agenda for this week.
ID: 44074 · Report as offensive
Josef W. Segur
Volunteer tester

Send message
Joined: 14 Oct 05
Posts: 1137
Credit: 1,848,733
RAC: 0
United States
Message 44078 - Posted: 17 Oct 2012, 5:02:25 UTC - in response to Message 44074.  

Getting that adjustment to the flops estimate into the splitter in on my agenda for this week.

While doing that, I suggest also doubling the estimate for AR <= beam width. Both the CUDA x41z and OpenCL builds handle VLAR work much more gracefully than the original 6.08 thru 6.10 CUDA builds, but there is still a speed impact from limited parallelism at low ARs. The change would be preparation for a later test by splitting some VLARs and sending them to all app_versions. The doubling approximates what's needed as a compromise so runtime estimates won't be terrible for either CPU or GPU.

One factor which leads me to this suggestion is that the original observation plan for the Kepler field at GBT was to observe selected targets for about half the available time, then go on to scanning across the field. My guess from what I could read between the lines is that the targetted observations took longer than planned. That suggests that more than half that data would produce VLAR tasks, and I think that those doing GPU crunching would be dismayed if they weren't able to participate in processing it.

In any case, it seems like a useful kind of Beta testing to do.
                                                                  Joe
ID: 44078 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 44080 - Posted: 17 Oct 2012, 6:24:48 UTC
Last modified: 17 Oct 2012, 6:25:14 UTC

CUDA app still has no found signal printing into stderr.
It's much more important thing than to add or to remove consumed GPU memory lines inside stderr...
ID: 44080 · Report as offensive
Profile Eric J Korpela
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 15 Mar 05
Posts: 1547
Credit: 27,183,456
RAC: 0
United States
Message 44126 - Posted: 17 Oct 2012, 23:54:49 UTC - in response to Message 44080.  
Last modified: 17 Oct 2012, 23:54:59 UTC

One workunit, one app version, three different answers...

http://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=4130183
ID: 44126 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 44142 - Posted: 18 Oct 2012, 13:46:54 UTC - in response to Message 44126.  

One workunit, one app version, three different answers...

http://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=4130183


Fourth will be from NV too. Maybe worth to do offline rerun with CPU as ref.
ID: 44142 · Report as offensive
Profile Eric J Korpela
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 15 Mar 05
Posts: 1547
Credit: 27,183,456
RAC: 0
United States
Message 44144 - Posted: 18 Oct 2012, 15:50:24 UTC - in response to Message 44142.  

Yeah, I've got it running on my linux desktop in standalone.

ID: 44144 · Report as offensive
Alex Storey
Volunteer tester
Avatar

Send message
Joined: 10 Feb 12
Posts: 107
Credit: 305,151
RAC: 0
Greece
Message 44250 - Posted: 25 Oct 2012, 8:43:09 UTC

This task crashed and burned when I manually suspended it.
ID: 44250 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 29 May 06
Posts: 1037
Credit: 8,440,339
RAC: 0
United Kingdom
Message 44251 - Posted: 25 Oct 2012, 9:01:14 UTC - in response to Message 44250.  

This task crashed and burned when I manually suspended it.

Known issue, it's been fixed in the forthcoming Cuda22 x41zb app,

Claggy
ID: 44251 · Report as offensive
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · Next

Message boards : News : SETI@home v7 6.98 for NVIDIA CUDA 2.3, 3.2, and 4.2 released.


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.