Message boards :
News :
Tests of new scheduler features.
Message board moderation
Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 . . . 17 · Next
Author | Message |
---|---|
Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0 ![]() |
... that's one of the first to complete under '4 VLARs at once'. 5,286 seconds - almost an hour and a half - works out at an equivalent APR just below 35, compared to ~150 - 160 for the general 'non-VLAR' mix of work. Yes, exactly - 2 cards (identical - NB factory overclock), 2 tasks per card, four tasks in total, each task takes ~90 minutes. So she would spit out one task every 22 minutes, give or take. CPU is overclocked i7-3770K, hyperthreaded - running six threads of BOINC (non-SETI) tasks, the balance of CPU power available to support the GPUs. The tasks had a staggered start as the supply of non-VLAR tasks ran out. I remember the original CUDA apps ran with horrible lag at the beginning of each task, and again for a short segment towards the end (was it around the 75% mark? I've forgotten the details), but better at other stages. So another test would be to send off all four with a synchronised start, but I'm not sure I want to go there... |
Send message Joined: 11 Dec 08 Posts: 198 Credit: 658,573 RAC: 0 ![]() |
Yes, exactly - 2 cards (identical - NB factory overclock), 2 tasks per card, four tasks in total, each task takes ~90 minutes. So she would spit out one task every 22 minutes, give or take. CPU is overclocked i7-3770K, hyperthreaded - running six threads of BOINC (non-SETI) tasks, the balance of CPU power available to support the GPUs. Yep, alright, matching up with my vague recollections. Yeah taking a while to get a handle on the characteristics here, and the old Cuda apps were too long ago :). I've been running at defaults, but will jack up the settings now in the hopes for a clear Cuda50 performance (still at single instance). Running modded Boinc here I shouldn't run into problems from excessive runtime aborts. I hope the APR swing stays around or preferably under a factor of 5. If that looks OK I'll upgrade Boinc for a multiple test run. |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
these lags corresponded PulseFind processing of max length and FFT size of 8. That is, max work per thread, GPU heavely underloaded 9cause too little independent threads for GPU to process). Indeed such PulseFind config will occue at very beginning of task + perhaps somewhere near end. |
Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0 ![]() |
these lags corresponded PulseFind processing of max length and FFT size of 8. That is, max work per thread, GPU heavely underloaded 9cause too little independent threads for GPU to process). Indeed such PulseFind config will occue at very beginning of task + perhaps somewhere near end. Would the relative positioning of those pulsefind kernels have been changed with the addition of Autocorrelations to the workload? Just asking, so I know when and where to look if I ever try that 'synchronised start' test. Even with powerful GPUs, I didn't see much sign of underutilisation: ![]() (direct link) The GPU usage traces for the two cards - lines 5 and 6 - show relatively high, though variable, usage throughout the display - left-to-right is about 25 minutes, or over 25% of the WUs. Even with one task just recently started, and the other just over 80% done, I see GPU usage mostly in the high 80%s, only occasionally spiking down into the teen%s - and they really are narrow spikes. But very different from the steady 98% - 99% loading of the single GPUGrid task on the other GPU currently. |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
no change in relative positioning. Longest PulseFind occurs before first autocorr. |
Send message Joined: 11 Dec 08 Posts: 198 Credit: 658,573 RAC: 0 ![]() |
these lags corresponded PulseFind processing of max length and FFT size of 8. That is, max work per thread, GPU heavely underloaded 9cause too little independent threads for GPU to process). Indeed such PulseFind config will occue at very beginning of task + perhaps somewhere near end. For detail, While my autocorrelations are still using the VRAM & bus bound 4NFFT approach, a crude baseline implementation, loading with a single task won't fill larger Units. It can 'look like' it is because of the way utilisation is measured (on the first SMX by duration % in the sample period.] This will [tend to] look like flat troughs where there are ACs in progress & peaks elsewhere. For the remainder of processing, they fill the GPU fairly well, though there are hidden latencies involved. optionally Jacking up the priority & pulsefind settings from defaults smooths these. [Edit:]With agressive settings like mine, you should begin to see hints of long pulsefind related lag at the familiar points: [ & narrow dips disappear... ] [mbcuda] |
Send message Joined: 14 Oct 05 Posts: 1137 Credit: 1,848,733 RAC: 0 ![]() |
OK, now we have VLARs - I've got about 80 of them. Let the fun begin. There's some related information from host 5619046 at the main project. That dual 560 Ti system did a lot of setiathome_enhanced VLARs on GPU with x41zc, Cuda 4.2. The user posted several times about it in the "Please rise the limits... just a little..." thread March 13, starting with message 1345975. I looked through its task list at the time and judged for that configuration VLARs took about 6 times as long as midrange ARs with similar estimates. The host only has AP v6 in progress now, probably a reaction to the 100 limit. Its last remaing VLAR on GPU was validated earlier today, but here's the timing comparison for that VLAR and 4 midrange AR tasks: Task name Run time CPU time 28my12ab.11019.110604.3.11.149.vlar_0 5,689.26 53.66 03mr13ad.23566.11110.6.11.140_0 869.53 47.56 03mr13ad.23566.11110.6.11.122_0 866.73 44.10 02mr13ad.4401.7850.16.11.39_0 897.47 48.22 02mr13ad.4401.7850.16.11.33_1 899.64 50.01 Joe |
Send message Joined: 11 Dec 08 Posts: 198 Credit: 658,573 RAC: 0 ![]() |
Hmmm, yeah 560ti (compute cap 2.1) , like on the other PC here, probably sits either square on, or below, performance levels where significant problems with APR etc might occur. It's a tough call. Target market for this card was the newly created 'midrange enthusiast' or 'performance - price' bracket if you like. This suggests to me pushing VLAR to these could go badly, because of the target market expectations, and the geometry being more or less maxxed out for the memory subsystem. Chances are there could be initial backlash with only Kepler Class receiving VLARs, at the more palatable 4x elapsed. That's all prior to adjustable multithreading the long pulsefinds in x42, ala paralleled V13 experiment, after serialisation of the result reductions. I would describe Fermi-class utilisation as solid. That'd make not much room to move before hybridisation & higher level algorithmic changes, while Keplers still do have some 'legs' yet, before major developments. Jason |
Send message Joined: 14 Oct 05 Posts: 1137 Credit: 1,848,733 RAC: 0 ![]() |
Hmmm, yeah 560ti (compute cap 2.1) , like on the other PC here, probably sits either square on, or below, performance levels where significant problems with APR etc might occur. It's a tough call. Autocorrelation processing would somewhat reduce the ratio on that dual 560 Ti host, but I agree the mismatch between estimates and actual performance is a problem area. What might be best overall is an additional preference so those crunching with GPUs could opt in or out of doing VLARs. I'm very much in favor of giving users control of their own systems. Joe |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
I agree the mismatch between estimates and actual performance is a problem area. What might be best overall is an additional preference so those crunching with GPUs could opt in or out of doing VLARs. I'm very much in favor of giving users control of their own systems.Joe "Must have" option. Or we would see mass VLAR abortions on main. Even with such option we will see some of them just because some peoples unaware that there are other ways. CPU AstroPulse experience confirms this... |
Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0 ![]() |
Got my first Kepler-Kepler validation - WU 5374993 The wingmate is interesting - host 62827 - dual Titan. He has a number of EXIT_TIME_LIMIT_EXCEEDED for VLAR, but also a validation with cuda32 in not much longer that his cuda42 tasks. He seems to be getting very few cuda50, I note. Edit - perhaps I should add that the VLARs on the system I'm watching arrived with an estimate of 18 minutes 24 seconds - so the rsc_fpops_bound time limit would be over three hours. None of the tasks has yet exceeded an hour and a half, so I'm well inside the safety zone, and should remain so unless I have a GPU downclock. |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
Two more tasks wasted for nothing... and cause it's Brook AP it costed much more than aborted MB task... And back to BOINC force abortions theme. Is there any chance that after update project I will get estimations increase for already downloaded tasks? Or the only way to make it work is to abort manually all already downloaded CAL tasks and hope that newly downloaded ones will have bigger estimations ? |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
And another possible issue: SETI@home v7 7.00 windows_intelx86 (cuda23) It's GSO9600 host with AMD Trinity APU so now it makes both NV and Ati tasks. And looks like APR for NV tasks is screwed. So far all pre-FERMI GPUs I tested favored cuda23, not cuda32 tasks. But, with addition of Ati GPU (it was disabled initially) execution times are changed perhaps. And looks like initial bias in APR will never be healed. Host continues to recive only cuda32 tasks with higher APR that leaves no chance to cuda23. In short, random part of calculations is too small to be usable just as I feared. |
Send message Joined: 29 May 06 Posts: 1037 Credit: 8,440,339 RAC: 0 ![]() |
And another possible issue: That is unless that host does a Bunch of VHAR tasks with the Cuda32 app, that'll have the effeect of driving the APR down, and hopefully bring the Cuda23 app into play, On my GTX460 i know the Cuda42 app is fastest, with the Cuda5 and Cuda32 apps being only slightly slower, (from my Bench Testing) as of 15 minutes ago all my Nvidia Wu's were Cuda42 Wu's, (The Cuda42 APR must have been top three days ago when i received that work), now when i unset NNT i get a mixture of Normal and VHAR Cuda5 Wu's, i foresee that which app version is prefered will switch every time we have a Shortie Storm at the Main project, (as long as enough Wu's are done) SETI@home v7 7.00 windows_intelx86 (cuda32) Number of tasks completed 323 Max tasks per day 187 Number of tasks today 0 Consecutive valid tasks 154 Average processing rate 161.44290806292 Average turnaround time 2.48 days SETI@home v7 7.00 windows_intelx86 (cuda42) Number of tasks completed 494 Max tasks per day 535 Number of tasks today 0 Consecutive valid tasks 502 Average processing rate 184.4685013102 Average turnaround time 2.86 days SETI@home v7 7.00 windows_intelx86 (cuda50) Number of tasks completed 428 Max tasks per day 375 Number of tasks today 37 Consecutive valid tasks 342 Average processing rate 190.79465847299 Average turnaround time 2.19 days Claggy |
Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0 ![]() |
Cache is beginning to run a little low - the 'in progress' number on the task list includes some non-VLAR I'm holding suspended - so I allowed some work fetch. With the cuda50 APR now down to 133 under the influence of the VLARs, cuda42 becomes the popular choice... |
Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0 ![]() |
OK, VLAR test completed. cuda50 APR driven down to 121 - I think it went transiently even lower than that. Done a couple of cuda42 VLARs - seemed a trifle quicker that the cuda50 norm, but too small a sample for that to be significant. I'll finish off the holdbacks, then return to normal running. |
![]() Send message Joined: 15 Mar 05 Posts: 1547 Credit: 27,183,456 RAC: 0 ![]() |
I expect that once cuda42 picks up some VLARs it's APR will drop accordingly and cuda50 will again be your main choice. Because of the better workunit mix on the main project, I expect this to happen more smoothly there than it does in beta where there are big runs of only VLAR. As far as cal_ati goes, I'll put back the I'm also planning to add a "reset app version statistics for this host" button to the app versions detail page, for people who believe that they are getting the wrong app versions consistently. ![]() |
Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0 ![]() |
I expect that once cuda42 picks up some VLARs it's APR will drop accordingly and cuda50 will again be your main choice. Because of the better workunit mix on the main project, I expect this to happen more smoothly there than it does in beta where there are big runs of only VLAR. cuda50 APR is still dropping as wingmates trickle in - now 111. Getting closer to the 96 of Cuda32.... I agree that a similar consecutive run of cuda42 VLARs could theoretically drive that APR even lower, but that's just a race to the bottom. I'll be more interested to watch how quickly the random probes bring cuda50 back up to the top: if they probe too often, people will complain that slower, inefficient (for their rig) apps will be too prominent in the mix: if they probe too rarely, a distorted APR will remain trapped under an inversion for too long. It sounds like this might provide an answer for the question being asked at Main: will we still need to use anonymous platform once the stock applications are the same as third-party apps? Anonymous platform will be one way (the only way?) to 'lock-in' a host to use one version consistently. |
![]() Send message Joined: 14 Feb 13 Posts: 606 Credit: 588,843 RAC: 0 |
It sounds like this might provide an answer for the question being asked at Main: will we still need to use anonymous platform once the stock applications are the same as third-party apps? Anonymous platform will be one way (the only way?) to 'lock-in' a host to use one version consistently. Another scenario where you need Anonymous Platform is when you want to restrict a type of application to a type of device (e.g. AP on CPU only) A person who won't read has no advantage over one who can't read. (Mark Twain) |
![]() ![]() Send message Joined: 16 Jun 05 Posts: 2531 Credit: 1,074,556 RAC: 0 ![]() |
It sounds like this might provide an answer for the question being asked at Main: will we still need to use anonymous platform once the stock applications are the same as third-party apps? Anonymous platform will be one way (the only way?) to 'lock-in' a host to use one version consistently. I totally agree on that. With each crime and every kindness we birth our future. |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.