Message boards :
News :
Stop using anonymous platform in SETI@home beta.
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0 ![]() |
Not much happening with Beta recently. Nothing seems to be done about the ridicolously wrongly calculated est times and the errors that follows. Eric has been on vacation, but is back in Berkeley now. (unless he's emailing from his hotel room, in which case - stop it, Eric! :) ) |
![]() Send message Joined: 10 Mar 12 Posts: 1700 Credit: 13,216,373 RAC: 0 ![]() |
Not much happening with Beta recently. Nothing seems to be done about the ridicolously wrongly calculated est times and the errors that follows. Well, if nothing is done about this in the coming 2-years, I will stop running Beta. :-) |
![]() Send message Joined: 15 Mar 05 Posts: 1547 Credit: 27,183,456 RAC: 0 ![]() |
Yep, I'm back in Berkeley today and swamped with meetings and catch-up and emails and PMs. Hardly feels like I left. Anyway here's the progress since I left. This is average peak flops count (pfc_avg) for each of the 4 GPU apps... Think of it as being the raw claim of work being done by each app. They are slowly heading towards where they want to be which is between 2 and 7. But it's taking a long time. I'm working on code to speed it up. ![]() Next we have the scaled PFC which is the work claim corrected to make each result about equal no matter which version it's being done on. ![]() So far, so good. Still don't know why the work estimates are so far off. I still don't now why my attempts to manually set pfc_avg and pfc_scale for the apps didn't work. A lot of the results were caused by versions with to few results being compared to versions with too few results. The way I'm going to avoid this on the main project is to release the app versions one at a time. So the plan for next week is to pick the version that will be used on the smallest number of machines and release it to the main project. Once it has equalized with other versions we'll release the next one, and so on..... ![]() |
![]() Send message Joined: 16 Jan 07 Posts: 155 Credit: 194,400 RAC: 0 ![]() |
OpenCL ATI it is then, since they will need a 7.x.x version of BOINC and a supported AMD card. |
![]() Send message Joined: 15 Mar 05 Posts: 1547 Credit: 27,183,456 RAC: 0 ![]() |
Yep. Then I'll have to decide on whether to go with the other ATI or the NVIDIA. I might choose the ATI just because the NVIDIAs can be working on SETI@home results while they are waiting. ![]() |
Send message Joined: 14 Oct 05 Posts: 1137 Credit: 1,848,733 RAC: 0 ![]() |
Not much happening with Beta recently. Nothing seems to be done about the ridicolously wrongly calculated est times and the errors that follows. LoL, something's been done with BOINC 7.0.33 which makes things worse for those running anonymous platform without <flops> in app_info.xml. Not pertinent here in this thread, but an indicator of how complex the problem is. Meanwhile, Beta testing is always interesting... Joe |
![]() Send message Joined: 15 Mar 05 Posts: 1547 Credit: 27,183,456 RAC: 0 ![]() |
I'm wondering if I put on a new validator that uses a credit average weighed by proximity to the app average estimate and doesn't just grant the credit "normal" result when "normal" and "approx" credit results are compared. I've convinced myself that was part a significant part of the problems we experienced. Haven't checked it in because it may not be stable over the long term. If it's not, I'm hoping to catch it before hilarity ensues. ![]() |
Send message Joined: 24 Aug 09 Posts: 79 Credit: 26,117 RAC: 0 ![]() |
Hey if you had a Mac version for GPUs you wold have the smallest bath to test with. But since you don't Oh well. |
![]() ![]() Send message Joined: 24 May 12 Posts: 38 Credit: 436,379 RAC: 0 ![]() |
Eric, any news about all those WUs that errored out and are now just sitting there contemplating the Universe? I mean is there any plan to grant some credit? Oh, and sort of unrelated: does this seem fair to you? http://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=3981521 I thought I had it rough until I saw the poor wingman with 160,655.55 seconds on CPU and about the same for GPU and just 779 credits... I would also mention this: http://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=4012520 In the words of the great wise man Elmer Fudd: there's somethin' awfully screwy goin' on around here... Cheers! Per aspera, ad astra! ![]() ![]() |
Send message Joined: 14 Oct 05 Posts: 1137 Credit: 1,848,733 RAC: 0 ![]() |
Eric, any news about all those WUs that errored out and are now just sitting there contemplating the Universe? I mean is there any plan to grant some credit? The wingmate's task was done with the CPU app, run time in line with the host's other CPU tasks considering it was 89.5% blanked. The 779 credits make sense since the extra ~2 hours to generate blanking data is a small fraction of the run time for CPU tasks. That extra time to generate the blanking data makes a large difference on GPU tasks, though. I would also mention this: The BOINC client will abort any task which hasn't started before deadline, and under some circumstances BOINC will get more work than a host can do within deadline. That host may very well be doing multiple projects and have a small resource share here. Joe |
Send message Joined: 16 Oct 09 Posts: 58 Credit: 662,990 RAC: 0 ![]() |
It looks like the estimates are still off. I didn't get new tasks since I still have enough for several days. But on some I havn't corrected the flops yet and they are still estimated around 13min. I thought if the server side would get it fixed the estimations would get fixed with the next scheduler call. But also the opencl_ati class didn't have their 10 results full when it happened what ever it was. Eric, I have normal estimations if just add two 0 to the flops. Doesn't that mean that the estimation is only 1/100 of what it should be? I mention this in case that it might be helpfull when looking for the bug. EDIT: I did mix up the plan classes. Christoph |
![]() Send message Joined: 15 Mar 05 Posts: 1547 Credit: 27,183,456 RAC: 0 ![]() |
The credits will eventually be fixed. Probably shortly before we start concentrating on the SAHv7 GPU versions. I still don't understand where the factor of 100 in work extimates is coming from. It's got to be somewhere in the host_app_version table. Time to add more debugging code. ![]() |
![]() Send message Joined: 10 Mar 12 Posts: 1700 Credit: 13,216,373 RAC: 0 ![]() |
The credits will eventually be fixed. Probably shortly before we start concentrating on the SAHv7 GPU versions. Number of tasks completed is still not working as it should for AstroPulse v6 6.04 windows_intelx86 (ati_opencl_100) I've done and completed, and got credit for way more than 4 low blanked task with my Q8200/HD4850. The same issue as for 6.03, although Number of tasks completed for 6.03 never came up above 0. For 6.04 it stopped at 4. Application details for host 57178 It does work for my two Nvidia hosts though. |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
My HD6950 based host has estimation of 2h17mins for ati_opencl_100 plan class now so I can continue to participate w/o computational errors just because of wrong estimates. |
![]() Send message Joined: 15 Mar 05 Posts: 1547 Credit: 27,183,456 RAC: 0 ![]() |
Ok, so I've tried a little experiment. Maybe after an app version hits the 100 "normal" result threshold (which they've all done) it easier to try manual changes. So I've bumped pfc_avg for ati_opencl_100 by 100x and dropped pfc_scale by 100x. Let me know if you see a difference in the work estimates and resource limit problems for new results received tomorrow. If this works we can all pretend I know what the hell I'm doing with the BOINC server code. ![]() |
![]() Send message Joined: 10 Mar 12 Posts: 1700 Credit: 13,216,373 RAC: 0 ![]() |
Ok, so I've tried a little experiment. Maybe after an app version hits the 100 "normal" result threshold (which they've all done) it easier to try manual changes. So I've bumped pfc_avg for ati_opencl_100 by 100x and dropped pfc_scale by 100x. Let me know if you see a difference in the work estimates and resource limit problems for new results received tomorrow. The new 6.04 result I got, immediately jumped to 3 days, from just 48 minutes Edit: ( the 48 minutes would have been 38 seconds if my Task duration correction factor hadn't been at 100 already) So, now it overestimates, which is much better since my Task duration correction factor will now be able to drop as I finish 6.04's. Also of course my 6.01 is wildly overestimated, which also will be able to adjust in time. So far, this looks promising. Edit2: This applies to my AstroPulse v6 6.04 windows_intelx86 (ati_opencl_100) tasks. The Nvidia computers were on their way to adjust themselves automatically, even though they too are way off in est times (6.04's with a heightened TDF were pretty OK time wise, but the heightened TDF made 6.01's overestimate wildly). The Task duration correction factor for those, never reached 100, and were slowly dropping, so they would have been able to fix themselves, even though it might have taken a year or two :-) Gee, I find it exceedingly difficult to explain what I mean in English, the older I get. Maybe this is the first sign of Alzheimers Light, or some other age related disease... LOL |
Send message Joined: 18 May 06 Posts: 280 Credit: 26,477,429 RAC: 0 ![]() |
Now that the GPU apps seem to be working for win and Linux, how about osx now? I have GPUs for both ATI and nvidia in my macs, and am ready to test. Dublin, California Team: SETI.USA ![]() |
![]() Send message Joined: 15 Mar 05 Posts: 1547 Credit: 27,183,456 RAC: 0 ![]() |
Good plan. I'll try to reboot my OSX vm this weekend and see what I can do. ![]() |
Send message Joined: 16 Oct 09 Posts: 58 Credit: 662,990 RAC: 0 ![]() |
Ok, so I will edit my remaining tasks this evening. New tasks are one week still away with the remaining cache. Christoph |
Send message Joined: 1 May 07 Posts: 556 Credit: 6,470,846 RAC: 0 ![]() |
Just downloaded a batch of ati_opencl_100 WU's est time of 1:46:00 past the first .900 mark and counting up. I'm just off to the Wansbeck for heart check up. I'll let you know how they proceed later this afternoon. ATI card HD 4600 host 50814 Michael |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.