Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /disks/centurion/b/carolyn/b/home/boincadm/projects/beta/html/inc/boinc_db.inc on line 147
Stop using anonymous platform in SETI@home beta.

Stop using anonymous platform in SETI@home beta.

Message boards : News : Stop using anonymous platform in SETI@home beta.
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,272,268
RAC: 0
United Kingdom
Message 43530 - Posted: 2 Aug 2012, 19:29:30 UTC - in response to Message 43529.  

Not much happening with Beta recently. Nothing seems to be done about the ridicolously wrongly calculated est times and the errors that follows.

Are we just threading water, and burning electricity with our computers, for no reason whatsoever?

Eric has been on vacation, but is back in Berkeley now.

(unless he's emailing from his hotel room, in which case - stop it, Eric! :) )
ID: 43530 · Report as offensive
Grumpy Swede
Volunteer tester
Avatar

Send message
Joined: 10 Mar 12
Posts: 1700
Credit: 13,216,373
RAC: 0
Sweden
Message 43531 - Posted: 2 Aug 2012, 19:34:34 UTC - in response to Message 43530.  

Not much happening with Beta recently. Nothing seems to be done about the ridicolously wrongly calculated est times and the errors that follows.

Are we just threading water, and burning electricity with our computers, for no reason whatsoever?

Eric has been on vacation, but is back in Berkeley now.

(unless he's emailing from his hotel room, in which case - stop it, Eric! :) )


Well, if nothing is done about this in the coming 2-years, I will stop running Beta. :-)
ID: 43531 · Report as offensive
Profile Eric J Korpela
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 15 Mar 05
Posts: 1547
Credit: 27,183,456
RAC: 0
United States
Message 43532 - Posted: 2 Aug 2012, 22:11:19 UTC - in response to Message 43531.  

Yep, I'm back in Berkeley today and swamped with meetings and catch-up and emails and PMs. Hardly feels like I left.

Anyway here's the progress since I left.

This is average peak flops count (pfc_avg) for each of the 4 GPU apps... Think of it as being the raw claim of work being done by each app. They are slowly heading towards where they want to be which is between 2 and 7. But it's taking a long time. I'm working on code to speed it up.



Next we have the scaled PFC which is the work claim corrected to make each result about equal no matter which version it's being done on.




So far, so good. Still don't know why the work estimates are so far off. I still don't now why my attempts to manually set pfc_avg and pfc_scale for the apps didn't work.

A lot of the results were caused by versions with to few results being compared to versions with too few results. The way I'm going to avoid this on the main project is to release the app versions one at a time.

So the plan for next week is to pick the version that will be used on the smallest number of machines and release it to the main project. Once it has equalized with other versions we'll release the next one, and so on.....
ID: 43532 · Report as offensive
arkayn
Volunteer tester
Avatar

Send message
Joined: 16 Jan 07
Posts: 155
Credit: 194,400
RAC: 0
United States
Message 43533 - Posted: 2 Aug 2012, 22:51:19 UTC - in response to Message 43532.  


So the plan for next week is to pick the version that will be used on the smallest number of machines and release it to the main project. Once it has equalized with other versions we'll release the next one, and so on.....


OpenCL ATI it is then, since they will need a 7.x.x version of BOINC and a supported AMD card.
ID: 43533 · Report as offensive
Profile Eric J Korpela
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 15 Mar 05
Posts: 1547
Credit: 27,183,456
RAC: 0
United States
Message 43534 - Posted: 3 Aug 2012, 1:24:14 UTC - in response to Message 43533.  

Yep. Then I'll have to decide on whether to go with the other ATI or the NVIDIA. I might choose the ATI just because the NVIDIAs can be working on SETI@home results while they are waiting.
ID: 43534 · Report as offensive
Josef W. Segur
Volunteer tester

Send message
Joined: 14 Oct 05
Posts: 1137
Credit: 1,848,733
RAC: 0
United States
Message 43535 - Posted: 3 Aug 2012, 4:14:11 UTC - in response to Message 43529.  

Not much happening with Beta recently. Nothing seems to be done about the ridicolously wrongly calculated est times and the errors that follows.

Are we just threading water, and burning electricity with our computers, for no reason whatsoever?


LoL, something's been done with BOINC 7.0.33 which makes things worse for those running anonymous platform without <flops> in app_info.xml. Not pertinent here in this thread, but an indicator of how complex the problem is.

Meanwhile, Beta testing is always interesting...
                                                                   Joe
ID: 43535 · Report as offensive
Profile Eric J Korpela
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 15 Mar 05
Posts: 1547
Credit: 27,183,456
RAC: 0
United States
Message 43536 - Posted: 3 Aug 2012, 5:10:17 UTC - in response to Message 43535.  

I'm wondering if defaults to 1 somewhere in the scheduler....

I put on a new validator that uses a credit average weighed by proximity to the app average estimate and doesn't just grant the credit "normal" result when "normal" and "approx" credit results are compared. I've convinced myself that was part a significant part of the problems we experienced.

Haven't checked it in because it may not be stable over the long term. If it's not, I'm hoping to catch it before hilarity ensues.
ID: 43536 · Report as offensive
B-Man
Volunteer tester

Send message
Joined: 24 Aug 09
Posts: 79
Credit: 26,117
RAC: 0
United States
Message 43537 - Posted: 3 Aug 2012, 5:21:47 UTC - in response to Message 43533.  


So the plan for next week is to pick the version that will be used on the smallest number of machines and release it to the main project. Once it has equalized with other versions we'll release the next one, and so on.....


OpenCL ATI it is then, since they will need a 7.x.x version of BOINC and a supported AMD card.

Hey if you had a Mac version for GPUs you wold have the smallest bath to test with. But since you don't Oh well.
ID: 43537 · Report as offensive
Profile cAnDYmanS@H-Beta
Volunteer tester
Avatar

Send message
Joined: 24 May 12
Posts: 38
Credit: 436,379
RAC: 0
Romania
Message 43538 - Posted: 3 Aug 2012, 6:29:37 UTC
Last modified: 3 Aug 2012, 6:31:24 UTC

Eric, any news about all those WUs that errored out and are now just sitting there contemplating the Universe? I mean is there any plan to grant some credit?

Oh, and sort of unrelated: does this seem fair to you?
http://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=3981521
I thought I had it rough until I saw the poor wingman with 160,655.55 seconds on CPU and about the same for GPU and just 779 credits...
I would also mention this:
http://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=4012520

In the words of the great wise man Elmer Fudd: there's somethin' awfully screwy goin' on around here...

Cheers!
Per aspera, ad astra!

ID: 43538 · Report as offensive
Josef W. Segur
Volunteer tester

Send message
Joined: 14 Oct 05
Posts: 1137
Credit: 1,848,733
RAC: 0
United States
Message 43540 - Posted: 3 Aug 2012, 15:53:30 UTC - in response to Message 43538.  

Eric, any news about all those WUs that errored out and are now just sitting there contemplating the Universe? I mean is there any plan to grant some credit?

Oh, and sort of unrelated: does this seem fair to you?
http://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=3981521
I thought I had it rough until I saw the poor wingman with 160,655.55 seconds on CPU and about the same for GPU and just 779 credits...

The wingmate's task was done with the CPU app, run time in line with the host's other CPU tasks considering it was 89.5% blanked. The 779 credits make sense since the extra ~2 hours to generate blanking data is a small fraction of the run time for CPU tasks. That extra time to generate the blanking data makes a large difference on GPU tasks, though.

I would also mention this:
http://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=4012520

In the words of the great wise man Elmer Fudd: there's somethin' awfully screwy goin' on around here...

Cheers!

The BOINC client will abort any task which hasn't started before deadline, and under some circumstances BOINC will get more work than a host can do within deadline. That host may very well be doing multiple projects and have a small resource share here.
                                                                  Joe
ID: 43540 · Report as offensive
Christoph
Volunteer tester

Send message
Joined: 16 Oct 09
Posts: 58
Credit: 662,990
RAC: 0
Germany
Message 43541 - Posted: 3 Aug 2012, 16:16:25 UTC
Last modified: 3 Aug 2012, 16:40:32 UTC

It looks like the estimates are still off. I didn't get new tasks since I still have enough for several days. But on some I havn't corrected the flops yet and they are still estimated around 13min.

I thought if the server side would get it fixed the estimations would get fixed with the next scheduler call.

But also the opencl_ati class didn't have their 10 results full when it happened what ever it was.

Eric, I have normal estimations if just add two 0 to the flops. Doesn't that mean that the estimation is only 1/100 of what it should be?
I mention this in case that it might be helpfull when looking for the bug.

EDIT: I did mix up the plan classes.
Christoph
ID: 43541 · Report as offensive
Profile Eric J Korpela
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 15 Mar 05
Posts: 1547
Credit: 27,183,456
RAC: 0
United States
Message 43542 - Posted: 3 Aug 2012, 20:24:48 UTC - in response to Message 43538.  

The credits will eventually be fixed. Probably shortly before we start concentrating on the SAHv7 GPU versions.

I still don't understand where the factor of 100 in work extimates is coming from. It's got to be somewhere in the host_app_version table. Time to add more debugging code.
ID: 43542 · Report as offensive
Grumpy Swede
Volunteer tester
Avatar

Send message
Joined: 10 Mar 12
Posts: 1700
Credit: 13,216,373
RAC: 0
Sweden
Message 43543 - Posted: 3 Aug 2012, 20:53:24 UTC - in response to Message 43542.  
Last modified: 3 Aug 2012, 20:54:22 UTC

The credits will eventually be fixed. Probably shortly before we start concentrating on the SAHv7 GPU versions.

I still don't understand where the factor of 100 in work extimates is coming from. It's got to be somewhere in the host_app_version table. Time to add more debugging code.


Number of tasks completed is still not working as it should for AstroPulse v6 6.04 windows_intelx86 (ati_opencl_100)

I've done and completed, and got credit for way more than 4 low blanked task with my Q8200/HD4850. The same issue as for 6.03, although Number of tasks completed for 6.03 never came up above 0. For 6.04 it stopped at 4.

Application details for host 57178

It does work for my two Nvidia hosts though.
ID: 43543 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 43544 - Posted: 3 Aug 2012, 21:54:02 UTC

My HD6950 based host has estimation of 2h17mins for ati_opencl_100 plan class now so I can continue to participate w/o computational errors just because of wrong estimates.
ID: 43544 · Report as offensive
Profile Eric J Korpela
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 15 Mar 05
Posts: 1547
Credit: 27,183,456
RAC: 0
United States
Message 43546 - Posted: 4 Aug 2012, 2:58:59 UTC - in response to Message 43543.  

Ok, so I've tried a little experiment. Maybe after an app version hits the 100 "normal" result threshold (which they've all done) it easier to try manual changes. So I've bumped pfc_avg for ati_opencl_100 by 100x and dropped pfc_scale by 100x. Let me know if you see a difference in the work estimates and resource limit problems for new results received tomorrow.

If this works we can all pretend I know what the hell I'm doing with the BOINC server code.


ID: 43546 · Report as offensive
Grumpy Swede
Volunteer tester
Avatar

Send message
Joined: 10 Mar 12
Posts: 1700
Credit: 13,216,373
RAC: 0
Sweden
Message 43547 - Posted: 4 Aug 2012, 3:37:13 UTC - in response to Message 43546.  
Last modified: 4 Aug 2012, 4:12:30 UTC

Ok, so I've tried a little experiment. Maybe after an app version hits the 100 "normal" result threshold (which they've all done) it easier to try manual changes. So I've bumped pfc_avg for ati_opencl_100 by 100x and dropped pfc_scale by 100x. Let me know if you see a difference in the work estimates and resource limit problems for new results received tomorrow.

If this works we can all pretend I know what the hell I'm doing with the BOINC server code.



The new 6.04 result I got, immediately jumped to 3 days, from just 48 minutes Edit: ( the 48 minutes would have been 38 seconds if my Task duration correction factor hadn't been at 100 already)

So, now it overestimates, which is much better since my Task duration correction factor will now be able to drop as I finish 6.04's. Also of course my 6.01 is wildly overestimated, which also will be able to adjust in time.

So far, this looks promising.

Edit2: This applies to my AstroPulse v6 6.04 windows_intelx86 (ati_opencl_100) tasks. The Nvidia computers were on their way to adjust themselves automatically, even though they too are way off in est times (6.04's with a heightened TDF were pretty OK time wise, but the heightened TDF made 6.01's overestimate wildly). The Task duration correction factor for those, never reached 100, and were slowly dropping, so they would have been able to fix themselves, even though it might have taken a year or two :-)

Gee, I find it exceedingly difficult to explain what I mean in English, the older I get. Maybe this is the first sign of Alzheimers Light, or some other age related disease...

LOL
ID: 43547 · Report as offensive
zombie67 [MM]
Volunteer tester
Avatar

Send message
Joined: 18 May 06
Posts: 280
Credit: 26,477,429
RAC: 0
United States
Message 43548 - Posted: 4 Aug 2012, 3:45:47 UTC

Now that the GPU apps seem to be working for win and Linux, how about osx now? I have GPUs for both ATI and nvidia in my macs, and am ready to test.
Dublin, California
Team: SETI.USA

ID: 43548 · Report as offensive
Profile Eric J Korpela
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 15 Mar 05
Posts: 1547
Credit: 27,183,456
RAC: 0
United States
Message 43549 - Posted: 4 Aug 2012, 4:28:51 UTC - in response to Message 43548.  

Good plan. I'll try to reboot my OSX vm this weekend and see what I can do.
ID: 43549 · Report as offensive
Christoph
Volunteer tester

Send message
Joined: 16 Oct 09
Posts: 58
Credit: 662,990
RAC: 0
Germany
Message 43550 - Posted: 4 Aug 2012, 5:02:54 UTC

Ok, so I will edit my remaining tasks this evening. New tasks are one week still away with the remaining cache.
Christoph
ID: 43550 · Report as offensive
Father Ambrose
Volunteer tester

Send message
Joined: 1 May 07
Posts: 556
Credit: 6,470,846
RAC: 0
United Kingdom
Message 43551 - Posted: 4 Aug 2012, 8:31:50 UTC

Just downloaded a batch of ati_opencl_100 WU's est time of 1:46:00 past the first .900 mark and counting up. I'm just off to the Wansbeck for heart check up.

I'll let you know how they proceed later this afternoon.

ATI card HD 4600 host 50814

Michael
ID: 43551 · Report as offensive
Previous · 1 · 2 · 3 · 4 · Next

Message boards : News : Stop using anonymous platform in SETI@home beta.


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.