Instant finishing of all WUs?

Message boards : Number crunching : Instant finishing of all WUs?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Cavalary

Send message
Joined: 15 Jul 99
Posts: 104
Credit: 7,507,548
RAC: 38
Romania
Message 1547682 - Posted: 26 Jul 2014, 0:10:53 UTC - in response to Message 1543014.  

Yeah, after never checking results here in all these 15 years, now I do it a few times per day, and after that first WU since the BOINC downgrade that it reported finished right after reboot and was obviously all sorts of messed up if you just looked at the output here (and which required 4 users to finally get a matched result, as the 3rd person returned something completely different from mine or the 2nd person's as well), all went fine till today when I see two inconclusive ones.
Well, have to ask, why is the first one inconclusive when the result itself is exactly the same? Just because of the difference in the flopcounter?
Second remains to be seen I guess, other user found an additional spike my computer didn't, but first seems weird.
ID: 1547682 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1547719 - Posted: 26 Jul 2014, 1:50:54 UTC - in response to Message 1547682.  

...
Well, have to ask, why is the first one inconclusive when the result itself is exactly the same? Just because of the difference in the flopcounter?
...

It's not the flopcounter, and the ~4% difference there is more or less expected when comparing GPU to CPU results.

The count of reported signals being the same is far from enough to call the results exactly the same. The validator is working with the uploaded result files, and checking a lot of details in those signal reports. In addition to the 7 spikes, the uploaded result files contain a best_spike, best_autocorr, best_pulse, and best_gaussian which the validator also checked. Some minor difference probably caused one of those 11 signals to not match closely enough so the validator called for a third opinion.

With the signal count match, it is highly likely that all 3 results will be granted credit, it would take 6 mismatched signals out of the 11 for a result to be called invalid and not get any credit. But what will go into the science database is a result file for which every signal was matched in a wingmate's result.
                                                                  Joe
ID: 1547719 · Report as offensive
Cavalary

Send message
Joined: 15 Jul 99
Posts: 104
Credit: 7,507,548
RAC: 38
Romania
Message 1547991 - Posted: 26 Jul 2014, 14:44:59 UTC - in response to Message 1547719.  

Ah. Better late than never, learning how this actually works, that is :)
But why isn't all data shown there then?

And you were right about the credit, all 3 users got credit in both cases, just that mine was picked as canonical only in 2nd, where the difference was visible, in first it was the other one.
But is it normal to have errors like this? I mean, shouldn't 2 computers that run the same procedures on the same dataset obtain the same results if they run properly?
ID: 1547991 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1548111 - Posted: 26 Jul 2014, 18:37:06 UTC - in response to Message 1547991.  

Ah. Better late than never, learning how this actually works, that is :)
But why isn't all data shown there then?

Putting all of the details of each signal into the stderr would make it much larger, but the Lunatics OpenCL and CPU applications do include some of the details in rounded form. That's often enough to support a good guess about what caused an inconclusive validation.

And you were right about the credit, all 3 users got credit in both cases, just that mine was picked as canonical only in 2nd, where the difference was visible, in first it was the other one.
But is it normal to have errors like this? I mean, shouldn't 2 computers that run the same procedures on the same dataset obtain the same results if they run properly?

Different implementations of the same algorithm generally don't match exactly, and the project wants to make it possible for any suitable hardware to be used. The sah_validate checking of signals allows mismatches of 1% between values where a relative difference is the appropriate measure, and a conceptually similar tolerance where absolute differences are appropriate (1 second in time, for instance).

However, the applications must make decisions which don't have any tolerance. In choosing which of 7 reported spikes to also characterize as "best", for instance, it is quite possible that a WU has two spikes which are so close in power that a tiny calculation difference will make one host choose differently than another.

For binary computer math there's always a tradeoff between speed and accuracy. Most of the heavy computation is done using single precision floating point math, and compilers are usually set to use fast math optimizations which don't strictly adhere to IEEE 754 standards. Even so, the value matches can usually be kept to within 0.01% (100 times better than the validator requires). At that level, there are relatively few inconclusive validations. The optimizations increase project productivity in spite of the occasional need to get a third opinion before assimilating the results.

Note that the stock SETI@home v7 v7.00 application chooses among several implementations of various functions at runtime, so even stock CPU to stock CPU validations sometimes are initially inconclusive.
                                                                  Joe
ID: 1548111 · Report as offensive
Cavalary

Send message
Joined: 15 Jul 99
Posts: 104
Credit: 7,507,548
RAC: 38
Romania
Message 1548365 - Posted: 27 Jul 2014, 9:32:24 UTC - in response to Message 1548111.  

Ah, right, didn't consider the floating point issue, despite bumping into oddities with non-integer values even in my odd attempts at programming.

Though a difference between 4 and 5 isn't a floating point issue I'll say, but the rest seems to explain that as well.

Thanks again.
ID: 1548365 · Report as offensive
Previous · 1 · 2

Message boards : Number crunching : Instant finishing of all WUs?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.