Message boards :
Number crunching :
Monitoring inconclusive GBT validations and harvesting data for testing
Message board moderation
Previous · 1 . . . 14 · 15 · 16 · 17 · 18 · 19 · 20 . . . 36 · Next
Author | Message |
---|---|
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
Grabbing for curiosity sakes (since Cuda doesn't have a horse in the race). The 5th host in my 2nd example is stock CPU, also, so should be interesting to see which one (if any) that it agrees with. [May lose power again tonight, due to storms and government mismanagement of our utilities, so comparison could take a bit] So, you're one of those 1.6 million (or whatever number I think I saw). Time for a herd of hamsters and a large generating wheel. ;^) |
Kiska Send message Joined: 31 Mar 12 Posts: 302 Credit: 3,067,762 RAC: 0 |
I've decided to grab both datafiles for archival purposes |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Wrong marking. It's not SSE3 CPU, it's OpenCL NV too. SETI apps news We're not gonna fight them. We're gonna transcend them. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Grabbing for curiosity sakes (since Cuda doesn't have a horse in the race). First task: Both Cuda baseline, and alpha zipa2 agree strongly similar Q=99.12% with 8.00 Win32 CPU here, so not running into the one rare long standing possible triplet/pulse ordering issue with this particular task. Spike count: 19 If the spikes aren't striped into the other searches in those OpenCL builds, The spike shortfalls suggest (perhaps) fp:precise might not have been set on ms compiler host code, so as to match Eric's 8.00 Alternatively, if the ordering is still not involved for other reasons, and fp:precise was engaged, then it suggests cummulative error in the chirp, fft, powerspectra or summations may be involved. Lastly, if the spikes were striped into the other searches, so completely changing the ordering, then either validator change or proper rigourous reduction to serial order would be required, for project efficiency. Should I check the other one ? "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
-= Vyper =- Send message Joined: 5 Sep 99 Posts: 1652 Credit: 1,065,191,981 RAC: 2,537 |
Nice find Jeff! This pretty much Nail it with what i'm trying to say for a time now that the validator seem to detect anomilies and it says that the validator doesn't bother about out-of-order when dealing with WUs with high amount of detections. We will see how this will pan out and i Believe this pretty much Nails it that sorting may be required or a revamp of the validator code (or be nothing less of 100% sure of how it handles information in comparison of other result sent in) as i'm not a coder anylonger but only a Think-tank :) _________________________________________________________________________ Addicted to SETI crunching! Founder of GPU Users Group |
-= Vyper =- Send message Joined: 5 Sep 99 Posts: 1652 Credit: 1,065,191,981 RAC: 2,537 |
Should I check the other one ? Yes! Please do! Because i'm very curious of the outcome of this experience and if this issue gets sorted out then we will probably see alot of false-positives that will vanish. As in your testing, you got strong Q and when i recieved a message from Petri with his "banks" of testwus all of them were in the Q99+ range when he ran his application but yet still they seem to fall into the inconclusive swamp anyway. _________________________________________________________________________ Addicted to SETI crunching! Founder of GPU Users Group |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Nice find Jeff! Yeah that describes it IMO. The question/tradeoff becomes if we need to worry so much about performance for these (very infrequent) overflow type results, that optimising their throughput becomes more important than the cost of validator updates and reissue, demands on project and participants respectively. I'm open to debate, however my opinion is that CPU serial dictates order by rules typically adopted implementing parallel algorithms, in they must always be reducible to serial form and produce the same result. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Should I check the other one ? OK, will grab that in a bit, in between battening down various hatches for the storm. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
-= Vyper =- Send message Joined: 5 Sep 99 Posts: 1652 Credit: 1,065,191,981 RAC: 2,537 |
I'm open to debate, however my opinion is that CPU serial dictates order by rules typically adopted implementing parallel algorithms, in they must always be reducible to serial form and produce the same result. Exactly what i Believe also, all finds in all apps need to be uploaded in the same order-of-sequence to not fall in the inconclusive ballpark (as it seems?!) . It's easy when WUs get compared to the stock application (CPU) and gets validated in the end but Think of it when a WU is sent to for instance an Android, Apple Darwin, SoG and Cuda (What application is more right and wrong than the other is hard to find if this isn't addresed) and none of them passes through because the result sent back is Always different in some way even if it perhaps get Q99+ for real (Or does it really and everyone Believes that the code works?!). Has anyone looked at the results in a excel spreadsheet and try to sort and compare the results there (Or other human viewable application) :) Lol _________________________________________________________________________ Addicted to SETI crunching! Founder of GPU Users Group |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
fp: precise leads to inconclusive results vs stock. Better to forget about fp:precise completely. This is non-portable feature of CPU. SETI apps news We're not gonna fight them. We're gonna transcend them. |
-= Vyper =- Send message Joined: 5 Sep 99 Posts: 1652 Credit: 1,065,191,981 RAC: 2,537 |
fp: precise leads to inconclusive results vs stock. Hmm http://stackoverflow.com/questions/12514516/difference-between-fpstrict-and-fpprecise I Think that strict should be used in every code produced if i read this above. "bitwise compatibility between different compilers and platforms" https://en.wikipedia.org/wiki/IEEE_754-1985 I posted this for others to get it too as to why! _________________________________________________________________________ Addicted to SETI crunching! Founder of GPU Users Group |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Interesting, since Cuda codebase required it for the close matches. Removing that, that leaves the two other possibilities I listed (apart from host breakages) . If not ordering/striping related, could be tricky to locate the source of cumulative error. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
-= Vyper =- Send message Joined: 5 Sep 99 Posts: 1652 Credit: 1,065,191,981 RAC: 2,537 |
could be tricky to locate the source of cumulative error. Indeed! And this needs to be addressed "NOW", not in the future or later on because the variety of different apps, platforms, compilers, Cuda, OpenCL, Vulkan bla bla yada yada is increasing and thus this problem increases exponentially. I Think that what we're seeing now were a "non issue" in the past Before 2010 where the majority of computers were CPU based (Serial non-interesting-output) but now more and more ppl add their PS3s, Androids, AMD Gpus , Nvidia GPUs bla bla this "inconclusive era" seems to have got out of reach in every app produced! Not to mention the real black Apple issue sheep! This is only my way of seeing this, and perhaps real old code that worked perfectly in a CPU-only world needs to be changed and i'm not talking about the analyzing part that you guys are tweaking the hell out of, perhaps the Server Validator code needs to be changed that perhaps was written in 2006 where we had none of the new devices that pops up regularly. If that code part is "stupid" and doesn't do the sorting and juggling required to do then you coders "need to patch the outgoing results from the analyze so the validator gets it because it is serial-coded-minded instead of parallell-coded-minded" _________________________________________________________________________ Addicted to SETI crunching! Founder of GPU Users Group |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
fp: precise leads to inconclusive results vs stock. strict would certainly be an option if variation among platforms beyond validator tolerence is required, though haven't encountered Cuda inconclusives at sufficient rate on any of the 3 platforms to warrant it (all built with different compilers) There are other ways to ramp up the precision, though discussion with Eric with v8 development he indicated he felt that level wasn't necessary. For our purposes I'd suggest fp:precise and fp:strict will be equivalent. The difficulty with deciding to enforce strict IEEE-754 semantics for all, is that not all devices/platforms/compilers actually have IEEE-754 available. In essense if it comes down to precise Vs Strict with single floats (which I doubt this is), then double precision should be used if it's really necessary. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
could be tricky to locate the source of cumulative error. Yeah I'm definitely agreeing with the principles, and they need discussion. I agree it's going to get a lot worse if not reigned in either. [Edit] Second wu started, same bench comparison arrangement. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
-= Vyper =- Send message Joined: 5 Sep 99 Posts: 1652 Credit: 1,065,191,981 RAC: 2,537 |
I don't really see the need of it. They only need to use IEE strict and use single precision (single precision 32 bits ±1.18×10−38 to ±3.4×1038 approx. 7 decimal digits ) , That should set every application and even CPUs so we would get the Q=100 mark. And if it indeed doesn't then validator portion of code needs to be adressed. Perhaps something for you all to pursue Eric to go down this route if this isn't so extremely more slow than other FP modes. It seems like of S@H goes down this IEE route then i Believe you coders would get alot less headache in the future when you're optimising the analyzing part and can focus more on development instead of chasing rounding bugs slipped through. Compare this to write some code in c++ compared to pure machine code. What is the most easy code to maintain when bugs arise? :) , Certainly not the good old Classic F8 E4 code Lol... _________________________________________________________________________ Addicted to SETI crunching! Founder of GPU Users Group |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14672 Credit: 200,643,578 RAC: 874 |
Nice find Jeff! It depends on where the 'out of order' is happening. These are overflow tasks, in what I call the 'late onset' category. Let's say, for the sake of argument, that the data contains 50 reportable signals, but (as we know) the app applies a hard cut-off after it's found 30 of them. If two apps find the same subset of 30 out of the available 50, then I'm pretty sure the validator will pass the result, even if the reporting order is different - I had a walk through the code a few days ago. But if the app - by doing a parallel search - finds a different subset of 30 from 50, then the results are different, and no amount of tweaking the validator is going make any difference. There's actually a big difference between 'strongly similar' (required for immediate full validation, and acceptance of the science) and 'weakly similar' (which earns you a pat on the back and some credit, but nothing else). 271 // return true if at least half the signals (and at least one) 272 // from each result are roughly equal to a signal from the other result 'Strong' requires that every single one of the 30 signals matches. 'Weak' (the comment above) only requires an overlap of 15. |
-= Vyper =- Send message Joined: 5 Sep 99 Posts: 1652 Credit: 1,065,191,981 RAC: 2,537 |
Yup, that is so totally true! Thats why i'm naging about that the result sent back should be unified (presentation wise) so the stock CPU has a sorting routine incorporated in the future if so and every other application aswell so we won't ever get this again in the future. If a WU is overflowed it is ofcourse and it's crap. But why perhaps don't get credit for 5600 seconds of cputime if it gets ironed out of other "juggling order" applications when you could do the code right from the beginning? Incorporate a result sorting routine in main s@h code and let the other (Tweakers follow its lead). The only thing in the future all would get is less headache when dealing with other forthcoming optimisations and variations which will only increase, not decrease :-/ _________________________________________________________________________ Addicted to SETI crunching! Founder of GPU Users Group |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
nah I think Eric's right, in that near 100% Q's are good enough. The problem with the idea of achieving 100% cross architecture, is that while MS say strict guarantees compliance, this doesn't guarantee other compilers, or hardware device manufacturers implemented their chips in the bit identical way suggested --- or for that matter that MS even interpreted the standard correctly in the first place. That (possible but small variation within spec) is on top of that the key Fourier transforms themselves encounter data dependant variation of order O(logN), which amounts to ~64 ulps over a dataset. There's another source of cumulative error than compiler settings hiding in those OpenCL and Mac CPU builds. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14672 Credit: 200,643,578 RAC: 874 |
But to solve this by doing a "find all and sort them afterwards" would mean that every task would have to run to full term, and we'd lose the efficiency of quitting early after 10 seconds or so for the really noisy WUs. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.