Message boards :
Number crunching :
Monitoring inconclusive GBT validations and harvesting data for testing
Message board moderation
Previous · 1 . . . 16 · 17 · 18 · 19 · 20 · 21 · 22 . . . 36 · Next
Author | Message |
---|---|
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
There is no connection between fp:strict or whatever precision switch can be and reporting subset of results on overflow. This was mentioned in this thread before. Increasing precision is not a solution for overflow tasks. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Yes, host code is different indeed. So, I'll reformulate regarding fp:precise: it's not an universal solution to fix precision-related issues. And of course, not a solution at all for different ordering issue on overflows (as already stated in this thread). SETI apps news We're not gonna fight them. We're gonna transcend them. |
-= Vyper =- Send message Joined: 5 Sep 99 Posts: 1652 Credit: 1,065,191,981 RAC: 2,537 |
There is no connection between fp:strict or whatever precision switch can be and reporting subset of results on overflow.. Isn't you all using fp:precise (single) i just wanted to ask what happens if app is compiled and tested with fp:strict (single) instead! What is the speed penalty of going precise(single) to strict(single)?? This was mentioned in this thread before. Increasing precision is not a solution for overflow tasks I'm not talking about solving overflow tasks, that was not the purpose. (This was an offtopic question that popped up in my mind) The purpose in my mind was an overall platform standard that should follow IEEE754 regardless of cpu, x32 x64 arm, gpu. When calculated and fixed correctly then the outcome would be so very Close to Q100 as it possibly can resulting in less headbanging for all of you optimisers in the future. The idea of me telling you to test for that direction is mainly for you all to switch more to code optimising instead of bughunting various platforms until hell freezes over. It will only increase as i say not decrease. Until you know for sure that it won't work i will continue to push on this for unification if it isn't so much slower than using precise. When numbers have been presented here as an comparison then we know 100% if this is not Worth it or not but if going fp:strict is for an example 3% slower but Q is increased to Q99.99 - Q100 range then if i were an Project manager i would vouch to go that route now instead of banging heads for more months/years to come chasing annoying rounding bugs and result disparities. _________________________________________________________________________ Addicted to SETI crunching! Founder of GPU Users Group |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
There is no connection between fp:strict or whatever precision switch can be and reporting subset of results on overflow.. No, all my builds use /fp:fast for example as always was with AKv8 derivatives AFAIK. Out of interest I could provide you builds for comparison. Recently found inefficiency in CPU pulsefinding makes CPU apps rebuild worthwhile. The purpose in my mind was an overall platform standard that should follow IEEE754 regardless of cpu, x32 x64 arm, gpu. This rules out IEE754-incompatible devices w/o any real need to doing so.
Unjustified idealization here. Most of bugs hunting (except own bugs of course) coming from non-complying runtimes. If runtime doesn't comply with standart stricly following standart will not help, just make debugging even more obscure. SETI apps news We're not gonna fight them. We're gonna transcend them. |
-= Vyper =- Send message Joined: 5 Sep 99 Posts: 1652 Credit: 1,065,191,981 RAC: 2,537 |
If the s@h people says that they want to use IEEE754 in future releases to iron out differences then science wise it should be very welcome. It's easier to ban an platform/compiler that doesn't conform to those rules in Boinc API if you developers find an combination that doesnt work properly. https://en.wikipedia.org/wiki/IEEE_floating_point#Basic_and_interchange_formats Its only to binary32 or decimal32 what serves best from the simpliest cpu application up to monster quadruple gpu/fpga/asic cores in the future. If you all find a card or driver that doesn't work then it's up to the manufacturer to patch their shit so that they can conform to be working 100% to IEEE754 standard. EDIT: All this above is to get code more to the Q100 mark whatever platform/combination as possible but as a second step perhaps but as we've noticed that thing that i mention now has nothing to do with the main topic of the thread of inconclusive validations, that is Another thing ofcourse that actually needs to be fixed on Another level because i'm sure that each and every one of those applications if compared to all signals found (30+) would get the Q99+ so they most certainly would validate. _________________________________________________________________________ Addicted to SETI crunching! Founder of GPU Users Group |
-= Vyper =- Send message Joined: 5 Sep 99 Posts: 1652 Credit: 1,065,191,981 RAC: 2,537 |
Now one of them validated them all! http://setiathome.berkeley.edu/workunit.php?wuid=2276193382 _________________________________________________________________________ Addicted to SETI crunching! Founder of GPU Users Group |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
If you all find a card or driver that doesn't work then it's up to the manufacturer to patch their shit so that they can conform to be working 100% to IEEE754 standard. Hm... looks like you dont' read these forums frequently. Else you would know how long is that "patch their shit" list currently is even w/o any precision-compliance. Well, currently we have 2 platforms with real precision issues: OpenCL NV + OS X of modern version; OpenCL Intel + some (still not known exactly those) devices and drivers. OS X out of my scope, but I could do some experiments with iGPU builds regarding /fp:* switches. I think better to make experimental proofs versus plain discussions. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Now one of them validated them all! As should be with current validator in most cases. Nothing really interesting here. But to reduce inefficiency of re-processing validator should be changed. As I said earlier, this topic in discussion with Eric. SETI apps news We're not gonna fight them. We're gonna transcend them. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Here, on Cuda, fp:strict had minimal performance impact, but actually made match to stock 8.00 worse. That's because x86 builds built with gnu compiler uses the x87 FPU, which uses 80 bit registers for intermediates. There are no such intermediate registers on the GPUs, nor in the SSE+ parts of CPUs, and are blocked from use by ms compiler in x64 builds entirely (so they can reappropriate as general x64 registers, and register renaming capability for Windows32 on Windows64) "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Actually, conforming IEEE754 standart in rounding will not result in Q100 mark either. Standart just describes how rounding will be made, it can't prevent precision lost in such case for example: A+B+C versus A+(B+C) in case where A is big number and B and C much smaller ones. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14672 Credit: 200,643,578 RAC: 874 |
I'm not talking about solving overflow tasks, that was not the purpose. (This was an offtopic question that popped up in my mind) Most of the conversation this morning has arisen from your message 1820401: Nice find Jeff! in reply to Jeff Buck's message 1820375: It's a -9 overflow with 3 different apps coming up with 4 different results As Jeff says, the point at issue there is very specifically overflow tasks. I haven't (recently) seen any obvious cases where the validator rejections have been attributable directly to precision issues. As Jason reminded us, there were some in the CUDA builds, but they were detected and corrected during the pre-release testing phase for SaH v8 (Breakthough Listen and Guppi). As thread originator, I'm absolutely happy to discuss precision issues here too, but let's try to be rigorous, please, and address comments to the appropriate sub-category (overflow, precision, or whatever else). Is anyone currently seeing systemic validator rejections because of poor precision of the correct signal, rather than selection of the wrong signal to report? |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Now one of them validated them all! Only important one is the canonical, at least the top 2 will be weakly similar. 8.00 apparently strongly matched it, as the results are similarly rolling in here now with Cuda. Q=99.38% . Closer than that is not with reasonable practical reach at this time ;) "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Yes, in some iGPUs as you know from beta. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14672 Credit: 200,643,578 RAC: 874 |
Now one of them validated them all! The new (Stock CPU) result must have been strongly similar to "canonical result 5182875831" - the opencl_ati_cat132. All of the others must have been weakly similar to one of those two - and 'weakly' can be very weak indeed (only half the signals matched). I'd actually not call that validated at all, but we're stuck with a binary choice in the status column. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Is anyone currently seeing systemic validator rejections because of poor precision of the correct signal, rather than selection of the wrong signal to report? Just confirming both Jeff's matched (8.00) on the 2 cuda flavours, Q over 99%. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
I'd actually not call that validated at all, but we're stuck with a binary choice in the status column. And this "feature" really hides issues making builds validation and debugging harder (though allows that damned credits receiving of course...) SETI apps news We're not gonna fight them. We're gonna transcend them. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14672 Credit: 200,643,578 RAC: 874 |
Is anyone currently seeing systemic validator rejections because of poor precision of the correct signal, rather than selection of the wrong signal to report? Fair point. I've not (yet) done a side-by-side visual comparison of the signal summary reports for one of those, but it would be worth doing. Edit - and the newly-validated one provides an excellent case study. We have an iGPU (HD Graphics 530) with an enormous inconclusive count, and a canonical signal display from the ATi. I'll grab them, and compare after lunch. Edit2 - and both using comparable r3430 code. Even nicer. Edit3 - initial eyeball: the iGPU reported a triplet that the ATI didn't. Threshhold issue, perhaps? |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
If you all find a card or driver that doesn't work then it's up to the manufacturer to patch their shit so that they can conform to be working 100% to IEEE754 standard. To elaborate little more on this: http://setiathome.berkeley.edu/forum_thread.php?id=80247&postid=1820339 Recently testing new builds with Mike and his GPU we discovered that last build stopped to provide inconclusives being run in multiple instances... But then I looked inside stderr and found obviously bad and wrong numbers in profiling counters app prints now. They work OK in single-instance and OK in multiple-instance modes in other configs I tested (where multiple instances allowed before too). So, I think it's straight evidence that driver GPU context switching just bugged for that whole AMD GPUs family on Windows! And we talking about rare borderline inconclusives from Q99 instead of Q100 here.... SETI apps news We're not gonna fight them. We're gonna transcend them. |
-= Vyper =- Send message Joined: 5 Sep 99 Posts: 1652 Credit: 1,065,191,981 RAC: 2,537 |
I'd actually not call that validated at all, but we're stuck with a binary choice in the status column. You got a Point there, because it all gets down to human psychology. If something doesn't needs to be fixed because it won't matter in the end (credits) it won't get fixed that much. But if weakly ones doesnt get a single credit then things would sped up dramtically to make it work or if it can't work then "ban" the computer/platform/gpu combo in the servers instead and don't send units to devices that can't compute them thoroughly. As simple as that really. It would be ashame if Cuda/Amds GPU hardware gets there but in the end it is the same rules that then would apply for everyone. _________________________________________________________________________ Addicted to SETI crunching! Founder of GPU Users Group |
-= Vyper =- Send message Joined: 5 Sep 99 Posts: 1652 Credit: 1,065,191,981 RAC: 2,537 |
I remember this from last year when i noticed something with iGPU on Intel. Posted: 16 Sep 2015, 14:12:34 UTC Edit Hide Move I bought it solely to Crunch at iGPU and Cpu at the same time as the Igpu is powerful but i sold it and bought a 6700K instead. http://ark.intel.com/sv/products/88040/Intel-Core-i7-5775C-Processor-6M-Cache-up-to-3_70-GHz This processor couldn't do Astropulse and just paused the work and started the next unit, no one at lunatics had an answer that solved this back then either. _________________________________________________________________________ Addicted to SETI crunching! Founder of GPU Users Group |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.