Message boards :
Number crunching :
Monitoring inconclusive GBT validations and harvesting data for testing
Message board moderation
Previous · 1 . . . 22 · 23 · 24 · 25 · 26 · 27 · 28 . . . 36 · Next
Author | Message |
---|---|
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Wiki's Advanced Vector Extensions page says that for x86, FMA only became available with AVX2 - as your intel blog reply already told us. But again, hardly that x86 is what that used on iGPU. Hehe... https://software.intel.com/en-us/forums/opencl/topic/277001 Unfortunately the offline compiler only displays CPU asm and we do not currently expose the graphics ISA from this tool, even though the ISA is available as part of the linux graphics documentation (http://intellinuxgraphics.org/documentation.html). https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-skl-vol02a-commandreference-instructions.pdf SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Multiply Add
Multiply Add for Macro As one say черт ногу Ñломит :D And seems no FMA per se, BTW. And definition of MAD doesn't discuss any precision considerations. Pseudocode is simple: dst.chan[n] = src1.chan[n] * src2.chan[n] + src0.chan[n]; how rounding occurs - not specified. Bravo, Intel's manual writers, decades of development did not vanish... :P EDIT2: Well, my timeslice for iGPU finished. I found no discussion of precision of iGPU. If someone find it please give the reference. SETI apps news We're not gonna fight them. We're gonna transcend them. |
MarkJ Send message Joined: 17 Feb 08 Posts: 1139 Credit: 80,854,192 RAC: 5 |
So try test iGPU build. 8 cooking on beta using 8.19 app as we speak. Will take a couple of hours for them to be done. All are Arecibo at the moment, no sign of guppis. Tasks BOINC blog |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
So try test iGPU build. beta app will produce inconclusives (but hceck this as baseline). Then check test app: https://cloud.mail.ru/public/2aUP/dborYAw9G SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
And for reference: https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-hsw-commandreference-instructions_0_0.pdf SETI apps news We're not gonna fight them. We're gonna transcend them. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
... With Cuda code (for comparison only), the instructions resolve to either MADs, MAFSs or Adds and Muls depending on architecture, yielding different results, so there are some similarities in the situation. It isn't as sensitive though, because CUFFT library is used instead of self compiled OCLFFT. The way around this in Cuda is relatively simple, using the example Answer_mul = float0 * float1; Answer_add = Answer_mul + float2; it becomes wired by hand as corresponding intrinsics in sensitive places, that generate explicit instructions, or Inline PTX assembly, which the compiler cannot optimise or change. The OpenCL situation may be murkier, with its wider range of hardware. I would have expected Intel to have provided some math.h or similar in their SDK, with either intrinsic functions, override switch of MADs, or vendor extensions with assembly... but not something I've looked at directly for the Intel case, due to bot actively running such a GPU. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
And that oclFFT (we share with Einstein btw) uses mad() in code generator. Well, I think time to look for Skylake results from modded binary I posted. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Just for comparison with Intel: how MAD description sounds for HD6900: http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/AMD_HD_6900_Series_Instruction_Set_Architecture.pdf Floating-Point Multiply-Add SETI apps news We're not gonna fight them. We're gonna transcend them. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14672 Credit: 200,643,578 RAC: 874 |
I would prefer to find original Intel's thread about this issue. Christian has replied It was a direct mail exchange with an Intel developer where I got the explanation from. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
I would prefer to find original Intel's thread about this issue. So awaiting results from new build. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Juha Send message Joined: 7 Mar 04 Posts: 388 Credit: 1,857,738 RAC: 0 |
https://www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/mad.html There's another post in Einstein forums, Intel GPU brp app returns incorect results with beignet 1.2 drivers. In Beignet 1.2 FP_CONTRACT was switched to ON and the code generated for x*y+z was changed from MUL+ADD to MAD. (commit) If I'm reading FP_CONTRACT documentation correctly it seems that implementations are supposed to use fused instructions unless told otherwise. What it doesn't say is whether FMA or MAD should be used, but since there's --cl_mad_enable compiler option I suppose FMA should be used. I can imagine Windows drivers have made similar change earlier. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Not sure that contract expression means using mad/fma. It's more generalized term than just MAD instead of ab+c substitution. AMD docs directly state that such replacement will not occur. But (as usual) worth to try. When I will get feedback from already provided build I could try to disable this pragma too. Also, would be interesting to add to CLinfo printing of FMA status. What we will see for different devices/platforms?... EDIT: from comitted code looks like they map mad to hardware mad now instead of emulating it via mul and add. Still it doesn't imply silent replacement of ab+c to mad(a,b,c) but also it will change behavior of mad(a,b,c) call and as I said earlier oclFFT heavely using mad. The question to Intel is: why their mad so imprecise versus 2 other vendors?? (BTW, iGPU imprecision in native trigonometry was demonstrated by Einstein's team before, in oclFFT. ) SETI apps news We're not gonna fight them. We're gonna transcend them. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
The question to Intel is: why their mad so imprecise versus 2 other vendors?can be as simple as a float implemented as double half float hardware emulation sequence, or some other shortcut. Maybe they even use something like x87 80 bit intermediate registers underneath, or blocks of pentium circuits with fdiv bugs (j/k) That aside, using fma's etc changes algorithms, and error growth. So you'll see different codelets even in fftw CPU sources to compensate. We don't completely escape problems on Cuda either, especially from Pre-compute capability 1.3 not having doubles, nor IEEE 754 compliance, and fma not coming until much later. We escape a lot though, because of CUFFT hard wired paths, and we use a fair whack of intrinsics already (more assembly gradually) "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
MarkJ Send message Joined: 17 Feb 08 Posts: 1139 Credit: 80,854,192 RAC: 5 |
When I will get feedback from already provided build I could try to disable this pragma too. Sorry for the delay. Work intervened. Another set running using new app. Same hosts as before. I also snagged some guppies this time. BOINC blog |
MarkJ Send message Joined: 17 Feb 08 Posts: 1139 Credit: 80,854,192 RAC: 5 |
When I will get feedback from already provided build I could try to disable this pragma too. Looking through the results it would seem v8.19 supplied by beta validate most of the time. The r3525's are almost all inconclusive or invalid. BOINC blog |
Juha Send message Joined: 7 Mar 04 Posts: 388 Credit: 1,857,738 RAC: 0 |
EDIT: from comitted code looks like they map mad to hardware mad now instead of emulating it via mul and add. I could be mistaken but I think that is really what it now does. LLVM uses fmuladd to let code generator decide between using mul+add or fma. llvm.fmuladd |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
EDIT: from comitted code looks like they map mad to hardware mad now instead of emulating it via mul and add. Thanks, perhaps you are right. That means detailed definition regarding precision behavior is required for iGPU MAD/MAC/"macro MAD". I'll try to disable corresponding macro in code. Will see if it help. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
New build: https://cloud.mail.ru/public/EbPU/q7ZKhRnYV More details on beta: https://setiweb.ssl.berkeley.edu/beta//forum_thread.php?id=2266&postid=59828 SETI apps news We're not gonna fight them. We're gonna transcend them. |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
FWIW, I just had an overflow task running SoG r3528 get ganged up on by a pair of x41p_zi3j Petri Specials. It was really an extreme case where my host found 30 Pulses while the two Special hosts found 30 Triplets. The WU is 2295032503, although it's now too late to grab the file since I didn't spot the Inconclusive before the second Special host reported. |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
FWIW, I just had an overflow task running SoG r3528 get ganged up on by a pair of x41p_zi3j Petri Specials. It was really an extreme case where my host found 30 Pulses while the two Special hosts found 30 Triplets. The WU is 2295032503, although it's now too late to grab the file since I didn't spot the Inconclusive before the second Special host reported. Someone else could say this: I think it is a 'bad' packet having noisy data. This time it was reported as 'bad' by a different version of software looking into something else before looking into something different but still something 'broken'. EDIT: and each time it could still be something, although probably noise. To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.