Message boards :
Number crunching :
SSE2, SSE3, SSSE3, etc
Message board moderation
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
Urs Echternacht Send message Joined: 15 May 99 Posts: 692 Credit: 135,197,781 RAC: 211 |
That pushes the envelope back quite a long way. Don't run those Quads under Windows 95, guys! Smile, the joy of argueing. _\|/_ U r s |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
Of course if the application were to use an instruction that is not supported the app would likely crash immediately and you would know right away that you have chosen the wrong one for your platform. Ah, but that's the rub. I don't think it would crash. What follows is a gross oversimplification: For the sake of argument, lets assume for a moment that XMM0 stores a value that is the strength of a pulse. Lets give it the value of "2." Tasks switch, and for whatever reason FXSAVE doesn't work. The new application puts a different value in MMX0. Let's say, 1,000,000. Tasks switch again, and FXRSTOR leaves the 1,000,000 in MMX0. I think the work unit would finish, and report an incredibly strong spike. Again, gross oversimplification. Programmers like to keep values in registers because storing them in RAM is very much slower, but it is vital that the processor state be stored when task switching or things can get very, very bad. I don't know of any version of any OS that does not properly handle FXSAVE/FXRSTOR -- but I don't know how well Windows NT has been tested on a Core2 processor. |
Paul D Harris Send message Joined: 1 Dec 99 Posts: 1122 Credit: 33,600,005 RAC: 0 |
Who on earth would be using 95, 98 or NT and running any form of BOINC anyway? |
Paul D Harris Send message Joined: 1 Dec 99 Posts: 1122 Credit: 33,600,005 RAC: 0 |
Who on earth would be using 95, 98 or NT and running any form of BOINC anyway? According to BoincStats quite a lot of 98 users 24.5K and NT there are 3.4K and no 95 users. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
Who on earth would be using 95, 98 or NT and running any form of BOINC anyway? I have an NT 4.0 server that works perfectly, does a specific job just fine, and is in no desperate need of being upgraded. Should I stop running BOINC on it, or should I invest time and money to upgrade the OS when that won't change what it does, or make it run any faster or better? |
NewtonianRefractor Send message Joined: 19 Sep 04 Posts: 495 Credit: 225,412 RAC: 0 |
|
archae86 Send message Joined: 31 Aug 99 Posts: 909 Credit: 1,582,816 RAC: 0 |
|
gomeyer Send message Joined: 21 May 99 Posts: 488 Credit: 50,370,425 RAC: 0 |
Of course if the application were to use an instruction that is not supported the app would likely crash immediately and you would know right away that you have chosen the wrong one for your platform. Try running an SSE3 app on a CPU that only supports up to SSE or SSE2. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
On a 32 bit capable Windows OS. SSE through SSE4.1 all use the same 8 XMM registers (XMM0-XMM7), which are ALL saved in a context switch, as long as that OS supports at least SSE. SSE3 & SSSE3 are only supplementary additions adding a few instructions [that are generally really compound SSE2/SSE1 instructions] and no new registers whatsoever, so this requires no special OS support for saving of any more registers than The standard SSE ones. SSE4.1 adds only a few useful 'Application specific' instructions that are application targetted, uses the same registers, again. This is by design so that the OS aware of at least SSE-Level will context switch the registers properly through all SSE levels issued to date. These are bad ideas: - running Boinc, with an SSE+only app on pre SSE hardware will generate invalid instruction errors and crash the application. - using an Pre-SSE version of windows to run any SSE+ app - trying to run a 64 bit app on 32 bit OS, will cause an error - Using any Pre-MMX OS to run MMX + App will likely corrupt the FPU registers, as these are shared with MMX, and must be handled specially. I have written kernel level OS context switch code in the distant past, an SSE+ context switch Does require saving all the SSE registers, and in particular, careful handling of MMX & FPU registers (which is the tricky bit that can indeed cause computation errors, and is the source of serious, hard to find bugs. This is handled properly at assembly by issuing an EMMS instruction after MMX assembly level code (to preserve the MMX state), avoiding inline assembly, and using any compiler that properly handles MMX. There are no inline assembly instructions in the AK port, context switches are handled by compiler and SSE+ capable OSes in entirety. The responsibility of validation falls to the validators, and any error introduced by the use of SSE will be from platform or application variation between two reporting hosts will generally fall within the hysteris band of 'weakly similar' initiating a third results issue and 'Checked But no Concensus'. The likliehood of two results sharing the same identical platform/apps result variation is judged by the project as remote enough to have reduced the quorum to two. Now Stock application uses SSE2 code detected by means external to boinc/OS (Agner fog's ASMLIB extensions) , so will run MMX, SSE, SSE2 (and I think even some limited SSE3 in a chirp) as available on the CPU not OS/Boinc... If this is a problem of some sort that I'm not aware of, then the problem extends to Stock also. Jason "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
zoom3+1=4 Send message Joined: 30 Nov 03 Posts: 65763 Credit: 55,293,173 RAC: 49 |
On a 32 bit capable Windows OS. SSE through SSE4.1 all use the same 8 XMM registers (XMM0-XMM7), which are ALL saved in a context switch, as long as that OS supports at least SSE. Then this is looking more and more like a bunch of hooey(or a collection of hockey pucks) by somebody who may have skimmed the surface and jumped to a wrong conclusion possibly, In which case I'll keep on crunching and ignore this. Just My 2 cents. The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
Try running an SSE3 app on a CPU that only supports up to SSE or SSE2. This would be the opposite of what we've been discussing. If the optimized application checks the processor compatibility flags, it should just terminate. If it doesn't, and just tries the ops, it'll crash. Either way, I don't see the point. |
gomeyer Send message Joined: 21 May 99 Posts: 488 Credit: 50,370,425 RAC: 0 |
Try running an SSE3 app on a CPU that only supports up to SSE or SSE2. Then just what point were you trying to make when you answered my original post? The original comment was as aside thrown in at the end of the original and indeed had nothing to do with the main point. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
Jason, Last time I did OS level code, the 8086/8088 was pretty cool. As I read this, it should be almost exclusively an OS issue, as the OS is doing the switching, at least in a preemptive multitasking environment. It looks like the FXSAVE and FXRSTOR instructions want a nice big buffer, so there is room for future expansion. In reading AMD's documentation, however, it looks like those processors don't set or restore the exception flags. here Seems like a flaw, but I've been away from this level of programming for quite a while. Comments? -- Ned |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
Try running an SSE3 app on a CPU that only supports up to SSE or SSE2. Okay, I quoted the wrong bit. If the OS fails to save and restore the context correctly, I think a crash is unlikely -- it is more likely that the result will be wrong. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
These are all legitimate concerns, but most of them actually stem from the days of MMX's introduction. Difficulties were experienced in preserving an MMX state as they are shared with FPU registers. That DID cause a lot of problems, and still makes things tricky today, so MMX registers are often avoided in favour of SSE ones, for that very reason. Context switches also used to be expensive, so special flags were set limiting preservation to only those required. Wrong setting of that is generally catastrophic and won't merely exhibit a latent minor calculation error, but usually a severe crash or application error. As an aside: SSE exceptions are a weird and different beast as they are generally handled transparently by hardware, resulting in 'Assists', and 'Denormal results' are handled specially by the application defined means. not the OS. Any failure for the hardware to handle these would usually result in a computation error, or results that don't pass validation. Joe found one of these out in stock code, but we won't go into that.. So in a nutshell, where MMX&FPU exceptions need OS support, SSE usually do not, they are handled by Hardware & Application. What I can point out, is that the nature of the SSE+ vectorised code is such that, as context switches occur thousands of times per second, If register corruption were occurring, you'd know about it, basically instantly. There are thorough application tests performed before any release and these AK apps are the most thoroughly tested of any so far (So I'm told) Jason "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
gomeyer Send message Joined: 21 May 99 Posts: 488 Credit: 50,370,425 RAC: 0 |
Try running an SSE3 app on a CPU that only supports up to SSE or SSE2. Now that I agree with. |
Hofman's Atlantic Send message Joined: 6 Jan 05 Posts: 32 Credit: 11,359,969 RAC: 0 |
Stupid newby type question: How does BONIC determine cpu features? |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Stupid newby type question: How does BONIC determine cpu features? I believe, perhaps incorrectly, that Boinc uses OS Calls, which is not the method recommended by Intel or AMD. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
Most of what I'm doing these days is real-time, interrupt-driven, and network intensive -- but it doesn't even think about floating point, it's all integer math (and mostly shifts and masks). I like the C3 CPU alot. ... and I know you guys are doing thorough testing, and doing it right. My worry would be some other application using some advanced flags that aren't used in Outlook or Internet Explorer or MS-Office. That could be very long odds. It's unlikely that most of the crunchers here would ever see that. Either way, I think it's an OS error, not an application error. Thanks for your comments -- Ned |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
...Oh SSE+ does integer math too. really fast. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.