SSE2, SSE3, SSSE3, etc

Message boards : Number crunching : SSE2, SSE3, SSSE3, etc
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 692
Credit: 135,197,781
RAC: 211
Germany
Message 755847 - Posted: 20 May 2008, 0:28:39 UTC - in response to Message 755832.  

That pushes the envelope back quite a long way. Don't run those Quads under Windows 95, guys!

Smile, the joy of argueing.
_\|/_
U r s
ID: 755847 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 755861 - Posted: 20 May 2008, 1:15:21 UTC - in response to Message 755831.  

Of course if the application were to use an instruction that is not supported the app would likely crash immediately and you would know right away that you have chosen the wrong one for your platform.

Ah, but that's the rub. I don't think it would crash.

What follows is a gross oversimplification:

For the sake of argument, lets assume for a moment that XMM0 stores a value that is the strength of a pulse. Lets give it the value of "2."

Tasks switch, and for whatever reason FXSAVE doesn't work.

The new application puts a different value in MMX0. Let's say, 1,000,000.

Tasks switch again, and FXRSTOR leaves the 1,000,000 in MMX0.

I think the work unit would finish, and report an incredibly strong spike.

Again, gross oversimplification.

Programmers like to keep values in registers because storing them in RAM is very much slower, but it is vital that the processor state be stored when task switching or things can get very, very bad.

I don't know of any version of any OS that does not properly handle FXSAVE/FXRSTOR -- but I don't know how well Windows NT has been tested on a Core2 processor.
ID: 755861 · Report as offensive
Profile Paul D Harris
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 1122
Credit: 33,600,005
RAC: 0
United States
Message 755862 - Posted: 20 May 2008, 1:32:42 UTC

Who on earth would be using 95, 98 or NT and running any form of BOINC anyway?
ID: 755862 · Report as offensive
Profile Paul D Harris
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 1122
Credit: 33,600,005
RAC: 0
United States
Message 755863 - Posted: 20 May 2008, 1:39:57 UTC - in response to Message 755862.  
Last modified: 20 May 2008, 1:40:47 UTC

Who on earth would be using 95, 98 or NT and running any form of BOINC anyway?

According to BoincStats quite a lot of 98 users 24.5K and NT there are 3.4K and no 95 users.
ID: 755863 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 755865 - Posted: 20 May 2008, 1:55:04 UTC - in response to Message 755862.  

Who on earth would be using 95, 98 or NT and running any form of BOINC anyway?

I have an NT 4.0 server that works perfectly, does a specific job just fine, and is in no desperate need of being upgraded.

Should I stop running BOINC on it, or should I invest time and money to upgrade the OS when that won't change what it does, or make it run any faster or better?
ID: 755865 · Report as offensive
NewtonianRefractor
Volunteer tester
Avatar

Send message
Joined: 19 Sep 04
Posts: 495
Credit: 225,412
RAC: 0
United States
Message 755867 - Posted: 20 May 2008, 2:03:25 UTC

So who here is an assembly programmer that can shed light on all of this?
ID: 755867 · Report as offensive
archae86

Send message
Joined: 31 Aug 99
Posts: 909
Credit: 1,582,816
RAC: 0
United States
Message 755868 - Posted: 20 May 2008, 2:03:44 UTC - in response to Message 755862.  

Who on earth would be using 95, 98 or NT and running any form of BOINC anyway?
I have a 930 MHz Coppermine Dell host on Windows 98 SE which runs both Einstein and SETI.

Do you object?

ID: 755868 · Report as offensive
gomeyer
Volunteer tester

Send message
Joined: 21 May 99
Posts: 488
Credit: 50,370,425
RAC: 0
United States
Message 755871 - Posted: 20 May 2008, 2:25:12 UTC - in response to Message 755861.  

Of course if the application were to use an instruction that is not supported the app would likely crash immediately and you would know right away that you have chosen the wrong one for your platform.

Ah, but that's the rub. I don't think it would crash.

What follows is a gross oversimplification:

For the sake of argument, lets assume for a moment that XMM0 stores a value that is the strength of a pulse. Lets give it the value of "2."

Tasks switch, and for whatever reason FXSAVE doesn't work.

The new application puts a different value in MMX0. Let's say, 1,000,000.

Tasks switch again, and FXRSTOR leaves the 1,000,000 in MMX0.

I think the work unit would finish, and report an incredibly strong spike.

Again, gross oversimplification.

Programmers like to keep values in registers because storing them in RAM is very much slower, but it is vital that the processor state be stored when task switching or things can get very, very bad.

I don't know of any version of any OS that does not properly handle FXSAVE/FXRSTOR -- but I don't know how well Windows NT has been tested on a Core2 processor.

Try running an SSE3 app on a CPU that only supports up to SSE or SSE2.
ID: 755871 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 755880 - Posted: 20 May 2008, 2:52:23 UTC
Last modified: 20 May 2008, 2:58:47 UTC

On a 32 bit capable Windows OS. SSE through SSE4.1 all use the same 8 XMM registers (XMM0-XMM7), which are ALL saved in a context switch, as long as that OS supports at least SSE.

SSE3 & SSSE3 are only supplementary additions adding a few instructions [that are generally really compound SSE2/SSE1 instructions] and no new registers whatsoever, so this requires no special OS support for saving of any more registers than The standard SSE ones. SSE4.1 adds only a few useful 'Application specific' instructions that are application targetted, uses the same registers, again. This is by design so that the OS aware of at least SSE-Level will context switch the registers properly through all SSE levels issued to date.

These are bad ideas:
- running Boinc, with an SSE+only app on pre SSE hardware will generate invalid instruction errors and crash the application.
- using an Pre-SSE version of windows to run any SSE+ app
- trying to run a 64 bit app on 32 bit OS, will cause an error
- Using any Pre-MMX OS to run MMX + App will likely corrupt the FPU registers, as these are shared with MMX, and must be handled specially.

I have written kernel level OS context switch code in the distant past, an SSE+ context switch Does require saving all the SSE registers, and in particular, careful handling of MMX & FPU registers (which is the tricky bit that can indeed cause computation errors, and is the source of serious, hard to find bugs. This is handled properly at assembly by issuing an EMMS instruction after MMX assembly level code (to preserve the MMX state), avoiding inline assembly, and using any compiler that properly handles MMX.

There are no inline assembly instructions in the AK port, context switches are handled by compiler and SSE+ capable OSes in entirety.

The responsibility of validation falls to the validators, and any error introduced by the use of SSE will be from platform or application variation between two reporting hosts will generally fall within the hysteris band of 'weakly similar' initiating a third results issue and 'Checked But no Concensus'. The likliehood of two results sharing the same identical platform/apps result variation is judged by the project as remote enough to have reduced the quorum to two.

Now Stock application uses SSE2 code detected by means external to boinc/OS (Agner fog's ASMLIB extensions) , so will run MMX, SSE, SSE2 (and I think even some limited SSE3 in a chirp) as available on the CPU not OS/Boinc... If this is a problem of some sort that I'm not aware of, then the problem extends to Stock also.

Jason
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 755880 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65763
Credit: 55,293,173
RAC: 49
United States
Message 755886 - Posted: 20 May 2008, 3:03:41 UTC - in response to Message 755880.  

On a 32 bit capable Windows OS. SSE through SSE4.1 all use the same 8 XMM registers (XMM0-XMM7), which are ALL saved in a context switch, as long as that OS supports at least SSE.

SSE3 & SSSE3 are only supplementary additions adding a few instructions [that are generally really compound SSE2/SSE1 instructions] and no new registers whatsoever, so this requires no special OS support for saving of any more registers than The standard SSE ones. SSE4.1 adds only a few useful 'Application specific' instructions that are application targeted, uses the same registers, again. This is by design so that the OS aware of at least SSE-Level will context switch the registers properly through all SSE levels issued to date.

These are bad ideas:
- running Boinc, with an SSE+ enabled app on pre SSE hardware will generate invalid instruction errors and crash the application
- using an Pre-SSE version of windows to run any SSE+ app
- trying to run a 64 bit app on 32 bit OS, will cause an error
- Using any Pre-MMX OS to run MMX + App will corrupt the FPU registers

I have written kernel level OS context switch code in the distant past, an SSE+ context switch Does require saving all the SSE registers, and in particular, careful handling of MMX & FPU registers (which is the tricky bit that can indeed cause computation errors, and is the source of serious, hard to find bugs. This is handled properly at assembly by issuing an EMMS instruction after MMX assembly level code (to preserve the MMX state), avoiding inline assembly, and using any compiler that properly handles MMX.

There are no inline assembly instructions in the AK port, context switches are handled by compiler and SSE+ capable OSes in entirety.

The responsibility of validation falls to the validators, and any error introduced by the use of SSE will be from platform or application variation between two reporting hosts will generally fall within the hysteris band of 'weakly similar' initiating a third results issue and 'Checked But no Concensus'. The likliehood of two results sharing the same identical platform/apps result variation is judged by the project as remote enough to have reduced the quorum to two.

Now Stock application uses SSE2 code detected by means external to boinc/OS (Agner fog's ASMLIB extensions) , so will run MMX, SSE, SSE2 (and I think even some limited SSE3 in a chirp) as available on the CPU not OS/Boinc... If this is a problem of some sort that I'm not aware of, then the problem extends to Stock also.

Jason

Then this is looking more and more like a bunch of hooey(or a collection of hockey pucks) by somebody who may have skimmed the surface and jumped to a wrong conclusion possibly, In which case I'll keep on crunching and ignore this. Just My 2 cents.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 755886 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 755889 - Posted: 20 May 2008, 3:22:10 UTC - in response to Message 755871.  

Try running an SSE3 app on a CPU that only supports up to SSE or SSE2.

This would be the opposite of what we've been discussing.

If the optimized application checks the processor compatibility flags, it should just terminate.

If it doesn't, and just tries the ops, it'll crash.

Either way, I don't see the point.
ID: 755889 · Report as offensive
gomeyer
Volunteer tester

Send message
Joined: 21 May 99
Posts: 488
Credit: 50,370,425
RAC: 0
United States
Message 755892 - Posted: 20 May 2008, 3:27:57 UTC - in response to Message 755889.  
Last modified: 20 May 2008, 3:28:14 UTC

Try running an SSE3 app on a CPU that only supports up to SSE or SSE2.

This would be the opposite of what we've been discussing.

If the optimized application checks the processor compatibility flags, it should just terminate.

If it doesn't, and just tries the ops, it'll crash.

Either way, I don't see the point.

Then just what point were you trying to make when you answered my original post? The original comment was as aside thrown in at the end of the original and indeed had nothing to do with the main point.
ID: 755892 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 755893 - Posted: 20 May 2008, 3:33:18 UTC - in response to Message 755880.  


I have written kernel level OS context switch code in the distant past, an SSE+ context switch Does require saving all the SSE registers, and in particular, careful handling of MMX & FPU registers (which is the tricky bit that can indeed cause computation errors, and is the source of serious, hard to find bugs. This is handled properly at assembly by issuing an EMMS instruction after MMX assembly level code (to preserve the MMX state), avoiding inline assembly, and using any compiler that properly handles MMX.

There are no inline assembly instructions in the AK port, context switches are handled by compiler and SSE+ capable OSes in entirety.

Jason,

Last time I did OS level code, the 8086/8088 was pretty cool.

As I read this, it should be almost exclusively an OS issue, as the OS is doing the switching, at least in a preemptive multitasking environment.

It looks like the FXSAVE and FXRSTOR instructions want a nice big buffer, so there is room for future expansion.

In reading AMD's documentation, however, it looks like those processors don't set or restore the exception flags. here

Seems like a flaw, but I've been away from this level of programming for quite a while. Comments?

-- Ned

ID: 755893 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 755894 - Posted: 20 May 2008, 3:41:01 UTC - in response to Message 755892.  

Try running an SSE3 app on a CPU that only supports up to SSE or SSE2.

This would be the opposite of what we've been discussing.

If the optimized application checks the processor compatibility flags, it should just terminate.

If it doesn't, and just tries the ops, it'll crash.

Either way, I don't see the point.

Then just what point were you trying to make when you answered my original post? The original comment was as aside thrown in at the end of the original and indeed had nothing to do with the main point.

Okay, I quoted the wrong bit.

If the OS fails to save and restore the context correctly, I think a crash is unlikely -- it is more likely that the result will be wrong.
ID: 755894 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 755895 - Posted: 20 May 2008, 3:49:14 UTC - in response to Message 755893.  
Last modified: 20 May 2008, 3:51:22 UTC


I have written kernel level OS context switch code in the distant past, an SSE+ context switch Does require saving all the SSE registers, and in particular, careful handling of MMX & FPU registers (which is the tricky bit that can indeed cause computation errors, and is the source of serious, hard to find bugs. This is handled properly at assembly by issuing an EMMS instruction after MMX assembly level code (to preserve the MMX state), avoiding inline assembly, and using any compiler that properly handles MMX.

There are no inline assembly instructions in the AK port, context switches are handled by compiler and SSE+ capable OSes in entirety.

Jason,

Last time I did OS level code, the 8086/8088 was pretty cool.

As I read this, it should be almost exclusively an OS issue, as the OS is doing the switching, at least in a preemptive multitasking environment.

It looks like the FXSAVE and FXRSTOR instructions want a nice big buffer, so there is room for future expansion.

In reading AMD's documentation, however, it looks like those processors don't set or restore the exception flags. here

Seems like a flaw, but I've been away from this level of programming for quite a while. Comments?

-- Ned


These are all legitimate concerns, but most of them actually stem from the days of MMX's introduction. Difficulties were experienced in preserving an MMX state as they are shared with FPU registers. That DID cause a lot of problems, and still makes things tricky today, so MMX registers are often avoided in favour of SSE ones, for that very reason. Context switches also used to be expensive, so special flags were set limiting preservation to only those required. Wrong setting of that is generally catastrophic and won't merely exhibit a latent minor calculation error, but usually a severe crash or application error.

As an aside:
SSE exceptions are a weird and different beast as they are generally handled transparently by hardware, resulting in 'Assists', and 'Denormal results' are handled specially by the application defined means. not the OS. Any failure for the hardware to handle these would usually result in a computation error, or results that don't pass validation. Joe found one of these out in stock code, but we won't go into that.. So in a nutshell, where MMX&FPU exceptions need OS support, SSE usually do not, they are handled by Hardware & Application.

What I can point out, is that the nature of the SSE+ vectorised code is such that, as context switches occur thousands of times per second, If register corruption were occurring, you'd know about it, basically instantly. There are thorough application tests performed before any release and these AK apps are the most thoroughly tested of any so far (So I'm told)

Jason
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 755895 · Report as offensive
gomeyer
Volunteer tester

Send message
Joined: 21 May 99
Posts: 488
Credit: 50,370,425
RAC: 0
United States
Message 755897 - Posted: 20 May 2008, 3:51:03 UTC - in response to Message 755894.  

Try running an SSE3 app on a CPU that only supports up to SSE or SSE2.

This would be the opposite of what we've been discussing.

If the optimized application checks the processor compatibility flags, it should just terminate.

If it doesn't, and just tries the ops, it'll crash.

Either way, I don't see the point.

Then just what point were you trying to make when you answered my original post? The original comment was as aside thrown in at the end of the original and indeed had nothing to do with the main point.

Okay, I quoted the wrong bit.

If the OS fails to save and restore the context correctly, I think a crash is unlikely -- it is more likely that the result will be wrong.

Now that I agree with.
ID: 755897 · Report as offensive
Hofman's Atlantic
Volunteer tester

Send message
Joined: 6 Jan 05
Posts: 32
Credit: 11,359,969
RAC: 0
United States
Message 755898 - Posted: 20 May 2008, 3:55:28 UTC

Stupid newby type question: How does BONIC determine cpu features?
ID: 755898 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 755901 - Posted: 20 May 2008, 4:07:42 UTC - in response to Message 755898.  
Last modified: 20 May 2008, 4:08:27 UTC

Stupid newby type question: How does BONIC determine cpu features?

I believe, perhaps incorrectly, that Boinc uses OS Calls, which is not the method recommended by Intel or AMD.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 755901 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 755902 - Posted: 20 May 2008, 4:12:22 UTC - in response to Message 755895.  


As an aside:
SSE exceptions are a weird and different beast as they are generally handled transparently by hardware, resulting in 'Assists', and 'Denormal results' are handled specially by the application defined means. not the OS. Any failure for the hardware to handle these would usually result in a computation error, or results that don't pass validation. Joe found one of these out in stock code, but we won't go into that.. So in a nutshell, where MMX&FPU exceptions need OS support, SSE usually do not, they are handled by Hardware & Application.

What I can point out, is that the nature of the SSE+ vectorised code is such that, as context switches occur thousands of times per second, If register corruption were occurring, you'd know about it, basically instantly. There are thorough application tests performed before any release and these AK apps are the most thoroughly tested of any so far (So I'm told)

Jason

Most of what I'm doing these days is real-time, interrupt-driven, and network intensive -- but it doesn't even think about floating point, it's all integer math (and mostly shifts and masks). I like the C3 CPU alot.

... and I know you guys are doing thorough testing, and doing it right.

My worry would be some other application using some advanced flags that aren't used in Outlook or Internet Explorer or MS-Office. That could be very long odds. It's unlikely that most of the crunchers here would ever see that.

Either way, I think it's an OS error, not an application error.

Thanks for your comments -- Ned
ID: 755902 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 755903 - Posted: 20 May 2008, 4:17:44 UTC - in response to Message 755902.  

...
Most of what I'm doing these days is real-time, interrupt-driven, and network intensive -- but it doesn't even think about floating point, it's all integer math (and mostly shifts and masks). I like the C3 CPU alot.
...
Oh SSE+ does integer math too. really fast.

"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 755903 · Report as offensive
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : SSE2, SSE3, SSSE3, etc


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.