SSE2, SSE3, SSSE3, etc

Message boards : Number crunching : SSE2, SSE3, SSSE3, etc
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 755711 - Posted: 19 May 2008, 19:55:54 UTC

Can I draw the attention of NC readers / posters to this post by JM7 which seems to disagree violently with sentiments that have often been expressed here.

F.
ID: 755711 · Report as offensive
Profile dnolan
Avatar

Send message
Joined: 30 Aug 01
Posts: 1228
Credit: 47,779,411
RAC: 32
United States
Message 755717 - Posted: 19 May 2008, 20:14:00 UTC

Very interesting, so then the normal advice given out here, to use cpu-z to figure out what you can use, is wrong?!?

ID: 755717 · Report as offensive
archae86

Send message
Joined: 31 Aug 99
Posts: 909
Credit: 1,582,816
RAC: 0
United States
Message 755728 - Posted: 19 May 2008, 20:45:56 UTC - in response to Message 755711.  

Can I draw the attention of NC readers / posters to this post by JM7 which seems to disagree violently with sentiments that have often been expressed here.

F.
Very interesting. I run two Conroe-generation hosts, under Windows XP SP2. Both the E6600 and the Q6600 report:

Processor features: fpu tsc pae nx sse sse2 mmx

I've been using the (relatively) new Windows port of Alex Kan's V8 code in its "SSSE3X" flavor on both.

As Windows XP SP2 and the Conroes are both pretty popular here, I suspect I have lots of company.

If, as JM7 says, we are hurting the science, we should surely downgrade to the SSE2 version, I infer. But it seems a bit odd that I'm not seeing validation errors from a system which has active enough non-BOINC use that I'd expect it does a lot of dangerous context switches.

JM7's comment quite specifically invokes the case of two processes on your system _both_ using a feature whose state is not saved by the OS. Perhaps I have been safe so far through having no SSSE3-using code on my host save AK_v8_win_SSSE3x.exe, and through having run most the time at settings assuring there would be stably one of those and three Einsteins. However, recently I've been running a far higher SETI share, and have seen one to three SETIs regularly, and four at least once.

On the other hand I _have_ had a few crashes, both of the system in general and more narrowly of the BOINC process. I've been assuming these are either bad luck or symptoms that my overclock was greater than the current workload would reliably support. (I actually backed down my Q6600 all the way from 3.006 GHz to stock 2.4, and have so far inched my way back up to 2.7).

One of my problems is that I'm not clear on which features BOINC can be expected to be report, nor whether its terms will precisely match the code developer's designations. I'm quite sure, for example, that whatever the "Xeon optimization" is exactly in this series of aps, it does not, in fact, require that the processor be in any sense a Xeon in order to function properly.



ID: 755728 · Report as offensive
Profile SATAN
Avatar

Send message
Joined: 27 Aug 06
Posts: 835
Credit: 2,129,006
RAC: 0
United Kingdom
Message 755745 - Posted: 19 May 2008, 21:21:30 UTC

When I had a windows machine crunching I ran what CPU-Z told me. It was a T5500, it ran full out for over a year without problem.

If I recall at the time they were released Simon stated that you should run what the CPU is capable of not what windows is telling you.

My mother now has the laptop and has not had a single problem with it. In fact with the addition of XPSP3 the machine has seen up to a 20% increase in performance depending on what else she is doing.
ID: 755745 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 755760 - Posted: 19 May 2008, 21:29:02 UTC - in response to Message 755728.  
Last modified: 19 May 2008, 21:30:31 UTC

Can I draw the attention of NC readers / posters to this post by JM7 which seems to disagree violently with sentiments that have often been expressed here.

F.
Very interesting. I run two Conroe-generation hosts, under Windows XP SP2. Both the E6600 and the Q6600 report:

Processor features: fpu tsc pae nx sse sse2 mmx

I've been using the (relatively) new Windows port of Alex Kan's V8 code in its "SSSE3X" flavor on both.

As Windows XP SP2 and the Conroes are both pretty popular here, I suspect I have lots of company.

If, as JM7 says, we are hurting the science, we should surely downgrade to the SSE2 version, I infer. But it seems a bit odd that I'm not seeing validation errors from a system which has active enough non-BOINC use that I'd expect it does a lot of dangerous context switches.

JM7's comment quite specifically invokes the case of two processes on your system _both_ using a feature whose state is not saved by the OS. Perhaps I have been safe so far through having no SSSE3-using code on my host save AK_v8_win_SSSE3x.exe, and through having run most the time at settings assuring there would be stably one of those and three Einsteins. However, recently I've been running a far higher SETI share, and have seen one to three SETIs regularly, and four at least once.

On the other hand I _have_ had a few crashes, both of the system in general and more narrowly of the BOINC process. I've been assuming these are either bad luck or symptoms that my overclock was greater than the current workload would reliably support. (I actually backed down my Q6600 all the way from 3.006 GHz to stock 2.4, and have so far inched my way back up to 2.7).

One of my problems is that I'm not clear on which features BOINC can be expected to be report, nor whether its terms will precisely match the code developer's designations. I'm quite sure, for example, that whatever the "Xeon optimization" is exactly in this series of aps, it does not, in fact, require that the processor be in any sense a Xeon in order to function properly.



Note that JM7 does not say anything about BOINC stopping or system crashes; he says the answer "will be wrong". I interpret that to mean that we should see occasional validation errors that we are unable to explain [edit] and I have never seen any validation errors that I couldn't explain, that I can recall[/edit]. I have always run based on CPUZ - wonder if this could be a difference between Opti's and Stock (i.e. the Opti's iron out whatever it is the Stock App that would cause the "wrong answers")? Maybe one of the Lunatics crew might be able to shed some light here?

F.
ID: 755760 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 692
Credit: 135,197,781
RAC: 211
Germany
Message 755768 - Posted: 19 May 2008, 21:39:58 UTC

I think JM7 is partly wrong with his argumentation here, because even the stock seti@home application (v5.27/v5.28) uses more CPU-features up to CPU's abilities than, for example, W2Ksp4 is able to say it can use (shows only MMX SSE within BOINC). And i have big doubts that Eric Korpela, responsible for the seti@home application, is risking to release a science app that puts the 'Science' at risk.

_\|/_
U r s
ID: 755768 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 755770 - Posted: 19 May 2008, 21:44:35 UTC - in response to Message 755768.  

I think JM7 is partly wrong with his argumentation here, because even the stock seti@home application (v5.27/v5.28) uses more CPU-features up to CPU's abilities than, for example, W2Ksp4 is able to say it can use (shows only MMX SSE within BOINC). And i have big doubts that Eric Korpela, responsible for the seti@home application, is risking to release a science app that puts the 'Science' at risk.

JM7 is not referring to the stock application, but the enhanced applications released by others.

Most of those use instruction sets that go beyond the original x86 instruction set, which is not a problem.

But, if there is some unique SSE3 register used in the SSE3 "V8" application, and the underlying OS does not properly save that register, then the register could change when the OS switches tasks.

That would introduce some randomness into the results.

... or a lot of randomness, depending.
ID: 755770 · Report as offensive
Profile Paul D Harris
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 1122
Credit: 33,600,005
RAC: 0
United States
Message 755774 - Posted: 19 May 2008, 21:48:48 UTC

According to Wikipedia on SSSE3 http://en.wikipedia.org/wiki/SSSE3]http://en.wikipedia.org/wiki/SSSE3 is an offical extension
ID: 755774 · Report as offensive
Profile Paul D Harris
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 1122
Credit: 33,600,005
RAC: 0
United States
Message 755776 - Posted: 19 May 2008, 21:50:28 UTC - in response to Message 755774.  
Last modified: 19 May 2008, 21:50:59 UTC

According to Wikipedia on SSSE3 http://en.wikipedia.org/wiki/SSSE3 is an offical extension

I made a correction to the link
ID: 755776 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 755796 - Posted: 19 May 2008, 22:24:16 UTC - in response to Message 755745.  

When I had a windows machine crunching I ran what CPU-Z told me. It was a T5500, it ran full out for over a year without problem.

If I recall at the time they were released Simon stated that you should run what the CPU is capable of not what windows is telling you.

My mother now has the laptop and has not had a single problem with it. In fact with the addition of XPSP3 the machine has seen up to a 20% increase in performance depending on what else she is doing.

Ah, but what version of Windows?

SSE adds a new set of registers, XMM0 through XMM7. They're 128 bits wide.

Let's say that SETI is running, and then the OS switches tasks. There are values in XMM0 that represent intermediate calculations.

The OS (let's say NT 4.0 since that is a "safe" choice) does not preserve XMM0 because XMM0 didn't exist when it was developed.

The new task runs for a bit, and it uses SSE, and changes XMM0.

The task switch goes back to SETI, but the prior value in XMM0 is not restored, and SETI continues with the wrong value in the register.

If you run a fairly new version of Windows, that is SSE-aware, it's not a problem. If your machine is a dedicated cruncher and nothing else touches the XMM registers, it's not a problem.

JM7's warning isn't "stick to the lowest safe instruction set" but be aware that you might have trouble if you run an older OS, a newer CPU, and the best-match optimized SETI application.
ID: 755796 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 755798 - Posted: 19 May 2008, 22:28:49 UTC - in response to Message 755770.  

I think JM7 is partly wrong with his argumentation here, because even the stock seti@home application (v5.27/v5.28) uses more CPU-features up to CPU's abilities than, for example, W2Ksp4 is able to say it can use (shows only MMX SSE within BOINC). And i have big doubts that Eric Korpela, responsible for the seti@home application, is risking to release a science app that puts the 'Science' at risk.

JM7 is not referring to the stock application, but the enhanced applications released by others.

Most of those use instruction sets that go beyond the original x86 instruction set, which is not a problem.

But, if there is some unique SSE3 register used in the SSE3 "V8" application, and the underlying OS does not properly save that register, then the register could change when the OS switches tasks.

That would introduce some randomness into the results.

... or a lot of randomness, depending.

But where I believe he is wrong is in his assertion that it will "harm the science". It could result in an occasional "Validate Error" and call on a third cruncher but... And I repeat, I have not got a single "Validate Error" in my logs for the last 8 months despite running Chicken SSSE3, Crunch3r SSSE3, AK SSSE3x and AK SSE4.1 under XP Home SP2 on my rig as it has progressed from an E6400 to a Q6600 to a Q9450. Boinc says that I should be running nothing higher than sse2!!

If a "Validate Error" less than once a year is the price that has to be paid, then I will stick with running the SSSE3x thank you very much.

F.
ID: 755798 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 755800 - Posted: 19 May 2008, 22:34:11 UTC - in response to Message 755796.  


<snip>
JM7's warning isn't "stick to the lowest safe instruction set" but be aware that you might have trouble if you run an older OS, a newer CPU, and the best-match optimized SETI application.

No. JM7 said categorically "When you are choosing an optimized application, use the ones that BOINC says the operating system supports, not what CPUZ says the chipset supports."

And he made a big thing of it with his bolding.

So I should be using the SSE2 Op App on my Q9450 under XP Home (since that is what BOINC tells me)??

F.
ID: 755800 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 692
Credit: 135,197,781
RAC: 211
Germany
Message 755802 - Posted: 19 May 2008, 22:39:27 UTC

To widen JM7's line of argumentation here is the part he left out:
http://en.wikipedia.org/wiki/Non-blocking_algorithm

The more or the less, blocking a task will do.
_\|/_
U r s
ID: 755802 · Report as offensive
NewtonianRefractor
Volunteer tester
Avatar

Send message
Joined: 19 Sep 04
Posts: 495
Credit: 225,412
RAC: 0
United States
Message 755803 - Posted: 19 May 2008, 22:42:08 UTC
Last modified: 19 May 2008, 22:48:13 UTC

I only found this:
http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions#Registers

Because these 128-bit registers are additional program states that the operating system must preserve across task switches, they are disabled by default until the operating system explicitly enables them. This means that the OS must know how to use the FXSAVE and FXRSTOR instructions, which is the extended pair of instructions which can save all x87 and SSE register states all at once. This support was quickly added to all major IA-32 operating systems.


*edit*
http://my.opera.com/reversing/blog/show.dml/430042.
Could that be helpful?
ID: 755803 · Report as offensive
archae86

Send message
Joined: 31 Aug 99
Posts: 909
Credit: 1,582,816
RAC: 0
United States
Message 755809 - Posted: 19 May 2008, 22:56:41 UTC - in response to Message 755803.  

I only found this:
http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions#Registers

Because these 128-bit registers are additional program states that the operating system must preserve across task switches, they are disabled by default until the operating system explicitly enables them. This means that the OS must know how to use the FXSAVE and FXRSTOR instructions, which is the extended pair of instructions which can save all x87 and SSE register states all at once. This support was quickly added to all major IA-32 operating systems.
Fascinating. So the questions arise "1. does the operational definition of the scope of the FXSAVE/FXRSTOR pair reliably extend to whatever extensions are present in a particular x86 implemenation? 2. does the OS reliably use this pair?"

It would seem (to this amateur in that part of the discipline) that if both answers are "yes", then regarding the specific risk articulated here that OS "support" for an extension level might not govern whether it has its required state reliably preserved.

Again, to an amateur's eye, the following references appear, at least indirectly, to support the multi-version safe hypothesis, though better eyes may find important caveats therein:

reference 1

reference 2

ID: 755809 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 755824 - Posted: 19 May 2008, 23:23:02 UTC

This is a fascinating discussion. I've been googling around since the question was posted, and the most significant feature seems to be the lack of any results from microsoft.com

I'd like to pose the question specifically in the context of standard, consumer-grade, 32-bit Windows XP - since that's what most of the world's SSE3 and SSSE3 processors will be run under.

That Wikipedia line, "This support was quickly added to all major IA-32 operating systems.", would seem to apply absolutely to the XP market: but that also implies that there might be some versions of XP (RTM, 'Release to Manufacturing'), which didn't include FXSAVE/FXRSTOR as a safety net. So, and I'm guessing here, was it added later? If so, when? An obvious place to look would be a service pack. Which one?

I'm heading towards provisional guidance of "Don't run an SSE3 or higher optimised app under Windows XP, without first applying at least SP2 and the latest hotfixes from Windows Update." That's very much a seat-of-the-pants guesstimate at this stage, but like Fred, I feel that the real-world experience of crunchers here suggests that the problem, while obviously a potential threat, is actually much less likely than John made it sound.
ID: 755824 · Report as offensive
NewtonianRefractor
Volunteer tester
Avatar

Send message
Joined: 19 Sep 04
Posts: 495
Credit: 225,412
RAC: 0
United States
Message 755826 - Posted: 19 May 2008, 23:27:14 UTC - in response to Message 755824.  

This is a fascinating discussion. I've been googling around since the question was posted, and the most significant feature seems to be the lack of any results from microsoft.com

I'd like to pose the question specifically in the context of standard, consumer-grade, 32-bit Windows XP - since that's what most of the world's SSE3 and SSSE3 processors will be run under.

That Wikipedia line, "This support was quickly added to all major IA-32 operating systems.", would seem to apply absolutely to the XP market: but that also implies that there might be some versions of XP (RTM, 'Release to Manufacturing'), which didn't include FXSAVE/FXRSTOR as a safety net. So, and I'm guessing here, was it added later? If so, when? An obvious place to look would be a service pack. Which one?

I'm heading towards provisional guidance of "Don't run an SSE3 or higher optimised app under Windows XP, without first applying at least SP2 and the latest hotfixes from Windows Update." That's very much a seat-of-the-pants guesstimate at this stage, but like Fred, I feel that the real-world experience of crunchers here suggests that the problem, while obviously a potential threat, is actually much less likely than John made it sound.


I was wondering if some linux documentation might mention info about this. Since you are compiling most of the stuff though...
ID: 755826 · Report as offensive
gomeyer
Volunteer tester

Send message
Joined: 21 May 99
Posts: 488
Credit: 50,370,425
RAC: 0
United States
Message 755831 - Posted: 19 May 2008, 23:39:46 UTC
Last modified: 19 May 2008, 23:56:42 UTC

Let me try to state this another way. I see two issues here:

- JM7's statement that "Using a processor extension that the Operating System does not support is a very bad idea." I agree with that as far as it goes . . .

and

- His apparent assumption that this implies OS's reporting only support for SSE2 do not support SSE3. That is what the original poster to the thread was asking. If that is what he was saying then I would have to respectfully disagree.

As others here have already stated more or less, I THINK it was the SSE specification that added 8 new registers to the CPU which must be saved by the OS when task switching, but the SSE2 and SSE3 [edit] and SSSE3 [/edit] extensions added instructions only, not new registers, to the instruction set. So for purposes of task switching only, any OS reporting support for SSE or higher will support all three since those 8 registers will be saved by virtue of SSE.

Of course if the application were to use an instruction that is not supported the app would likely crash immediately and you would know right away that you have chosen the wrong one for your platform.
ID: 755831 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 755832 - Posted: 19 May 2008, 23:45:50 UTC

Here's an interesting post from early 1998: Intel to field performance-enhancing instructions in Deschutes

It appears that the FXSAVE and FXRSTOR instructions first appeared in the 333-MHz, 0.25-micron implementation of Pentium II ... unveiled in January 1998.

And note the line "Indeed, it's expected the instructions will be exploited by Windows 98 and Windows NT 5.0, two upcoming offerings from Microsoft Corp."

That pushes the envelope back quite a long way. Don't run those Quads under Windows 95, guys!
ID: 755832 · Report as offensive
archae86

Send message
Joined: 31 Aug 99
Posts: 909
Credit: 1,582,816
RAC: 0
United States
Message 755838 - Posted: 19 May 2008, 23:55:47 UTC - in response to Message 755832.  

Don't run those Quads under Windows 95, guys!
Roger that, sir.

ID: 755838 · Report as offensive
1 · 2 · 3 · Next

Message boards : Number crunching : SSE2, SSE3, SSSE3, etc


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.