Optimized windows clients - plz help listing cpu times

Message boards : Number crunching : Optimized windows clients - plz help listing cpu times
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

AuthorMessage
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 141993 - Posted: 24 Jul 2005, 15:15:45 UTC

I'm sorry for hijacking your thread Speedy, I didn't realize it would take up so many posts. Sorry.

Thanks to all who have tried to help. I'll be waiting till I'm back in SC to try to fix this. Your appreciation at that time continues to be appreciated.

tony
ID: 141993 · Report as offensive
Profile Speedy67 & Friends
Volunteer tester
Avatar

Send message
Joined: 14 Jul 99
Posts: 335
Credit: 1,178,138
RAC: 0
Netherlands
Message 142053 - Posted: 24 Jul 2005, 16:04:49 UTC - in response to Message 141828.  


What I found out was that running a single seti client ends up in the sime CPU time both with HT enabled and disabled (give or take a couple of seconds).


That's interesting! I was always under the impression that maximum single-thread performance was only possible with HT disabled. I don't know if the process that uses only one 'virtual' cpu can use the full L2 cache when the other 'virtual' cpu is inactive? Anybody who can shine a light on this?

Greetings,
Speedy67


ID: 142053 · Report as offensive
Profile Speedy67 & Friends
Volunteer tester
Avatar

Send message
Joined: 14 Jul 99
Posts: 335
Credit: 1,178,138
RAC: 0
Netherlands
Message 142067 - Posted: 24 Jul 2005, 16:10:43 UTC

Hi all,

Thank you all for the response so far!

Could anyone send me the official 4.18 seti-client by e-mail (speedy67 at marisan.nl) so I can bench it? Might be a nice extra to see what the difference is between the optimized clients and the official 4.18.

Greetings,
Speedy67



ID: 142067 · Report as offensive
MiCrO
Avatar

Send message
Joined: 5 Apr 00
Posts: 48
Credit: 43,924,114
RAC: 7
Germany
Message 142146 - Posted: 24 Jul 2005, 17:20:22 UTC - in response to Message 142053.  


What I found out was that running a single seti client ends up in the sime CPU time both with HT enabled and disabled (give or take a couple of seconds).


That's interesting! I was always under the impression that maximum single-thread performance was only possible with HT disabled. I don't know if the process that uses only one 'virtual' cpu can use the full L2 cache when the other 'virtual' cpu is inactive? Anybody who can shine a light on this?

Only programs capable of using the P4 (and compatible) fully run faster with ht disabled. All other programs should not make a big difference. Most will be a bit faster with HT enabled (at least if they use more threads).

Greez
MiCrO

ID: 142146 · Report as offensive
Jordi Valls

Send message
Joined: 10 Jun 99
Posts: 6
Credit: 1,599,185
RAC: 0
Message 142195 - Posted: 24 Jul 2005, 18:42:47 UTC

Processor:
AMD Athlon XP 2600+ Mobile(Barton core)
512 KB L2 Cache

Stock: 1995 MHz /133 MHz FSB * 15 / 1,45v
Overclocked: 2120 Mhz / 185 *11,5 / 1,5v


System:
Barebone Shuttle SN41g2v3 (NForce2)

Memory:
1024 MB Value Kingston PC3200
185 MHz DDR
2-3-3-6

Results:
YAOSCW-K-r8.1: 8364
YAOSCW-K-r7: 8494

Note: Thread Master 90% to Seti.


Greetings,
AsDeCopes

ID: 142195 · Report as offensive
Profile Speedy67 & Friends
Volunteer tester
Avatar

Send message
Joined: 14 Jul 99
Posts: 335
Credit: 1,178,138
RAC: 0
Netherlands
Message 142214 - Posted: 24 Jul 2005, 19:31:09 UTC - in response to Message 141993.  

I'm sorry for hijacking your thread Speedy, I didn't realize it would take up so many posts. Sorry.


Don't be. :) Aren't we all here to inform eachother? A little sidestep once in a while won't hurt anyone. :)

Greetings,
Speedy67



ID: 142214 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 692
Credit: 135,197,781
RAC: 211
Germany
Message 142228 - Posted: 24 Jul 2005, 20:00:44 UTC
Last modified: 24 Jul 2005, 20:03:21 UTC

CPU-Info:
Intel(R) Pentium(R) III CPU-S 1400MHz (Tualatin)
Family 6 Model B Stepping 4 Revision tB1
Core-Speed 1512MHz, FSB 144MHz
L1 32KB, L2 512KB,
Instructions: MMX, SSE

RAM: 256MB 2-2-2-7-9 SD-RAM, 144MHz

seti-p3........: 13228
YAOSCW-K-r7....: 12073
YAOSCW-K-r8.1..: 11751

I do have another one of these with an older stepping and a dual PIII-Coppermine which i'll bench next.

_\|/_
U r s
ID: 142228 · Report as offensive
Metod, S56RKO
Volunteer tester

Send message
Joined: 27 Sep 02
Posts: 309
Credit: 113,221,277
RAC: 9
Slovenia
Message 142234 - Posted: 24 Jul 2005, 20:08:50 UTC - in response to Message 142146.  
Last modified: 24 Jul 2005, 20:13:36 UTC


What I found out was that running a single seti client ends up in the sime CPU time both with HT enabled and disabled (give or take a couple of seconds).


That's interesting! I was always under the impression that maximum single-thread performance was only possible with HT disabled. I don't know if the process that uses only one 'virtual' cpu can use the full L2 cache when the other 'virtual' cpu is inactive? Anybody who can shine a light on this?

Only programs capable of using the P4 (and compatible) fully run faster with ht disabled. All other programs should not make a big difference. Most will be a bit faster with HT enabled (at least if they use more threads).


My impression was that a typical modern CPU (Intel, AMD or any other) has more than one integer unit and more than one FP unit. Then there's a tiny part of processor that dispatches instructions into appropriate units. Even one process can be executed in several units at the time. An example would be execution of if then else type of code (both branches in parallel) just to decide what results to take after the criteria (the if part) becomes known (speculative branch prediction or something similar).
Typically only a few of each units can be used at a time and if a processor has more, they are unused. But, if you can make one processor look like two (HT), you can execute two processes at the time and make better use of all these integer and FP units.

So, whne HT enabled, if you run only one process, the process won't notice lack of integer/FP units and will run full speed. If you run two (or more) processes, any of processes may suffer because of lack of free integer/FP units, but processor as a whole will be utilized better.
Metod ...
ID: 142234 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 142243 - Posted: 24 Jul 2005, 20:27:20 UTC - in response to Message 142234.  

So, whne HT enabled, if you run only one process, the process won't notice lack of integer/FP units and will run full speed. If you run two (or more) processes, any of processes may suffer because of lack of free integer/FP units, but processor as a whole will be utilized better.

true, HT off, you under-utilize the available resources.

HT on - use more resouces my the time to complete goes up ... but the total throughput also rises ... so, more time to complete, more work done in unit time.
ID: 142243 · Report as offensive
B.U.M.S.P.A.S.S.A.T

Send message
Joined: 10 Jun 02
Posts: 5
Credit: 669,560
RAC: 0
Germany
Message 142679 - Posted: 25 Jul 2005, 19:12:43 UTC - in response to Message 141411.  


I think for HT enabled machines (or dual cores), it would be best to run 2 separate copies of the reference WU at the same time thus mimicking what would most likely be used in real life. ie, most people with HT machines are going to run 2 WUs at a time, not one.

I think this would give a more meaningful result in terms of processing times.

Ned


That is absolutely true, but on the other hand I don't think it will make any difference in regard to which client is the fastest for that CPU. I will add that extra info for my own Prescott 3.0 soon.

Greetings,
Speedy67


Well, you gotta think of one other thing the boinc developers seemed not to think about. When my Northwood 3.0 HT_on crunches 2 units and I check the process affinity mask of each, i have to find out that each process is supposed to run on both 'virtual' processors. It might be, that this slows things down, because when a client switches a cpu during execution the result will be a complete l2cache-refill. this is a real slowdown...at least on real multiprocessor boards. maybe this should be included in the boinc software which starts the clients.

st0ff
ID: 142679 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 142711 - Posted: 25 Jul 2005, 20:13:23 UTC

Ouch! I never thought of that ...

Sounds like a case of a bug report crying out to be made ...
ID: 142711 · Report as offensive
Metod, S56RKO
Volunteer tester

Send message
Joined: 27 Sep 02
Posts: 309
Credit: 113,221,277
RAC: 9
Slovenia
Message 142952 - Posted: 26 Jul 2005, 6:26:43 UTC - in response to Message 142679.  

Well, you gotta think of one other thing the boinc developers seemed not to think about. When my Northwood 3.0 HT_on crunches 2 units and I check the process affinity mask of each, i have to find out that each process is supposed to run on both 'virtual' processors. It might be, that this slows things down, because when a client switches a cpu during execution the result will be a complete l2cache-refill. this is a real slowdown...at least on real multiprocessor boards. maybe this should be included in the boinc software which starts the clients.


This is certainly true if you see SETI (or any BOINC project FWIW) as the main task of your computer. I see it as as welcome task to keep CPUs warm when it doesn't have to do anything more important. Developers see it the same way and that's why all BOINC related processes run with the lowest possible priority. I wouldn't like to see a low priority process to run with CPU affinity so it doesn't get transferred to othe CPUs when some normal- or high prority task starts to run (and thusly stealing something like 10% of CPU power from it).

In short: I'm quite happy SETI doesn't get started with CPU affinity set.

Metod ...
ID: 142952 · Report as offensive
Profile Keck_Komputers
Volunteer tester
Avatar

Send message
Joined: 4 Jul 99
Posts: 1575
Credit: 4,152,111
RAC: 1
United States
Message 143000 - Posted: 26 Jul 2005, 10:04:29 UTC

I personally would like to see BOINC set affinity too (for whenever I get a dual/HT machine).

However I do not think this will be a high priority item, ever. The main reason is: the good reliable reports I have seen only show a moderate improvement with affinity set on dual CPU machines, and on HT machines some of these reports show a decrease in performance.

The best chance of it getting in is if someone works out a way for the CPU scheduler to operate independantly on multiple CPUs and setting affinity happens as a by-product of that solution.

It also may not be possible with the CPU scheduler, especially if suspending to memory. For example WU1 starts on CPU0, it is suspended to memory. Later WU2 starts on CPU0, then WU1 restarts. Now you have 2 WUs running on CPU0 and none running on CPU1.
BOINC WIKI

BOINCing since 2002/12/8
ID: 143000 · Report as offensive
Don Erway
Volunteer tester

Send message
Joined: 18 May 99
Posts: 305
Credit: 471,946
RAC: 0
United States
Message 146356 - Posted: 3 Aug 2005, 4:25:16 UTC
Last modified: 3 Aug 2005, 5:13:16 UTC

I just got my new athlon 64 3200, venice core, to complete the reference WU in 6178 seconds, using the full up P4-sse3 client!

I ran this at low priority, while other CPU intensive tasks were happening, but the CPU time measure should be independent of load.. Or is it not?

The result file has diffs from the reference result. Does this mean no go?

I've run the reference WU with the sse2-amd64 client. It took 6166 secs, so faster than the p4-sse3 version. The result file agrees with the one produced by the p4-sse3 client, and not with the reference result file. But I already know the sse2-amd64 client is returning valid results, because credit has already been granted.

What's the deal with the mismatch results?

I'm not going to send in times,until I get another stick of mem, to run at dual ddr speed.

The sse2 boinc CC works great as well.

Any chance of getting a non pentium specific sse3 version?? :)

Don



ID: 146356 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19810
Credit: 40,757,560
RAC: 67
United Kingdom
Message 146375 - Posted: 3 Aug 2005, 5:38:16 UTC - in response to Message 143000.  

I personally would like to see BOINC set affinity too (for whenever I get a dual/HT machine).

However I do not think this will be a high priority item, ever. The main reason is: the good reliable reports I have seen only show a moderate improvement with affinity set on dual CPU machines, and on HT machines some of these reports show a decrease in performance.

The best chance of it getting in is if someone works out a way for the CPU scheduler to operate independantly on multiple CPUs and setting affinity happens as a by-product of that solution.

It also may not be possible with the CPU scheduler, especially if suspending to memory. For example WU1 starts on CPU0, it is suspended to memory. Later WU2 starts on CPU0, then WU1 restarts. Now you have 2 WUs running on CPU0 and none running on CPU1.


I found that setting affinity was patchy, as it usually lost its setting when switching units. But I did discover that there was a performance boost if you can get it to run two different projects at same time i.e. one seti and one einstein there could be up to a 15% improvemant over running two units for one project.

The problem here is without micro management it is almost impossible to get it to do it automatically.

Andy
ID: 146375 · Report as offensive
Profile Speedy67 & Friends
Volunteer tester
Avatar

Send message
Joined: 14 Jul 99
Posts: 335
Credit: 1,178,138
RAC: 0
Netherlands
Message 146538 - Posted: 3 Aug 2005, 16:43:22 UTC - in response to Message 146356.  
Last modified: 3 Aug 2005, 16:45:16 UTC


The result file has diffs from the reference result. Does this mean no go?

I've run the reference WU with the sse2-amd64 client. It took 6166 secs, so faster than the p4-sse3 version. The result file agrees with the one produced by the p4-sse3 client, and not with the reference result file. But I already know the sse2-amd64 client is returning valid results, because credit has already been granted.

What's the deal with the mismatch results?


The result files don't have to be 100% the same, but within certain limits. How this is calculated exactly, I don't know, but as you said your results have already been credited, so no worries there.
More info on validation in the Boinc Wiki, by Paul D. Buck

[edit: typo]

Greetings,
Speedy67



ID: 146538 · Report as offensive
Don Erway
Volunteer tester

Send message
Joined: 18 May 99
Posts: 305
Credit: 471,946
RAC: 0
United States
Message 146616 - Posted: 4 Aug 2005, 0:38:03 UTC - in response to Message 146356.  

I just got my new athlon 64 3200, venice core, to complete the reference WU in 6178 seconds, using the full up P4-sse3 client!



I spoke too soon. I had the client versions swapped, and in fact, the p4-sse3 version will NOT run on the venice core, even though cpu-z says it does have sse3.

So, a non-P4 specific, sse3 client, might be worth creating/trying... But it sounds like there is not much difference betweeen sse2 and sse3, on the linux clients anyway.

ID: 146616 · Report as offensive
Ned Slider

Send message
Joined: 12 Oct 01
Posts: 668
Credit: 4,375,315
RAC: 0
United Kingdom
Message 146660 - Posted: 4 Aug 2005, 2:26:28 UTC - in response to Message 146616.  
Last modified: 4 Aug 2005, 2:26:47 UTC

I just got my new athlon 64 3200, venice core, to complete the reference WU in 6178 seconds, using the full up P4-sse3 client!



I spoke too soon. I had the client versions swapped, and in fact, the p4-sse3 version will NOT run on the venice core, even though cpu-z says it does have sse3.

So, a non-P4 specific, sse3 client, might be worth creating/trying... But it sounds like there is not much difference betweeen sse2 and sse3, on the linux clients anyway.


Don,

I built optimized linux clients for AMD64 with both SSE2 and SSE3 and there was no advantage in using SSE3. In fact, IIRC the SSE3 enabled client was very slightly slower.

My AMD64 (SSE2) client for linux is available on my site :)

Ned
*** My Guide to Compiling Optimised BOINC and SETI Clients ***
*** Download Optimised BOINC and SETI Clients for Linux Here ***
ID: 146660 · Report as offensive
Don Erway
Volunteer tester

Send message
Joined: 18 May 99
Posts: 305
Credit: 471,946
RAC: 0
United States
Message 146771 - Posted: 4 Aug 2005, 7:29:09 UTC - in response to Message 146660.  
Last modified: 4 Aug 2005, 7:31:16 UTC



I built optimized linux clients for AMD64 with both SSE2 and SSE3 and there was no advantage in using SSE3. In fact, IIRC the SSE3 enabled client was very slightly slower.

My AMD64 (SSE2) client for linux is available on my site :)

Ned


Hi Ned.

Yeah, I know. It was your reports that led me to say that it was unlikely to do any good...

I guess cranking the cpu up to 2.45 GHz, or so ought to do it, and it is so easy to do, with the 939 chips, even with the stock AMD retail box HSF!

Prime95 is happy as a clam.

But hey, has anyone tried memtest86, on an athlon64 machine? Mine won't even start to run.

Final results shortly...

Don



ID: 146771 · Report as offensive
Profile meckano
Avatar

Send message
Joined: 1 Jul 03
Posts: 130
Credit: 48,466
RAC: 0
Canada
Message 146780 - Posted: 4 Aug 2005, 8:43:03 UTC - in response to Message 146771.  

a reference unit? finally.
good go!
-----------------------
Click to see my tag
My tag
SNAFU'ed? Turn the Page! :D
ID: 146780 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

Message boards : Number crunching : Optimized windows clients - plz help listing cpu times


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.