Version 3 of faster SETI cruncher for Linux

Message boards : Number crunching : Version 3 of faster SETI cruncher for Linux
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 8 · Next

AuthorMessage
Gary Zhang

Send message
Joined: 19 Apr 04
Posts: 26
Credit: 32,583
RAC: 0
Taiwan
Message 183492 - Posted: 29 Oct 2005, 5:06:02 UTC - in response to Message 183478.  

my-naparst-r3-* is built with
icc-9.0.021, ipp-5.0.043, gcc-3.4.1,
and tested on
Intel Celeron M processor 1.40GHz, 1MB cache, 438.84MB memory, 486.30MB swap.

benchmarks:

my-naparst-r3-no-prec
<wu_cpu_time>4600.744603</wu_cpu_time>
my-naparst-r3-prec
<wu_cpu_time>4902.263765</wu_cpu_time>
naparst-r3
<wu_cpu_time>4901.236921</wu_cpu_time>

my-naparst-r3-no-prec
<wu_cpu_time>4603.051252</wu_cpu_time>
my-naparst-r3-prec
<wu_cpu_time>4901.652858</wu_cpu_time>
naparst-r3
<wu_cpu_time>4901.964811</wu_cpu_time>

./rescmp/rescmp result_unit.sah my-naparst-r3-no-prec/result.sah
Result: these are weakly similar.
./rescmp/rescmp result_unit.sah my-naparst-r3-prec/result.sah
Result: these are strongly similar.
./rescmp/rescmp result_unit.sah naparst-r3/result.sah
Result: these are strongly similar.
ID: 183492 · Report as offensive
Harold Naparst
Volunteer tester

Send message
Joined: 11 May 05
Posts: 236
Credit: 91,803
RAC: 0
Sweden
Message 183550 - Posted: 29 Oct 2005, 12:14:30 UTC - in response to Message 183492.  



my-naparst-r3-no-prec
<wu_cpu_time>4600.744603</wu_cpu_time>
my-naparst-r3-prec
<wu_cpu_time>4902.263765</wu_cpu_time>
naparst-r3
<wu_cpu_time>4901.236921</wu_cpu_time>


./rescmp/rescmp result_unit.sah my-naparst-r3-no-prec/result.sah
Result: these are weakly similar.
./rescmp/rescmp result_unit.sah my-naparst-r3-prec/result.sah
Result: these are strongly similar.
./rescmp/rescmp result_unit.sah naparst-r3/result.sah
Result: these are strongly similar.


Gary,

So the conclusion is that the -no-prec-div flag causes a big speedup but
weak similarity, as Tetsuji has claimed?


Harold Naparst
ID: 183550 · Report as offensive
Gary Zhang

Send message
Joined: 19 Apr 04
Posts: 26
Credit: 32,583
RAC: 0
Taiwan
Message 183554 - Posted: 29 Oct 2005, 12:24:06 UTC - in response to Message 183550.  

Gary,

So the conclusion is that the -no-prec-div flag causes a big speedup but
weak similarity, as Tetsuji has claimed?


I add "-no-prec-div -no-prec-sqrt" for my-naparst-r3-no-prec, so maybe both of them cause weak similarity.
ID: 183554 · Report as offensive
Harold Naparst
Volunteer tester

Send message
Joined: 11 May 05
Posts: 236
Credit: 91,803
RAC: 0
Sweden
Message 183555 - Posted: 29 Oct 2005, 12:24:35 UTC - in response to Message 182953.  


@Harold

Just a note

analyzeFuncs.cpp: In function 'int v_ChirpData(float*, float*, float, int, double)':
analyzeFuncs.cpp:938: error: 'align' was not declared in this scope
analyzeFuncs.cpp:938: error: '__declspec' was not declared in this scope
analyzeFuncs.cpp:938: error: expected `;' before 'float'


ACK!! I missed this message. Please send me this kind of stuff by email,
too, in case I don't see it. Anyway, I've fixed it in the tagged source version.
So if you already have a working directory based on
svn://hnaparst.homelinux.com/seti_boinc/tags/naparst-r3.0, just

cd seti_boinc
svn update

I had been testing some ICC-specific code, and I forgot to comment
out the variable declaration before committing the code to the repository.
And I didn't run a test-compile on GCC. My fault.
Harold Naparst
ID: 183555 · Report as offensive
Harold Naparst
Volunteer tester

Send message
Joined: 11 May 05
Posts: 236
Credit: 91,803
RAC: 0
Sweden
Message 183558 - Posted: 29 Oct 2005, 12:27:02 UTC - in response to Message 183554.  


I add "-no-prec-div -no-prec-sqrt" for my-naparst-r3-no-prec, so maybe both of them cause weak similarity.


My guess is that the -no-prec-div has a larger effect than -no-prec-sqrt.
But I'm usually wrong, as we all know. But still, that's a pretty large
performance effect for the prec flags. Good work finding that.
Harold Naparst
ID: 183558 · Report as offensive
Profile michael37
Avatar

Send message
Joined: 23 Jul 99
Posts: 311
Credit: 6,955,447
RAC: 0
United States
Message 183586 - Posted: 29 Oct 2005, 13:45:53 UTC - in response to Message 183166.  
Last modified: 29 Oct 2005, 13:49:36 UTC


What would you think would be an appropriate maximum amount of memory
for the caches to use? Although Hans might disagree, (and I might be wrong)
my explorations seem to show that it is important to have more than 8MB cache. The chances of finding cached data are much higher if the cache is larger. Would you think that a cache of 160 MB is OK? Or is that still too large?


I would say the key objective for memory reduction would be to allow the client to run on a system with 256MB ram without swapping, so a maximum of about 160MB (a figure already mentioned) seems about right, leaving 96MB for the system, although I guess it would be nice if this could be reduced further.

Using your ratio of hits/mises analogy above, I guess you need to do some analysis to find the sweet spot.

Ned


At some point, Boinc developers specified that an application should use less than 64MB of RAM. I personally disagree, plus there is a Boinc bug that allows applications to use unlimited memory.

Anyway, I would like to have two versions of the client -- the "boinc-compliant" low memory client for the 64MB RAM, and the "fastest reasonable" client using 256MB RAM. Why 256MB? I personally do not have computers with less than 512MB RAM. Memory is so cheap these days, especially older PC133 SDRAM and DDR266 SDRAM. Using half of the RAM for Seti is OK with me.



ID: 183586 · Report as offensive
Harold Naparst
Volunteer tester

Send message
Joined: 11 May 05
Posts: 236
Credit: 91,803
RAC: 0
Sweden
Message 183587 - Posted: 29 Oct 2005, 13:53:07 UTC - in response to Message 183586.  


What would you think would be an appropriate maximum amount of memory
for the caches to use? Although Hans might disagree, (and I might be wrong)
my explorations seem to show that it is important to have more than 8MB cache. The chances of finding cached data are much higher if the cache is larger. Would you think that a cache of 160 MB is OK? Or is that still too large?


I would say the key objective for memory reduction would be to allow the client to run on a system with 256MB ram without swapping, so a maximum of about 160MB (a figure already mentioned) seems about right, leaving 96MB for the system, although I guess it would be nice if this could be reduced further.

Using your ratio of hits/mises analogy above, I guess you need to do some analysis to find the sweet spot.

Ned


At some point, Boinc developers specified that an application should use less than 64MB of RAM. I personally disagree, plus there is a Boinc bug that allows applications to use unlimited memory.

Anyway, I would like to have two versions of the client -- the "boinc-compliant" low memory client for the 64MB RAM, and the "fastest reasonable" client using 256MB RAM. Why 256MB? I personally do not have computers with less than 512MB RAM. Memory is so cheap these days, especially older PC133 SDRAM and DDR266 SDRAM. Using half of the RAM for Seti is OK with me.



If Hans is correct, which I suspect he is, we'll be able to produce
a client that uses only 8MB cache and has no performance penalty.
Hopefully soon.


Harold Naparst
ID: 183587 · Report as offensive
Ned Slider

Send message
Joined: 12 Oct 01
Posts: 668
Credit: 4,375,315
RAC: 0
United Kingdom
Message 183589 - Posted: 29 Oct 2005, 14:02:51 UTC - in response to Message 183587.  


If Hans is correct, which I suspect he is, we'll be able to produce
a client that uses only 8MB cache and has no performance penalty.
Hopefully soon.



Well that would indeed be brilliant!

Personally, my feeling is that a little more is tolerable, but there comes a point where it's no longer ideal.

Say the core client uses approx 22MB, an 8MB cache takes it to 30MB. A 32MB cache (40MB total) would still be fine and a 64MB cache (86MB total) is probably heading towards the upper limit that would be acceptable to everyone.

Obviously it's finding a compromise betwen performance and memory usage, and if that can indeed be achieved with an 8MB cache then Hans and yourself definately deserve a pint and a night off!

Ned
*** My Guide to Compiling Optimised BOINC and SETI Clients ***
*** Download Optimised BOINC and SETI Clients for Linux Here ***
ID: 183589 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20326
Credit: 7,508,002
RAC: 20
United Kingdom
Message 183680 - Posted: 29 Oct 2005, 17:45:14 UTC - in response to Message 183213.  
Last modified: 29 Oct 2005, 17:46:35 UTC

... Do I guess right that I'd better trim that down to just:

.boinc/projects/setiathome.berkeley.edu/app_info.xml
<app_info>
    <app>
        <name>setiathome</name>
    </app>
    <file_info>
        <name>setiathome-4.07.athlon-xp-static-pc-linux-gnu</name>
        <executable/>
    </file_info>
</app_info>

Well that little lot didn't work!

This does though:
<app_info>
    <app>
        <name>setiathome</name>
    </app>
    <file_info>
        <name>setiathome-4.07.athlon-xp-static-pc-linux-gnu</name>
        <executable/>
    </file_info>
    <app_version>
        <app_name>setiathome</app_name>
        <version_num>470</version_num>
        <file_ref>
            <file_name>setiathome-4.07.athlon-xp-static-pc-linux-gnu</file_name>
            <main_program/>
        </file_ref>
    </app_version>
</app_info>

Now crunching with 121MBytes after a few minutes and the memory increase is slowing.

Happy crunchin',
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 183680 · Report as offensive
Profile Crunch3r
Volunteer tester
Avatar

Send message
Joined: 15 Apr 99
Posts: 1546
Credit: 3,438,823
RAC: 0
Germany
Message 183831 - Posted: 30 Oct 2005, 2:44:13 UTC - in response to Message 183680.  
Last modified: 30 Oct 2005, 2:47:37 UTC

Hello,

I've updated all setiathome Linux clients for i686,P3,Athlon( K7) and Athlon XP.
And all setiathome clients for EV5, EV6 and EV67 running Linux.
All are based on Harolds 3.0 source.


URL: I WANT VERY FAST LINUX CLIENTS :-)

Join BOINC United now!
ID: 183831 · Report as offensive
Harold Naparst
Volunteer tester

Send message
Joined: 11 May 05
Posts: 236
Credit: 91,803
RAC: 0
Sweden
Message 183839 - Posted: 30 Oct 2005, 3:07:42 UTC - in response to Message 183831.  
Last modified: 30 Oct 2005, 3:08:13 UTC

Hello,

I've updated all setiathome Linux clients for i686,P3,Athlon( K7) and Athlon XP.
And all setiathome clients for EV5, EV6 and EV67 running Linux.
All are based on Harolds 3.0 source.


URL: I WANT VERY FAST LINUX CLIENTS :-)


You're going to kill me. I've released another version-- R3.1.
Get it at http://naparst.name
This one uses only 16MB cache. It could have been only 8MB, but I
couldn't bear the thought of not using any of my multi-cache code, so
I left the cache size at two for old times sake.

I've finally realized that Hans was right all along about the chirp
rate. It needs to be a double precision variable. Now everything is good.
I don't know why I didn't listen the first time.

So the program is just as fast, but it uses practically no memory.
Onward and upward.
Harold Naparst
ID: 183839 · Report as offensive
Profile Crunch3r
Volunteer tester
Avatar

Send message
Joined: 15 Apr 99
Posts: 1546
Credit: 3,438,823
RAC: 0
Germany
Message 183841 - Posted: 30 Oct 2005, 3:19:45 UTC - in response to Message 183839.  
Last modified: 30 Oct 2005, 3:21:42 UTC

Hello,

I've updated all setiathome Linux clients for i686,P3,Athlon( K7) and Athlon XP.
And all setiathome clients for EV5, EV6 and EV67 running Linux.
All are based on Harolds 3.0 source.


URL: I WANT VERY FAST LINUX CLIENTS :-)


You're going to kill me. I've released another version-- R3.1.
Get it at http://naparst.name
This one uses only 16MB cache. It could have been only 8MB, but I
couldn't bear the thought of not using any of my multi-cache code, so
I left the cache size at two for old times sake.

I've finally realized that Hans was right all along about the chirp
rate. It needs to be a double precision variable. Now everything is good.
I don't know why I didn't listen the first time.

So the program is just as fast, but it uses practically no memory.
Onward and upward.


If i would be in the US somewere near you i would throw my hands around your your neck and start pushing GRRRRR :-)

It took me 6 hours to compile ...
I'll do it again :-)

It's 4.20 am here in germany..



Join BOINC United now!
ID: 183841 · Report as offensive
Profile michael37
Avatar

Send message
Joined: 23 Jul 99
Posts: 311
Credit: 6,955,447
RAC: 0
United States
Message 183851 - Posted: 30 Oct 2005, 3:57:10 UTC - in response to Message 183839.  


You're going to kill me. I've released another version-- R3.1.
Get it at http://naparst.name
This one uses only 16MB cache. It could have been only 8MB, but I
couldn't bear the thought of not using any of my multi-cache code, so
I left the cache size at two for old times sake.

Harold, what am I gonna do with those 2GB of RAM my servers got?

Just kidding.

GREAT JOB!!!


ID: 183851 · Report as offensive
Hans Dorn
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 2262
Credit: 26,448,570
RAC: 0
Germany
Message 183885 - Posted: 30 Oct 2005, 4:51:49 UTC - in response to Message 183841.  
Last modified: 30 Oct 2005, 4:52:22 UTC


It took me 6 hours to compile ...
I'll do it again :-)

It's 4.20 am here in germany..



Hi Crunch3r!

That's what daylight saving time is for :o)

Regards Hans
ID: 183885 · Report as offensive
Mark Day
Avatar

Send message
Joined: 19 Aug 02
Posts: 81
Credit: 502,830
RAC: 0
United States
Message 183911 - Posted: 30 Oct 2005, 5:55:49 UTC

@Harold
Nice work on 3.1 clients, 40meg RSS near end of WU

@Crunch3r
I picked up the K7+3dnow client, which works fine, but in the stderr.txt
it puts non UTF-8 characters °°°°°°°° etc which messes with some stats gathering
programs like boincstat that look at client_state.xml. Any chance of making these
just **** like the other headers?

M
ID: 183911 · Report as offensive
Ned Slider

Send message
Joined: 12 Oct 01
Posts: 668
Credit: 4,375,315
RAC: 0
United Kingdom
Message 183946 - Posted: 30 Oct 2005, 10:10:32 UTC

Nice work Harold :)

I haven't even tested 3.0 properly for you yet! - do you still want me to?

I'll definately try the 3.1 source as I still have a little old laptop struggling on with 256MB ram.

Ned


*** My Guide to Compiling Optimised BOINC and SETI Clients ***
*** Download Optimised BOINC and SETI Clients for Linux Here ***
ID: 183946 · Report as offensive
Ned Slider

Send message
Joined: 12 Oct 01
Posts: 668
Credit: 4,375,315
RAC: 0
United Kingdom
Message 183955 - Posted: 30 Oct 2005, 10:54:19 UTC

Harold,

V3.1 compiles cleanly with GCC-4.0.1 using FFTW3 :)


I've built both dynamic and static versions for Athlon XP and am benchtesting the dynamic version now for speed against V3.0 (should have the results in an hour)

Regards,

Ned

*** My Guide to Compiling Optimised BOINC and SETI Clients ***
*** Download Optimised BOINC and SETI Clients for Linux Here ***
ID: 183955 · Report as offensive
Profile Martin A. Boegelund
Volunteer tester
Avatar

Send message
Joined: 4 Jul 00
Posts: 292
Credit: 387,485
RAC: 1
Denmark
Message 183960 - Posted: 30 Oct 2005, 11:17:47 UTC - in response to Message 183839.  


You're going to kill me. I've released another version-- R3.1.
Get it at http://naparst.name
This one uses only 16MB cache. It could have been only 8MB, but I
couldn't bear the thought of not using any of my multi-cache code, so
I left the cache size at two for old times sake.

I've finally realized that Hans was right all along about the chirp
rate. It needs to be a double precision variable. Now everything is good.
I don't know why I didn't listen the first time.

So the program is just as fast, but it uses practically no memory.
Onward and upward.


Man!
I just started using 3.0...
Looking forward to using 3.1, I hope it will cut my crunching times, since the big cache cliets are having my system using lots of swap.

KBoincspy is reporting a high estimated crunching time, 3+ hrs, it used to be around 2 hrs 15-30 mins with clients 2.6 - 2.8. I'll tell you more when I finish my current work units.

"Are you suggesting coconuts migrate?"

ID: 183960 · Report as offensive
Ned Slider

Send message
Joined: 12 Oct 01
Posts: 668
Credit: 4,375,315
RAC: 0
United Kingdom
Message 183974 - Posted: 30 Oct 2005, 12:18:11 UTC

Here's some benchmark times on an Athlon XP for v3.0 versus V3.1:

V3.0 (large cache)

Real: 62:42
User: 62:31
Sys: 0:05


V3.1 (small cache)

Real: 61:30
User: 61:20
Sys: 0:05


Excellent result!

The small cache reduces the amount of total memory used from approx 360MB down to 44M and actually gives a very small increase in performance (probably within experimental error though). These benchmarks were on dynamic builds on a system with 512MB ram (200MHz fsb, 5-2-2-2).

So Hans and Harold have succeeded in their brief to produce a client which uses less memory but retains the full speed improvement - truely amazing!

Well done,

Ned

*** My Guide to Compiling Optimised BOINC and SETI Clients ***
*** Download Optimised BOINC and SETI Clients for Linux Here ***
ID: 183974 · Report as offensive
Harold Naparst
Volunteer tester

Send message
Joined: 11 May 05
Posts: 236
Credit: 91,803
RAC: 0
Sweden
Message 183979 - Posted: 30 Oct 2005, 12:34:55 UTC - in response to Message 183974.  


So Hans and Harold have succeeded in their brief to produce a client which uses less memory but retains the full speed improvement - truely amazing!


More discomfort.
As for the source code, there are really four ideas that lead to
performance:

1) Using IPP's FFT routine, due to Tetsuji.
2) Caching trig calculations, due to Hans Dorn.
3) Caching results of invert_lcgf, due to Hans Dorn.
4) Using double precision for the chirp rate, due to Hans Dorn.

My name is nowhere to be found in this list. Although Alex Kan
and I made some contributions at one point, those contributions
don't contribute to the speedup at the present moment.

Anyway, glory to the aliens, not to us.


Harold Naparst
ID: 183979 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 8 · Next

Message boards : Number crunching : Version 3 of faster SETI cruncher for Linux


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.