Message boards :
Number crunching :
Building a 32 thread xeon system doesn't need to cost a lot
Message board moderation
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 12 · Next
Author | Message |
---|---|
George 254 Send message Joined: 25 Jul 99 Posts: 155 Credit: 16,507,264 RAC: 19 |
Thanks for the prompt response Cruncher. Could you point me to a spec/machine/unit on eBay where I could have a closer look? |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
Very tempted to get a tower unit instead of a stack of laptops. With a normal AR task running ~3.5 hours on a E5-2670 you would be looking at closer to 192-224 tasks a day for a 16c/32t system. For reference. My i5-4670 running 4 CPU tasks can complete ~96 normal AR tasks a day. With the high cost of motherboards for these Xeon CPUs, $350-400, it might be cheaper to built a pair of i5 systems. The 5th & 6th gen i5's might pump out even more tasks vs the 4th gen. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Sidewinder Send message Joined: 15 Nov 09 Posts: 100 Credit: 79,432,465 RAC: 0 |
There are a few other things to consider though. Xeon/server pros:
- Dual socket motherboards usually cost only slightly more than single socket - More cores/threads - Can save space (case reduction, etc.); moot point if you use an open air configuration - May use less power overall compared to desktop CPUs - Server mobos are occasionally forward compatible with CPUs (e.g. E5-2670 and E5-2670v2 support) - Server mobos often offer BMC/IPMI - Low power versions are usually faster than lower-end/power desktop CPUs - Server mobos often have more PCIe slots (usually in the form of x8 or x4) that can be used for more GPUs - More commonly found in rackmount form factors in case you want to use a rack
- Cheaper ones don't process single WUs as fast (compare my E5-2670s to 2600K; same arch family), but more WUs overall get processed - Often require more unusual and/or more expensive CPU coolers due to mobo layout (passive heatsinks are not very good unless in a rackmount form factor) - Rackmount forms are usually much louder (due to small fans for cooling) |
Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340 |
With a normal AR task running ~3.5 hours on a E5-2670 you would be looking at closer to 192-224 tasks a day for a 16c/32t system. You are right...on my X79-based machine with an E5-2670 (NOT v2), seems like about 2-2.5 hours/Arecibo MB. So ~300-360/day. Not as low as your estimate, but not as high as mine... EDIT: Oh, and I should mention, I just bought an Asrock EP2C602 dual socket 2011 server board (but with lots of PCIex16 slots) for $300 at Newegg. And as someone up thread mentioned, server DDR3 1333 RAM (reg. ECC) is available in 4GB modules for < $10 each on eBay. 8 of those provides plenty of RAM for a dual socket cruncher. |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
One other thing to consider is that in a few years the $300-400 motherboards for these CPUs will likely be had for <$100 once the demand for them drops off. I would file that under pros. I was specing out the parts to update my VMware server with a pair of E5-2670's. Then decided to see how the crunching performance of the Xeon CPUs compared to desktop CPUs. As I knew, from a dual Xeon E5645 server I had at work, that a few much faster cores can best the performance of many much slower ones with running BOINC projects. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Al Send message Joined: 3 Apr 99 Posts: 1682 Credit: 477,343,364 RAC: 482 |
Hal, curious about that last statement, because I am in the process of putting together a 48 thread machine, should be firing it up tomorrow. I wanted to try getting a V2 processor, as well as high of a core count as I could, without breaking the bank, so I found 2 of these from Japan (per Intel, they are spec'ed specifically for the Asian market, prob to hit a price point is my guess and are OEM only, which are the reasons I couldn't find anything about them on Intel's website), they arrived over the weekend and I started assembling them on my Supermicro board: Brand Intel Processor Model Xeon E5-2692v2 (CM8063501452600) Clock Speed 2.2 GHz Turbo Speed 3.0 Ghz CPU Socket Type Socket 2011 / LGA2011 Multi-Core Technology 12-Core TDP 115 W L1 cache 32 KB L2 cache 256 KB L3 cache 30 MB Bus Speed 5 GT/s DMI Manufacturing Process 22 nm 64-bit Computing Yes (sorry for it being all smashed together, it was formatted perfectly in the window, but when posted it lost all the spacing for the columns) Seems to me that the biggest compromise I had to make with these, to be able to actually afford them, was the lower clock speed (but on the positive side, 115 vs 130/150w for the faster versions, so that is a small bonus). So your statement about (many) more slower cores is worse than a few much faster ones worries me a little. I already have the bits and pieces, so I am going to assemble it, but I hope that I didn't spend my money in a less than optimal way. Your thoughts on what I might expect to see? I will have the ability to add 4 PCI-E video cards to the mix as well, thinking that once the 10x0 series gets a little more fleshed out, this might make an interested crunching rig, that is if/when the RAC issues of the GUPPIES gets worked out, because for now it sure seems to be going in the wrong direction for Nvidia GPU crunching of them... |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
Brand Intel Processor Model Xeon E5-2692v2 (CM8063501452600) Clock Speed 2.2 GHz Turbo Speed 3.0 Ghz CPU Socket Type Socket 2011 / LGA2011 Multi-Core Technology 12-Core TDP 115 W L1 cache 32 KB L2 cache 256 KB L3 cache 30 MB Bus Speed 5 GT/s DMI Manufacturing Process 22 nm 64-bit Computing Yes (sorry for it being all smashed together, it was formatted perfectly in the window, but when posted it lost all the spacing for the columns) Use 'pre' tags. |
George 254 Send message Joined: 25 Jul 99 Posts: 155 Credit: 16,507,264 RAC: 19 |
Thank you Richard |
Jimbocous Send message Joined: 1 Apr 13 Posts: 1853 Credit: 268,616,081 RAC: 1,349 |
Thank you Richard +1 |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
Hal, curious about that last statement, because I am in the process of putting together a 48 thread machine, should be firing it up tomorrow. I wanted to try getting a V2 processor, as well as high of a core count as I could, without breaking the bank, so I found 2 of these from Japan (per Intel, they are spec'ed specifically for the Asian market, prob to hit a price point is my guess and are OEM only, which are the reasons I couldn't find anything about them on Intel's website), they arrived over the weekend and I started assembling them on my Supermicro board: It is a bit annoying how Intel's OEM parts don't get a public spec list. Early generation Intel Macs used custom parts as well & getting the specs for them initially was really hard. The specs for the E5-2692v2 fall in line logically below the E5-2697v2 & E5-2695v2. Sometimes their numbering scheme is actually helpful. If I were to guesstimate the run times for the E5-2692v2. 1) I would consider the speed improvements on Ivy Bridge over Sandye Bridge 2) I would extrapolate the increase in time that tasks take when there are more instances running. - I imagine there might be a point where memory saturation occurs and running more tasks beyond that point is less efficient. I would guesstimate that normal AR arecibo tasks will be somewhere in the 3.75-4.25 hr range when running 48 at once. Probably ~250-300 tasks a day. When I was picking out parts. A pair of E5-2670 CPUs, MB, & coolers came to $625. These CPUs have a TDP of 115w each or 230w combined. A pair of i5-6500 CPUs, MBs, & coolers came to $640. These CPUs have a TDP of 65w each or 130w combined. The cost for RAM is pretty much the same if configured with an equal amount of RAM. Such as 16GB per i5 or 32GB for the Xeon. As I already have enough HDDs and PSUs around for either configuration I didn't factor those pieces. Based on the times I have seen from hosts with i7-6000 CPUs I would guess that the i5-6500 would be on par or slightly faster then my i5-4670 CPUs. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Al Send message Joined: 3 Apr 99 Posts: 1682 Credit: 477,343,364 RAC: 482 |
Thanks for the heads up on the Pre idea, I will remember that one! And if I am looking at ~300 tasks completed per day, I thought I would try to get a comparison of that and my current highest concurrent (24) threaded rig, but looking at the details of it, I didn't see in the list the # of tasks processed per day. I was hoping to see that info, broken down into CPU and GPU, is there a spot on there that it is listed, that I am not seeing? I also have my other recent rig, Computer 794928 which has a i7-5930K CPU, which is a current gen (I believe) i-7 proc that is fairly fast (though not the fastest available, because even with all my toys, there _is_ a limit to the budgets for these things), and has been OC'ed about 10% or so, nothing spectacular, but just a little bump over the long term, with almost no addl heat. This one would be the comparison to the 48 core in my world, of your observation, and I will be monitoring it closely to see how they compare. I'll probably put a Kill-a-Watt onto both of them once I get the 48 core loaded with video cards, though that will probably be a little bit of a ways off, depending on when Nvidia fleshes out their product line a bit more on the 10x0 series... You mentioned that there might be a point where memory saturation occurs and running more tasks beyond that point is less efficient, I presume that you aren't referring to a situation where there isn't enough RAM, it's that the bandwidth in the internal buses between the CPU and RAM is just clogging up trying to pass too much thru too small a pipe, would that be correct, or is it that the actual RAM quantity isn't enough? Thanks again for all your thoughts, they are appreciated. |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
Thanks for the heads up on the Pre idea, I will remember that one! And if I am looking at ~300 tasks completed per day, I thought I would try to get a comparison of that and my current highest concurrent (24) threaded rig, but looking at the details of it, I didn't see in the list the # of tasks processed per day. I was hoping to see that info, broken down into CPU and GPU, is there a spot on there that it is listed, that I am not seeing? This is something you'll have to count yourself. The easiest way is to look through your stdoutdae.txt in the data directory and look for scheduler contacts, specifically.. "reporting [x] completed tasks" and just tally those up for a whole calendar day to get your rough average of how many tasks are completed per day. You mentioned that there might be a point where memory saturation occurs and running more tasks beyond that point is less efficient, I presume that you aren't referring to a situation where there isn't enough RAM, it's that the bandwidth in the internal buses between the CPU and RAM is just clogging up trying to pass too much thru too small a pipe, would that be correct, or is it that the actual RAM quantity isn't enough? RAM quantity is not particularly important. If you pull up task manager to see the memory usage, that gives you an idea of how much you would need. For example, my single-core machine is showing ~33mb for current usage, but shows ~40mb for a peak. So if you're looking to run 32 threads, just multiply those figures by 32 and add a gig or so for the OS and everything else and that would be your absolute bare-bones minimum that you could possibly get away with (but I would say that you'd probably want at least 8gb, preferably 16 or more). Quantity isn't really the problem though. As you alluded to, the real problem is going to become I/O contention/saturation. The memory controller is going to get absolutely hammered with 32 threads running. It wouldn't be so bad if it was a single task that had 32 threads, but it's going to be pretty rough having 32 single-threaded tasks running simultaneously. I would be willing to bet once you get that system up and running, if you looked at task manager with more detail, the 'system' process is likely to have a fair bit of CPU usage all by itself, and over on the CPU graph, if you have "show kernel times" enabled, there is probably going to be a fair amount of that shown on the graph. One of the things I would do would be to start off with 16 threads running and monitor the 'system' and 'kernel times' usage, and make note of the general trend for average duration to complete a few batches of tasks. Then up it to 18, 20, 22, 24, etc, and see how the task durations change, and watch how much overhead starts to appear. I would venture to guess that you would probably want to leave a core or two "idle", mostly to handle that overhead. But these are just my opinions, as I have no actual experience with that many cores/threads. I just know that the more cores you add, the less efficient they become. This is most easily noticed in multi-threaded benchmarks, such as Cinebench. After running the single-threaded test, you start running multi-threaded by adding one more core to it each time. My previous 2p Opteron 2222 rig (2x dual-cores), having all four threads running gave it 3.77x over a single thread. A quarter of a core was lost because if inefficiency and contention and overhead. I would imagine that loss would become more significant when you get 32 threads running. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
Thanks for the heads up on the Pre idea, I will remember that one! And if I am looking at ~300 tasks completed per day, I thought I would try to get a comparison of that and my current highest concurrent (24) threaded rig, but looking at the details of it, I didn't see in the list the # of tasks processed per day. I was hoping to see that info, broken down into CPU and GPU, is there a spot on there that it is listed, that I am not seeing? You can look at the Number of tasks today in the application details for the host. However you will have to know what kind of tasks you have completed since midnight. If the host has been doing mostly shorties the number will be much higher than expected & if there were a lot of VLARs the number will be lower than expected. This is why I calculate tasks/day by finding the time a normal AR task takes. Then it is some simple math (24/task time)*cores. If the run time was ~4 hours on a 48 c/t host (24/4)*48=288 tasks a day. If the run time was ~50 minutes on a 4 c/t host (24/.83)*4=112-116 tasks a day. I normally drop the decimal in the tasks/day before I multiply it by the number of cores. Like with 24/8.83=28.91. It would be either 28 or 29 tasks per core a day. It looks like your i7-5930K does a normal AR CPU task in about 1hr 45min. If that is the time while running 12 CPU tasks then 156-168 CPU tasks a day could be expected. Also it looks like the i7-5930K is based on the E5-1650v3. You would really only want to compare the power consumption of your 24c/48t & i7-5930K systems if they were similarly loaded with GPUs. Otherwise you will also be measuring the GPU power differences. As Cosmic_Ocean mentioned & you presumed correctly I was referring to memory bandwidth saturation. It has been my experience that the term saturation is used when referring to bandwidth. Then usage, or used, when referring to the amount. Like memory or disk usage. It could be said that a disk is saturated with data when filled, but it sounds weird to me. YMMV I mentioned that I believe memory saturation could be an issue as SETI@home tends to be very heavy on memory i/o. Also that the number of cores/threads in a CPU is increasing faster than the memory bandwidth. Being able to use the fastest memory you can is helpful. A 20% bump in memory speed is much easier than a 20% CPU overclock. As you can get memory that is designed to run at the higher speed. SETI@home run times seem to be reduced about 75% of the memory speed increase. So a 20% bump in memory speed is about a 15% reduction in run time. At least with my my observations of 1333MHz & 1600MHz DDR3 memory. Many years ago when 100MHz & 133MHz memory were the choices. The use of 133MHz memory was a significant gain for the systems that could use it. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
fractal Send message Joined: 5 Mar 16 Posts: 5 Credit: 1,000,547 RAC: 0 |
But what does seem to be a bargain right now is the X5670 Six Cores 2.93 GHz/12M/6.40GT 95 watt LGA1366, which can be had for under $100 used. A pair of these could make the Z600 a pretty heavy hitter ... I suppose one of these days I'll get a couple and give it a shot... My wife hates you ;) I thought to myself, "I know, I'll put it in the rack in the garage" when I picked up a SE316M1 with a pair of x5670's and a bunch of ram for 400USD delivered. Who will notice? I notice. That thing is freeking loud. And, the 5670's are cooking along at 80-90c. But, it is quick. Friends don't let friends buy 1U servers with 200 watts of processors even if they are going to put it in the garage. Then I spot an X8DTE-F with a pair of low power processors for a hundred and fifty bucks. Now that's what I am talking about. I spot some foxconn heat sinks for 12 bucks each and we are cooking with gas. Temperatures are in the 30's. Except I don't have a power supply with two 8 pin EPS12V plugs. I don't mind using a splitter when running a pair of 40 watt processors but there's no way I'll feed a pair of 95 watt processors with a single EPS12V cable and a splitter. So along comes FedEx with a new power supply. You never know when you are going to need another power supply. I was going to do a swap and put the L5630's in the 1U in the garage and the 5670's in the X8DTE when I noticed a pair of 5675's going for the price of a pair of 5670's. So, 150 for the motherboard with some RAM, 200 for the processors, 50 for the power supply (after rebate) and 25 for the heat sinks. That's under your 500 budget and gives Cpu speed from cpuinfo 2133.00Mhz True Frequency (without accounting Turbo) 2133 MHz Socket [0] - [physical cores=6, logical cores=12, max online cores ever=6] CPU Multiplier 16x || Bus clock frequency (BCLK) 133.31 MHz TURBO ENABLED on 6 Cores, Hyper Threading ON Max Frequency without considering Turbo 2266.31 MHz (133.31 x [17]) Max TURBO Multiplier (if Enabled) with 1/2/3/4/5/6 cores is 26x/26x/25x/25x/25x/25x Real Current Frequency 3287.81 MHz (Max of below) Core [core-id] :Actual Freq (Mult.) C0% Halt(C1)% C3 % C6 % Temp Core 1 [0]: 3287.81 (24.66x) 99.9 0 0 0 55 Core 2 [1]: 3287.81 (24.66x) 99.9 0 0 0 54 Core 3 [2]: 3287.81 (24.66x) 99.9 0 0 0 54 Core 4 [3]: 3287.81 (24.66x) 99.9 0 0 0 54 Core 5 [4]: 3287.81 (24.66x) 99.9 0 0 0 57 Core 6 [5]: 3287.80 (24.66x) 99.9 0 0 0 53 Socket [1] - [physical cores=6, logical cores=12, max online cores ever=6] CPU Multiplier 16x || Bus clock frequency (BCLK) 133.31 MHz TURBO ENABLED on 6 Cores, Hyper Threading ON Max Frequency without considering Turbo 2266.31 MHz (133.31 x [17]) Max TURBO Multiplier (if Enabled) with 1/2/3/4/5/6 cores is 26x/26x/25x/25x/25x/25x Real Current Frequency 3289.81 MHz (Max of below) Core [core-id] :Actual Freq (Mult.) C0% Halt(C1)% C3 % C6 % Temp Core 1 [6]: 3289.49 (24.68x) 98.4 0 0 0 61 Core 2 [7]: 3289.81 (24.68x) 99.9 0 0 0 54 Core 3 [8]: 3289.81 (24.68x) 99.9 0 0 0 54 Core 4 [9]: 3289.81 (24.68x) 99.9 0 0 0 47 Core 5 [10]: 3289.80 (24.68x) 99.9 0 0 0 55 Core 6 [11]: 3289.80 (24.68x) 99.9 0 0 0 59 Now I need to find a case for it... Like I said, my wife hates you ... but I don't :) |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13752 Credit: 208,696,464 RAC: 304 |
I notice. That thing is freeking loud. And, the 5670's are cooking along at 80-90c. What is the ambient temperature there? From memory servers are generally designed with 22°c ambient temperatures in mind, and if the temperature is around that level then even with an extended 100% CPU load on all cores & running in a case in a rack the CPUs shouldn't get that hot. Does the exhaust air feel extremely hot? At those CPU temperatures it should. If the CPU is hot, the heatsink should be hot. If the heatsink is hot, the exhaust air should be hot. If not, it indicates that the heatsinks aren't fitted correctly. Grant Darwin NT |
fractal Send message Joined: 5 Mar 16 Posts: 5 Credit: 1,000,547 RAC: 0 |
I notice. That thing is freeking loud. And, the 5670's are cooking along at 80-90c. It is 13c now and temperatures have dropped to 70-80c. It can get as hot as 25c in the afternoon. It will get warmer as summer hits. From memory servers are generally designed with 22°c ambient temperatures in mind, and if the temperature is around that level then even with an extended 100% CPU load on all cores & running in a case in a rack the CPUs shouldn't get that hot. The air coming out the back is toasty warm. The top of the case is warm. Taking the top cover off reduces temperatures a little. Removing the air baffel causes the fans to ramp up as the temperatures rise. I was going to replace the 1u heat sinks with full size LGA1366 heat sinks except the mounting holes are different. I may try remounting them. Either way, it has an annoying screech from all those itty bitty fans. |
Al Send message Joined: 3 Apr 99 Posts: 1682 Credit: 477,343,364 RAC: 482 |
I was going to replace the 1u heat sinks with full size LGA1366 heat sinks except the mounting holes are different. I may try remounting them. Either way, it has an annoying screech from all those itty bitty fans. I have a HP Z800 motherboard that I am working on trying to get going using mostly non-HP parts, I have the a couple Zalman 1366 coolers, and I am running into the same issue. The fix I read for mine not fitting properly was to use motherboard standoffs, the screw pitch is the same as a normal case screw, but the complication I've found is that they need to be longer than your typical MB standoff, as the threads in the holes are recessed below the surface of the MB. The backplate on these isn't removable, I thought of that, it is somehow mated to the CPU socket. Thought of everything, didn't you, HP? ;-) Well, there are a couple of thoughts for you, I'm still searching for those standoffs, hopefully I can find them in the US, otherwise there is always Alibaba... Good luck! |
Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340 |
You can remove the backplates with a Torx screwdriver (I forget what size) from the front. You will then need standard desktop 1366 backplates that don't cover the mb hsf mounting holes. I did it and it worked very well with old i7-980x hsfs. |
Al Send message Joined: 3 Apr 99 Posts: 1682 Credit: 477,343,364 RAC: 482 |
EXCELLENT! I will give that a shot, when looking at it, it looked sort of like a type of rivet or pressed stamping(which I thought was quite weird for something as delicate as a motherboard, but it is HP, so who knows?), as the holes didn't appear symmetrical, at least it seemed that way when I looked at it, so they must be pretty small torx bits. Any idea as to the size they take? I'd love to get that thing torn down and assembled tonite. Have to start digging for my torx set, though from what I remember, I may have to get some smaller bits, I think the smallest I have is T10 if I remember correctly, and I'd guess that that would be something between a T4 to T6? Thanks again for the great news. |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
Al, I use something called Platinum Tools Precision Screwdriver set-33 on all my Graph cards. Should be able to get them at a computer store or order them online http://www.bhphotovideo.com/bnh/controller/home?O=&sku=906369&gclid=Cj0KEQjw-Mm6BRDTpaLgj6K04KsBEiQA5f20E02sBVbg7izx5C71flTBEDEzOVhBxG-8ItrkPNJfmuIaAqUG8P8HAQ&is=REG&ap=y&m=Y&c3api=1876%2C92051677562%2C&Q=&A=details https://www.computercablestore.com/precision-screwdriver-set-33-pc Good luck |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.