Message boards :
Number crunching :
New SETI Rig
Message board moderation
Author | Message |
---|---|
I3APR Send message Joined: 23 Apr 16 Posts: 99 Credit: 70,717,488 RAC: 0 |
Ok, take two. I was told 8 CPU ( with HT) are not enough to drive 5/6 GPUs, so I've got 48. The new setup : - 2 x Intel Xeon E5 2670 v3 - Asrock EP2C612 WS dual Xeon - 32 Gb Kingston RAM - 2 x Noctua NH-U9DX i4 cooler - 1 x Samsung SSD 750 Evo, 250 GB - Corsair HX1200I PSU ( will be replaced with a Corsair AX1500I ) So, as Al knows, I was struggling with the fact that the CPU I've got were ES, so the first thing to do was to check the compatibility with the other components. I spare the unboxing to you, but want to point out I've received a semi-defective cooler from Amazon : since it should not harm the system, I decided to use it anyway.. : So here are some pictures, while I was building it : The Mobo, with the Xeons, ready for some thermal paste and the coolers : First CPU + cooler ready; also installed RAM : Et voila' : after conneting the SSD, time to power it up And BAM !! 48 CPUs !! Here's CPU-Z : Now, I will keep the system UP and running for the next 24 hours, to check stability, then will try to add the GPUs. A. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13746 Credit: 208,696,464 RAC: 304 |
If you haven't already done it, i'd suggest running the Lunatics installer & installing the Lunatics CPU AVX application. It will give a big boost over the stock AVX application. Grant Darwin NT |
spitfire_mk_2 Send message Joined: 14 Apr 00 Posts: 563 Credit: 27,306,885 RAC: 0 |
Love the 100 MHz bus speed. Brought back some Socket 7 Pentium memories. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13746 Credit: 208,696,464 RAC: 304 |
Love the 100 MHz bus speed. Brought back some Socket 7 Pentium memories. With Quad memory channels & 2 QPI links it's got a bit more bandwidth than those older systems. Grant Darwin NT |
Al Send message Joined: 3 Apr 99 Posts: 1682 Credit: 477,343,364 RAC: 482 |
Yaay, it's running! Gratz, I was pulling for you that it would work. Now you need to put a load onto it and see if you get any errors, and then go to town. :-) You might still want to contact Amazon about the damaged cooler, if they would advance ship you a replacement, you could still run it till the new one arrived, and then swap it out and ship the defective one back? Just a thought, I know it would bug me every time I looked at it, but that's just me. |
I3APR Send message Joined: 23 Apr 16 Posts: 99 Credit: 70,717,488 RAC: 0 |
Yes Al, thank you, it is running, but nothing like other Asrock Mobo, which was running steady and rock solid with 5 GPUs after a couple of weeks... :-( with this system, I have had BSODs, without, but especially WITH a 1080 GPU hoked up via USB riser. To make windows 10 recognize it was a real nightmare with system freeze and need to update...I estimate that I had to reboot the system at least 30 times before getting it to work; lucky for me, the mobo has an onboard VGA, otherwise it would a much worse problem : in order to make it work, I had to remove the GPU, update to Win 10 ver 1511, then plug it in and only then I was able to install Nvidia drivers. But of course, after 20 minutes I run into some BSOD, so I gave up and tried something different : - removed GPU The system run fine for 5 hours then BSOeD - update mobo Bios to the last available - disabled HT ( to have a physical cores reserved for the GPU ) - disabled Intel Turbo Boost Technology, to have the CPUs running not above nominal speed - set PCI link width from Auto to the lowest value possible (4x ) although the risers are 1x - set PCI link speed from "Auto" to "Gen 1" About this last setting, I found this : Summary of PCI Express Interface Parameters: Base Clock Speed: PCIe 3.0 = 8.0GHz, PCIe 2.0 = 5.0GHz, PCIe 1.1 = 2.5GHz Data Rate: PCIe 3.0 = 1000MB/s, PCIe 2.0 = 500MB/s, PCIe 1.1 = 250MB/s Total Bandwidth: (x16 link): PCIe 3.0 = 32GB/s, PCIe 2.0 = 16GB/s, PCIe 1.1 = 8GB/s Data Transfer Rate: PCIe 3.0 = 8.0GT/s, PCIe 2.0= 5.0GT/s, PCIe 1.1 = 2.5GT/s So I thought that lowering the Base clock speed could help preventing BSOD I tell ya : it is really frustrating, and the perception is that of a very unstable system, and with only one GPU !! What next ? Well...let the system run of course and see if it gets any more stable, but since I have Genuine licenses of Windows 8.1 Ent and Windows 2012 R2, I might give them a try. A. |
Al Send message Joined: 3 Apr 99 Posts: 1682 Credit: 477,343,364 RAC: 482 |
Ruh Roh, Raggy.. I am just making a guess here, but I have got a gut suspicion that the early release CPU is maybe biting you in the backside. This one is as you mentioned one of the earliest ones, probably has some things disabled or not there, other things might not be quite up to speed with what the system board is expecting. I was afraid that this might happen, of course I don't know for certain, but with all the struggles you have described, my guess is that it is the CPU's. I have an idea for you, and it won't cost much money. Go and find a pair of known good, low end Xeon CPU's that are compatible with this board, maybe V1 or V2, they should be fairly easy to find for a great price, if they have low core counts, because everyone wants high core counts for perf reasons. At this point, you don't care about cores, you just want to confirm that either it is something else in your system or the CPU's fault for all the grief you are experiencing. More fun, I know, but at least if you have a stable running (albeit less than optimal core count) machine, you can go back to the seller of the CPUs and get your money back, and look for hopefully a release version of them, though my guess is the price will be significantly higher, sadly. Well, best of luck with this one, try not to pull too much of your hair out, and as always, keep us up to speed on your "adventure"! ;-) |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
The kitties luv new rig pics. They drool over them, as their farm is very old. New boards are very cool. Give me a couple of years until some bills are paid off, and I might be able to join ya. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
The_Matrix Send message Joined: 17 Nov 03 Posts: 414 Credit: 5,827,850 RAC: 0 |
At first of all, congrats on such insane rig. Second, limit your PCIe bus speed to "Gen2", says its PCIe 2.0. Your GPU will run much more stable as before. |
The_Matrix Send message Joined: 17 Nov 03 Posts: 414 Credit: 5,827,850 RAC: 0 |
Ur switching always the os, what´s the problem ? Why u don`t upgrade to WX Pro ? |
I3APR Send message Joined: 23 Apr 16 Posts: 99 Credit: 70,717,488 RAC: 0 |
@The Matrix Tnx for the hint about Gen2, I will do it. However I did not understand what you were trying to say about WX Pro..: I was running Windows 2010 Enterprise on the PC. @msattler A friendly rub to all the kitties there.. ;-) I'm sure you will come up with a sublime rig, when the time will come... @Grant Thank you, I was already using Lunatics. @Al Yes...I begin to think those Xeons are too early pre-production issue. I will write to the Ebay seller and ask to change them with a more recent one or a refund. Anyway, after a couple of hours after my last msg, the PC BSOeD, so all the changes I did were not good, it seems.. I decided then to install Windows Server 2012R2 Standard, and this is what I'm using right now while I write. Let's see if the unstable situation was caused by bad drivers/spoiled kernel or not. I will install a plain Boinc and monitor the situation, but I'm not expecting anything good tonight. Thanks for your sympathy ;-) A. |
The_Matrix Send message Joined: 17 Nov 03 Posts: 414 Credit: 5,827,850 RAC: 0 |
Yeah, u could install Windows 10 Professional too, it runs great with 6 cores double cpus. Must be also running with 48 threads too without issue. Why not ? |
I3APR Send message Joined: 23 Apr 16 Posts: 99 Credit: 70,717,488 RAC: 0 |
Ok, I have some update. 1) My first rig, the old one died today. The OS did, and it was impossible to recover/repair, so I decided to reinstall the OS, and replaced the Radeon 7800 gpu with the Gainward GTX 1080. Was able to transfer the Boinc structure in the new installation, and it is working pretty well, with 4 GPU tasks/2 physical core. 2) I installed Windows Server 2010 R2 on the dual Xeons rig and is working flawlessly since 20 hours, with HT enabled no GPUs. I'm begining to think something was wrong in the way the drivers were installed for the Windows 10 installation, and the BSODs were related to that. NOW...the Rig is still under test, and the fact that it is working rather well should be conforting, but...but, then I decided to try to install a GTX 780ti and, well it is no surprise, there's no way the Nvidia drivers can be installed !! ( Not that I know of..) By default, the GPU is seen by Win2012R2 as "Microsoft Basic Display Adapter" with an exclamation mark. I downloaded the latest drivers for Windows 10 64bit, which is the closest OS I could think of, and run the installation program. When checking the system compatibility during the installation process, it fails, with an error like "This NVIDIA graphic driver is not compatible with this version of windows". Probably Nvidia is pushing the "Quadro" type for the Server market, and have no plan to make consumers gpu work with such server OS. I did some homework, and looked for a solution...tried even this approach : http://blog.ittoby.com/2013/02/installing-nvidia-consumer-drivers-on.html and got an oem18.inf, which I tried to install in any way I know, with no success. It seems then that I'm stuck in a dead end road : Windows 10 Enterprise installation, sees the GPU and installed the drivers, but it is highly unstable with lots of BSODs. On other hand, Windows Server 2012R2 installation on the same hardware seems solid, but have had no success in making GPU recognized by the OS ( no drivers ). So I'm officially asking : does anybody know how to install nvidia drivers for GTX 600/700/1000 series under Windows Server 2012R2 pretty please !! ?? Thank you, thank you in advance .... A. |
The_Matrix Send message Joined: 17 Nov 03 Posts: 414 Credit: 5,827,850 RAC: 0 |
think there will be NO solution for that...the only... Get a Windows 7 Professional 64-bit and upgrade to Windows 10 Pro 64-bit, only 5 days left for a free upgrade. If it´s only a cruncher/gamer system you can renounce on security and data safety. My thoughts on this... |
The_Matrix Send message Joined: 17 Nov 03 Posts: 414 Credit: 5,827,850 RAC: 0 |
ok, it can be doet so too. Ur running the 1080 on your i7 board, and the rest on your 48 core monster. :D Happy crunching. |
Al Send message Joined: 3 Apr 99 Posts: 1682 Credit: 477,343,364 RAC: 482 |
I guess at this point I'd suggest just swapping out the CPU's and getting something that should work, either a later rev of the same pre-release CPU, or a release version if you can find it at a reasonable price. Otherwise, whenever you have new releases of whatever software, be it a driver or a new version of BOINC, or an update to the OS with a service pack or something, you'll never know for sure if something weird happens if it is because of the CPU, or the new software, or whatever. I know it sucks that you'd have to wait for the new one to arrive, or spending more money if you end up getting the release version of the same proc, but in the end it may mean less pulling out of hair? Just my .02 |
I3APR Send message Joined: 23 Apr 16 Posts: 99 Credit: 70,717,488 RAC: 0 |
Al, Matrix and everybody, I really appreciate you support !! I'd like to comment on my progress....for someone it could be boring, but maybe someone else can just use my experience to avoid the same mistake I did !! Ok, first of all, I must say that Windows 2012 R2 is working rock solid since days, with HT & Turbo boost enabled ( thus pushing the Xeons from 2.0 to 2.4 Ghz), no glitch. One of the things I could do was reinstall Windows 10 Enterprise, or maybe Windows 8.1, but from the steadiness POV, I prefer Win 2012. Now, I was even able to make the gpus work !!! It is partially my fault, since I asked a Wintel systems engineer colleague of mine to corroborate my assumption that the closest OS to W2012 was W2010 and he reply positively, but he ( and I ) was/were wrong : even graphically speaking, Windows 2012R2 is very close to Win 8.1, so I downloaded Win8.1 drivers from Nvidia and the installation under Win2012 went successful !! So, the first card, the GTX 780ti, on slot PCIE 1 was recognized by the OS and utilized by boinc. Then I decided to add a GTX 660ti, and the most logical ( not technical tho ) solution was to put it on slot 2. I was wrong : the 660ti card was not detected. A quick look at the MoBo manual, which for SLI configuration with Nvidia gpus, suggest to use slot 1 & 3...(I am not using SLI, but I believed it was a hint about which slot to use progressively). Again I was wrong...at the boot the system did hang. I tried slot 5 then and...it worked. I must try to understand the logic beneath the choice of the PCIE slot to use, otherwise I'll end up to use a "trial and error" approch.. As usual, if someone has any suggestion, please feel free to advise ( the suggestion touse "Gen 2" for PCIE link speed was good, since when "Auto" prevents from detecting the GPUs. I'll keep you on the loop. P.S. Al, just to be sure that I've tried everything, I'll re-install Windows 10 with this bios settings, and also will install Windows 8.1, but at this point I believe something went wrong during the driver installation at the beginning... |
J. Mileski Send message Joined: 9 Jun 02 Posts: 632 Credit: 172,116,532 RAC: 572 |
Al, Matrix and everybody, I really appreciate you support !! I just looked at the manual. First the PCIe slots are numbered from bottom to top with the white ones from CPU 2 and the blue ones from CPU 1, to me that suggests two separate PCIe channels. I'm guessing that the 2 different generations of cards are not working together on this motherboard. From your description you ended up with one card on the white slot and one on the blue. So I believe if you want to add another card in the future, if it is another 780ti, place it in the same color slot as your first card. |
The_Matrix Send message Joined: 17 Nov 03 Posts: 414 Credit: 5,827,850 RAC: 0 |
Are your chipset drivers already installed yet ? Intel C612 chipset drivers.... https://downloadcenter.intel.com/product/81756/Intel-C612-Chipset |
I3APR Send message Joined: 23 Apr 16 Posts: 99 Credit: 70,717,488 RAC: 0 |
First the PCIe slots are numbered from bottom to top with the white ones from CPU 2 and the blue ones from CPU 1, to me that suggests two separate PCIe channels. I'm guessing that the 2 different generations of cards are not working together on this motherboard. From your description you ended up with one card on the white slot and one on the blue. So I believe if you want to add another card in the future, if it is another 780ti, place it in the same color slot as your first card. Hi J, tnx for the hint, and thanx for taking the time to read the manual of a MoBo you don't own, I really appreciate that !! Well, I've been fighting all evening with this Mobo and his 7-read-7 PCI slots... The connection schema of the PCI Slot to CPU are part of the equation, but what you can't see on the manual, is that on the Bios, you can also set the Link speed and the Width, so the variables became a lot !! So after a couple of hours of "try-and-error" approach, rather frustrating I had to say, I read the manual again and decide to act as if I had to install an homogeneous 4 way SLI, but instead I wanted to install 1x780ti and 3x660ti, NOT in SLI. I followed the instruction, putting the link width at 16x for slots 1,3,5,7, set generation to Gen2 for stability, connected an extra power link from PSU to the Mobo, for voltage/current stability and rebooted : Surprise ! All four GPUs... ;-) Started Boinc, and this is the first part of the log : 7/26/2016 9:42:32 PM | | Starting BOINC client version 7.6.22 for windows_x86_64 7/26/2016 9:42:32 PM | | log flags: file_xfer, sched_ops, task 7/26/2016 9:42:32 PM | | Libraries: libcurl/7.45.0 OpenSSL/1.0.2d zlib/1.2.8 7/26/2016 9:42:32 PM | | Data directory: C:\ProgramData\BOINC 7/26/2016 9:42:32 PM | | Running under account Andrea 7/26/2016 9:42:37 PM | | CUDA: NVIDIA GPU 0: GeForce GTX 780 Ti (driver version 368.81, CUDA version 8.0, compute capability 3.5, 3072MB, 2879MB available, 6022 GFLOPS peak) 7/26/2016 9:42:37 PM | | CUDA: NVIDIA GPU 1: GeForce GTX 660 Ti (driver version 368.81, CUDA version 8.0, compute capability 3.0, 2048MB, 1953MB available, 2810 GFLOPS peak) 7/26/2016 9:42:37 PM | | CUDA: NVIDIA GPU 2: GeForce GTX 660 Ti (driver version 368.81, CUDA version 8.0, compute capability 3.0, 2048MB, 1953MB available, 2810 GFLOPS peak) 7/26/2016 9:42:37 PM | | CUDA: NVIDIA GPU 3: GeForce GTX 660 Ti (driver version 368.81, CUDA version 8.0, compute capability 3.0, 2048MB, 1953MB available, 2810 GFLOPS peak) 7/26/2016 9:42:37 PM | | OpenCL: NVIDIA GPU 0: GeForce GTX 780 Ti (driver version 368.81, device version OpenCL 1.2 CUDA, 3072MB, 2879MB available, 6022 GFLOPS peak) 7/26/2016 9:42:37 PM | | OpenCL: NVIDIA GPU 1: GeForce GTX 660 Ti (driver version 368.81, device version OpenCL 1.2 CUDA, 2048MB, 1953MB available, 2810 GFLOPS peak) 7/26/2016 9:42:37 PM | | OpenCL: NVIDIA GPU 2: GeForce GTX 660 Ti (driver version 368.81, device version OpenCL 1.2 CUDA, 2048MB, 1953MB available, 2810 GFLOPS peak) 7/26/2016 9:42:37 PM | | OpenCL: NVIDIA GPU 3: GeForce GTX 660 Ti (driver version 368.81, device version OpenCL 1.2 CUDA, 2048MB, 1953MB available, 2810 GFLOPS peak) 7/26/2016 9:42:37 PM | SETI@home | Found app_info.xml; using anonymous platform 7/26/2016 9:42:37 PM | | Host name: Win2012ST 7/26/2016 9:42:37 PM | | Processor: 48 GenuineIntel Genuine Intel(R) CPU @ 2.20GHz [Family 6 Model 63 Stepping 1] 7/26/2016 9:42:37 PM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 fma cx16 sse4_1 sse4_2 movebe popcnt aes f16c rdrandsyscall nx lm avx avx2 vmx smx tm2 dca pbe fsgsbase bmi1 hle smep bmi2 7/26/2016 9:42:37 PM | | OS: Microsoft Windows Server 2012 R2: Standard x64 Edition, (06.03.9600.00) 7/26/2016 9:42:37 PM | | Memory: 31.88 GB physical, 63.88 GB virtual 7/26/2016 9:42:37 PM | | Disk: 232.54 GB total, 176.93 GB free 7/26/2016 9:42:37 PM | | Local time is UTC +2 hours 7/26/2016 9:42:37 PM | SETI@home | Found app_config.xml 7/26/2016 9:42:37 PM | | Config: use all coprocessors And this is the power consumption ( @220V) read on Corsair Link sofware connected via USB to the PSU : I tell ya, not bad for a dual Xeon V3 with 48 cores@100% and 4 Gpus... Now, the next step would be to try to fill the remaining 3 PCI slots, but...unfortunately I almost run out of connection on the mighty Corsair AX1500i PSU...: only two PCI-E power connections left, so I'll have to wait until the GTX 1070 will arrive to test it "full spread". I'm not really familiar now with the setting in app_config.xml for a 48 core ( 24 physical ) system with four nvidia GPUs, but I'm opening a new thread for this. Thank you, however... A. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.