PC suddenly shuts down

Message boards : Number crunching : PC suddenly shuts down
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34811
Credit: 261,360,520
RAC: 489
Australia
Message 1611214 - Posted: 9 Dec 2014, 7:48:13 UTC - in response to Message 1611212.  
Last modified: 9 Dec 2014, 7:48:50 UTC

I always enjoyed Dan's thermal paste review.

It's a shame that Dan only does the occasional blog, column or letter posts these days. :-(

Cheers.
ID: 1611214 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1611242 - Posted: 9 Dec 2014, 9:52:34 UTC

That was an interesting review. I was a bit disappointed to find Arctic Silver 5 so far down on the results at the end, but as Wiggo said.. I've always read (and I think even the instructions/data sheet/MSDS says it, too) that it needs several heat cycles before it reaches optimum efficiency. ie: full-load stressing for a few hours, shut it down and let it cool for a few hours, and repeat 2-5 more times.

However, I do agree with the ending remarks: unless you are doing some extreme OCing, the difference seems to somewhat consistently be 1-3C between the main contenders (excluding toothpaste...lol), so for all practical purposes, they're all pretty good at what they're intended to do. It just comes down to opinions about price and ease of use, and as opinions go: there are more opinions than there are people giving them, so to each their own.

However, as I said earlier on (and in another thread not too long ago), thermal compounds dry out after 1-4 years and become noticeably less efficient at their job. Even Arctic Silver 5--despite the claims that it "never dries out"--WILL after 7 years. My XP3200+ machine got AS-5'ed back in 2006 and I was having problems with it hitting the 60C temp alarm sitting in BIOS. Tried taking the heatsink off and the CPU came out with it. Gently pried and twisted and finally got it to pop free. Cleaned it all up, applied fresh AS-5 and put it back together and it idled in the mid 40's in BIOS again. ~15C drop in temp in the same ambient temp and airflow and loading conditions, just from putting fresh goop on there.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1611242 · Report as offensive
JohnDK Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 28 May 00
Posts: 1222
Credit: 451,243,443
RAC: 1,127
Denmark
Message 1611868 - Posted: 10 Dec 2014, 20:14:44 UTC

Damn I'm getting crazy, last night it shut down again. Could it be the GTX970 even though it's only 2 months old? Doing SETI or Milkyway the temp is 60-63c, but I ran out of work and started a Bitcoin task, here the temp was 69c and after 15 mins the PC shut down. Today I limited the temp to 60c with Afterburner and after 4-5 hours it shut down again... Right now I'm trying to only do 1 task instead of the 3 I normally do.

I still haven't done RAM check but they are only 2-3 weeks old.
ID: 1611868 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1611876 - Posted: 10 Dec 2014, 20:29:55 UTC - in response to Message 1611868.  
Last modified: 10 Dec 2014, 20:30:35 UTC

John,

What version of lunatics are you running?

You said you are running SIV?

What amount of memory are each of the work units using?

The normal operating range for the new Maxwell 900s is 90C so you should be good.
ID: 1611876 · Report as offensive
Dena Wiltsie
Volunteer tester

Send message
Joined: 19 Apr 01
Posts: 1628
Credit: 24,230,968
RAC: 26
United States
Message 1611892 - Posted: 10 Dec 2014, 21:11:58 UTC - in response to Message 1611868.  

Damn I'm getting crazy, last night it shut down again. Could it be the GTX970 even though it's only 2 months old? Doing SETI or Milkyway the temp is 60-63c, but I ran out of work and started a Bitcoin task, here the temp was 69c and after 15 mins the PC shut down. Today I limited the temp to 60c with Afterburner and after 4-5 hours it shut down again... Right now I'm trying to only do 1 task instead of the 3 I normally do.

I still haven't done RAM check but they are only 2-3 weeks old.

While new hardware is most likely not your problem, you should never assume that it's not. Infant mortality happens to hardware only a few weeks old and there can be lot issues as well. One time we had a batch of ICs and the failure rate was so high on them that every time we found a board with chips of that date code, we replaced the chips even if the board wasn't failing.
ID: 1611892 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34811
Credit: 261,360,520
RAC: 489
Australia
Message 1611903 - Posted: 10 Dec 2014, 21:43:49 UTC - in response to Message 1611868.  

Damn I'm getting crazy, last night it shut down again. Could it be the GTX970 even though it's only 2 months old? Doing SETI or Milkyway the temp is 60-63c, but I ran out of work and started a Bitcoin task, here the temp was 69c and after 15 mins the PC shut down. Today I limited the temp to 60c with Afterburner and after 4-5 hours it shut down again... Right now I'm trying to only do 1 task instead of the 3 I normally do.

I still haven't done RAM check but they are only 2-3 weeks old.

I've had faulty new RAM here plenty of times over the years here though I'm starting to wonder about your PSU.

What make and model is it and how old?

Cheers.
ID: 1611903 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1611919 - Posted: 10 Dec 2014, 22:07:25 UTC - in response to Message 1611903.  

I've had a couple of instances so far on one cruncher where for no apparent reason, the GTX 970's just shut down to 0%. The CPU continues to process and the GPU tasks will start back up with no errors after the reboot to get the cards going again. Have a Corsair AX-1200 feeding them so don't think the PSU is the problem. I have SIV regulate the card temperature setpoint to 65 degrees so don't think the cards are overheating. Haven't had this problem on the other identical cruncher. Never saw this problem on this machine with the old 670's. Got me stumped so far.

Cheers, Keith
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1611919 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65755
Credit: 55,293,173
RAC: 49
United States
Message 1612016 - Posted: 11 Dec 2014, 1:46:51 UTC

How old is the motherboard and do you have a 4pin or 6pin plug for the pcie slots on the motherboard?

Maybe the motherboard is in need of new heatsink compound and/or the cpu is.

If you do have one of those power plugs and you aren't using them, put a cable from the psu on it.

I had no such plug on My current motherboard, so I got a card from EVGA that goes in a pcie slot to provide some clean 12v power.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 1612016 · Report as offensive
tbret
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 3380
Credit: 296,162,071
RAC: 40
United States
Message 1612054 - Posted: 11 Dec 2014, 4:36:18 UTC - in response to Message 1611919.  

I've had a couple of instances so far on one cruncher where for no apparent reason, the GTX 970's just shut down to 0%. The CPU continues to process and the GPU tasks will start back up with no errors after the reboot to get the cards going again. Have a Corsair AX-1200 feeding them so don't think the PSU is the problem. I have SIV regulate the card temperature setpoint to 65 degrees so don't think the cards are overheating. Haven't had this problem on the other identical cruncher. Never saw this problem on this machine with the old 670's. Got me stumped so far.

Cheers, Keith


What happens if you exit BOINC Manager (killing the application processes) and restart BOINC Manager?
ID: 1612054 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1612544 - Posted: 12 Dec 2014, 1:47:58 UTC - in response to Message 1612054.  
Last modified: 12 Dec 2014, 1:56:29 UTC

I think I have had the cards restart maybe one time just with the close and restart of the Boinc Manager. Mainly have needed a restart of the Boinc Client accomplished with the computer reboot. Haven't attempted to just kill the client process and restart from within Windows. This isn't an issue of the video driver bugging out and restarting either because I never get the standard video driver error message and the desktop responds normally to window manipulation. The in process GPU tasks start right back up with no errors I assume reading back in from their checkpoints. Also any time you have the video driver crap out, you can kiss the in process GPU tasks a cold goodbye with a computation error status. As it stands now, I have seen it maybe 3 or 4 times in the last couple of weeks since the new 970's went in. First happened when I was crunching MilkWay and Einstein while I was waiting for SETI to come back. Now that I have been crunching solely SETI since the project came back (because I generated a large project debt for SETI by running only MW and Einstein) I have seen it a couple times on SETI tasks. I have poked around on the system trying to figure out what is happening but so far, no insights. I have tried to make single action changes to the system to see if I can catch it out and not be confused by multiple simultaneous changes made to fix the problem. It is now a wait and see situation. I'll post back to the thread if I get some kind of 'Aha!' moment.

Cheers, Keith

P.S. OK, now this is funny. It just happened during this post. Manager tabs respond to selection but the Tasks status page times are frozen. GPU utilization went to 0% and the fans wound down. And now .... the tasks are back running again at 99% and fans are ramping back up to get back to their 65 degree temp setpoints. The GPU's went bye-bye for about a minute and a half while I was typing this postscript. Didn't do anything with the Manager or Client.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1612544 · Report as offensive
WB8ILI

Send message
Joined: 27 May 03
Posts: 11
Credit: 12,942,299
RAC: 0
United States
Message 1612783 - Posted: 12 Dec 2014, 12:50:09 UTC

JohnDK -

I had EXACTLY the same issue a few weeks ago - the computer works OK as long as not using the GPU but shuts off when using the GPU. It would shut down within 10 seconds of starting a GPU task so it was easy to figure out.

My solution was suggested already in this thread - the power supply. A new, slightly more powerful supply resolved the problem.

My original power supply specifications were sufficient for the graphics card. I have no way of telling if it shut down because of voltage, current, or heat or whether any of those were actually out-of-range. Maybe it was one the sensors.
ID: 1612783 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1612846 - Posted: 12 Dec 2014, 16:41:58 UTC - in response to Message 1612783.  

John, the symptoms are not EXACTLY as you describe. I run 6 tasks at a time on the two cards. So, the tasks are in various percentage of completion at the time of the freeze. The freeze hasn't been observed to coincide with the start of any task. Also, I run SIV64 all the time on the desktop in a window right above the Boinc Manager so it is always available to report the status of the machine. The temps, voltages, fan speeds and GPU and CPU utilization is always visible. Everything else except the GPUs report normal. The temps are well withing spec, the voltages are well within spec. Looked at the 12V bus specifically when this problem occurred. Everything normal. The CPUs are crunching 4-5 tasks at the time the GPUs freeze. The CPU utilization and load graphs continue as normal. The GPU load graphs and utilization drop towards zero. The fan speeds start to wind down. The GPU core voltages remain normal. For about a minute or less, just enough for me to notice the fan noise decrease. Then everything just picks back up normally as if nothing happened. No BOINC or computation errors. Still looking for answers.

Keith
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1612846 · Report as offensive
JohnDK Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 28 May 00
Posts: 1222
Credit: 451,243,443
RAC: 1,127
Denmark
Message 1612854 - Posted: 12 Dec 2014, 17:04:02 UTC

I've also experienced the GPU task stops running, when I disable and enable GPU in BOINC it continues running, is it a GTX970 specific problem?

I've done a RAM test with a diagnostic tool included in the bios boot option menu, took about 30 mins with no error.

I could smell something burning/melting or whatever again last night, 5 mins later the PC shut down. GPU temp was 55/56c. The smell isn't something new btw, it's been there before over time, but it went away and nothing happend.

The PSU is a 3 years old Cooler Master GX 650W. The GTX970 is from Oct 7.

Which PSU should I try getting?
ID: 1612854 · Report as offensive
JohnDK Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 28 May 00
Posts: 1222
Credit: 451,243,443
RAC: 1,127
Denmark
Message 1612865 - Posted: 12 Dec 2014, 17:34:48 UTC - in response to Message 1612016.  

How old is the motherboard and do you have a 4pin or 6pin plug for the pcie slots on the motherboard?

Maybe the motherboard is in need of new heatsink compound and/or the cpu is.

If you do have one of those power plugs and you aren't using them, put a cable from the psu on it.

I had no such plug on My current motherboard, so I got a card from EVGA that goes in a pcie slot to provide some clean 12v power.

PC is a Dell XPS 8100 and is 5 years old next year. There is a 4 pin on the motherboard and I have it plugged to the PSU, but no idea if it for the pcie slot, manual calls it powerplug (pwr2).

Have installed a new heatsink, that didn't help.
ID: 1612865 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1612869 - Posted: 12 Dec 2014, 17:40:45 UTC - in response to Message 1612865.  
Last modified: 12 Dec 2014, 17:41:47 UTC

John,

what version of Lunatics are you using and how much memory (SIV records this) are the work units using when it happens?

Zalster
ID: 1612869 · Report as offensive
JohnDK Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 28 May 00
Posts: 1222
Credit: 451,243,443
RAC: 1,127
Denmark
Message 1612870 - Posted: 12 Dec 2014, 17:40:58 UTC - in response to Message 1612783.  

JohnDK -

I had EXACTLY the same issue a few weeks ago - the computer works OK as long as not using the GPU but shuts off when using the GPU. It would shut down within 10 seconds of starting a GPU task so it was easy to figure out.

My solution was suggested already in this thread - the power supply. A new, slightly more powerful supply resolved the problem.

My original power supply specifications were sufficient for the graphics card. I have no way of telling if it shut down because of voltage, current, or heat or whether any of those were actually out-of-range. Maybe it was one the sensors.

My PC runs for hours before it shuts down, yesterday 9 hours or so.
ID: 1612870 · Report as offensive
JohnDK Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 28 May 00
Posts: 1222
Credit: 451,243,443
RAC: 1,127
Denmark
Message 1612874 - Posted: 12 Dec 2014, 17:46:39 UTC - in response to Message 1612869.  

John,

what version of Lunatics are you using and how much memory (SIV records this) are the work units using when it happens?

Zalster

I'm using latest version. Don't know exactly how much memory it using, but according to GPU observer and All CPU meter gadgets there's plenty RAM to spare. But I have no idea how much it uses when it happens, there's no warning it just shuts down.
ID: 1612874 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1612880 - Posted: 12 Dec 2014, 17:55:45 UTC - in response to Message 1612874.  
Last modified: 12 Dec 2014, 17:57:50 UTC

I'm not talking about the RAM of your computer but the RAM of the GPU.

You can change the setting on SIV to show how much memory each work units utilizes of the GPU.

Early on, there was some problems with APs using upward of 1 GB of the GPU RAM

This lead to freezes and restarts of the APs and erroring out after sometime.

Juan called them "Huggers" cause they were hogging all the memory of the GPU.

That particular problem was addressed in the follow up version where they didn't go over 1 GB but they come pretty close.

Depending on how many work units you have running, if you had 1 of those, you would run out of memory pretty quickly.

That is why I asked which version you are using.

Zalster

Edit...

I seem to remember my computers shutting down at different times when those were around.

Hasn't happen in a while since they put in changes and I monitor usage.
ID: 1612880 · Report as offensive
JohnDK Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 28 May 00
Posts: 1222
Credit: 451,243,443
RAC: 1,127
Denmark
Message 1612886 - Posted: 12 Dec 2014, 18:11:19 UTC

GPU observer reports GPU RAM and running 3 tasks it's using maybe 1gb out of the 4gb.

I did have those APs using upto 1.5gb, but that didn't make the PC shutting down. But that problem did go away with r2399, so that's not the issue.
ID: 1612886 · Report as offensive
Profile cliff
Avatar

Send message
Joined: 16 Dec 07
Posts: 625
Credit: 3,590,440
RAC: 0
United Kingdom
Message 1612895 - Posted: 12 Dec 2014, 18:34:00 UTC - in response to Message 1611206.  

Hi Wiggo,

There is a small problem with AS5, if you get the larger syringe of the stuff it deteriorates over time, if you haven't used it for a year or so its easy to forget its supposed to be a dark greyish colour, after time it turns white and is about as much use as 3in1 oil:-(

Good point about the bedding in times, also good to apply a thin layer to the CPU or its wont ever bed in and just becomes a barrier to good heat conductivity.

Regards,
Cliff,
Been there, Done that, Still no damm T shirt!
ID: 1612895 · Report as offensive
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : PC suddenly shuts down


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.