Message boards :
Number crunching :
A lot of mysterious MB CPU BM error messages lately
Message board moderation
Author | Message |
---|---|
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Maybe Jason or Raistmer can explain the meaning of this message I am getting on some MB CPU tasks in Boinc Manager? Postponed: Impossible Autocorr power, retrying from checkpoint Have been seeing them frequently on one computer now for about a week. The task eventually goes back into the queue and processes normally with a good result. But why does the message come up in the first place. Searched for the error string in the forum but it didn't hit on anything. Apologies if this has been explained already. Keith Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
It's a sanity check mechanism. Autocorrelation processing cannot produce certain outputs unless something has clobbered the data. So when the app sees an impossible value it does a temporary exit, and when BOINC restarts the app it is working with a fresh read of the data from the WU. There are also sanity checks on other signal types which can do the same thing, we consider it desirable to avoid sending known bad values back to the project. How the data got clobbered is unknown, of course. Possibilities are memory going bad, some other program reaching outside it's own memory area, etc. Joe |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Also, if those tasks ultimately finished and validated, it means your host definitely had damaged version of data for those tasks at some point of time and restart healed this. Time to check host for overheating/stability. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Thanks for the explanation guys. Not sure why the bad data read though. The system has been completely stable for years now. The system gets regularly cat de-haired every 2-3 months. Never seen a really dirty interior. I have an H-105 AIO cooling the CPU and the core temps normally run about 40 degrees C. with the socket/mainboard temps running < 55 degrees C. I'm fairly certain that is well within the specs for the chip and mainboard. I do wonder if the error can be attributed to running MilkyWay and Einstein on the GPU's along with SETI? I only run SETI on the CPU however. The system seems to have high utilization normally. About 85-95% CPU utilization and 99% on the GPUs. Will have to monitor the problem I guess and see if it worsens. At least your error checking and recovery seems to be working quite well. Thanks for your hard work in making the apps. Cheers, Keith Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
As a side input: It's a pretty big misconception that if a system appears to be running then data cannot be damaged. This is one of the biggest differences between 'enterpise grade', ECC RAM etc, typically more expensive, and consumer gear. Data corruption can originate from radiation from particles within the semiconductor packaging materials, and from space, so I'm sure clearing dust bunnies cannot completely eliminate potential data corruption [in the best of machines]. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Rasputin42 Send message Joined: 25 Jul 08 Posts: 412 Credit: 5,834,661 RAC: 0 |
Data corruption can originate from radiation from particles within the semiconductor packaging materials Are you saying computer chips are radioactive? |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Data corruption can originate from radiation from particles within the semiconductor packaging materials Absolutely "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874 |
Data corruption can originate from radiation from particles within the semiconductor packaging materials So is almost every other material on planet Earth. But it's all a matter of degree. |
Rasputin42 Send message Joined: 25 Jul 08 Posts: 412 Credit: 5,834,661 RAC: 0 |
Would that not be very counter-productive, if that corrupts data? |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Data corruption can originate from radiation from particles within the semiconductor packaging materials Well stop rotating so rapidly. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Would that not be very counter-productive, if that corrupts data? Which post ? "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Rasputin42 Send message Joined: 25 Jul 08 Posts: 412 Credit: 5,834,661 RAC: 0 |
The one about chip packaging being radioactive. I very much doubt, that it is more radioactive than anything around us, except maybe smoke-detectors. I stand corrected, i just looked it up. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I do astrophotography and you would be amazed at how many cosmic rays hit on a little 24mm X 36mm detector in just 10 minutes. There is natural radiation all around you. The only time I ever fogged my radiation badge was on a 13 hour flight over the pole from LA to London. Pretty impressive compared to never fogging a badge for a year sitting outside a high-energy medical linear accelerator. Keith Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.