Most files must be aborted

Questions and Answers : Unix/Linux : Most files must be aborted
Message board moderation

To post messages, you must log in.

AuthorMessage
Lee Wilkerson

Send message
Joined: 6 Jul 99
Posts: 7
Credit: 891,075
RAC: 0
United States
Message 1899084 - Posted: 4 Nov 2017, 14:08:44 UTC

Can someone tell me if this situation is normal, or can be corrected? I have a PC with Mint Linux (Intel 4 core CPU Q6600 @ 2.40GHz , Linux 4.4.0-89-generic) which processes four files at a time and does not use GPU RAM.

After four days or so of processing, the system will abort the file with the message: "Timed out - no response". I have begun manually aborting files if they take close to one day to get to 98% or so, because those always hang. Sometimes rebooting the PC will fix the issue, but usually not. About 60% of the files have to be terminated, therefore much processing time is wasted.

I cannot find any hardware problems, or Linux or BOINC/Seti configuration issues. Resetting the project did not help.

Thanks.

Lee Wilkerson
ID: 1899084 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1899120 - Posted: 4 Nov 2017, 16:22:58 UTC - in response to Message 1899084.  

What I see from your list of erroneous tasks is that the Timed out - no response was outside its deadline. Tasks come with a deadline, a time limit before which they have to be calculated and reported. The one task that didn't meet that just timed out. This task may not even have been on your system, it may have been a so-called lost task (lost in transition between the server and your computer).

Then among all the aborted tasks, there's one exceeded elapsed time limit 431161.36 (3826907.57G/8.88G) . All tasks come with an estimated time to run value, measured in flops (floating point operations per second). When a task runs, BOINC calculates constantly how long it has already done so. When the task runs for longer than the estimated flops amount, the task is auto-aborted with the above error.

A cause for the problem could be that the system was busy with other things at the same time as running the task.
The benchmarks could be too low.
It could've been a bad task for your computer.
It may indicate hardware trouble, but until we see a lot more of these tasks do the same thing, there's no telling.

That one task has this error doesn't mean all tasks will have this error.
You shouldn't be aborting tasks, just because they can run for longer than a day. My Android phone regularly does over 270,000 - 300,000 seconds on a task, I'm just leaving things well alone and they'll finish fine.

Sorry to say this, but you say you must terminate the files. Why must you do that? Is someone threatening you with great bodily harm, if you do not do what they say in this? Why can't you leave them well alone?
Why don't you just let BOINC try to finish them? Now there's hardly any evidence to go by that something may be wrong with your system, or that it's a bad batch of tasks, because you continue to abort them. Just leave them be. If they're on your system, BOINC will try to run all tasks before their deadline.
ID: 1899120 · Report as offensive
Lee Wilkerson

Send message
Joined: 6 Jul 99
Posts: 7
Credit: 891,075
RAC: 0
United States
Message 1899182 - Posted: 4 Nov 2017, 22:12:59 UTC - in response to Message 1899120.  

Thanks for the reply. I don't actually have to abort the tasks - no, there are no threats. I can let them sit at 100.000% until the deadline comes, then the system will abort them. In the meantime the loss of good potential calculation time has been lost. This PC is doing virtually nothing else.

I don't like to abort files, but I have never seen one go to 100.000% and sit there for several days and then finish properly. I have no problem with files that take 4, 9, 20 or more days to complete. It seems like either this system has a hardware problem, or the time-to-calculate needs to be reset on the server.

Lee
ID: 1899182 · Report as offensive
Lee Wilkerson

Send message
Joined: 6 Jul 99
Posts: 7
Credit: 891,075
RAC: 0
United States
Message 1900021 - Posted: 9 Nov 2017, 14:57:45 UTC

This is why I manually abort files. The date and time on the processing system are correct. Files either process correctly in less than ten hours (elapsed), or they take 1-2 days to go to 100.000% processed, then after a few more days the system aborts them apparently for processing too long . It has nothing to do with deadlines:

Mon 06 Nov 2017 03:08:08 AM EST | SETI@home | Aborting task 20fe07af.27239.13569.12.39.33_0: exceeded elapsed time limit 152520.65 (1414546.15G/9.35G)
Note: Deadline was Tue 21 Nov 2017 10:24:15 AM EST

Mon 06 Nov 2017 04:56:45 PM EST | SETI@home | Aborting task 09mr07ab.13652.1299.7.34.23_0: exceeded elapsed time limit 151733.96 (1414546.15G/9.36G)
Note: Deadline was Tue 21 Nov 2017 10:24:15 AM EST

Tue 07 Nov 2017 09:09:24 PM EST | SETI@home | Aborting task 09mr07ab.13652.1299.7.34.17_1: exceeded elapsed time limit 151273.31 (1414546.15G/9.36G)
Note: Deadline was Tue 21 Nov 2017 10:24:15 AM EST

Wed 08 Nov 2017 10:54:29 AM EST | SETI@home | Aborting task 20fe07af.27239.13569.12.39.54_1: exceeded elapsed time limit 151061.69 (1414546.15G/9.40G)
Note: Deadline was Tue 21 Nov 2017 10:24:15 AM EST

Thu 09 Nov 2017 08:31:09 AM EST | SETI@home | Aborting task 02ja07aa.24089.281010.4.31.103.vlar_0: exceeded elapsed time limit 396248.37 (3677746.15G/9.40G)
Note: Deadline was Sun 24 Dec 2017 04:14:15 AM EST

Before I shut down this PC forever, is there something that can be done to fix the problem?
ID: 1900021 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1900028 - Posted: 9 Nov 2017, 15:52:17 UTC - in response to Message 1900021.  

I see you're using BOINC 7.6.31, is it possible for you to upgrade to a newer version or downgrade to an earlier version to make sure that the BOINC version isn't the cause here? It sounds as if boinc_finish() isn't called in time, at task end.
ID: 1900028 · Report as offensive
Lee Wilkerson

Send message
Joined: 6 Jul 99
Posts: 7
Credit: 891,075
RAC: 0
United States
Message 1900043 - Posted: 9 Nov 2017, 16:59:45 UTC - in response to Message 1900028.  

Thanks, I’ll definitely look into that.

Lee
ID: 1900043 · Report as offensive
Lee Wilkerson

Send message
Joined: 6 Jul 99
Posts: 7
Credit: 891,075
RAC: 0
United States
Message 1900524 - Posted: 11 Nov 2017, 14:31:48 UTC - in response to Message 1900028.  

I installed a different OS on the PC, Ubuntu 16.04 LTS. So far processing seems much improved.

Lee
ID: 1900524 · Report as offensive

Questions and Answers : Unix/Linux : Most files must be aborted


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.