Help. SETI@home - 2005-01-29 18:19:06 - Result 28mr04aa.6348.4946.523558.214_1 exited with zero status but no 'finished' file

Message boards : Number crunching : Help. SETI@home - 2005-01-29 18:19:06 - Result 28mr04aa.6348.4946.523558.214_1 exited with zero status but no 'finished' file
Message board moderation

To post messages, you must log in.

AuthorMessage
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 75301 - Posted: 30 Jan 2005, 1:03:24 UTC

For many months my P3 500 Win 98 machine has been giving me these errors.

I have tried Reinstallation of the OS
Uninstall/ reinstall of the client.
Uninstall/manual file deletion/reinstallation of Boinc.

Nothing seems to work. It does complete WU but at ridiculously high numbers.

for example:

a seti wu is currently at 94% and has an accumulated time of 33:34:12 with 1:51:32 remaining
an Einstein WU is currently at 7% has taken 4:39.52 with 59:33:58 remaining.
an Predictor WU is currently at 37% has taken 7:46:14 with 13:16:04 remaining.

This system has 256M ram.

Is using 4.19, but this problem has been present through many version of Boinc.

the following is a copy of the current STDOUT:
2005-01-29 12:47:51 [---] Starting BOINC client version 4.19 for windows_intelx86
2005-01-29 12:47:52 [LHC@home] Project prefs: using your defaults
2005-01-29 12:47:52 [ProteinPredictorAtHome] Project prefs: using your defaults
2005-01-29 12:47:52 [SETI@home] Project prefs: using your defaults
2005-01-29 12:47:52 [Pirates@Home] Project prefs: using your defaults
2005-01-29 12:47:52 [Einstein@Home] Project prefs: using your defaults
2005-01-29 12:47:52 [LHC@home] Host ID is 20100
2005-01-29 12:47:52 [ProteinPredictorAtHome] Host ID is 43057
2005-01-29 12:47:52 [SETI@home] Host ID is 465385
2005-01-29 12:47:52 [Pirates@Home] Host ID is 7068
2005-01-29 12:47:52 [Einstein@Home] Host ID is 5066
2005-01-29 12:47:52 [---] General prefs: from SETI@home (last modified 2005-01-22 10:42:40)
2005-01-29 12:47:52 [---] General prefs: using your defaults
2005-01-29 12:47:54 [SETI@home] Deferring computation for result 28mr04aa.6348.4946.523558.214_1
2005-01-29 12:47:54 [Einstein@Home] Resuming computation for result H1_0063.9__0064.0_0.1_T15_Test02_1 using einstein version 4.72
2005-01-29 12:47:55 [ProteinPredictorAtHome] Deferring computation for result t0201E_1_8763_1
2005-01-29 13:04:59 [Einstein@Home] Result H1_0063.9__0064.0_0.1_T15_Test02_1 exited with zero status but no 'finished' file
2005-01-29 13:04:59 [Einstein@Home] If this happens repeatedly you may need to reset the project.
2005-01-29 13:05:00 [Einstein@Home] Restarting result H1_0063.9__0064.0_0.1_T15_Test02_1 using einstein version 4.72
2005-01-29 13:12:42 [Einstein@Home] Result H1_0063.9__0064.0_0.1_T15_Test02_1 exited with zero status but no 'finished' file
2005-01-29 13:12:42 [Einstein@Home] If this happens repeatedly you may need to reset the project.
2005-01-29 13:12:42 [Einstein@Home] Restarting result H1_0063.9__0064.0_0.1_T15_Test02_1 using einstein version 4.72
2005-01-29 13:16:46 [Einstein@Home] Result H1_0063.9__0064.0_0.1_T15_Test02_1 exited with zero status but no 'finished' file
2005-01-29 13:16:46 [Einstein@Home] If this happens repeatedly you may need to reset the project.
2005-01-29 13:16:46 [Einstein@Home] Restarting result H1_0063.9__0064.0_0.1_T15_Test02_1 using einstein version 4.72
2005-01-29 13:46:47 [SETI@home] Restarting result 28mr04aa.6348.4946.523558.214_1 using setiathome version 4.08
2005-01-29 13:46:47 [Einstein@Home] Pausing result H1_0063.9__0064.0_0.1_T15_Test02_1 (left in memory)
2005-01-29 14:16:47 [SETI@home] Pausing result 28mr04aa.6348.4946.523558.214_1 (left in memory)
2005-01-29 14:16:47 [ProteinPredictorAtHome] Restarting result t0201E_1_8763_1 using mfoldB125 version 4.22
2005-01-29 14:46:47 [Einstein@Home] Resuming result H1_0063.9__0064.0_0.1_T15_Test02_1 using einstein version 4.72
2005-01-29 14:46:47 [ProteinPredictorAtHome] Pausing result t0201E_1_8763_1 (left in memory)
2005-01-29 15:16:47 [SETI@home] Resuming result 28mr04aa.6348.4946.523558.214_1 using setiathome version 4.08
2005-01-29 15:16:47 [Einstein@Home] Pausing result H1_0063.9__0064.0_0.1_T15_Test02_1 (left in memory)
2005-01-29 15:46:47 [SETI@home] Pausing result 28mr04aa.6348.4946.523558.214_1 (left in memory)
2005-01-29 15:46:47 [ProteinPredictorAtHome] Resuming result t0201E_1_8763_1 using mfoldB125 version 4.22
2005-01-29 16:16:47 [Einstein@Home] Resuming result H1_0063.9__0064.0_0.1_T15_Test02_1 using einstein version 4.72
2005-01-29 16:16:47 [ProteinPredictorAtHome] Pausing result t0201E_1_8763_1 (left in memory)
2005-01-29 16:46:49 [SETI@home] Resuming result 28mr04aa.6348.4946.523558.214_1 using setiathome version 4.08
2005-01-29 16:46:49 [Einstein@Home] Pausing result H1_0063.9__0064.0_0.1_T15_Test02_1 (left in memory)
2005-01-29 17:15:53 [---] Insufficient work; requesting more
2005-01-29 17:16:01 [---] Insufficient work; requesting more
2005-01-29 17:16:01 [LHC@home] Requesting 22181 seconds of work
2005-01-29 17:16:01 [LHC@home] Sending request to scheduler: http://lhcathome-sched1.cern.ch/scheduler/cgi
2005-01-29 17:16:06 [LHC@home] Scheduler RPC to http://lhcathome-sched1.cern.ch/scheduler/cgi succeeded
2005-01-29 17:16:07 [LHC@home] Message from server: No work available
2005-01-29 17:16:40 [ProteinPredictorAtHome] Sending request to scheduler: http://predictor1.scripps.edu/predictor_cgi/cgi
2005-01-29 17:16:44 [ProteinPredictorAtHome] Scheduler RPC to http://predictor1.scripps.edu/predictor_cgi/cgi succeeded
2005-01-29 17:16:45 [ProteinPredictorAtHome] Project prefs: using your defaults
2005-01-29 17:16:49 [SETI@home] Pausing result 28mr04aa.6348.4946.523558.214_1 (left in memory)
2005-01-29 17:16:49 [ProteinPredictorAtHome] Resuming result t0201E_1_8763_1 using mfoldB125 version 4.22
2005-01-29 17:18:08 [SETI@home] Sending request to scheduler: http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi
2005-01-29 17:18:13 [SETI@home] Scheduler RPC to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi succeeded
2005-01-29 17:18:14 [SETI@home] Project prefs: using your defaults
2005-01-29 17:46:49 [Einstein@Home] Resuming result H1_0063.9__0064.0_0.1_T15_Test02_1 using einstein version 4.72
2005-01-29 17:46:49 [ProteinPredictorAtHome] Pausing result t0201E_1_8763_1 (left in memory)
2005-01-29 18:16:08 [---] Insufficient work; requesting more
2005-01-29 18:16:08 [LHC@home] Requesting 22185 seconds of work
2005-01-29 18:16:08 [LHC@home] Sending request to scheduler: http://lhcathome-sched1.cern.ch/scheduler/cgi
2005-01-29 18:17:07 [SETI@home] Resuming result 28mr04aa.6348.4946.523558.214_1 using setiathome version 4.08
2005-01-29 18:17:07 [Einstein@Home] Pausing result H1_0063.9__0064.0_0.1_T15_Test02_1 (left in memory)
2005-01-29 18:17:08 [SETI@home] Result 28mr04aa.6348.4946.523558.214_1 exited with zero status but no 'finished' file
2005-01-29 18:17:08 [SETI@home] If this happens repeatedly you may need to reset the project.
2005-01-29 18:17:08 [Einstein@Home] Result H1_0063.9__0064.0_0.1_T15_Test02_1 exited with zero status but no 'finished' file
2005-01-29 18:17:08 [Einstein@Home] If this happens repeatedly you may need to reset the project.
2005-01-29 18:17:08 [ProteinPredictorAtHome] Result t0201E_1_8763_1 exited with zero status but no 'finished' file
2005-01-29 18:17:08 [ProteinPredictorAtHome] If this happens repeatedly you may need to reset the project.
2005-01-29 18:17:08 [SETI@home] Restarting result 28mr04aa.6348.4946.523558.214_1 using setiathome version 4.08
2005-01-29 18:18:08 [---] Insufficient work; requesting more
2005-01-29 18:18:08 [LHC@home] Requesting 22185 seconds of work
2005-01-29 18:18:08 [LHC@home] Sending request to scheduler: http://lhcathome-sched1.cern.ch/scheduler/cgi
2005-01-29 18:19:06 [SETI@home] Result 28mr04aa.6348.4946.523558.214_1 exited with zero status but no 'finished' file
2005-01-29 18:19:06 [SETI@home] If this happens repeatedly you may need to reset the project.
2005-01-29 18:19:06 [SETI@home] Restarting result 28mr04aa.6348.4946.523558.214_1 using setiathome version 4.08
2005-01-29 18:20:07 [---] Insufficient work; requesting more
2005-01-29 18:20:07 [LHC@home] Requesting 22185 seconds of work
2005-01-29 18:20:07 [LHC@home] Sending request to scheduler: http://lhcathome-sched1.cern.ch/scheduler/cgi
2005-01-29 18:21:05 [SETI@home] Result 28mr04aa.6348.4946.523558.214_1 exited with zero status but no 'finished' file
2005-01-29 18:21:05 [SETI@home] If this happens repeatedly you may need to reset the project.
2005-01-29 18:21:05 [SETI@home] Restarting result 28mr04aa.6348.4946.523558.214_1 using setiathome version 4.08
2005-01-29 18:34:55 [---] Insufficient work; requesting more
2005-01-29 18:34:55 [LHC@home] Requesting 22186 seconds of work
2005-01-29 18:34:55 [LHC@home] Sending request to scheduler: http://lhcathome-sched1.cern.ch/scheduler/cgi
2005-01-29 18:35:54 [SETI@home] Result 28mr04aa.6348.4946.523558.214_1 exited with zero status but no 'finished' file
2005-01-29 18:35:54 [SETI@home] If this happens repeatedly you may need to reset the project.
2005-01-29 18:35:55 [SETI@home] Restarting result 28mr04aa.6348.4946.523558.214_1 using setiathome version 4.08
2005-01-29 19:05:55 [SETI@home] Pausing result 28mr04aa.6348.4946.523558.214_1 (left in memory)
2005-01-29 19:05:55 [ProteinPredictorAtHome] Restarting result t0201E_1_8763_1 using mfoldB125 version 4.22
2005-01-29 19:17:46 [SETI@home] Result 28mr04aa.6348.4946.523558.214_1 exited with zero status but no 'finished' file
2005-01-29 19:17:58 [SETI@home] If this happens repeatedly you may need to reset the project.
2005-01-29 19:34:43 [---] Insufficient work; requesting more
2005-01-29 19:34:43 [LHC@home] Requesting 22190 seconds of work
2005-01-29 19:34:43 [LHC@home] Sending request to scheduler: http://lhcathome-sched1.cern.ch/scheduler/cgi
2005-01-29 19:35:42 [ProteinPredictorAtHome] Result t0201E_1_8763_1 exited with zero status but no 'finished' file
2005-01-29 19:35:42 [ProteinPredictorAtHome] If this happens repeatedly you may need to reset the project.
2005-01-29 19:35:42 [ProteinPredictorAtHome] Restarting result t0201E_1_8763_1 using mfoldB125 version 4.22
2005-01-29 19:37:44 [ProteinPredictorAtHome] Result t0201E_1_8763_1 exited with zero status but no 'finished' file
2005-01-29 19:37:44 [ProteinPredictorAtHome] If this happens repeatedly you may need to reset the project.
2005-01-29 19:37:45 [ProteinPredictorAtHome] Restarting result t0201E_1_8763_1 using mfoldB125 version 4.22
2005-01-29 19:38:37 [ProteinPredictorAtHome] Result t0201E_1_8763_1 exited with zero status but no 'finished' file
2005-01-29 19:38:37 [ProteinPredictorAtHome] If this happens repeatedly you may need to reset the project.
2005-01-29 19:38:37 [ProteinPredictorAtHome] Restarting result t0201E_1_8763_1 using mfoldB125 version 4.22
2005-01-29 19:40:00 [ProteinPredictorAtHome] Result t0201E_1_8763_1 exited with zero status but no 'finished' file
2005-01-29 19:40:00 [ProteinPredictorAtHome] If this happens repeatedly you may need to reset the project.
2005-01-29 19:40:02 [ProteinPredictorAtHome] Restarting result t0201E_1_8763_1 using mfoldB125 version 4.22
2005-01-29 19:41:03 [---] Insufficient work; requesting more
2005-01-29 19:41:03 [ProteinPredictorAtHome] Requesting 4262 seconds of work
2005-01-29 19:41:05 [ProteinPredictorAtHome] Sending request to scheduler: http://predictor1.scripps.edu/predictor_cgi/cgi
2005-01-29 19:41:45 [ProteinPredictorAtHome] Result t0201E_1_8763_1 exited with zero status but no 'finished' file
2005-01-29 19:41:45 [ProteinPredictorAtHome] If this happens repeatedly you may need to reset the project.
2005-01-29 19:41:46 [ProteinPredictorAtHome] Restarting result t0201E_1_8763_1 using mfoldB125 version 4.22
2005-01-29 19:41:56 [ProteinPredictorAtHome] Scheduler RPC to http://predictor1.scripps.edu/predictor_cgi/cgi succeeded
2005-01-29 19:41:58 [ProteinPredictorAtHome] Started download of t0201E_1_15881.ini
2005-01-29 19:41:58 [ProteinPredictorAtHome] Started download of t0201E_1_15881.inp
2005-01-29 19:42:00 [ProteinPredictorAtHome] Finished download of t0201E_1_15881.ini
2005-01-29 19:42:00 [ProteinPredictorAtHome] Throughput 876 bytes/sec
2005-01-29 19:42:00 [ProteinPredictorAtHome] Finished download of t0201E_1_15881.inp
2005-01-29 19:42:00 [ProteinPredictorAtHome] Throughput 76 bytes/sec
2005-01-29 19:42:00 [ProteinPredictorAtHome] Started download of t0201E_1_15881.seq
2005-01-29 19:42:00 [ProteinPredictorAtHome] Started download of t0201E_1_15881.res
2005-01-29 19:42:01 [---] Insufficient work; requesting more
2005-01-29 19:42:01 [LHC@home] Requesting 22190 seconds of work
2005-01-29 19:42:01 [LHC@home] Sending request to scheduler: http://lhcathome-sched1.cern.ch/scheduler/cgi
2005-01-29 19:42:08 [ProteinPredictorAtHome] Finished download of t0201E_1_15881.seq
2005-01-29 19:42:08 [ProteinPredictorAtHome] Throughput 265 bytes/sec
2005-01-29 19:42:08 [ProteinPredictorAtHome] Finished download of t0201E_1_15881.res
2005-01-29 19:42:08 [ProteinPredictorAtHome] Throughput 0 bytes/sec
2005-01-29 19:42:23 [LHC@home] Scheduler RPC to http://lhcathome-sched1.cern.ch/scheduler/cgi succeeded
2005-01-29 19:42:23 [LHC@home] Message from server: No work available

Any Ideas? I have seen this machine do Seti in 12hrs in the past.
thanks ahead of time

tony


ID: 75301 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 75304 - Posted: 30 Jan 2005, 1:18:33 UTC

Additionally,

I put in a new hard drive when the reinstallation of the OS occurred. this problem has been seen both prior to and after hard drive replacement.

I am aware that the Seti and PPAH application has a bug which allows accumulation time to increase when the WU is paused. This is a part of it, but a minor one. The Einstein App functions properly when paused.

I really think it a resource issue but don't know how to check it on win98.

I've run CheckIt diagnostics and it can't find any memory or other problems. Norton WinDoctor or other software has found any hardware problems.

ID: 75304 · Report as offensive
wrzwaldo
Avatar

Send message
Joined: 16 Jul 00
Posts: 113
Credit: 1,073,284
RAC: 0
United States
Message 75337 - Posted: 30 Jan 2005, 4:06:31 UTC
Last modified: 30 Jan 2005, 4:16:03 UTC

Do a keyword search for "no finished file". If I remember correctly one of the threads will have an explanation.

Paul's documentation covers this as well. Clicky Thing
ID: 75337 · Report as offensive
Walt Gribben
Volunteer tester

Send message
Joined: 16 May 99
Posts: 353
Credit: 304,016
RAC: 0
United States
Message 75340 - Posted: 30 Jan 2005, 4:29:34 UTC
Last modified: 30 Jan 2005, 4:31:51 UTC

I'd say your system is overloaded. Too much work and not enough memory and/or CPU to do it in. That log shows you're running LHC@home, ProteinPredictorAtHome, SETI@home, Pirates@Home and Einstein@Home. That a lot for a little 256M system. What is your "Leave applications in memory while preempted" preference set to? If yes you're probably taking up a lot of memory with all the suspended apps.

When you get that "exited with zero status but no 'finished' file" message, check the stderr.txt file in the slots/0 directory. It probably says something like "No heartbeat from core client for xx sec - exiting".

I looked at the results from that Win98 machine you have. Its not just the "no heartbeat.... exiting" message that most of the WU's get, but this one, it didn't even start. Error message says "CreateProcess() failed - Not enough storage is available to process this command. (0x8)".
ID: 75340 · Report as offensive
Hans Dorn
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 2262
Credit: 26,448,570
RAC: 0
Germany
Message 75344 - Posted: 30 Jan 2005, 4:53:58 UTC - in response to Message 75340.  


> When you get that "exited with zero status but no 'finished' file" message,
> check the stderr.txt file in the slots/0 directory. It probably says something
> like "No heartbeat from core client for xx sec - exiting".
>

Hi Walt!

What's up with these "no heartbeat" messages. I'm getting these under linux,
on PCs with multiple real CPUs.

Regards Hans
ID: 75344 · Report as offensive
Walt Gribben
Volunteer tester

Send message
Joined: 16 May 99
Posts: 353
Credit: 304,016
RAC: 0
United States
Message 75346 - Posted: 30 Jan 2005, 5:04:48 UTC
Last modified: 30 Jan 2005, 5:15:34 UTC

To answer the question about resource tools, look in System Tools. Click Start->Programs->Accessories->System Tools. Theres Resource Meter, System Information and System Monitor.

The Resource Monitor watches GDI, SYSTEM and USER, and shows how much is in use. Green is good, red means you'll probably have to reboot soon.

System Information is useful to see whats running and the programs started at boot time. If you want to change them, click Tools->System Configuration Utility, and in that switch to the startup tab. Uncheck any programs you don't want started at boot time.

A third party program thats useful - and easier to use then the system monitor - is taskinfo. Shows how much memory each program uses which sysmon doesn't. Its $35, but has a one month free trial.

ID: 75346 · Report as offensive
Walt Gribben
Volunteer tester

Send message
Joined: 16 May 99
Posts: 353
Credit: 304,016
RAC: 0
United States
Message 75360 - Posted: 30 Jan 2005, 6:33:25 UTC - in response to Message 75344.  
Last modified: 30 Jan 2005, 6:34:18 UTC

>
> > When you get that "exited with zero status but no 'finished' file"
> message,
> > check the stderr.txt file in the slots/0 directory. It probably says
> something
> > like "No heartbeat from core client for xx sec - exiting".
> >
>
> Hi Walt!
>
> What's up with these "no heartbeat" messages. I'm getting these under linux,
> on PCs with multiple real CPUs.
>
> Regards Hans
>

Hi Hans, it means that BOINC stopped communicating with the science app.


BOINC sends a heartbeat message out so the science programs know its still alive and kicking. So if the messages stop, its supposed to mean BOINC isn't running anymore (crashed?) and the science programs are also supposed to exit. Thats after they don't get a heartbeat message for 30 seconds. SO they print the "no heartbeat" error and exit. Using exit code of zero to indicate there isn't any error, at least not with the WU.

BOINC sees that the science app exited (zero status) but wasn't finished with the WU (no finished file) so it restarts the WU. And from where it left off, or at least from the last checkpoint.

There might be a problem with BOINCscience app communications, but its not all that serious. Some time is lost in restarting the WU, but its not like it has to start from the beginning each time. After I saw the WU's were completing in around the same time whether or not they got "no heartbeat" messages, I stopped looking into it.

Walt

ID: 75360 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 75393 - Posted: 30 Jan 2005, 11:24:36 UTC

We have a new accumulation of issues. Before project restarts we did not have issues with Predictor@Home. However, my stock suite is now to run:

CPDN
Einstein@Home
Predictor@Home
SETI@Home

Right now I have 14 WU in "red" status (on BOINC View) meaning I have blown past the deadline. Along with that I have been seeing a lot of failures with the Predictor@Home Science Application. Einstein@Home is also having some issues with the deadline (4 of the 14 are Einstein@Home, the rest Predictor@Home).

It is not clear to me, or as far as I know anyone else why we are seeing these issues. I suspect that this is just more of the interaction of the Science Applications that is creating the problems. One of the simplest explanations is that the prediction of "time to complete" is wrong for the two projects coupled with short deadlines.

Note also that to combat this problem I have lowered my queue size to 2 days from 4. This is troubling in that before I did not see this problem with the 4 day queue prior to Predictor@Home going off-line for the update to v 4.x BOINC.

Anyway, I am sure that this is a known problem, it is probably not clear exactly what is going on and from there what is the best fix. We do know that there are issues with the BOINC Manager's ability to properly schedule the next work unit to run based on factors like deadlines. I have had what apears as BOINC Manager allocating time to a SETI@Home WU that it just started over one that it has already started.

ID: 75393 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 75394 - Posted: 30 Jan 2005, 11:54:44 UTC - in response to Message 75301.  

> a seti wu is currently at 94% and has an accumulated time of 33:34:12 with
> 1:51:32 remaining
> an Einstein WU is currently at 7% has taken 4:39.52 with 59:33:58 remaining.
> an Predictor WU is currently at 37% has taken 7:46:14 with 13:16:04
> remaining.
>
OK, I followed ONE of walts suggestions. This meaning I changed ONE thing. That was the "leave in memory" option under General Pref. It now says "NO". Additionally, overnight, the seti WU has been sent in, and the new one reports 00:29:58 cpu time, 14.30% done, 02:59:35 remaining. The Predictor unit says 12:08:34 cpu time, 100.00% done, -- remaining Paused. THe Einstein unit says 07:10:44 cpu time, 0.00% done (I think this is an update problem, on my side), 27:50:04 remaining.

Heres the STDOUT from after that change:

2005-01-29 21:55:06 [SETI@home] Sending request to scheduler: http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi
2005-01-29 21:55:14 [SETI@home] Scheduler RPC to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi succeeded
2005-01-29 21:55:14 [SETI@home] General preferences have been updated
2005-01-29 21:55:14 [---] General prefs: from SETI@home (last modified 2005-01-29 21:54:17)
2005-01-29 21:55:14 [---] General prefs: using your defaults
2005-01-29 21:56:17 [---] Running CPU benchmarks
2005-01-29 21:56:17 [---] Suspending computation and network activity - running CPU benchmarks
2005-01-29 21:57:18 [---] Benchmark results:
2005-01-29 21:57:18 [---] Number of CPUs: 1
2005-01-29 21:57:18 [---] 399 double precision MIPS (Whetstone) per CPU
2005-01-29 21:57:18 [---] 988 integer MIPS (Dhrystone) per CPU
2005-01-29 21:57:18 [---] Finished CPU benchmarks
2005-01-29 21:57:19 [---] Resuming computation and network activity
2005-01-29 22:27:19 [SETI@home] Pausing result 28mr04aa.6348.4946.523558.214_1 (removed from memory)
2005-01-29 22:27:19 [ProteinPredictorAtHome] Restarting result t0201E_1_8763_1 using mfoldB125 version 4.22
2005-01-29 22:42:36 [---] Insufficient work; requesting more
2005-01-29 22:42:36 [LHC@home] Requesting 22202 seconds of work
2005-01-29 22:42:36 [LHC@home] Sending request to scheduler: http://lhcathome-sched1.cern.ch/scheduler/cgi
2005-01-29 22:42:42 [LHC@home] Scheduler RPC to http://lhcathome-sched1.cern.ch/scheduler/cgi succeeded
2005-01-29 22:42:42 [LHC@home] Message from server: No work available
2005-01-29 22:57:22 [Einstein@Home] Restarting result H1_0063.9__0064.0_0.1_T15_Test02_1 using einstein version 4.72
2005-01-29 22:57:22 [ProteinPredictorAtHome] Pausing result t0201E_1_8763_1 (removed from memory)
2005-01-29 23:27:22 [SETI@home] Restarting result 28mr04aa.6348.4946.523558.214_1 using setiathome version 4.08
2005-01-29 23:27:22 [Einstein@Home] Pausing result H1_0063.9__0064.0_0.1_T15_Test02_1 (removed from memory)
2005-01-29 23:42:43 [---] Insufficient work; requesting more
2005-01-29 23:42:43 [LHC@home] Requesting 22206 seconds of work
2005-01-29 23:42:44 [LHC@home] Sending request to scheduler: http://lhcathome-sched1.cern.ch/scheduler/cgi
2005-01-29 23:45:46 [---] Insufficient work; requesting more
2005-01-29 23:45:46 [LHC@home] Requesting 22206 seconds of work
2005-01-29 23:45:46 [LHC@home] Sending request to scheduler: http://lhcathome-sched1.cern.ch/scheduler/cgi
2005-01-29 23:48:55 [SETI@home] Computation for result 28mr04aa.6348.4946.523558.214 finished
2005-01-29 23:48:55 [ProteinPredictorAtHome] Restarting result t0201E_1_8763_1 using mfoldB125 version 4.22
2005-01-29 23:48:56 [SETI@home] Started upload of 28mr04aa.6348.4946.523558.214_1_0
2005-01-29 23:49:45 [SETI@home] Temporarily failed upload of 28mr04aa.6348.4946.523558.214_1_0
2005-01-29 23:49:45 [SETI@home] Backing off 1 minutes and 0 seconds on transfer of file 28mr04aa.6348.4946.523558.214_1_0
2005-01-29 23:50:45 [SETI@home] Started upload of 28mr04aa.6348.4946.523558.214_1_0
2005-01-29 23:51:33 [SETI@home] Temporarily failed upload of 28mr04aa.6348.4946.523558.214_1_0
2005-01-29 23:51:33 [SETI@home] Backing off 1 minutes and 0 seconds on transfer of file 28mr04aa.6348.4946.523558.214_1_0
2005-01-29 23:52:34 [SETI@home] Started upload of 28mr04aa.6348.4946.523558.214_1_0
2005-01-29 23:52:44 [SETI@home] Finished upload of 28mr04aa.6348.4946.523558.214_1_0
2005-01-29 23:52:44 [SETI@home] Throughput 3492 bytes/sec
2005-01-29 23:53:02 [---] Insufficient work; requesting more
2005-01-29 23:53:02 [LHC@home] Requesting 22207 seconds of work
2005-01-29 23:53:02 [LHC@home] Sending request to scheduler: http://lhcathome-sched1.cern.ch/scheduler/cgi
2005-01-29 23:53:06 [LHC@home] Scheduler RPC to http://lhcathome-sched1.cern.ch/scheduler/cgi succeeded
2005-01-29 23:53:06 [LHC@home] Message from server: No work available
2005-01-29 23:53:35 [SETI@home] Sending request to scheduler: http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi
2005-01-29 23:53:40 [SETI@home] Scheduler RPC to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi succeeded
2005-01-30 00:18:57 [Einstein@Home] Restarting result H1_0063.9__0064.0_0.1_T15_Test02_1 using einstein version 4.72
2005-01-30 00:18:57 [ProteinPredictorAtHome] Pausing result t0201E_1_8763_1 (removed from memory)
2005-01-30 00:48:58 [Einstein@Home] Pausing result H1_0063.9__0064.0_0.1_T15_Test02_1 (removed from memory)
2005-01-30 00:48:58 [SETI@home] Starting result 28mr04aa.6348.7154.928404.210_0 using setiathome version 4.08
2005-01-30 00:53:07 [---] Insufficient work; requesting more
2005-01-30 00:53:07 [LHC@home] Requesting 22211 seconds of work
2005-01-30 00:53:08 [LHC@home] Sending request to scheduler: http://lhcathome-sched1.cern.ch/scheduler/cgi
2005-01-30 00:58:54 [---] Insufficient work; requesting more
2005-01-30 00:58:54 [LHC@home] Requesting 22211 seconds of work
2005-01-30 00:58:55 [LHC@home] Sending request to scheduler: http://lhcathome-sched1.cern.ch/scheduler/cgi
2005-01-30 01:19:01 [ProteinPredictorAtHome] Restarting result t0201E_1_8763_1 using mfoldB125 version 4.22
2005-01-30 01:19:01 [SETI@home] Pausing result 28mr04aa.6348.7154.928404.210_0 (removed from memory)
2005-01-30 01:23:08 [ProteinPredictorAtHome] Result t0201E_1_8763_1 exited with zero status but no 'finished' file
2005-01-30 01:23:08 [ProteinPredictorAtHome] If this happens repeatedly you may need to reset the project.
2005-01-30 01:23:09 [ProteinPredictorAtHome] Restarting result t0201E_1_8763_1 using mfoldB125 version 4.22
2005-01-30 01:44:44 [---] Insufficient work; requesting more
2005-01-30 01:44:44 [LHC@home] Requesting 22214 seconds of work
2005-01-30 01:44:44 [LHC@home] Sending request to scheduler: http://lhcathome-sched1.cern.ch/scheduler/cgi
2005-01-30 01:45:43 [ProteinPredictorAtHome] Result t0201E_1_8763_1 exited with zero status but no 'finished' file
2005-01-30 01:45:43 [ProteinPredictorAtHome] If this happens repeatedly you may need to reset the project.
2005-01-30 01:45:44 [ProteinPredictorAtHome] Restarting result t0201E_1_8763_1 using mfoldB125 version 4.22
2005-01-30 01:47:41 [ProteinPredictorAtHome] Result t0201E_1_8763_1 exited with zero status but no 'finished' file
2005-01-30 01:47:41 [ProteinPredictorAtHome] If this happens repeatedly you may need to reset the project.
2005-01-30 01:47:41 [ProteinPredictorAtHome] Restarting result t0201E_1_8763_1 using mfoldB125 version 4.22
2005-01-30 01:49:39 [ProteinPredictorAtHome] Result t0201E_1_8763_1 exited with zero status but no 'finished' file
2005-01-30 01:49:39 [ProteinPredictorAtHome] If this happens repeatedly you may need to reset the project.
2005-01-30 01:49:39 [ProteinPredictorAtHome] Restarting result t0201E_1_8763_1 using mfoldB125 version 4.22
2005-01-30 01:51:37 [ProteinPredictorAtHome] Result t0201E_1_8763_1 exited with zero status but no 'finished' file
2005-01-30 01:51:37 [ProteinPredictorAtHome] If this happens repeatedly you may need to reset the project.
2005-01-30 01:51:37 [ProteinPredictorAtHome] Restarting result t0201E_1_8763_1 using mfoldB125 version 4.22
2005-01-30 01:53:38 [ProteinPredictorAtHome] Result t0201E_1_8763_1 exited with zero status but no 'finished' file
2005-01-30 01:53:38 [ProteinPredictorAtHome] If this happens repeatedly you may need to reset the project.
2005-01-30 01:53:38 [ProteinPredictorAtHome] Restarting result t0201E_1_8763_1 using mfoldB125 version 4.22
2005-01-30 01:56:20 [ProteinPredictorAtHome] Result t0201E_1_8763_1 exited with zero status but no 'finished' file
2005-01-30 01:56:20 [ProteinPredictorAtHome] If this happens repeatedly you may need to reset the project.
2005-01-30 01:56:20 [ProteinPredictorAtHome] Restarting result t0201E_1_8763_1 using mfoldB125 version 4.22
2005-01-30 02:56:20 [Einstein@Home] Restarting result H1_0063.9__0064.0_0.1_T15_Test02_1 using einstein version 4.72
2005-01-30 02:56:20 [ProteinPredictorAtHome] Pausing result t0201E_1_8763_1 (removed from memory)
2005-01-30 04:38:09 [---] Insufficient work; requesting more
2005-01-30 06:17:24 [---] Insufficient work; requesting more

Now this looks better, however I'm still seeing the message about the Predictor unit. I have a dial up connection and the STDERR file which covers the period of both this and the previous posting of the STDOUT. shows no errors other than when it can't get a connection (not signed on) and looks like this:

2005-01-29 12:47:55 [LHC@home] Deferring communication with project for 8 hours, 39 minutes, and 10 seconds
2005-01-29 12:47:55 [Pirates@Home] Deferring communication with project for 12 hours, 34 minutes, and 11 seconds
2005-01-29 13:47:55 [LHC@home] Deferring communication with project for 7 hours, 39 minutes, and 10 seconds
2005-01-29 13:47:55 [Pirates@Home] Deferring communication with project for 11 hours, 34 minutes, and 11 seconds
2005-01-29 14:47:55 [LHC@home] Deferring communication with project for 6 hours, 39 minutes, and 10 seconds
2005-01-29 14:47:55 [Pirates@Home] Deferring communication with project for 10 hours, 34 minutes, and 11 seconds
2005-01-29 15:47:55 [LHC@home] Deferring communication with project for 5 hours, 39 minutes, and 10 seconds
2005-01-29 15:47:55 [Pirates@Home] Deferring communication with project for 9 hours, 34 minutes, and 11 seconds
2005-01-29 16:47:55 [LHC@home] Deferring communication with project for 4 hours, 39 minutes, and 10 seconds
2005-01-29 16:47:55 [Pirates@Home] Deferring communication with project for 8 hours, 34 minutes, and 11 seconds
2005-01-29 17:16:07 [LHC@home] No work from project
2005-01-29 17:16:07 [LHC@home] Deferring communication with project for 1 hours, 0 minutes, and 0 seconds
2005-01-29 17:47:55 [Pirates@Home] Deferring communication with project for 7 hours, 34 minutes, and 11 seconds
2005-01-29 18:17:07 [---] Can't resolve hostname lhcathome-sched1.cern.ch (host name not found)
2005-01-29 18:17:07 [LHC@home] scheduler init_op_project to http://lhcathome-sched1.cern.ch/scheduler/cgi failed, error -113
2005-01-29 18:17:07 [LHC@home] init_op_project failed, error -113
2005-01-29 18:17:08 [LHC@home] Deferring communication with project for 59 seconds
2005-01-29 18:19:06 [---] Can't resolve hostname lhcathome-sched1.cern.ch (host name not found)
2005-01-29 18:19:06 [LHC@home] scheduler init_op_project to http://lhcathome-sched1.cern.ch/scheduler/cgi failed, error -113
2005-01-29 18:19:06 [LHC@home] init_op_project failed, error -113
2005-01-29 18:19:06 [LHC@home] Deferring communication with project for 1 minutes and 0 seconds
2005-01-29 18:21:05 [---] Can't resolve hostname lhcathome-sched1.cern.ch (host name not found)
2005-01-29 18:21:05 [LHC@home] scheduler init_op_project to http://lhcathome-sched1.cern.ch/scheduler/cgi failed, error -113
2005-01-29 18:21:05 [LHC@home] init_op_project failed, error -113
2005-01-29 18:21:05 [LHC@home] Deferring communication with project for 13 minutes and 49 seconds
2005-01-29 18:35:54 [---] Can't resolve hostname lhcathome-sched1.cern.ch (host name not found)
2005-01-29 18:35:54 [LHC@home] scheduler init_op_project to http://lhcathome-sched1.cern.ch/scheduler/cgi failed, error -113
2005-01-29 18:35:54 [LHC@home] init_op_project failed, error -113
2005-01-29 18:35:54 [LHC@home] Deferring communication with project for 58 minutes and 48 seconds
2005-01-29 18:47:55 [Pirates@Home] Deferring communication with project for 6 hours, 34 minutes, and 11 seconds
2005-01-29 19:35:42 [---] Can't resolve hostname lhcathome-sched1.cern.ch (host name not found)
2005-01-29 19:35:42 [LHC@home] scheduler init_op_project to http://lhcathome-sched1.cern.ch/scheduler/cgi failed, error -113
2005-01-29 19:35:42 [LHC@home] init_op_project failed, error -113
2005-01-29 19:35:42 [LHC@home] Deferring communication with project for 1 minutes and 0 seconds
2005-01-29 19:37:44 [---] Can't resolve hostname lhcathome.cern.ch (host name not found)
2005-01-29 19:37:44 [LHC@home] Couldn't read master page for LHC@home: error -113
2005-01-29 19:37:44 [LHC@home] Master file fetch failed
2005-01-29 19:37:44 [LHC@home] Deferring communication with project for 1 minutes and 0 seconds
2005-01-29 19:40:00 [---] Can't resolve hostname lhcathome.cern.ch (host name not found)
2005-01-29 19:40:00 [LHC@home] Couldn't read master page for LHC@home: error -113
2005-01-29 19:40:00 [LHC@home] Master file fetch failed
2005-01-29 19:40:00 [LHC@home] Deferring communication with project for 1 minutes and 0 seconds
2005-01-29 19:42:23 [LHC@home] No work from project
2005-01-29 19:42:23 [LHC@home] Deferring communication with project for 1 hours, 0 minutes, and 0 seconds
2005-01-29 19:47:55 [Pirates@Home] Deferring communication with project for 5 hours, 34 minutes, and 11 seconds
2005-01-29 20:42:29 [LHC@home] No work from project
2005-01-29 20:42:29 [LHC@home] Deferring communication with project for 1 hours, 0 minutes, and 0 seconds
2005-01-29 20:47:55 [Pirates@Home] Deferring communication with project for 4 hours, 34 minutes, and 11 seconds
2005-01-29 21:42:35 [LHC@home] No work from project
2005-01-29 21:42:35 [LHC@home] Deferring communication with project for 1 hours, 0 minutes, and 0 seconds
2005-01-29 21:47:55 [Pirates@Home] Deferring communication with project for 3 hours, 34 minutes, and 11 seconds
2005-01-29 22:42:42 [LHC@home] No work from project
2005-01-29 22:42:42 [LHC@home] Deferring communication with project for 1 hours, 0 minutes, and 0 seconds
2005-01-29 22:47:55 [Pirates@Home] Deferring communication with project for 2 hours, 34 minutes, and 11 seconds
2005-01-29 23:43:37 [LHC@home] Scheduler RPC to http://lhcathome-sched1.cern.ch/scheduler/cgi failed
2005-01-29 23:43:37 [LHC@home] No schedulers responded
2005-01-29 23:43:37 [LHC@home] Deferring communication with project for 2 minutes and 8 seconds
2005-01-29 23:46:35 [LHC@home] Scheduler RPC to http://lhcathome-sched1.cern.ch/scheduler/cgi failed
2005-01-29 23:46:35 [LHC@home] No schedulers responded
2005-01-29 23:46:35 [LHC@home] Deferring communication with project for 6 minutes and 26 seconds
2005-01-29 23:47:55 [Pirates@Home] Deferring communication with project for 1 hours, 34 minutes, and 11 seconds
2005-01-29 23:53:06 [LHC@home] No work from project
2005-01-29 23:53:06 [LHC@home] Deferring communication with project for 1 hours, 0 minutes, and 0 seconds
2005-01-30 00:47:55 [Pirates@Home] Deferring communication with project for 34 minutes and 11 seconds
2005-01-30 00:54:02 [LHC@home] Scheduler RPC to http://lhcathome-sched1.cern.ch/scheduler/cgi failed
2005-01-30 00:54:02 [LHC@home] No schedulers responded
2005-01-30 00:54:02 [LHC@home] Deferring communication with project for 4 minutes and 51 seconds
2005-01-30 00:59:43 [LHC@home] Scheduler RPC to http://lhcathome-sched1.cern.ch/scheduler/cgi failed
2005-01-30 00:59:43 [LHC@home] No schedulers responded
2005-01-30 00:59:43 [LHC@home] Deferring communication with project for 45 minutes and 0 seconds
2005-01-30 01:23:08 [---] Can't resolve hostname pirates.vassar.edu (host name not found)
2005-01-30 01:23:08 [Pirates@Home] Couldn't read master page for Pirates@Home: error -113
2005-01-30 01:23:08 [Pirates@Home] Master file fetch failed
2005-01-30 01:23:08 [Pirates@Home] Deferring communication with project for 3 days, 8 hours, 8 minutes, and 20 seconds
2005-01-30 01:45:43 [---] Can't resolve hostname lhcathome-sched1.cern.ch (host name not found)
2005-01-30 01:45:43 [LHC@home] scheduler init_op_project to http://lhcathome-sched1.cern.ch/scheduler/cgi failed, error -113
2005-01-30 01:45:43 [LHC@home] init_op_project failed, error -113
2005-01-30 01:45:43 [LHC@home] Deferring communication with project for 1 minutes and 0 seconds
2005-01-30 01:47:41 [---] Can't resolve hostname lhcathome.cern.ch (host name not found)
2005-01-30 01:47:41 [LHC@home] Couldn't read master page for LHC@home: error -113
2005-01-30 01:47:41 [LHC@home] Master file fetch failed
2005-01-30 01:47:41 [LHC@home] Deferring communication with project for 1 minutes and 0 seconds
2005-01-30 01:49:39 [---] Can't resolve hostname lhcathome.cern.ch (host name not found)
2005-01-30 01:49:39 [LHC@home] Couldn't read master page for LHC@home: error -113
2005-01-30 01:49:39 [LHC@home] Master file fetch failed
2005-01-30 01:49:39 [LHC@home] Deferring communication with project for 1 minutes and 0 seconds
2005-01-30 01:51:37 [---] Can't resolve hostname lhcathome.cern.ch (host name not found)
2005-01-30 01:51:37 [LHC@home] Couldn't read master page for LHC@home: error -113
2005-01-30 01:51:37 [LHC@home] Master file fetch failed
2005-01-30 01:51:37 [LHC@home] Deferring communication with project for 1 minutes and 0 seconds
2005-01-30 01:53:38 [---] Can't resolve hostname lhcathome.cern.ch (host name not found)
2005-01-30 01:53:38 [LHC@home] Couldn't read master page for LHC@home: error -113
2005-01-30 01:53:38 [LHC@home] Master file fetch failed
2005-01-30 01:53:38 [LHC@home] Deferring communication with project for 1 minutes and 43 seconds
2005-01-30 01:56:20 [---] Can't resolve hostname lhcathome.cern.ch (host name not found)
2005-01-30 01:56:20 [LHC@home] Couldn't read master page for LHC@home: error -113
2005-01-30 01:56:20 [LHC@home] Master file fetch failed
2005-01-30 01:56:20 [LHC@home] Deferring communication with project for 5 minutes and 25 seconds
2005-01-30 02:02:43 [---] Can't resolve hostname lhcathome.cern.ch (host name not found)
2005-01-30 02:02:43 [LHC@home] Couldn't read master page for LHC@home: error -113
2005-01-30 02:02:43 [LHC@home] Master file fetch failed
2005-01-30 02:02:43 [LHC@home] Deferring communication with project for 3 minutes and 17 seconds
2005-01-30 02:06:59 [---] Can't resolve hostname lhcathome.cern.ch (host name not found)
2005-01-30 02:06:59 [LHC@home] Couldn't read master page for LHC@home: error -113
2005-01-30 02:06:59 [LHC@home] Master file fetch failed
2005-01-30 02:06:59 [LHC@home] Deferring communication with project for 27 minutes and 39 seconds
2005-01-30 02:23:08 [Pirates@Home] Deferring communication with project for 3 days, 7 hours, 8 minutes, and 20 seconds
2005-01-30 02:35:37 [---] Can't resolve hostname lhcathome.cern.ch (host name not found)
2005-01-30 02:35:37 [LHC@home] Couldn't read master page for LHC@home: error -113
2005-01-30 02:35:37 [LHC@home] Master file fetch failed
2005-01-30 02:35:37 [LHC@home] Deferring communication with project for 2 hours, 2 minutes, and 31 seconds
2005-01-30 03:31:53 [Pirates@Home] Deferring communication with project for 3 days, 5 hours, 59 minutes, and 35 seconds
2005-01-30 03:35:54 [LHC@home] Deferring communication with project for 1 hours, 2 minutes, and 14 seconds
2005-01-30 04:31:53 [Pirates@Home] Deferring communication with project for 3 days, 4 hours, 59 minutes, and 35 seconds
2005-01-30 04:35:54 [LHC@home] Deferring communication with project for 2 minutes and 14 seconds
2005-01-30 04:39:08 [---] Can't resolve hostname lhcathome.cern.ch (host name not found)
2005-01-30 04:39:08 [LHC@home] Couldn't read master page for LHC@home: error -113
2005-01-30 04:39:08 [LHC@home] Master file fetch failed
2005-01-30 04:39:08 [LHC@home] Deferring communication with project for 1 hours, 38 minutes, and 15 seconds
2005-01-30 05:31:53 [Pirates@Home] Deferring communication with project for 3 days, 3 hours, 59 minutes, and 35 seconds
2005-01-30 05:39:08 [LHC@home] Deferring communication with project for 38 minutes and 15 seconds
2005-01-30 06:18:24 [---] Can't resolve hostname lhcathome.cern.ch (host name not found)
2005-01-30 06:18:24 [LHC@home] Couldn't read master page for LHC@home: error -113
2005-01-30 06:18:24 [LHC@home] Master file fetch failed
2005-01-30 06:18:24 [LHC@home] Deferring communication with project for 10 hours, 47 minutes, and 45 seconds
2005-01-30 06:31:53 [Pirates@Home] Deferring communication with project for 3 days, 2 hours, 59 minutes, and 35 seconds

I believe I'll let these existing WU's get finished and sent in prior to making any other changes, to see if things improve.

Thanks

tony

ID: 75394 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 75395 - Posted: 30 Jan 2005, 12:01:53 UTC - in response to Message 75340.  
Last modified: 30 Jan 2005, 12:06:36 UTC

> When you get that "exited with zero status but no 'finished' file" message,
> check the stderr.txt file in the slots/0 directory. It probably says something
> like "No heartbeat from core client for xx sec - exiting".
>
Hey, I didn't even know about the "Slots" directory. You're right. What are the Slots?

Here's what it says:
No heartbeat from core client for 34.000000 sec - exiting
Resuming computation at 1142/64800/77400
No heartbeat from core client for 31.030001 sec - exiting
Resuming computation at 1470/82080/89730
No heartbeat from core client for 30.970001 sec - exiting
Resuming computation at 1470/82080/89730
No heartbeat from core client for 30.980000 sec - exiting
Resuming computation at 1470/82080/89730
No heartbeat from core client for 30.980000 sec - exiting
Resuming computation at 1470/82080/89730
No heartbeat from core client for 30.539999 sec - exiting
Resuming computation at 1913/101970/102870
No heartbeat from core client for 30.980000 sec - exiting
Resuming computation at 1913/101970/104130
No heartbeat from core client for 30.150002 sec - exiting
Resuming computation at 1913/101970/104130
No heartbeat from core client for 31.030001 sec - exiting
Resuming computation at 1913/101970/104130
No heartbeat from core client for 30.969999 sec - exiting
Resuming computation at 2248/113490/121320
No heartbeat from core client for 30.980000 sec - exiting
Resuming computation at 2720/130410/133290
No heartbeat from core client for 30.980000 sec - exiting
Resuming computation at 2720/130410/133290
No heartbeat from core client for 30.970001 sec - exiting
Resuming computation at 2720/130410/139050
No heartbeat from core client for 30.050001 sec - exiting
Resuming computation at 2720/130410/139050
No heartbeat from core client for 30.100000 sec - exiting
Resuming computation at 2720/130410/139050
No heartbeat from core client for 30.379999 sec - exiting
Resuming computation at 4484/194040/196470
No heartbeat from core client for 30.039999 sec - exiting
Resuming computation at 4722/204390/215640
Resuming computation at 5121/225720/226980

Also, the system as a whole seems to be responding faster.

Thanks Walt


ID: 75395 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 75399 - Posted: 30 Jan 2005, 12:14:55 UTC

OK, It's been a few minutes and the Einstein WU numbers haven't changed. It still says: 07:10:44 cpu time, 0.00% done, 27:50:04 RUNNING. This is strange. It was the only app working properly (excluding long run times), now it's locked up.
ID: 75399 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 75418 - Posted: 30 Jan 2005, 14:54:00 UTC

I shutdown/restarted computer. Now, Einstein readings are:


7:16:43(and counting), 8.76% done, 75:46:22remaining (and dropping fast, now down to 75:34:19).
ID: 75418 · Report as offensive
7822531

Send message
Joined: 3 Apr 99
Posts: 820
Credit: 692
RAC: 0
Message 76225 - Posted: 3 Feb 2005, 7:41:35 UTC
Last modified: 3 Feb 2005, 7:44:25 UTC

It seems that I've also been hit by the Result exited with zero status but no 'finished' file error. Host info is here - PowerBook3,5 G4@867MHz 768MB OS X.2.8.

Yes, it was running just dandy under Jaguar for about five hours...

I'm rechecking permissions and all that junk, but I already ruled that out as being the culprit... Would a host changing its IP@ affect BOINC? .o0(My guess is that it doesn't)
ID: 76225 · Report as offensive
Hans Dorn
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 2262
Credit: 26,448,570
RAC: 0
Germany
Message 76337 - Posted: 4 Feb 2005, 3:45:22 UTC - in response to Message 75360.  


>
> Hi Hans, it means that BOINC stopped communicating with the science app.
>
>

(Snip)

>
> There might be a problem with BOINCscience app communications, but its not all
> that serious. Some time is lost in restarting the WU, but its not like it has
> to start from the beginning each time. After I saw the WU's were completing
> in around the same time whether or not they got "no heartbeat" messages, I
> stopped looking into it.
>
> Walt
>

Yes, at the moment it's only a minor nuisance.
But things get really messy if you run multiple projects and a lot of switching between applications occurs.

This will get urgent when all the classic users join in and you'll have to attach to other projects, too.

Regards Hans

ID: 76337 · Report as offensive
Walt Gribben
Volunteer tester

Send message
Joined: 16 May 99
Posts: 353
Credit: 304,016
RAC: 0
United States
Message 76390 - Posted: 4 Feb 2005, 7:17:49 UTC - in response to Message 76337.  


> Yes, at the moment it's only a minor nuisance.
> But things get really messy if you run multiple projects and a lot of
> switching between applications occurs.
>
> This will get urgent when all the classic users join in and you'll have to
> attach to other projects, too.
>
> Regards Hans

Switching projects will likely make the problem worse, if people overload their machines. At least on Windows.

One instance I've found that will always cause a "no finished file", at least on my systems.

Every time my DSL line drops, DNS requests locks up the application until the request times out. Its not a BOINC problem, its TCPIP - happens with every application. Timeout period is 40 seconds, which is 10 seconds longer than the seti app. Set too many DNS servers I guess. I could "fix" it by running my own caching DNS server but it doesn't happen that often.


ID: 76390 · Report as offensive
Hans Dorn
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 2262
Credit: 26,448,570
RAC: 0
Germany
Message 76391 - Posted: 4 Feb 2005, 7:29:32 UTC - in response to Message 76390.  

>
> Every time my DSL line drops, DNS requests locks up the application until the
> request times out. Its not a BOINC problem, its TCPIP - happens with every
> application. Timeout period is 40 seconds, which is 10 seconds longer than the
> seti app. Set too many DNS servers I guess. I could "fix" it by running my
> own caching DNS server but it doesn't happen that often.
>

Thanks for the info. I'll try upping the timeout to 60 seconds, and see if that helps.

Regards Hans

ID: 76391 · Report as offensive

Message boards : Number crunching : Help. SETI@home - 2005-01-29 18:19:06 - Result 28mr04aa.6348.4946.523558.214_1 exited with zero status but no 'finished' file


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.