Preempted/Paused Projects keep running (rather slowly creeping)

Questions and Answers : Unix/Linux : Preempted/Paused Projects keep running (rather slowly creeping)
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile FalconFly
Avatar

Send message
Joined: 5 Oct 99
Posts: 394
Credit: 18,053,892
RAC: 0
Germany
Message 32238 - Posted: 3 Oct 2004, 0:16:07 UTC
Last modified: 3 Oct 2004, 10:11:15 UTC

I've noticed this first on one Win98 machine, but until now, 3 Linux machines have started to exhibit exactly the same Problem as well :

With several Projects attached, the primary (Active, Status "running") Project works normal, but one Preempted/Paused Project keeps taking 50% CPU time at the same time (Single CPU System).

The result is reduced performance for the currently active Project (20-30% lower), while the "Paused" Project creeps along at maybe 10% of nominal performance, despite taking 50% cpu time according progress numbers for CPU Time.

I have no clue as to why this happens yet, but I could swear this was not observed last week (or anytime earlier than that).

So far, it is either the CPDN Client that keeps creeping along in "Paused" mode (observed on Win98), or the SETI Client (Linux) respectively.

I'm monitoring my entire BOINC Network (8 Systems as of now) via BOINCview V0.82d and GUI_RPC.
I've changed the Update period from 20 Seconds to 60 Seconds (just to be safe), but it made no difference.

I've disabled "Leave Applications in Memory", but so far that does not seem to make any difference either. Technically, BOINC does not seem to reliably shutdown/preempt after timeslicing anymore, causing the Clients to stay 'alive' despite being assumed "Paused".

This odd bug naturally hogs down overall System performance, and results in significantly increased CPU time needed to complete any given Work.

So far, I haven't observed the LHC Client do this (all but one System have 3 Projects attached), but since it seems to get worse every day (I'm catching more Systems exhibiting this Bug), it may be only a question of time.

Performance would be hogged down completely if a Single CPU was to work primarily on the Active Project, while dragging along 2 other "Paused" Projects at the same time :p

As odd as this is, so far only 4 out of 8 Systems are affected, while for all of them actually nothing changed, most weren't even restarted since several weeks.

---- edit ----

System No.5 has just been witnessed exhibiting the same Bug.

This time it's a Linux Dual AthlonMP System, and while 2 LHC Clients are running, (strangely) only one of the 2 Paused SETI Clients is creeping along. The other Paused SETI Client behaves normal.
From the Host Messages, I can confirm that both Paused SETI Applications were (supposedly) properly removed from Memory by BOINC. Seems that actually applied to only one of them, the other is still taking 50% CPU time and is being dragged along :p

This is getting more and more odd every day :(

To make things worse, the Problem keeps popping up, then disappearing for a while per System.
I can't reproduce it by forcing any settings, nor reliably say if it might be the combination of Projects timeslicing in a certain sequence.

All I can say until now, is that disabling the "Leave Applications in Memory" setting did not solve the Problem.
___________________________________________
<p>Scientific Network : 36200 MHz «» 8204 MB «» 815.0 GB </p>
ID: 32238 · Report as offensive
Profile FalconFly
Avatar

Send message
Joined: 5 Oct 99
Posts: 394
Credit: 18,053,892
RAC: 0
Germany
Message 39238 - Posted: 22 Oct 2004, 21:08:57 UTC

One more addition :

I have found that momentarily switching the BOINC Client's Run Mode to "Suspend", then back to normal (Always Run) solves the Problem.

Haven't had enough time to see if that permanently fixes the Problem yet, however, so it may or may not re-appear on the affected Systems.

Last Observation was with the latest BOINC V4.13 and Linux SETI 4.02, where LHC was supposed to be running, but did actually get only 1-2% CPU time, whereas SETI (Paused) got almost all of the CPU time despite being Paused.
ID: 39238 · Report as offensive
Profile FalconFly
Avatar

Send message
Joined: 5 Oct 99
Posts: 394
Credit: 18,053,892
RAC: 0
Germany
Message 39429 - Posted: 23 Oct 2004, 15:23:02 UTC - in response to Message 39238.  

Update :

I found I forgot to set the Access Rights of the freshly copied BOINC Binaries.

Owner/Group : nobody / nobody
(due to the Files being copied from a Windows System via Samba SMB)
Chmod : standard restrictions

I changed it into root:root and chmodd'ed the BOINC Binary to 777.
Let's see if that helps :p


ID: 39429 · Report as offensive
Profile Trane Francks

Send message
Joined: 18 Jun 99
Posts: 221
Credit: 122,319
RAC: 0
Japan
Message 40506 - Posted: 27 Oct 2004, 5:15:24 UTC - in response to Message 39429.  

> I changed it into root:root and chmodd'ed the BOINC Binary to 777.
> Let's see if that helps :p

I doubt that'll make any difference; moreover, I believe this problem also affects the maximum memory usage exceeded messages/WU errors.

ID: 40506 · Report as offensive

Questions and Answers : Unix/Linux : Preempted/Paused Projects keep running (rather slowly creeping)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.