Fiber channel woes, Chicken App, etc. (May 21 2007)

Message boards : Technical News : Fiber channel woes, Chicken App, etc. (May 21 2007)
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

AuthorMessage
Profile Xaak

Send message
Joined: 22 May 99
Posts: 32
Credit: 22,636,357
RAC: 0
United States
Message 573578 - Posted: 22 May 2007, 6:52:49 UTC - in response to Message 573524.  


...
XAAK -- Judging by your involvment in so many other projects, you know more than most that there are other projects to compile credit with. We each have the right to silently and non-apologetically reapportion our BOINC clients to those projects whose 'management styles' and 'scientific goals' better suit us. It's that simple.
...



I've already moved all of my resources to other projects, and have chosen not to make a donation this year too. Though I tend not to do things silently. Silence is often taken as apathy.
XaaK


ID: 573578 · Report as offensive
Profile Dave Rave

Send message
Joined: 18 Mar 00
Posts: 23
Credit: 3,083,330
RAC: 0
Australia
Message 573585 - Posted: 22 May 2007, 7:09:34 UTC

I'm having to do the app-info rename to get things going

and last night i got some work units
gone already

but i also get the error

Reason: no work from project

yet the status page shows 165k results to send out as of an hour ago
aaah there it goes.
when it finally gets the rigth answer and re-downloads the files (again, sigh) then ti gets workunits.
strange errors coming through
ID: 573585 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19377
Credit: 40,757,560
RAC: 67
United Kingdom
Message 573589 - Posted: 22 May 2007, 7:19:22 UTC - in response to Message 573585.  

I'm having to do the app-info rename to get things going

and last night i got some work units
gone already

but i also get the error

Reason: no work from project

yet the status page shows 165k results to send out as of an hour ago
aaah there it goes.
when it finally gets the rigth answer and re-downloads the files (again, sigh) then ti gets workunits.
strange errors coming through

I'm getting the no work msg also, along with int server err's, for the last three hours. This is my early morning check.

Andy
ID: 573589 · Report as offensive
Morris
Volunteer tester

Send message
Joined: 11 Sep 01
Posts: 57
Credit: 9,077,302
RAC: 29
Italy
Message 573594 - Posted: 22 May 2007, 7:56:32 UTC


I'm getting the no work msg also, along with int server err's, for the last three hours. This is my early morning check.

Andy


Same here..
tried with optimized (chicken) app, with or without app_info.xml, installing as a service or not, with boinc version 5.8.16 and/or 5.4.9 but that is of no help, all that i got from berkeley is

Http: Internal server error

or

no work from project


M.
ID: 573594 · Report as offensive
Profile QuietIce

Send message
Joined: 21 Jul 06
Posts: 5
Credit: 24,098,658
RAC: 0
United States
Message 573595 - Posted: 22 May 2007, 7:57:11 UTC
Last modified: 22 May 2007, 7:57:36 UTC

Just an FYI ...

Neither the normal apps nor the KWSN apps are able to get work at this time. One has the HTTP error, the other shows no error at all it simply says, "No work from project" ...
ID: 573595 · Report as offensive
Compukatt
Avatar

Send message
Joined: 5 Oct 99
Posts: 26
Credit: 27,325,826
RAC: 13
New Zealand
Message 573597 - Posted: 22 May 2007, 8:02:24 UTC - in response to Message 573574.  



One of the largest problems with testing BOINC releases is a matter of scale some problems just don't show up in a project a few thousand become much more apparent in a project of a few hundred thousand. The other problem is that BOINC releases are not designed to be incremental. An upgrade that fixes one bug often includes new ones elsewhere in the code. They are also not designed to be reversible. Database changes don't often go away quietly. At any rate, a code rollback wasn't going to work because it would negate the round-robin DNS scheme for our feeders and schedulers and we'd be back where we were on Friday, with most connection attempts failing. David checked in the final fix for that problem tonight, but I'm not going to change the server without getting in a few hours of sleep. My alarm clock is set for 5.5 hours from now. When I finish this message, I'm going to bed.

And with Matt gone, SETI's operations staff is essentially me and Jeff. Jeff has a real job, which means he doesn't work 24 hours a day. Lynn would also kill him if he tried. I'm a scientist, so I'm expected to work until I drop. After I drop I work in a reclining position. But I've got a proposal due on campus on Thursday, so I can't spend all my working hours watching the server logs. (I do, and have had two windows open on the feeder logs which I have been glancing at. Right now each system is handling about 10 results a second.)

Regarding censorship here. Please remember that most of the moderators are not university employees and they are human. Complain to the moderators list (setimods at ssl.berkeley.edu) or to me (korpela at ssl.berkeley.edu, warning: very aggressive spam filter) with a link to the posts in question and an explanation of what was meant. Under normal circumstances, moderation decisions can be overturned, or agreement can be reached about permissible language. Often times the problem can be including too much of a post which was deleted for a reason or withdrawn by the original poster with a request that quotes also be deleted.

Good night. 5h15 before the alarm goes off.

--

Eric


Sure, sometimes it takes a while to upload and download wus but they all get there eventually.

Delays and outages are a small price for me to pay for being able to take part and contribute to the biggest computing task ever undertaken. I am truly amazed at the speed with which the SETI Team (Eric, Jeff, Matt et al) can progress from the onset of a major disaster (multiple hardware and software issues) to making code and hardware changes to get 1.5 million computers (or maybe more) working in harmony again. I doubt that there are many teams that could rectify problems of this magnitude in weeks, let alone days. My hat's off to you guys!

My hat is also off to the volunteers on these boards who have provided ideas and tested differing configuration varieties in order to allow the majority of us to resume 'normal' crunching: idefix, Chicken, geek and many others I can't name, "Thanks!"

I'm with SETI for the long run, whether there are wus or not. Whether there are hardware and software issues, or not. Whether my machines are crunching, or not.
Overall the downtime is extremely small and I gain a lot of satisfaction seeing the amount of science my machines are contributing to.

end rave lol
Bill

Bill
Auckland, NZ
ID: 573597 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 573598 - Posted: 22 May 2007, 8:04:00 UTC - in response to Message 573589.  


I'm getting the no work msg also, along with int server err's, for the last three hours. This is my early morning check.

Andy


Same sluggish performance now as when the fiber channel went out... Will have to wait for about 6-7 hours before someone comes in...
ID: 573598 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13851
Credit: 208,696,464
RAC: 304
Australia
Message 573607 - Posted: 22 May 2007, 8:57:47 UTC - in response to Message 573598.  

Same sluggish performance now as when the fiber channel went out...

When the FC carked it, things weren't sluggish- they stopped dead.

Grant
Darwin NT
ID: 573607 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 573612 - Posted: 22 May 2007, 9:18:13 UTC - in response to Message 573607.  

Same sluggish performance now as when the fiber channel went out...

When the FC carked it, things weren't sluggish- they stopped dead.


OK, I must be recalling when it was about to go out. The same scenario played out on my AMD host as did my Intel host yesterday, i.e. the stock application download and 1 WU took a very long time to download (multiple retries). This was just before I posted a message about "letting it do it's thing".

ID: 573612 · Report as offensive
Teorias

Send message
Joined: 21 Jul 99
Posts: 5
Credit: 23,941,372
RAC: 0
Portugal
Message 573622 - Posted: 22 May 2007, 10:06:13 UTC

Since the database server went dead I've been unable to receive work.

22-05-2007 10:58 TEO|SETI@home|Sending scheduler request: Requested by user
22-05-2007 10:58 TEO|SETI@home|Requesting 864000 seconds of new work
22-05-2007 10:59 TEO|SETI@home|Scheduler RPC succeeded [server version 509]
22-05-2007 10:59 TEO|SETI@home|Deferring communication for 11 sec
22-05-2007 10:59 TEO|SETI@home|Reason: requested by project
22-05-2007 10:59 TEO|SETI@home|Deferring communication for 48 min 56 sec
22-05-2007 10:59 TEO|SETI@home|Reason: no work from project
22-05-2007 11:01 TEO|SETI@home|Sending scheduler request: Requested by user
22-05-2007 11:01 TEO|SETI@home|Requesting 864000 seconds of new work
22-05-2007 11:01 TEO|SETI@home|Scheduler request failed: HTTP internal server error
22-05-2007 11:01 TEO|SETI@home|Deferring communication for 1 min 0 sec
22-05-2007 11:01 TEO|SETI@home|Reason: scheduler request failed

And it has been like this for 5 or 6 days. Already tried to re-install Boinc but problem persist. It's the same for 5 clients.

Best Regards.
ID: 573622 · Report as offensive
Profile CElliott
Volunteer tester

Send message
Joined: 19 Jul 99
Posts: 178
Credit: 79,285,961
RAC: 0
United States
Message 573625 - Posted: 22 May 2007, 10:19:04 UTC - in response to Message 573180.  

Eric:


I read your entry; I won't quote it here. I crunch for the Beta project. I run Boinc 5.8.16 and use app_info.xml that specifies both the KWSN Seti app and AstroPulse 4.14. I have not changed anything in weeks. Suddenly I cannot upload any completed results or get any new work. I have had to shut down one computer because I cannot get any new work. I have tried all the suggestions in your post for getting new work and returning results. I have not tried renaming app_info.xml and restarting because in my experience that causes Boinc to invalidate everything. With all due respect, the problem is on your end.


ID: 573625 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 573626 - Posted: 22 May 2007, 10:27:16 UTC - in response to Message 573625.  
Last modified: 22 May 2007, 10:31:38 UTC

Eric:


I read your entry; I won't quote it here. I crunch for the Beta project. I run Boinc 5.8.16 and use app_info.xml that specifies both the KWSN Seti app and AstroPulse 4.14. I have not changed anything in weeks. Suddenly I cannot upload any completed results or get any new work. I have had to shut down one computer because I cannot get any new work. I have tried all the suggestions in your post for getting new work and returning results. I have not tried renaming app_info.xml and restarting because in my experience that causes Boinc to invalidate everything. With all due respect, the problem is on your end.


Read Eric's second post in this thread: 573574. He knows: the BOINC developers have supplied a fix: he's going to install it in the morning: but he's getting some sleep first.

And I know I've said this before, but you shouldn't be using Chicken apps in Beta: that's not what's being tested. You could help test the stock application program (that phase is mostly completed): you could generate some real world performance data to help fine-tune the FLOPs multiplier for multibeam credit (still ongoing): but you shouldn't pollute the database with irrelevant results that have to be weeded out again before the test results can be analysed.
ID: 573626 · Report as offensive
Teorias

Send message
Joined: 21 Jul 99
Posts: 5
Credit: 23,941,372
RAC: 0
Portugal
Message 573628 - Posted: 22 May 2007, 10:33:20 UTC - in response to Message 573626.  



And I know I've said this before, but you shouldn't be using Chicken apps in Beta: that's not what's being tested. You could help test the stock application program (that phase is mostly completed): you could generate some real world performance data to help fine-tune the FLOPs multiplier for credit (still ongoing): but you shouldn't polute the database with irrelevant results that have to be weeded out again before the test results can be analysed.


For what i've read lately in the forums that is no longer the problem since most people have reverted to standard apps ans still no work.
ID: 573628 · Report as offensive
Profile Kirsten
Volunteer tester
Avatar

Send message
Joined: 7 Jul 00
Posts: 190
Credit: 566,047
RAC: 0
Denmark
Message 573634 - Posted: 22 May 2007, 10:55:06 UTC
Last modified: 22 May 2007, 11:05:50 UTC

Previously downloaded tasks (while the opt. app was disabled) would allegedly use stock app version 5.15 (with the opt. app enabled) for crunching.

Now all of a sudden my task window informs they will be crunched with version 5.17 and I get this message

22-05-2007 12:41:54|SETI@home|Restarting task 04mr05ab.17213.29712.503418.3.251_1 using setiathome_enhanced version 517

Maybe this is the platform fix we have been waiting for.

Edit: As a test I try to download work *with* the Chicken appplication enabled. Until further I have not recieved any error message, but neither more work due to this

22-05-2007 13:02:12||Access to reference site succeeded - project servers may be temporarily down, but that is not an error message.

Kind regards
Kirsten

ID: 573634 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 573639 - Posted: 22 May 2007, 11:05:35 UTC - in response to Message 573634.  

Previously downloaded tasks (while the opt. app was disabled) would allegedly use stock app version 5.15 (with the opt. app enabled) for crunching.

Now all of a sudden my task window informs they will be crunched with version 5.17 and I get this message

22-05-2007 12:41:54|SETI@home|Restarting task 04mr05ab.17213.29712.503418.3.251_1 using setiathome_enhanced version 517

Maybe this is the platform fix we have been waiting for.

Edit: As a test I try to download work *with* the Chicken appplication enabled. Until further I have not recieved any error message.

Afraid not. Just tested with canary 1883631: got

SETI@home 22/05/2007 11:30:02 Scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi failed: error 500

and a ghost. (ignore time in log - clock's half-an-hour slow on that Win 98SE machine)
ID: 573639 · Report as offensive
Profile Kirsten
Volunteer tester
Avatar

Send message
Joined: 7 Jul 00
Posts: 190
Credit: 566,047
RAC: 0
Denmark
Message 573645 - Posted: 22 May 2007, 11:14:41 UTC - in response to Message 573639.  
Last modified: 22 May 2007, 11:32:30 UTC



Edit: As a test I try to download work *with* the Chicken appplication enabled. Until further I have not recieved any error message.

Afraid not. Just tested with canary 1883631: got

SETI@home 22/05/2007 11:30:02 Scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi failed: error 500

and a ghost. (ignore time in log - clock's half-an-hour slow on that Win 98SE machine)


I have not got any serious error msg yet and no ghost either, just a minor error IMO

22-05-2007 12:59:00|SETI@home|Requesting 677746 seconds of new work
22-05-2007 13:02:11||Project communication failed: attempting access to reference site
22-05-2007 13:02:11|SETI@home|Scheduler request failed: server returned nothing (no headers, no data)
22-05-2007 13:02:11|SETI@home|Deferring communication for 1 hr 35 min 4 sec
22-05-2007 13:02:11|SETI@home|Reason: scheduler request failed
22-05-2007 13:02:12||Access to reference site succeeded - project servers may be temporarily down.

Edit: The error msg came now 20 minutes later:

2007-05-22 13:24:38 [SETI@home] Scheduler request failed: HTTP internal server error

but no ghost.


Kind regards
Kirsten

ID: 573645 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19377
Credit: 40,757,560
RAC: 67
United Kingdom
Message 573719 - Posted: 22 May 2007, 13:19:27 UTC - in response to Message 573589.  
Last modified: 22 May 2007, 13:21:18 UTC

I'm having to do the app-info rename to get things going

and last night i got some work units
gone already

but i also get the error

Reason: no work from project

yet the status page shows 165k results to send out as of an hour ago
aaah there it goes.
when it finally gets the rigth answer and re-downloads the files (again, sigh) then ti gets workunits.
strange errors coming through

I'm getting the no work msg also, along with int server err's, for the last three hours. This is my early morning check.

Andy

Just checking, from work during lata, late lunch, where I cannot crunch, but have found that my two computers have got 3 (11:51 UTC) and 4 (11:46 UTC) new units.

Moral: Just hang in there and be patient. I don't believe the Berkeley staff would have gone in to the office before 04:40 Pacific time.

Andy
edit] have confirmed with at home son, that they are not ghosts. [/edit
ID: 573719 · Report as offensive
Wasabi Peanut
Avatar

Send message
Joined: 14 Jul 99
Posts: 62
Credit: 32,646,911
RAC: 0
Switzerland
Message 573726 - Posted: 22 May 2007, 13:27:26 UTC
Last modified: 22 May 2007, 13:39:22 UTC

Hi all!

I'd like to share my experience since thumper came back, but first a little background: I'm running BOINC CLI 5.4.9 of a variety of Intel- and PPC-based Macs, all of whom are running Alex Kan's workers (so there's an app_info.xml in the mix on all machines).

Once thumper came back, my boxes received a total of maybe 100 WUs over the first three days, and then nothing at all. Beginning on Sunday, I started renaming the app_info.xml files on all boxes. On every box, WUs have started to pour in shortly after stopping BOINC, renaming the file and then starting BOINC again.

During the course of the outage, I've seen just about all the error messages mentioned by previous posters. Since the app_info.xml-fix, downloads appear to work reliably, but uploads are still hit or miss.

Thanks to all the hard-working folks at SETI for tackling the issues at hand one by one! There will be smooth sailing again soon, I'm convinced.

HTH,

Ron
ID: 573726 · Report as offensive
Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar

Send message
Joined: 20 Dec 05
Posts: 3187
Credit: 57,163,290
RAC: 0
United States
Message 573736 - Posted: 22 May 2007, 13:39:09 UTC - in response to Message 573639.  

Previously downloaded tasks (while the opt. app was disabled) would allegedly use stock app version 5.15 (with the opt. app enabled) for crunching.

Now all of a sudden my task window informs they will be crunched with version 5.17 and I get this message

22-05-2007 12:41:54|SETI@home|Restarting task 04mr05ab.17213.29712.503418.3.251_1 using setiathome_enhanced version 517

Maybe this is the platform fix we have been waiting for.

Edit: As a test I try to download work *with* the Chicken appplication enabled. Until further I have not recieved any error message.

Afraid not. Just tested with canary 1883631: got

SETI@home 22/05/2007 11:30:02 Scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi failed: error 500

and a ghost. (ignore time in log - clock's half-an-hour slow on that Win 98SE machine)


Why not re-set the clock in the Win98 machine? Just right click on the time in the taskbar and choose "adjust Time/Date". when that machine needs a re-boot, go into setup and permanently adjust the time. (...or get a utility that will do the adjust for you, like Atomic Clock Sync from worldtimeserver.com) The clocks in most PC's (particularly overclocked ones) aren't particularly accurate.
.

Hello, from Albany, CA!...
ID: 573736 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 573747 - Posted: 22 May 2007, 13:49:30 UTC - in response to Message 573736.  

Why not re-set the clock in the Win98 machine? Just right click on the time in the taskbar and choose "adjust Time/Date". when that machine needs a re-boot, go into setup and permanently adjust the time. (...or get a utility that will do the adjust for you, like Atomic Clock Sync from worldtimeserver.com) The clocks in most PC's (particularly overclocked ones) aren't particularly accurate.

Don't worry, I do that periodically, but it drifts - it isn't an important machine, I only keep it running because it does a few tasks better than the others, and it crunches, of course. (Hence the 'canary' function).

If I really need to know the time, I look at one of the servers which is configured for SNTP sync from a proper tier-2 public time server (much more reliable than the mickey-mouse M$ ones, which tend to be overloaded). They in turn keep their local domains in line through group policy.
ID: 573747 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

Message boards : Technical News : Fiber channel woes, Chicken App, etc. (May 21 2007)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.