Fiber channel woes, Chicken App, etc. (May 21 2007)

Author	Message
Xaak Send message Joined: 22 May 99 Posts: 32 Credit: 22,636,357 RAC: 0	Message 573578 - Posted: 22 May 2007, 6:52:49 UTC - in response to Message 573524. ... XAAK -- Judging by your involvment in so many other projects, you know more than most that there are other projects to compile credit with. We each have the right to silently and non-apologetically reapportion our BOINC clients to those projects whose 'management styles' and 'scientific goals' better suit us. It's that simple. ... I've already moved all of my resources to other projects, and have chosen not to make a donation this year too. Though I tend not to do things silently. Silence is often taken as apathy. XaaK ID: 573578 ·

Dave Rave Send message Joined: 18 Mar 00 Posts: 23 Credit: 3,083,330 RAC: 0	Message 573585 - Posted: 22 May 2007, 7:09:34 UTC I'm having to do the app-info rename to get things going and last night i got some work units gone already but i also get the error Reason: no work from project yet the status page shows 165k results to send out as of an hour ago aaah there it goes. when it finally gets the rigth answer and re-downloads the files (again, sigh) then ti gets workunits. strange errors coming through ID: 573585 ·

W-K 666 Volunteer tester Send message Joined: 18 May 99 Posts: 19377 Credit: 40,757,560 RAC: 67	Message 573589 - Posted: 22 May 2007, 7:19:22 UTC - in response to Message 573585. I'm having to do the app-info rename to get things going and last night i got some work units gone already but i also get the error Reason: no work from project yet the status page shows 165k results to send out as of an hour ago aaah there it goes. when it finally gets the rigth answer and re-downloads the files (again, sigh) then ti gets workunits. strange errors coming through I'm getting the no work msg also, along with int server err's, for the last three hours. This is my early morning check. Andy ID: 573589 ·

Morris Volunteer tester Send message Joined: 11 Sep 01 Posts: 57 Credit: 9,077,302 RAC: 29	Message 573594 - Posted: 22 May 2007, 7:56:32 UTC I'm getting the no work msg also, along with int server err's, for the last three hours. This is my early morning check. Andy Same here.. tried with optimized (chicken) app, with or without app_info.xml, installing as a service or not, with boinc version 5.8.16 and/or 5.4.9 but that is of no help, all that i got from berkeley is Http: Internal server error or no work from project M. ID: 573594 ·

QuietIce Send message Joined: 21 Jul 06 Posts: 5 Credit: 24,098,658 RAC: 0	Message 573595 - Posted: 22 May 2007, 7:57:11 UTC Last modified: 22 May 2007, 7:57:36 UTC Just an FYI ... Neither the normal apps nor the KWSN apps are able to get work at this time. One has the HTTP error, the other shows no error at all it simply says, "No work from project" ... ID: 573595 ·

Compukatt Send message Joined: 5 Oct 99 Posts: 26 Credit: 27,325,826 RAC: 13	Message 573597 - Posted: 22 May 2007, 8:02:24 UTC - in response to Message 573574. One of the largest problems with testing BOINC releases is a matter of scale some problems just don't show up in a project a few thousand become much more apparent in a project of a few hundred thousand. The other problem is that BOINC releases are not designed to be incremental. An upgrade that fixes one bug often includes new ones elsewhere in the code. They are also not designed to be reversible. Database changes don't often go away quietly. At any rate, a code rollback wasn't going to work because it would negate the round-robin DNS scheme for our feeders and schedulers and we'd be back where we were on Friday, with most connection attempts failing. David checked in the final fix for that problem tonight, but I'm not going to change the server without getting in a few hours of sleep. My alarm clock is set for 5.5 hours from now. When I finish this message, I'm going to bed. And with Matt gone, SETI's operations staff is essentially me and Jeff. Jeff has a real job, which means he doesn't work 24 hours a day. Lynn would also kill him if he tried. I'm a scientist, so I'm expected to work until I drop. After I drop I work in a reclining position. But I've got a proposal due on campus on Thursday, so I can't spend all my working hours watching the server logs. (I do, and have had two windows open on the feeder logs which I have been glancing at. Right now each system is handling about 10 results a second.) Regarding censorship here. Please remember that most of the moderators are not university employees and they are human. Complain to the moderators list (setimods at ssl.berkeley.edu) or to me (korpela at ssl.berkeley.edu, warning: very aggressive spam filter) with a link to the posts in question and an explanation of what was meant. Under normal circumstances, moderation decisions can be overturned, or agreement can be reached about permissible language. Often times the problem can be including too much of a post which was deleted for a reason or withdrawn by the original poster with a request that quotes also be deleted. Good night. 5h15 before the alarm goes off. -- Eric Sure, sometimes it takes a while to upload and download wus but they all get there eventually. Delays and outages are a small price for me to pay for being able to take part and contribute to the biggest computing task ever undertaken. I am truly amazed at the speed with which the SETI Team (Eric, Jeff, Matt et al) can progress from the onset of a major disaster (multiple hardware and software issues) to making code and hardware changes to get 1.5 million computers (or maybe more) working in harmony again. I doubt that there are many teams that could rectify problems of this magnitude in weeks, let alone days. My hat's off to you guys! My hat is also off to the volunteers on these boards who have provided ideas and tested differing configuration varieties in order to allow the majority of us to resume 'normal' crunching: idefix, Chicken, geek and many others I can't name, "Thanks!" I'm with SETI for the long run, whether there are wus or not. Whether there are hardware and software issues, or not. Whether my machines are crunching, or not. Overall the downtime is extremely small and I gain a lot of satisfaction seeing the amount of science my machines are contributing to. end rave lol Bill Bill Auckland, NZ ID: 573597 ·

Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0	Message 573598 - Posted: 22 May 2007, 8:04:00 UTC - in response to Message 573589. I'm getting the no work msg also, along with int server err's, for the last three hours. This is my early morning check. Andy Same sluggish performance now as when the fiber channel went out... Will have to wait for about 6-7 hours before someone comes in... ID: 573598 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13851 Credit: 208,696,464 RAC: 304	Message 573607 - Posted: 22 May 2007, 8:57:47 UTC - in response to Message 573598. Same sluggish performance now as when the fiber channel went out... When the FC carked it, things weren't sluggish- they stopped dead. Grant Darwin NT ID: 573607 ·

Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0	Message 573612 - Posted: 22 May 2007, 9:18:13 UTC - in response to Message 573607. Same sluggish performance now as when the fiber channel went out... When the FC carked it, things weren't sluggish- they stopped dead. OK, I must be recalling when it was about to go out. The same scenario played out on my AMD host as did my Intel host yesterday, i.e. the stock application download and 1 WU took a very long time to download (multiple retries). This was just before I posted a message about "letting it do it's thing". ID: 573612 ·

Teorias Send message Joined: 21 Jul 99 Posts: 5 Credit: 23,941,372 RAC: 0	Message 573622 - Posted: 22 May 2007, 10:06:13 UTC Since the database server went dead I've been unable to receive work. 22-05-2007 10:58 TEO\|SETI@home\|Sending scheduler request: Requested by user 22-05-2007 10:58 TEO\|SETI@home\|Requesting 864000 seconds of new work 22-05-2007 10:59 TEO\|SETI@home\|Scheduler RPC succeeded [server version 509] 22-05-2007 10:59 TEO\|SETI@home\|Deferring communication for 11 sec 22-05-2007 10:59 TEO\|SETI@home\|Reason: requested by project 22-05-2007 10:59 TEO\|SETI@home\|Deferring communication for 48 min 56 sec 22-05-2007 10:59 TEO\|SETI@home\|Reason: no work from project 22-05-2007 11:01 TEO\|SETI@home\|Sending scheduler request: Requested by user 22-05-2007 11:01 TEO\|SETI@home\|Requesting 864000 seconds of new work 22-05-2007 11:01 TEO\|SETI@home\|Scheduler request failed: HTTP internal server error 22-05-2007 11:01 TEO\|SETI@home\|Deferring communication for 1 min 0 sec 22-05-2007 11:01 TEO\|SETI@home\|Reason: scheduler request failed And it has been like this for 5 or 6 days. Already tried to re-install Boinc but problem persist. It's the same for 5 clients. Best Regards. ID: 573622 ·

CElliott Volunteer tester Send message Joined: 19 Jul 99 Posts: 178 Credit: 79,285,961 RAC: 0	Message 573625 - Posted: 22 May 2007, 10:19:04 UTC - in response to Message 573180. Eric: I read your entry; I won't quote it here. I crunch for the Beta project. I run Boinc 5.8.16 and use app_info.xml that specifies both the KWSN Seti app and AstroPulse 4.14. I have not changed anything in weeks. Suddenly I cannot upload any completed results or get any new work. I have had to shut down one computer because I cannot get any new work. I have tried all the suggestions in your post for getting new work and returning results. I have not tried renaming app_info.xml and restarting because in my experience that causes Boinc to invalidate everything. With all due respect, the problem is on your end. ID: 573625 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874	Message 573626 - Posted: 22 May 2007, 10:27:16 UTC - in response to Message 573625. Last modified: 22 May 2007, 10:31:38 UTC Eric: I read your entry; I won't quote it here. I crunch for the Beta project. I run Boinc 5.8.16 and use app_info.xml that specifies both the KWSN Seti app and AstroPulse 4.14. I have not changed anything in weeks. Suddenly I cannot upload any completed results or get any new work. I have had to shut down one computer because I cannot get any new work. I have tried all the suggestions in your post for getting new work and returning results. I have not tried renaming app_info.xml and restarting because in my experience that causes Boinc to invalidate everything. With all due respect, the problem is on your end. Read Eric's second post in this thread: 573574. He knows: the BOINC developers have supplied a fix: he's going to install it in the morning: but he's getting some sleep first. And I know I've said this before, but you *shouldn't* be using Chicken apps in Beta: that's not what's being tested. You could help test the stock application program (that phase is mostly completed): you could generate some real world performance data to help fine-tune the FLOPs multiplier for multibeam credit (still ongoing): but you shouldn't pollute the database with irrelevant results that have to be weeded out again before the test results can be analysed. ID: 573626 ·

Teorias Send message Joined: 21 Jul 99 Posts: 5 Credit: 23,941,372 RAC: 0	Message 573628 - Posted: 22 May 2007, 10:33:20 UTC - in response to Message 573626. And I know I've said this before, but you *shouldn't* be using Chicken apps in Beta: that's not what's being tested. You could help test the stock application program (that phase is mostly completed): you could generate some real world performance data to help fine-tune the FLOPs multiplier for credit (still ongoing): but you shouldn't polute the database with irrelevant results that have to be weeded out again before the test results can be analysed. For what i've read lately in the forums that is no longer the problem since most people have reverted to standard apps ans still no work. ID: 573628 ·

Kirsten Volunteer tester Send message Joined: 7 Jul 00 Posts: 190 Credit: 566,047 RAC: 0	Message 573634 - Posted: 22 May 2007, 10:55:06 UTC Last modified: 22 May 2007, 11:05:50 UTC Previously downloaded tasks (while the opt. app was disabled) would allegedly use stock app version 5.15 (with the opt. app enabled) for crunching. Now all of a sudden my task window informs they will be crunched with version 5.17 and I get this message 22-05-2007 12:41:54\|SETI@home\|Restarting task 04mr05ab.17213.29712.503418.3.251_1 using setiathome_enhanced version 517 Maybe this is the platform fix we have been waiting for. Edit: As a test I try to download work with the Chicken appplication enabled. Until further I have not recieved any error message, but neither more work due to this 22-05-2007 13:02:12\|\|Access to reference site succeeded - project servers may be temporarily down, but that is not an error message. Kind regards Kirsten ID: 573634 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874	Message 573639 - Posted: 22 May 2007, 11:05:35 UTC - in response to Message 573634. Previously downloaded tasks (while the opt. app was disabled) would allegedly use stock app version 5.15 (with the opt. app enabled) for crunching. Now all of a sudden my task window informs they will be crunched with version 5.17 and I get this message 22-05-2007 12:41:54\|SETI@home\|Restarting task 04mr05ab.17213.29712.503418.3.251_1 using setiathome_enhanced version 517 Maybe this is the platform fix we have been waiting for. Edit: As a test I try to download work with the Chicken appplication enabled. Until further I have not recieved any error message. Afraid not. Just tested with canary 1883631: got SETI@home 22/05/2007 11:30:02 Scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi failed: error 500 and a ghost. (ignore time in log - clock's half-an-hour slow on that Win 98SE machine) ID: 573639 ·

Kirsten Volunteer tester Send message Joined: 7 Jul 00 Posts: 190 Credit: 566,047 RAC: 0	Message 573645 - Posted: 22 May 2007, 11:14:41 UTC - in response to Message 573639. Last modified: 22 May 2007, 11:32:30 UTC Edit: As a test I try to download work with the Chicken appplication enabled. Until further I have not recieved any error message. Afraid not. Just tested with canary 1883631: got SETI@home 22/05/2007 11:30:02 Scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi failed: error 500 and a ghost. (ignore time in log - clock's half-an-hour slow on that Win 98SE machine) I have not got any serious error msg yet and no ghost either, just a minor error IMO 22-05-2007 12:59:00\|SETI@home\|Requesting 677746 seconds of new work 22-05-2007 13:02:11\|\|Project communication failed: attempting access to reference site 22-05-2007 13:02:11\|SETI@home\|Scheduler request failed: server returned nothing (no headers, no data) 22-05-2007 13:02:11\|SETI@home\|Deferring communication for 1 hr 35 min 4 sec 22-05-2007 13:02:11\|SETI@home\|Reason: scheduler request failed 22-05-2007 13:02:12\|\|Access to reference site succeeded - project servers may be temporarily down. Edit: The error msg came now 20 minutes later: 2007-05-22 13:24:38 [SETI@home] Scheduler request failed: HTTP internal server error but no ghost. Kind regards Kirsten ID: 573645 ·

W-K 666 Volunteer tester Send message Joined: 18 May 99 Posts: 19377 Credit: 40,757,560 RAC: 67	Message 573719 - Posted: 22 May 2007, 13:19:27 UTC - in response to Message 573589. Last modified: 22 May 2007, 13:21:18 UTC I'm having to do the app-info rename to get things going and last night i got some work units gone already but i also get the error Reason: no work from project yet the status page shows 165k results to send out as of an hour ago aaah there it goes. when it finally gets the rigth answer and re-downloads the files (again, sigh) then ti gets workunits. strange errors coming through I'm getting the no work msg also, along with int server err's, for the last three hours. This is my early morning check. Andy Just checking, from work during lata, late lunch, where I cannot crunch, but have found that my two computers have got 3 (11:51 UTC) and 4 (11:46 UTC) new units. Moral: Just hang in there and be patient. I don't believe the Berkeley staff would have gone in to the office before 04:40 Pacific time. Andy edit] have confirmed with at home son, that they are not ghosts. [/edit ID: 573719 ·

Wasabi Peanut Send message Joined: 14 Jul 99 Posts: 62 Credit: 32,646,911 RAC: 0	Message 573726 - Posted: 22 May 2007, 13:27:26 UTC Last modified: 22 May 2007, 13:39:22 UTC Hi all! I'd like to share my experience since thumper came back, but first a little background: I'm running BOINC CLI 5.4.9 of a variety of Intel- and PPC-based Macs, all of whom are running Alex Kan's workers (so there's an app_info.xml in the mix on all machines). Once thumper came back, my boxes received a total of maybe 100 WUs over the first three days, and then nothing at all. Beginning on Sunday, I started renaming the app_info.xml files on all boxes. On every box, WUs have started to pour in shortly after stopping BOINC, renaming the file and then starting BOINC again. During the course of the outage, I've seen just about all the error messages mentioned by previous posters. Since the app_info.xml-fix, downloads appear to work reliably, but uploads are still hit or miss. Thanks to all the hard-working folks at SETI for tackling the issues at hand one by one! There will be smooth sailing again soon, I'm convinced. HTH, Ron ID: 573726 ·

KWSN THE Holy Hand Grenade! Volunteer tester Send message Joined: 20 Dec 05 Posts: 3187 Credit: 57,163,290 RAC: 0	Message 573736 - Posted: 22 May 2007, 13:39:09 UTC - in response to Message 573639. Previously downloaded tasks (while the opt. app was disabled) would allegedly use stock app version 5.15 (with the opt. app enabled) for crunching. Now all of a sudden my task window informs they will be crunched with version 5.17 and I get this message 22-05-2007 12:41:54\|SETI@home\|Restarting task 04mr05ab.17213.29712.503418.3.251_1 using setiathome_enhanced version 517 Maybe this is the platform fix we have been waiting for. Edit: As a test I try to download work with the Chicken appplication enabled. Until further I have not recieved any error message. Afraid not. Just tested with canary 1883631: got SETI@home 22/05/2007 11:30:02 Scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi failed: error 500 and a ghost. (ignore time in log - clock's half-an-hour slow on that Win 98SE machine) Why not re-set the clock in the Win98 machine? Just right click on the time in the taskbar and choose "adjust Time/Date". when that machine needs a re-boot, go into setup and permanently adjust the time. (...or get a utility that will do the adjust for you, like Atomic Clock Sync from worldtimeserver.com) The clocks in most PC's (particularly overclocked ones) aren't particularly accurate. . Hello, from Albany, CA!... ID: 573736 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874	Message 573747 - Posted: 22 May 2007, 13:49:30 UTC - in response to Message 573736. Why not re-set the clock in the Win98 machine? Just right click on the time in the taskbar and choose "adjust Time/Date". when that machine needs a re-boot, go into setup and permanently adjust the time. (...or get a utility that will do the adjust for you, like Atomic Clock Sync from worldtimeserver.com) The clocks in most PC's (particularly overclocked ones) aren't particularly accurate. Don't worry, I do that periodically, but it drifts - it isn't an important machine, I only keep it running because it does a few tasks better than the others, and it crunches, of course. (Hence the 'canary' function). If I really need to know the time, I look at one of the servers which is configured for SNTP sync from a proper tier-2 public time server (much more reliable than the mickey-mouse M$ ones, which tend to be overloaded). They in turn keep their local domains in line through group policy. ID: 573747 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.