Eric's biannual post #6: You can tuna fish, but you can't tune a TCP

Message boards : SETI@home Staff Blog : Eric's biannual post #6: You can tuna fish, but you can't tune a TCP
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 8 · Next

AuthorMessage
Profile [SETI.USA] OneChicken
Avatar

Send message
Joined: 3 Apr 04
Posts: 70
Credit: 906,887
RAC: 0
United States
Message 570506 - Posted: 18 May 2007, 17:57:07 UTC - in response to Message 570468.  


I think I've tracked down this problem. There seems to be something related to the outage that has corrupted the file "sched_request_setiathome.berkeley.edu.xml"
in your BOINC directories. Restarting BOINC or deleting the file and restarting BOINC should fix that problem.


Eric: Any chance this can be fixed on Berkeley's end? I have some remote mahcines that I can not get to.



Proud member of SETI.USA
ID: 570506 · Report as offensive
Profile Labbie
Avatar

Send message
Joined: 19 Jun 06
Posts: 4083
Credit: 5,930,102
RAC: 0
United States
Message 570507 - Posted: 18 May 2007, 17:57:16 UTC - in response to Message 570500.  

For me, the renaming the app_info file trick has accomplsihed something.

I did get 3 WUs on one machine, but on the Results for Computer page, it also gave me 5 additional Ghosts. Note that the timestamp for the Ghosts are weird, while the good DLs are consistent with the time they downloaded.

Good DLs in Red

535624975 129448185 18 May 2007 16:56:38 UTC 5 Jun 2007 9:12:03 UTC In Progress Unknown New --- --- ---
535218668 128846780 17 May 2007 17:43:31 UTC 22 May 2007 1:53:31 UTC In Progress Unknown New --- --- ---
535051627 129476175 18 May 2007 16:56:38 UTC 31 May 2007 19:41:22 UTC In Progress Unknown New --- --- ---
535051564 129476162 18 May 2007 16:56:38 UTC 31 May 2007 19:41:22 UTC In Progress Unknown New --- --- ---

534945332 129442634 18 May 2007 11:47:27 UTC 22 May 2007 19:57:27 UTC In Progress Unknown New --- --- ---
534759659 129388850 18 May 2007 1:11:32 UTC 11 Jun 2007 23:44:31 UTC In Progress Unknown New --- --- ---
534758973 129388628 18 May 2007 1:10:06 UTC 6 Jun 2007 19:09:32 UTC In Progress Unknown New --- --- ---
534758708 129388540 18 May 2007 1:09:01 UTC 6 Jun 2007 19:08:27 UTC In Progress Unknown New --- --- ---

[EDIT]Infact, one of the Ghosts got a timestamp in the future, if I'calculating GMT correctly from MDT[/EDIT]

The WU with the 17:43:31 UTC timestamp is dated yesterday (17 May 2007), so I don't think we've encountered ghostly time-travelling ETs just yet. Shame, really.


Yep, you are right, I misread the date.


Calm Chaos Forum...Join Calm Chaos Now
ID: 570507 · Report as offensive
Dominik S.

Send message
Joined: 4 Jun 03
Posts: 15
Credit: 4,346,294
RAC: 0
Poland
Message 570523 - Posted: 18 May 2007, 18:10:26 UTC - in response to Message 570468.  



I think I've tracked down this problem. There seems to be something related to the outage that has corrupted the file "sched_request_setiathome.berkeley.edu.xml"
in your BOINC directories. Restarting BOINC or deleting the file and restarting BOINC should fix that problem.

Eric

No it's not, I delete the file "sched_request_setiathome.berkeley.edu.xml" and restarted BOINC and now i have new ghost WU
ID: 570523 · Report as offensive
Profile Clyde C. Phillips, III

Send message
Joined: 2 Aug 00
Posts: 1851
Credit: 5,955,047
RAC: 0
United States
Message 570536 - Posted: 18 May 2007, 18:24:49 UTC

I don't know whether it's ghosts or what but I haven't been able to get a single Seti unit for either of my computers for at least a couple days:

5/18/2007 8:07:07 AM||Project communication failed: attempting access to reference site
5/18/2007 8:07:09 AM||Access to reference site succeeded - project servers may be temporarily down.
5/18/2007 8:07:11 AM|SETI@home|Scheduler request failed: couldn't connect to server
5/18/2007 8:07:11 AM|SETI@home|Deferring scheduler requests for 1 minutes and 32 seconds
5/18/2007 8:08:46 AM|SETI@home|Sending scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi
5/18/2007 8:08:46 AM|SETI@home|Reason: To fetch work
5/18/2007 8:08:46 AM|SETI@home|Requesting 345600 seconds of new work
5/18/2007 8:09:07 AM||Project communication failed: attempting access to reference site
5/18/2007 8:09:09 AM||Access to reference site succeeded - project servers may be temporarily down.
5/18/2007 8:09:11 AM|SETI@home|Scheduler request failed: couldn't connect to server
5/18/2007 8:09:11 AM|SETI@home|Deferring scheduler requests for 48 minutes and 26 seconds
5/18/2007 8:57:42 AM|SETI@home|Sending scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi
5/18/2007 8:57:42 AM|SETI@home|Reason: To fetch work
5/18/2007 8:57:42 AM|SETI@home|Requesting 345600 seconds of new work
5/18/2007 8:57:43 AM||Project communication failed: attempting access to reference site
5/18/2007 8:57:44 AM||Access to reference site succeeded - project servers may be temporarily down.
5/18/2007 8:57:47 AM|SETI@home|Scheduler request failed: server returned nothing (no headers, no data)
5/18/2007 8:57:47 AM|SETI@home|Deferring scheduler requests for 2 hours, 2 minutes and 53 seconds
5/18/2007 11:00:42 AM|SETI@home|Sending scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi
5/18/2007 11:00:42 AM|SETI@home|Reason: To fetch work
5/18/2007 11:00:42 AM|SETI@home|Requesting 345600 seconds of new work
5/18/2007 11:00:46 AM||Project communication failed: attempting access to reference site
5/18/2007 11:00:47 AM||Access to reference site succeeded - project servers may be temporarily down.
5/18/2007 11:00:47 AM|SETI@home|Scheduler request failed: server returned nothing (no headers, no data)
5/18/2007 11:00:47 AM|SETI@home|Deferring scheduler requests for 3 hours, 49 minutes and 19 seconds
5/18/2007 2:21:33 PM||Rescheduling CPU: application exited

ID: 570536 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 570539 - Posted: 18 May 2007, 18:31:35 UTC - in response to Message 570523.  



I think I've tracked down this problem. There seems to be something related to the outage that has corrupted the file "sched_request_setiathome.berkeley.edu.xml"
in your BOINC directories. Restarting BOINC or deleting the file and restarting BOINC should fix that problem.

Eric

No it's not, I delete the file "sched_request_setiathome.berkeley.edu.xml" and restarted BOINC and now i have new ghost WU

You have to read the full original post. Eric was tracking down an earlier, simpler problem relating to "Incomplete request received".

The ghost WU seems to relate to "HTTP internal server error" and the use of optimised apps. Try that workround - it's been posted enough times already.

On the other hand, if you're getting ghost WUs without an app_info.xml file and an optimised app, that would be useful to know - please post again.
ID: 570539 · Report as offensive
Dominik S.

Send message
Joined: 4 Jun 03
Posts: 15
Credit: 4,346,294
RAC: 0
Poland
Message 570543 - Posted: 18 May 2007, 18:43:55 UTC - in response to Message 570539.  



I think I've tracked down this problem. There seems to be something related to the outage that has corrupted the file "sched_request_setiathome.berkeley.edu.xml"
in your BOINC directories. Restarting BOINC or deleting the file and restarting BOINC should fix that problem.

Eric

No it's not, I delete the file "sched_request_setiathome.berkeley.edu.xml" and restarted BOINC and now i have new ghost WU

You have to read the full original post. Eric was tracking down an earlier, simpler problem relating to "Incomplete request received".

The ghost WU seems to relate to "HTTP internal server error" and the use of optimised apps. Try that workround - it's been posted enough times already.

On the other hand, if you're getting ghost WUs without an app_info.xml file and an optimised app, that would be useful to know - please post again.

Sorry, it's my fault,
I'm getting ghost WUs with app_info.xml of course, but the trick with renaming it works.
Really sorry for misunerstanding replay.
ID: 570543 · Report as offensive
Profile Y & J
Volunteer tester

Send message
Joined: 14 Nov 01
Posts: 15
Credit: 215,639
RAC: 0
United States
Message 570551 - Posted: 18 May 2007, 19:00:11 UTC - in response to Message 570539.  
Last modified: 18 May 2007, 19:01:03 UTC

Thanks Richard
Fixed up both units.

No it's not, I delete the file "sched_request_setiathome.berkeley.edu.xml" and restarted BOINC and now i have new ghost WU
You have to read the full original post. Eric was tracking down an earlier, simpler problem relating to "Incomplete request received".


[color= blue][u]SETI@home classic workunits = 5,906 with CPU time of 60,377 hours[/u][/color]
ID: 570551 · Report as offensive
gomeyer
Volunteer tester

Send message
Joined: 21 May 99
Posts: 488
Credit: 50,370,425
RAC: 0
United States
Message 570574 - Posted: 18 May 2007, 19:43:38 UTC - in response to Message 570477.  
Last modified: 18 May 2007, 19:44:26 UTC

That worked for me also. The question I now have is, once we go back to using app_info.xml will that break communications again?


The answer is yes, restoring app_info.xml and restarting BOINC does indeed break it again. I guess that last step should be skipped unless you're sure you have enough work to last a while, then stop new work requests to prevent ghosts.
ID: 570574 · Report as offensive
Rndmacts

Send message
Joined: 18 Aug 99
Posts: 4
Credit: 122,806
RAC: 0
Canada
Message 570578 - Posted: 18 May 2007, 19:53:14 UTC

I have been getting the same problems everyone else has, and I had closed and restarted Boinc several times with no relief. Finally rebooted computer and Boinc started and sent all finished work units and downloaded new units. I didn't try the app_info.xml fix, just rebooted, everything seems fine now.
Can a whisper be heard across the universe?
ID: 570578 · Report as offensive
crazyrabbit1

Send message
Joined: 17 Sep 06
Posts: 35
Credit: 2,282,319
RAC: 0
Germany
Message 570581 - Posted: 18 May 2007, 19:56:24 UTC - in response to Message 570468.  


SETI@home 17/05/2007 18:18:18 Message from server: Incomplete request received.


I think I've tracked down this problem. There seems to be something related to the outage that has corrupted the file "sched_request_setiathome.berkeley.edu.xml"
in your BOINC directories. Restarting BOINC or deleting the file and restarting BOINC should fix that problem.


SETI@home 17/05/2007 19:01:39 Scheduler request failed: HTTP internal server error
SETI@home 17/05/2007 19:06:49 Scheduler request failed: server returned nothing (no headers, no data)


This usually means the scheduler request timed out because we are still overwhelmed.

Eric


@Eric
on my side it does not seems to work, i deleted the file and restarted boinc and jusut get the message no headers no data returned. Also i get new ghost units after switching to the opapp again.

After all i would thank you and the hole team for the hard work to get things up to normal working.
ID: 570581 · Report as offensive
Dominik S.

Send message
Joined: 4 Jun 03
Posts: 15
Credit: 4,346,294
RAC: 0
Poland
Message 570593 - Posted: 18 May 2007, 20:11:13 UTC - in response to Message 570581.  


SETI@home 17/05/2007 18:18:18 Message from server: Incomplete request received.


I think I've tracked down this problem. There seems to be something related to the outage that has corrupted the file "sched_request_setiathome.berkeley.edu.xml"
in your BOINC directories. Restarting BOINC or deleting the file and restarting BOINC should fix that problem.


SETI@home 17/05/2007 19:01:39 Scheduler request failed: HTTP internal server error
SETI@home 17/05/2007 19:06:49 Scheduler request failed: server returned nothing (no headers, no data)


This usually means the scheduler request timed out because we are still overwhelmed.

Eric


@Eric
on my side it does not seems to work, i deleted the file and restarted boinc and jusut get the message no headers no data returned. Also i get new ghost units after switching to the opapp again.

After all i would thank you and the hole team for the hard work to get things up to normal working.

The problem with ghost units is different one. It's probably associated with using anonymous platform (you are using optimised app and have app_info.xml)
ID: 570593 · Report as offensive
crazyrabbit1

Send message
Joined: 17 Sep 06
Posts: 35
Credit: 2,282,319
RAC: 0
Germany
Message 570608 - Posted: 18 May 2007, 20:36:19 UTC - in response to Message 570593.  


SETI@home 17/05/2007 18:18:18 Message from server: Incomplete request received.


I think I've tracked down this problem. There seems to be something related to the outage that has corrupted the file "sched_request_setiathome.berkeley.edu.xml"
in your BOINC directories. Restarting BOINC or deleting the file and restarting BOINC should fix that problem.


SETI@home 17/05/2007 19:01:39 Scheduler request failed: HTTP internal server error
SETI@home 17/05/2007 19:06:49 Scheduler request failed: server returned nothing (no headers, no data)


This usually means the scheduler request timed out because we are still overwhelmed.

Eric


@Eric
on my side it does not seems to work, i deleted the file and restarted boinc and jusut get the message no headers no data returned. Also i get new ghost units after switching to the opapp again.

After all i would thank you and the hole team for the hard work to get things up to normal working.

The problem with ghost units is different one. It's probably associated with using anonymous platform (you are using optimised app and have app_info.xml)


I see no difference between the two problems, i get ghosts with the app from lunatics and i get the message "no header no data" from the server. if i use the original app i get work and no error messages. i think i will wait until things get better.
ID: 570608 · Report as offensive
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1382
Credit: 54,506,847
RAC: 60
United States
Message 570612 - Posted: 18 May 2007, 20:46:56 UTC - in response to Message 570506.  


Eric: Any chance this can be fixed on Berkeley's end? I have some remote mahcines that I can not get to.


I haven't come up with a way yet. But I'm still thinking....

Eric
@SETIEric@qoto.org (Mastodon)

ID: 570612 · Report as offensive
Profile Teratoma [SETI.USA]
Avatar

Send message
Joined: 30 Mar 00
Posts: 16
Credit: 2,200,914
RAC: 0
United States
Message 570631 - Posted: 18 May 2007, 21:35:09 UTC

All of these deleting files ideas are great. Restarting Boinc, good advice. I've done it about 6 or 7 times today.

The problem is that if you can't reach the project none of these fixes work.

I get a lot of "Scheduler request failed: server returned nothing (no headers, no data)"

And some "Scheduler request failed: HTTP internal server error"

But neither is consistent. Sometime I can upload and sometimes I can report. I cannot get new work no matter what I do.

Now I get "Scheduler request failed: failed sending data to the peer"

So, If I can't reach the project 9 out of 10 attempts, and I cannot get work on that 1 attempt, what am I going to do. I suppose that when I do (not if) run out of work, I can detach or uninstall Boinc and start over. However, with each detach or uninstall, the probability of me returning to this project keeps reducing.

I know everyone is working hard, but...it shouldn't be this difficult for us to participate. People will leave and some may never return.

..
ID: 570631 · Report as offensive
Profile Crunch3r
Volunteer tester
Avatar

Send message
Joined: 15 Apr 99
Posts: 1546
Credit: 3,438,823
RAC: 0
Germany
Message 570637 - Posted: 18 May 2007, 21:40:41 UTC - in response to Message 570612.  
Last modified: 18 May 2007, 21:41:40 UTC


Eric: Any chance this can be fixed on Berkeley's end? I have some remote mahcines that I can not get to.


I haven't come up with a way yet. But I'm still thinking....

Eric


While we're talking about remote machines. :)

I got the same problem too. 3 of my machines are not accessible atm (nor vpn or anything else).

Is it possible to initialize a reset on those machines from the user account page ?
(like a reset send from the project ?)

And if so could this be implemented ?











Join BOINC United now!
ID: 570637 · Report as offensive
zombie67 [MM]
Volunteer tester
Avatar

Send message
Joined: 22 Apr 04
Posts: 758
Credit: 27,771,894
RAC: 0
United States
Message 570639 - Posted: 18 May 2007, 21:43:58 UTC - in response to Message 570608.  

I see no difference between the two problems, i get ghosts with the app from lunatics and i get the message "no header no data" from the server. if i use the original app i get work and no error messages. i think i will wait until things get better.


Issue #1: SETI@home 17/05/2007 18:18:18 Message from server: Incomplete request received.

This error is caused by a corrupt "sched_request_setiathome.berkeley.edu.xml" file. Fixed by quitting/restarting BOINC, or quitting BOINC, deleting the file, restarting BOINC.


Issue #2: Cannot download new work & ghost results created. This can be fixed by renaming app_info.xml to something else. More detailed instructions here:

http://setiathome.berkeley.edu/forum_thread.php?id=39531&nowrap=true#570170


Issue #3: Other misc. error messages when trying to connect to S@H servers. Caused by heavily loaded servers. Ignore, will fix itself over time as everyone catches up.

Dublin, California
Team: SETI.USA
ID: 570639 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 570640 - Posted: 18 May 2007, 21:45:07 UTC - in response to Message 570631.  
Last modified: 18 May 2007, 21:46:49 UTC

All of these deleting files ideas are great. Restarting Boinc, good advice. I've done it about 6 or 7 times today.

The problem is that if you can't reach the project none of these fixes work.

I get a lot of "Scheduler request failed: server returned nothing (no headers, no data)"

And some "Scheduler request failed: HTTP internal server error"

But neither is consistent. Sometime I can upload and sometimes I can report. I cannot get new work no matter what I do.

Now I get "Scheduler request failed: failed sending data to the peer"

So, If I can't reach the project 9 out of 10 attempts, and I cannot get work on that 1 attempt, what am I going to do. I suppose that when I do (not if) run out of work, I can detach or uninstall Boinc and start over. However, with each detach or uninstall, the probability of me returning to this project keeps reducing.

I know everyone is working hard, but...it shouldn't be this difficult for us to participate. People will leave and some may never return.

"Scheduler request failed: server returned nothing (no headers, no data)" - congestion
"Scheduler request failed: failed sending data to the peer" - congestion
"Scheduler request failed: HTTP internal server error" - you are running an optimised app, and the scheduler is broken.

[probably - your computers are hidden, which makes helpful troubleshooting next to impossible. But your signature banner tends to imply an optimiser]

Look at Number Crunching, and the 'Ghosts' thread - your solution is there.
ID: 570640 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 570641 - Posted: 18 May 2007, 21:45:33 UTC

Thanks Eric! I've fixed all my systems using optimized apps with your suggestion. I'm back up and crunching on all machines now. No errors and no failed connections.
ID: 570641 · Report as offensive
Profile Blurf
Volunteer tester

Send message
Joined: 2 Sep 06
Posts: 8964
Credit: 12,678,685
RAC: 0
United States
Message 570689 - Posted: 18 May 2007, 22:57:18 UTC

Eric-

With all your hard work lately (and Matt's before his vacation) and with all due respect, maybe you need to call in some outside help as this simply isn't getting resolved???

Has this issue outgrown your skills and the Calvary needs to be called in??

How can we get you more IMMEDIATE assistance (besides $ and hardware)?


ID: 570689 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 570704 - Posted: 18 May 2007, 23:23:25 UTC - in response to Message 570640.  

All of these deleting files ideas are great. Restarting Boinc, good advice. I've done it about 6 or 7 times today.

The problem is that if you can't reach the project none of these fixes work.

I get a lot of "Scheduler request failed: server returned nothing (no headers, no data)"

And some "Scheduler request failed: HTTP internal server error"

But neither is consistent. Sometime I can upload and sometimes I can report. I cannot get new work no matter what I do.

Now I get "Scheduler request failed: failed sending data to the peer"

So, If I can't reach the project 9 out of 10 attempts, and I cannot get work on that 1 attempt, what am I going to do. I suppose that when I do (not if) run out of work, I can detach or uninstall Boinc and start over. However, with each detach or uninstall, the probability of me returning to this project keeps reducing.

I know everyone is working hard, but...it shouldn't be this difficult for us to participate. People will leave and some may never return.

"Scheduler request failed: server returned nothing (no headers, no data)" - congestion
"Scheduler request failed: failed sending data to the peer" - congestion
"Scheduler request failed: HTTP internal server error" - you are running an optimised app, and the scheduler is broken.

[probably - your computers are hidden, which makes helpful troubleshooting next to impossible. But your signature banner tends to imply an optimiser]

Look at Number Crunching, and the 'Ghosts' thread - your solution is there.



I have tried all the fixes suggested in the "Ghosts" thread. I have tried the scheduler file deletion to no avail. I am still getting the "Scheduler request failed: server returned nothing (no headers, no data)" message and also the:

"Scheduler request failed: HTTP internal server error" message. I am running the only app we have for our operating system. It is not optimized.

I have started and stopped BOINC multiple times and requested more work and updated the client with no sign of any new work. The online status of my two workstations shows 125 WU's "IN Progress" with no actual WU on either of my workstations present.

How do I fix this? What steps need to be taken so I can continue working for SETI?


SETI is the only project that we have a client application for. So doing work for other projects is impossible. I have been out of work now for two weeks.


Thanks in advance, Keith
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 570704 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 8 · Next

Message boards : SETI@home Staff Blog : Eric's biannual post #6: You can tuna fish, but you can't tune a TCP


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.