Message boards :
SETI@home Staff Blog :
Eric's biannual post #6: You can tuna fish, but you can't tune a TCP
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 . . . 8 · Next
Author | Message |
---|---|
gomeyer Send message Joined: 21 May 99 Posts: 488 Credit: 50,370,425 RAC: 0 |
Hi Eric, Thanks for the information. I just want to confirm something. . .
Are downloads indeed going out at all now? I've received none for about two days on any of my eight workstations. Also, while uploads are going we are unable to Report any of them. Please note unusual verbage (Message from Server: Incomplete request received.) in the following exchange. This seems to be common to many per the message threads. 5/17/2007 12:48:15 PM|SETI@home|Sending scheduler request: To report completed tasks 5/17/2007 12:48:15 PM|SETI@home|Requesting 4159 seconds of new work, and reporting 25 completed tasks 5/17/2007 12:48:30 PM|SETI@home|Scheduler RPC succeeded [server version 509] 5/17/2007 12:48:30 PM|SETI@home|Message from server: Incomplete request received. 5/17/2007 12:48:30 PM|SETI@home|Deferring communication for 11 sec 5/17/2007 12:48:30 PM|SETI@home|Reason: requested by project 5/17/2007 12:48:30 PM|SETI@home|Deferring communication for 5 min 35 sec 5/17/2007 12:48:30 PM|SETI@home|Reason: no work from project I'm not trying to bust your chops, I KNOW how hard you are all rowing trying to fix this monster. I'm just looking for information and to be sure you are aware how many problems remain. Best regards, and thanks for the hard work. EDIT The upload/download server also APPEARS to be bouncing although this may be related to a problem noted by Matt in an earlier post: ...Bruno is dropping lots of packets right now, resulting in all kinds of upload/download snags and showing up as "disabled" on the server status page... /EDIT |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
Please note unusual verbage (Message from Server: Incomplete request received.) in the following exchange. This seems to be common to many per the message threads. As you say, many people have reported this - yet others say things are working normally. I have two older hosts/older clients (v5.3.12.tx36) which have been trying to report at intervals all day. Sometimes they failed to contact the scheduler at all, sometimes they got the 'Incomplete request received' response. Since Eric mentioned that the scheduler has been moved, I thought I'd give them a prod. Just did an 'Exit BOINC - start BOINC' sequence, and both have now reported. Coincidence or.....? |
gomeyer Send message Joined: 21 May 99 Posts: 488 Credit: 50,370,425 RAC: 0 |
Please note unusual verbage (Message from Server: Incomplete request received.) in the following exchange. This seems to be common to many per the message threads. That's a good thought, when in doubt reboot or restart. I just tried it on two of my closest hosts but they did not report. This is all most likely just the luck of the draw plus an artifact of incredible network traffic, but thought Eric may be able to offer a different perspective. Update. Two of my more remote hosts have now reported. We're making progress it seems. |
Clyde C. Phillips, III Send message Joined: 2 Aug 00 Posts: 1851 Credit: 5,955,047 RAC: 0 |
I see that my caches had emptied overnight of all finished results, a few got awarded credits but most are "pendings"- more than two daysworth. Besides, I see no new work and have only five hours before two of my four cores run out of Einstein work. I guess I'll have to fetch about a daysworth more of Einstein; otherwise my two machines will be absolutely idle. Congratulations and thanks to the team (and anyone else helping) for doing its best in coping with all these problems and solving some of them. |
Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0 |
@Eric Several of us have tested and confirmed an issue where we contact the scheduler and get an HTTP Internal Server Error and it causes our hosts to be assigned a result, but the result never comes to us. I would guess that this is causing other people to get "No work from project" messages, as the system thinks that everything has been sent out???? If you want to read the discussion on it, please see the thread in Number Crunching called Ghost WU issue (and some talk about deadlines) from that post I pointed you at and on upwards. The prior discussion was me musing about deadline extensions and is not nearly as important as addressing this. Thanks... Brian |
zombie67 [MM] Send message Joined: 22 Apr 04 Posts: 758 Credit: 27,771,894 RAC: 0 |
Only penguin is on download duty, but that may change if downloads start becoming a problem. That would be now. Nothing is downloading, even though the account information on the server says that it has. Dublin, California Team: SETI.USA |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
Only penguin is on download duty, but that may change if downloads start becoming a problem. Do you mean you have tasks shown on the 'Transfers' tab in BOINC Manager, trying but failing to download? If so, then we need an additional download server. Or do you mean you see tasks in the 'Results for computer' on this web site, but nothing in BOINC Manager? We call those "Ghost WUs" - different problem, different solution. |
zombie67 [MM] Send message Joined: 22 Apr 04 Posts: 758 Credit: 27,771,894 RAC: 0 |
Do you mean you have tasks shown on the 'Transfers' tab in BOINC Manager, trying but failing to download? If so, then we need an additional download server. Nope. Or do you mean you see tasks in the 'Results for computer' on this web site, but nothing in BOINC Manager? We call those "Ghost WUs" - different problem, different solution. Yep. Every time BOINC tries to connect, it generates another ghost WU for my account. FWIW, there are two different error messages: Thu May 17 13:06:29 2007|SETI@home|Requesting 4104198 seconds of new work Thu May 17 13:06:44 2007|SETI@home|Scheduler request failed: HTTP internal server error Thu May 17 13:06:44 2007|SETI@home|Deferring communication for 13 min 41 sec Thu May 17 13:06:44 2007|SETI@home|Reason: scheduler request failed Thu May 17 13:20:26 2007|SETI@home|Sending scheduler request: Requested by user Thu May 17 13:20:26 2007|SETI@home|Requesting 4106271 seconds of new work Thu May 17 13:20:51 2007|SETI@home|Scheduler RPC succeeded [server version 509] Thu May 17 13:20:51 2007|SETI@home|Deferring communication for 11 sec Thu May 17 13:20:51 2007|SETI@home|Reason: requested by project Thu May 17 13:20:51 2007|SETI@home|Deferring communication for 9 min 52 sec Thu May 17 13:20:51 2007|SETI@home|Reason: no work from project Dublin, California Team: SETI.USA |
archae86 Send message Joined: 31 Aug 99 Posts: 909 Credit: 1,582,816 RAC: 0 |
... A simple exit boincmgr, start boincmgr gave first trial success on two of my three repeat offending machines. The third was not healed by that, but was healed by a full power off reboot. Thanks Richard. |
gomeyer Send message Joined: 21 May 99 Posts: 488 Credit: 50,370,425 RAC: 0 |
Just one more wierd happening to report and I'll be quiet. One of my hosts reported to request work and report 36 results. It got the dreaded: 5/17/2007 6:07:11 PM Scheduler request failed: HTTP internal server error When I checked that host under My Computers on the web site, I see that all 36 were indeed reported at that time even though they still show up in BOINC as "Ready to Report". The system also claims to have sent two ghost work units at the same time which never arrived. Anyway, things are still very broken. Just my 2c. |
KenKLRC Send message Joined: 12 Jul 06 Posts: 27 Credit: 7,791,658 RAC: 0 |
Eric, I still have this "Stand By" box offline.....If you think more H/W is the answer I can Overnight-it. U have my number. |
Kirsten Send message Joined: 7 Jul 00 Posts: 190 Credit: 566,047 RAC: 0 |
|
picantecomputing Send message Joined: 1 Jan 07 Posts: 4 Credit: 104,652 RAC: 0 |
I got manu WU's, too. Unfortunately they were all ghosts. They upload queue is gone, though. Same thing here. Tons of ghosts showing up in my results, but nothing has downloaded for days. Last attempt to request work gave me this: 5/17/2007 8:15:51 PM|SETI@home|Sending scheduler request: Requested by user 5/17/2007 8:15:51 PM|SETI@home|Requesting 442 seconds of new work 5/17/2007 8:16:01 PM||Project communication failed: attempting access to reference site 5/17/2007 8:16:01 PM|SETI@home|Scheduler request failed: server returned nothing (no headers, no data) 5/17/2007 8:16:01 PM|SETI@home|Deferring communication for 1 min 0 sec 5/17/2007 8:16:01 PM|SETI@home|Reason: scheduler request failed 5/17/2007 8:16:02 PM||Access to reference site succeeded - project servers may be temporarily down. |
Eric Korpela Send message Joined: 3 Apr 99 Posts: 1382 Credit: 54,506,847 RAC: 60 |
It's still pretty hit and miss (as you can see). Hopefully it's getting toward more hit than miss at this point. Ptolemy lost connectivity to the upload directories on bruno for a while. Just fixed that, so our upload rate should double. This Graph is still your best bet of checking your chances of getting through. The higher the graph is, the better. But we should be hovering around 22 Mbps rather than 15. We're still operating on a single scheduler due to compile problems. G'nite. I'll catch up on where we are in the morning. Eric @SETIEric@qoto.org (Mastodon) |
Jim Franklin Send message Joined: 3 Apr 99 Posts: 108 Credit: 10,843,395 RAC: 39 |
Eric, just to add to what others have said, none of my ten workstations are able to upload/download, and are returning the same messages, "Schedular Request Failed: Server returned nothing (No headers, no data)" and then they return the " Message from Server: Incomplete request recieved" message. It would appear from my stats that some of the machines managed some form of uload in the previous 36 hours, although the quantity uploaded is dwarfed by those in the upload queues and the rate of completing units is outstripping the exchanges that are taking place. Currently I have about 150 completed units in my queues and about 2 days of total crunch time left, and that is moving units from one machine to another to ensure they stay cruching! Zeus, my main workstation is now running Einstein again as it's 8 cores ran out of seti units yesterday, it has about 60 units in it's upload queue and has not connected in about 10 days, currently it is trying to recieve some 3,815,477 seconds of new work, but gets dropped constantly. If it were not for the sheer cost of transport, I would send you Zeus as an upload server, it is a DELL quad 3.2GHz HT Xeon machine Running with 4GB of Registered DDRII at the moment and Ubuntu 4.07. I have just sourced some DDRII (unbuffered or registered) to put into it (10GB). However a check on the shipping cost from London to you was more than £1000 unless it went by ship and would not arrive with you for 8 to 9 weeks!! Hopefully all the glitches will be sorted soon and we will be back to as normal as things ever are. Jim |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
@ Eric, A lot of people are reporting incomplete scheduler requests (with various error messages) and the creation of a lot of 'ghost' results - potentially storing up another batch of problems in the future when the 'ghost' deadlines expire. At least some of this seems to relate to the anonymous platform mechanism. The following recipe has worked for me on three machines, and independently verified by another user: Rename app_info.xml so it won't be recognised Restart BOINC (service) Update SETI - may not get through first time, but keep trying Restore app_info.xml to original name Wait until all transfers have finished Restart BOINC (service) I don't know why this works - one user has speculated about app_info processing overhead, I'm wondering about the BOINC v5.10 <platform> tag - but it seems consistent and reproducible, so it may help to narrow down the debug search. |
Lycanthrope Send message Joined: 27 May 05 Posts: 31 Credit: 1,338,589 RAC: 0 |
I don't know why this works - one user has speculated about app_info processing overhead, I'm wondering about the BOINC v5.10 <platform> tag - but it seems consistent and reproducible, so it may help to narrow down the debug search. I've tried this workaround on my MacIntel; G5 and XP laptop and it has worked in each case! Added to that I have solved my caching issue so I've loaded a nine queue of WU's. Now I can switch back to the optimized app. Thanks for the interim solution! |
Eric Korpela Send message Joined: 3 Apr 99 Posts: 1382 Credit: 54,506,847 RAC: 60 |
Could you give me an IP address for this machine? I'd like to scan the logs to see what's going on on this side. Eric @SETIEric@qoto.org (Mastodon) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
Here are some examples from host 2901600 - Windows Vista, BOINC v5.8.16 IP 81.156.16.160 [edit - IP may have been 86.141.28.126 - router has lost DSL since then, been re-assigned - but I think that was before these events] SETI@home 17/05/2007 18:18:18 Message from server: Incomplete request received. [BOINC restarted - no 'incomplete request' since then] SETI@home 17/05/2007 19:01:39 Scheduler request failed: HTTP internal server error SETI@home 17/05/2007 19:06:49 Scheduler request failed: server returned nothing (no headers, no data) Times have been adjusted to UTC - should be pretty exact. Hope that gives you something to look for while you're waiting for Jim to post. |
Mat Send message Joined: 20 Oct 01 Posts: 1 Credit: 8,718,618 RAC: 0 |
@ Eric, A lot of people are reporting incomplete scheduler requests (with various error messages) and the creation of a lot of 'ghost' results - potentially storing up another batch of problems in the future when the 'ghost' deadlines expire. At least some of this seems to relate to the anonymous platform mechanism. The following recipe has worked for me on three machines, and independently verified by another user: Rename app_info.xml so it won't be recognised Restart BOINC (service) Update SETI - may not get through first time, but keep trying Restore app_info.xml to original name Wait until all transfers have finished Restart BOINC (service) I don't know why this works - one user has speculated about app_info processing overhead, I'm wondering about the BOINC v5.10 <platform> tag - but it seems consistent and reproducible, so it may help to narrow down the debug search. That solution seems to work for me as well as i use the anonymous plattforn mechanism in conjunction with boinc 5.8.16 and had servere difficulties do download any new wu's since the 16th. around 16:00utc Now new wu's pooring in. The above mentioned workaround works. ty Richard |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.