Oopsie (Apr 11 2007)

Message boards : Technical News : Oopsie (Apr 11 2007)
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Iztok s52d (and friends)

Send message
Joined: 12 Jan 01
Posts: 136
Credit: 393,469,375
RAC: 116
Slovenia
Message 545085 - Posted: 12 Apr 2007, 19:47:35 UTC - in response to Message 545068.  

Found the problem. My bad. Fixing now. Should clear up within minutes. Explanation later.

- Matt


Hi!
5 minutes ago:

2007-04-12 21:42:55 [SETI@home] Temporarily failed download of 22ap04ab.6123.28785.636084.3.165: error 404
2007-04-12 21:42:55 [SETI@home] Giving up on download of 22ap04ab.6123.28785.636084.3.165: file was not found on s
2007-04-12 21:42:55 [SETI@home] Giving up on download of 22ap04ab.6123.28785.636084.3.165: file was not found on s
2007-04-12 21:42:55 [SETI@home] Checksum or signature error for 22ap04ab.6123.28785.636084.3.165
2007-04-12 21:42:55 [SETI@home] Checksum or signature error for 22ap04ab.6123.28785.636084.3.165
2007-04-12 21:42:56 [SETI@home] Unrecoverable error for result 22ap04ab.6123.28785.636084.3.165_1 (WU download err
<file_xfer_error>
<file_name>22ap04ab.6123.28785.636084.3.165</file_name>
<error_code>-163</error_code>
<error_message>file was not found on server</error_message>
</file_xfer_error>
)

73 Iztok
ID: 545085 · Report as offensive
rpas
Volunteer tester

Send message
Joined: 29 May 04
Posts: 11
Credit: 12,629,589
RAC: 0
United States
Message 545095 - Posted: 12 Apr 2007, 20:02:33 UTC - in response to Message 545085.  

Found the problem. My bad. Fixing now. Should clear up within minutes. Explanation later.

- Matt


Hi!
5 minutes ago:

2007-04-12 21:42:55 [SETI@home] Temporarily failed download of 22ap04ab.6123.28785.636084.3.165: error 404
2007-04-12 21:42:55 [SETI@home] Giving up on download of 22ap04ab.6123.28785.636084.3.165: file was not found on s
2007-04-12 21:42:55 [SETI@home] Giving up on download of 22ap04ab.6123.28785.636084.3.165: file was not found on s
2007-04-12 21:42:55 [SETI@home] Checksum or signature error for 22ap04ab.6123.28785.636084.3.165
2007-04-12 21:42:55 [SETI@home] Checksum or signature error for 22ap04ab.6123.28785.636084.3.165
2007-04-12 21:42:56 [SETI@home] Unrecoverable error for result 22ap04ab.6123.28785.636084.3.165_1 (WU download err
<file_xfer_error>
<file_name>22ap04ab.6123.28785.636084.3.165</file_name>
<error_code>-163</error_code>
<error_message>file was not found on server</error_message>
</file_xfer_error>
)

73 Iztok


I have had the same problem starting on the 11th, I am retired but have 8 hosts running, and have boincview running on one of the PC's sittin in front of me most of the day. I din't have the long queue of WU's so I have been doing a project reset to get new WU's as needed.

I also worked Seti with the classic, and moved to boinc when it came about. Did not have the time to fiddle the problem on my end at the time and ceased doing Seti, till the last automated 'heya we miss ya come on back message from Seti.

It might be a good idea to make sure this problem is not what the, new returnees see. I.E. seems to make sense to me to be sure this is not happening before sending out the emails.

I know this 2 cents is not much of a donation, but hey.
ID: 545095 · Report as offensive
rpas
Volunteer tester

Send message
Joined: 29 May 04
Posts: 11
Credit: 12,629,589
RAC: 0
United States
Message 545109 - Posted: 12 Apr 2007, 20:14:59 UTC - in response to Message 545095.  

Found the problem. My bad. Fixing now. Should clear up within minutes. Explanation later.

- Matt


Hi!
5 minutes ago:

2007-04-12 21:42:55 [SETI@home] Temporarily failed download of 22ap04ab.6123.28785.636084.3.165: error 404
2007-04-12 21:42:55 [SETI@home] Giving up on download of 22ap04ab.6123.28785.636084.3.165: file was not found on s
2007-04-12 21:42:55 [SETI@home] Giving up on download of 22ap04ab.6123.28785.636084.3.165: file was not found on s
2007-04-12 21:42:55 [SETI@home] Checksum or signature error for 22ap04ab.6123.28785.636084.3.165
2007-04-12 21:42:55 [SETI@home] Checksum or signature error for 22ap04ab.6123.28785.636084.3.165
2007-04-12 21:42:56 [SETI@home] Unrecoverable error for result 22ap04ab.6123.28785.636084.3.165_1 (WU download err
<file_xfer_error>
<file_name>22ap04ab.6123.28785.636084.3.165</file_name>
<error_code>-163</error_code>
<error_message>file was not found on server</error_message>
</file_xfer_error>
)

73 Iztok


I have had the same problem starting on the 11th, I am retired but have 8 hosts running, and have boincview running on one of the PC's sittin in front of me most of the day. I din't have the long queue of WU's so I have been doing a project reset to get new WU's as needed.

I also worked Seti with the classic, and moved to boinc when it came about. Did not have the time to fiddle the problem on my end at the time and ceased doing Seti, till the last automated 'heya we miss ya come on back message from Seti.

It might be a good idea to make sure this problem is not what the, new returnees see. I.E. seems to make sense to me to be sure this is not happening before sending out the emails.

I know this 2 cents is not much of a donation, but hey.


What I meant in the second paragraph is, same sort of problem happened to me a few years back and after a couple of days with no WU's, I just stopped running BOINC, till I got an email bout 6 or 7 months ago. So, just saying when no work units come in, folks might not stay around to long if they are newly returned to the projct.

ID: 545109 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 545116 - Posted: 12 Apr 2007, 20:30:25 UTC - in response to Message 545059.  
Last modified: 12 Apr 2007, 20:31:39 UTC

2007-04-12 19:40:07 [SETI@home] Checksum or signature error for 17dc04ab.28771.833.817310.3.52
2007-04-12 19:40:07 [SETI@home] Checksum or signature error for 17dc04ab.28771.833.817310.3.52


I don't read this forum very often (no time), so forgive me if this has been dealt with elsewhere...

A few weeks ago, I got rid of the old internet security package on one of my computers (named Steam) and suddenly found that *everything* ran much better (the old security, Panda Titanium, was hogging so much processor time for itself, a single Seti unit couldn't get done before its deadline). In particular, I started getting through more than 1 per day of Einstein units instead of several days each, so I decided to let the client start working on Seti again.

I have yet to actually do a Seti unit on that computer. Every time it tries to download one, it gets the type of error quoted above. The first day, it happened 47 times in a row. The next day, 23 times. In a few days, the server had whittled my client's daily quota down to 1. I tried stopping and restarting BOINC, and I have restarted the computer after a Windows update, both with no effect on the Seti problem.

I actually feel relief seeing that others are also getting checksum errors, but those others are still getting some WUs correctly, so what's my problem and what can I do to fix it (if anything)?

David

David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 545116 · Report as offensive
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 545120 - Posted: 12 Apr 2007, 20:37:09 UTC - in response to Message 545109.  

...same sort of problem happened to me a few years back and after a couple of days with no WU's, I just stopped running BOINC, till I got an email bout 6 or 7 months ago. So, just saying when no work units come in, folks might not stay around to long if they are newly returned to the projct.

Understood. As it happens we haven't sent out any reminder e-mails yet. I'll wait until the problem fully clears up (in a couple hours).

There's nothing we can do about the general behavior of BOINC which is that it will be down from time to time, or even fail. In most cases, it will eventually pick itself back up on its own, other times it needs to be kicked by the users to get going again. The internet community in general is still very novice about how networking software should work and how to troubleshoot, which is fair, but not really our fault. Still, this is our bread and butter so we'll do our best to coach people in the meantime and make sure the process is as easy and seamless as possible.

- Matt
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 545120 · Report as offensive
Arthur L. Smith

Send message
Joined: 17 Apr 02
Posts: 28
Credit: 244,050,922
RAC: 9
United States
Message 545140 - Posted: 12 Apr 2007, 21:22:55 UTC - in response to Message 545068.  

Found the problem. My bad. Fixing now. Should clear up within minutes. Explanation later.

- Matt


Thanks for your help, Matt.
ID: 545140 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51469
Credit: 1,018,363,574
RAC: 1,004
United States
Message 545346 - Posted: 13 Apr 2007, 6:12:32 UTC - in response to Message 545049.  

> ...totally hosed the replica database...

Is this also the cause of all the "file not found" WU download failures? They started yesterday too.


The only remedy to this is to abort the transfer -and- the work unit and ask the Manager to get another work unit :(
I'm sure the guys in charge will fix it :)


Are you so sure that aborting all of those WUs is the correct thing to do at this point?

Matt....could you please clafify the cause and whether or not these 'file not found' downloads being held hostage should be aborted or not?

I've got a least a couple hundred of them at this point.


If you don't clear out the hung work units then your computer will eventually run out of work and just sit there. The machine doesn't seem to want to get any "other" new work until those are cleared out.


Perhaps, but I am still wondering if they will complete the transfer when the database problems are rectified. Perhaps the the 'file not found' will not occur and the download will complete.

Matt...Wazzzup?


msattler

I actually have a QX6700 with 2GB of RAM and Windows XP like some of your machines. When I came in this morning I was out of work. I used to have a list of WUs that spanned about 6 screens. That all changed yesterday. Now when I request work it will only give me about 15 to 20 units and 2 or 3 will hang. Then it won't give me any more until those hung units are cleared. So to populate my WU list I have been clicking update and getting about 20 units, clearing out the 2 or 3 hung units, then clicking update again which gives me another 20, clearing out the 2 or 3 hung, and so on. If I do this about 15 times I will have a few screens of WUs again. It's tedious to say the least. I thought it was just my machine but all of my machines are doing it. I even reinstalled from scratch this morning on a new workstation and am getting the same issues with that one.

BTW, I notice your Quad core gets about 5500 RAC, mine only gets to about 3000. Can you tell me how you are getting such performance out of your Quads? :-)

Thanks


I took a quick look at a couple of your quaddies.......
One appears to be running the stock client instead of the 2.2b core 2 Chicken app. That will slow it down quite a bit.
The other appears to be running the Chicken app, but is running at stock speed, 2.66ghz. Mine is heavily OCd, been running between 3.4ghz and 3.8ghz depending on how well it is cooperating.

And it appears that whatever Matt found and fixed has corrected the 'file not found' downloading problem. The couple of hundred that were hung up when I left for work earlier today have all since downloaded, except for 2 orphans, which I have now aborted.
So I was correct in my earlier thoughts that aborting a couple of hundred WUs was not quite the right thing to do straight off.
Thanx for the repair Matt, will be curious what the cause was when you have time to post an explanation.

"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 545346 · Report as offensive
Profile Justin M. Morford
Volunteer tester

Send message
Joined: 23 Sep 99
Posts: 36
Credit: 2,539,975
RAC: 0
United States
Message 545353 - Posted: 13 Apr 2007, 6:56:30 UTC - in response to Message 545044.  

...and it's a little bit of a pain to keep visiting all of these workstations...


Not sure if you're aware of it, but if all or most of your machines are on the same network, you can access them all from Boinc Manager running on just one of them. On the "Advanced" menu choose "Select Computer..." and put in the computer name (I think IP address will work also) and the password for that computer. The password can be found (and edited) in a file named "gui_rpc_auth.cfg" in the main BOINC directory. Can't remember if this file is put there during install or if it's just a plain text file you have to create on your own.

Justin
ID: 545353 · Report as offensive
Arthur L. Smith

Send message
Joined: 17 Apr 02
Posts: 28
Credit: 244,050,922
RAC: 9
United States
Message 545436 - Posted: 13 Apr 2007, 12:47:28 UTC - in response to Message 545346.  

> ...totally hosed the replica database...

Is this also the cause of all the "file not found" WU download failures? They started yesterday too.


The only remedy to this is to abort the transfer -and- the work unit and ask the Manager to get another work unit :(
I'm sure the guys in charge will fix it :)


Are you so sure that aborting all of those WUs is the correct thing to do at this point?

Matt....could you please clafify the cause and whether or not these 'file not found' downloads being held hostage should be aborted or not?

I've got a least a couple hundred of them at this point.


If you don't clear out the hung work units then your computer will eventually run out of work and just sit there. The machine doesn't seem to want to get any "other" new work until those are cleared out.


Perhaps, but I am still wondering if they will complete the transfer when the database problems are rectified. Perhaps the the 'file not found' will not occur and the download will complete.

Matt...Wazzzup?


msattler

I actually have a QX6700 with 2GB of RAM and Windows XP like some of your machines. When I came in this morning I was out of work. I used to have a list of WUs that spanned about 6 screens. That all changed yesterday. Now when I request work it will only give me about 15 to 20 units and 2 or 3 will hang. Then it won't give me any more until those hung units are cleared. So to populate my WU list I have been clicking update and getting about 20 units, clearing out the 2 or 3 hung units, then clicking update again which gives me another 20, clearing out the 2 or 3 hung, and so on. If I do this about 15 times I will have a few screens of WUs again. It's tedious to say the least. I thought it was just my machine but all of my machines are doing it. I even reinstalled from scratch this morning on a new workstation and am getting the same issues with that one.

BTW, I notice your Quad core gets about 5500 RAC, mine only gets to about 3000. Can you tell me how you are getting such performance out of your Quads? :-)

Thanks


I took a quick look at a couple of your quaddies.......
One appears to be running the stock client instead of the 2.2b core 2 Chicken app. That will slow it down quite a bit.
The other appears to be running the Chicken app, but is running at stock speed, 2.66ghz. Mine is heavily OCd, been running between 3.4ghz and 3.8ghz depending on how well it is cooperating.

And it appears that whatever Matt found and fixed has corrected the 'file not found' downloading problem. The couple of hundred that were hung up when I left for work earlier today have all since downloaded, except for 2 orphans, which I have now aborted.
So I was correct in my earlier thoughts that aborting a couple of hundred WUs was not quite the right thing to do straight off.
Thanx for the repair Matt, will be curious what the cause was when you have time to post an explanation.


Thanks. I figured it had something to do with OCing.
ID: 545436 · Report as offensive
Chris Luth
Volunteer tester

Send message
Joined: 24 Dec 99
Posts: 21
Credit: 59,135
RAC: 0
United States
Message 546806 - Posted: 15 Apr 2007, 20:26:33 UTC - in response to Message 545353.  

...and it's a little bit of a pain to keep visiting all of these workstations...


Not sure if you're aware of it, but if all or most of your machines are on the same network, you can access them all from Boinc Manager running on just one of them. On the "Advanced" menu choose "Select Computer..." and put in the computer name (I think IP address will work also) and the password for that computer. The password can be found (and edited) in a file named "gui_rpc_auth.cfg" in the main BOINC directory. Can't remember if this file is put there during install or if it's just a plain text file you have to create on your own.

Justin

They don't even need to be on the same network (well, unless you consider the Internet one big network): I have four machines running, two of which are in geographically different areas (one in a different state--it belongs to some family of mine). As long as the machine is publicly accessible and the proper ports are forwarded through any NAT routers (1043 and 31416) and the gui_rpc_auth.cfg file password is set, the manager can connect to a BOINC instance anywhere in the world.
ID: 546806 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 547290 - Posted: 16 Apr 2007, 19:01:07 UTC
Last modified: 16 Apr 2007, 19:02:26 UTC

The only caveat is it would probably be better to do that through a VPN setup rather than just expose the BOINC service ports to the Internet at large.

I'm not saying the BOINC team doesn't make reasonable efforts at ensuring security, but that is a needless risk to take for very little benefit overall.

Alinator
ID: 547290 · Report as offensive
Profile Justin M. Morford
Volunteer tester

Send message
Joined: 23 Sep 99
Posts: 36
Credit: 2,539,975
RAC: 0
United States
Message 548483 - Posted: 18 Apr 2007, 13:02:43 UTC - in response to Message 546806.  

They don't even need to be on the same network (well, unless you consider the Internet one big network): I have four machines running, two of which are in geographically different areas (one in a different state--it belongs to some family of mine). As long as the machine is publicly accessible and the proper ports are forwarded through any NAT routers (1043 and 31416) and the gui_rpc_auth.cfg file password is set, the manager can connect to a BOINC instance anywhere in the world.


I figured that was the case, I just wasn't sure which ports needed to be forwarded.

Justin
ID: 548483 · Report as offensive
Previous · 1 · 2

Message boards : Technical News : Oopsie (Apr 11 2007)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.