Message boards :
Technical News :
Oopsie (Apr 11 2007)
Message board moderation
Author | Message |
---|---|
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
So as it turns out the donation screwup I briefly mentioned in yesterday's thread totally hosed the replica database. Lame but true. So we're recovering that now, or trying to. We're operating without the replica for the time being. In the future we'll set up the replica so that updates to its data are impossible except from the slave update/insert thread. Anyway, this also explains why various statistics on the web site weren't updating. I mostly spent the day working on a revised php script with Dave that will send "reminder" e-mails to lapsed users, or those who failed to send in any work whatsoever. This actually required a new database table and me discovering "group by ... having ..." syntax to make more eloquent and efficient mysql queries. Hopefully these e-mails will help get some of our user base back on track. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
tenebra Send message Joined: 3 Jun 99 Posts: 35 Credit: 368,467 RAC: 0 |
Hello, Those reminder e-mails help... Changing PCs and other stuff (unfortunately very ugly) that happened to me, made me forget about SETI. I still have something like 600 units from the classical SETI. However, a reminder e-mail that I got last year made me reattach to SETI and now I am also contributing to other projects I am interested in. Would like to be able to run a few more but have limited resources for now. If you are from Greece, please visit the following: Sicence Fiction and Fantasy Forum The World of Science |
Paul Hayslett Send message Joined: 3 Aug 00 Posts: 15 Credit: 14,207,862 RAC: 0 |
> ...totally hosed the replica database... Is this also the cause of all the "file not found" WU download failures? They started yesterday too. |
champ Send message Joined: 12 Mar 03 Posts: 3642 Credit: 1,489,147 RAC: 0 |
> ...totally hosed the replica database... ....must be..... |
Arthur L. Smith Send message Joined: 17 Apr 02 Posts: 28 Credit: 244,050,922 RAC: 9 |
We are constantly having WUs hang on download. On every client. This all started yesterday. It seems like 2 or 3 WUs out of every 20 it tries to download sit there in the transfer queue and never go anywhere??? |
elendil Send message Joined: 7 May 02 Posts: 28 Credit: 1,908,698 RAC: 0 |
> ...totally hosed the replica database... The only remedy to this is to abort the transfer -and- the work unit and ask the Manager to get another work unit :( I'm sure the guys in charge will fix it :) -=[ Not all who wander are lost ]=- |
Arthur L. Smith Send message Joined: 17 Apr 02 Posts: 28 Credit: 244,050,922 RAC: 9 |
> ...totally hosed the replica database... We have been using that as a workaround but we have about 30 machines running this and it's a little bit of a pain to keep visiting all of these workstations to clear out the stuck WUs. Hopefully they will fix this issue sometime soon. Thanks for your response. |
kittyman Send message Joined: 9 Jul 00 Posts: 51477 Credit: 1,018,363,574 RAC: 1,004 |
> ...totally hosed the replica database... Are you so sure that aborting all of those WUs is the correct thing to do at this point? Matt....could you please clarify the cause and whether or not these 'file not found' downloads being held hostage should be aborted or not? I've got a least a couple hundred of them at this point. "Time is simply the mechanism that keeps everything from happening all at once." |
Arthur L. Smith Send message Joined: 17 Apr 02 Posts: 28 Credit: 244,050,922 RAC: 9 |
> ...totally hosed the replica database... If you don't clear out the hung work units then your computer will eventually run out of work and just sit there. The machine doesn't seem to want to get any "other" new work until those are cleared out. |
kittyman Send message Joined: 9 Jul 00 Posts: 51477 Credit: 1,018,363,574 RAC: 1,004 |
> ...totally hosed the replica database... Perhaps, but I am still wondering if they will complete the transfer when the database problems are rectified. Perhaps the the 'file not found' will not occur and the download will complete. Matt...Wazzzup? "Time is simply the mechanism that keeps everything from happening all at once." |
Arthur L. Smith Send message Joined: 17 Apr 02 Posts: 28 Credit: 244,050,922 RAC: 9 |
> ...totally hosed the replica database... msattler I actually have a QX6700 with 2GB of RAM and Windows XP like some of your machines. When I came in this morning I was out of work. I used to have a list of WUs that spanned about 6 screens. That all changed yesterday. Now when I request work it will only give me about 15 to 20 units and 2 or 3 will hang. Then it won't give me any more until those hung units are cleared. So to populate my WU list I have been clicking update and getting about 20 units, clearing out the 2 or 3 hung units, then clicking update again which gives me another 20, clearing out the 2 or 3 hung, and so on. If I do this about 15 times I will have a few screens of WUs again. It's tedious to say the least. I thought it was just my machine but all of my machines are doing it. I even reinstalled from scratch this morning on a new workstation and am getting the same issues with that one. BTW, I notice your Quad core gets about 5500 RAC, mine only gets to about 3000. Can you tell me how you are getting such performance out of your Quads? :-) Thanks |
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
I'm not exactly sure what the "file not found" errors are all about, not what users should be doing to fix them. This is apparently not a big problem, as we're getting our usual complement of results/workunits received/sent. I'm also fairly certain this has nothing to do with the replica database problems, as none of the BOINC production processes touch the replica (only non-critical stats and certain web page queries). - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
Arthur L. Smith Send message Joined: 17 Apr 02 Posts: 28 Credit: 244,050,922 RAC: 9 |
I'm not exactly sure what the "file not found" errors are all about, not what users should be doing to fix them. This is apparently not a big problem, as we're getting our usual complement of results/workunits received/sent. I'm also fairly certain this has nothing to do with the replica database problems, as none of the BOINC production processes touch the replica (only non-critical stats and certain web page queries). Matt, I think you all will start noticing that your usual results/workunits received/sent will decline when people start running out of work units. And unless you clear out the hung units from your client you will eventually run out of work. The only reason I noticed so quickly is because I have a slew of quad cores that eat up work units like crazy so it only takes me a day or two to go through my whole list. The people running slower machines make take a few days to notice. Like I said I installed a fresh copy of the software on a new machine this morning and I am having the same issues with that one. And if multiple seti users are having this issue doesn't that indicate that something may be wrong. Thanks. |
Kinguni Send message Joined: 15 Feb 00 Posts: 239 Credit: 9,043,007 RAC: 0 |
This is apparently not a big problem, as we're getting our usual complement of results/workunits received/sent. It is a big problem since those who are experiencing it, and I've had it on all 4 of my local computers, will not get new WU's until the stalled download clears and if they are only attached to SETI@Home their computers may go idle. If the problem is unrelated however, it doesn't belong in this thread. Join Team Starfire BOINC Chat |
Alan Ng Send message Joined: 27 May 04 Posts: 17 Credit: 14,166,402 RAC: 0 |
Anecdotal evidence to support Arthur L. Smith's hypothesis: all of three my SETI-crunching machines, in geographically different locations and networks, with different processors, all have the same issues. In a few hours, all of my SETI machines will be doing no work for the SETI project because they're all unable to download workunits. Each machine is about to run out, or has already run out, of successfully-downloaded WUs. All machines have been reporting the following type of msg for some WUs since yesterday, and are not downloading new WU: Thu Apr 12 13:49:40 2007|SETI@home|[file_xfer] Temporarily failed download of 18se04aa.24984.27954.117328.3.8: file not found p.s. my BOINC manager version (all machines) is 5.8.17, all Mac OS 10.4. Are you a musician? Join the Musicians teams on SETI, Einstein, CPDN, or Predictor at: http://einstein.phys.uwm.edu/forum_thread.php?id=5053 |
Iztok s52d (and friends) Send message Joined: 12 Jan 01 Posts: 136 Credit: 393,469,375 RAC: 116 |
Anecdotal evidence to support Arthur L. Smith's hypothesis: all of three my SETI-crunching machines, in geographically different locations and networks, with different processors, all have the same issues. In a few hours, all of my SETI machines will be doing no work for the SETI project because they're all unable to download workunits. Each machine is about to run out, or has already run out, of successfully-downloaded WUs. All machines have been reporting the following type of msg for some WUs since yesterday, and are not downloading new WU: 90 minutes ago: 2007-04-12 19:40:04 [SETI@home] Started download of file 17dc04ab.28771.833.817310.3.52 2007-04-12 19:40:07 [SETI@home] Incomplete read of less than 5KB for 17dc04ab.28771.833.817310.3.52 - truncating 2007-04-12 19:40:07 [SETI@home] Temporarily failed download of 17dc04ab.28771.833.817310.3.52: HTTP file not found 2007-04-12 19:40:07 [SETI@home] Giving up on download of 17dc04ab.28771.833.817310.3.52: file was not found on serv 2007-04-12 19:40:07 [SETI@home] Giving up on download of 17dc04ab.28771.833.817310.3.52: file was not found on serv 2007-04-12 19:40:07 [SETI@home] Checksum or signature error for 17dc04ab.28771.833.817310.3.52 2007-04-12 19:40:07 [SETI@home] Checksum or signature error for 17dc04ab.28771.833.817310.3.52 2007-04-12 19:40:08 [SETI@home] Unrecoverable error for result 17dc04ab.28771.833.817310.3.52_3 (WU download error: <file_xfer_error> <file_name>17dc04ab.28771.833.817310.3.52</file_name> <error_code>-163</error_code> <error_message>file was not found on server</error_message> </file_xfer_error> ) 73 Iztok |
Arthur L. Smith Send message Joined: 17 Apr 02 Posts: 28 Credit: 244,050,922 RAC: 9 |
Anecdotal evidence to support Arthur L. Smith's hypothesis: all of three my SETI-crunching machines, in geographically different locations and networks, with different processors, all have the same issues. In a few hours, all of my SETI machines will be doing no work for the SETI project because they're all unable to download workunits. Each machine is about to run out, or has already run out, of successfully-downloaded WUs. All machines have been reporting the following type of msg for some WUs since yesterday, and are not downloading new WU: Thanks, Alan. I think that confirms it. 5 different users, obviously with different machines, all on different networks, and some probably using different client versions all having the same issue??? I'm convinced it's not just me. :-) |
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
Found the problem. My bad. Fixing now. Should clear up within minutes. Explanation later. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
Found the problem. My bad. Fixing now. Should clear up within minutes. Explanation later. Looks like it's fixed here. |
Alan Ng Send message Joined: 27 May 04 Posts: 17 Credit: 14,166,402 RAC: 0 |
Looks like it's fixed here. Same here! I used Boinc Manager, Transfers tab, to "Retry" the stuck download, and now (unlike previously), the download succeeds, and shortly thereafter, this machine finally began requesting and getting new WUs. Thanks, Matt. Are you a musician? Join the Musicians teams on SETI, Einstein, CPDN, or Predictor at: http://einstein.phys.uwm.edu/forum_thread.php?id=5053 |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.