Oopsie (Apr 11 2007)

Message boards : Technical News : Oopsie (Apr 11 2007)
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 544537 - Posted: 11 Apr 2007, 23:17:17 UTC

So as it turns out the donation screwup I briefly mentioned in yesterday's thread totally hosed the replica database. Lame but true. So we're recovering that now, or trying to. We're operating without the replica for the time being. In the future we'll set up the replica so that updates to its data are impossible except from the slave update/insert thread. Anyway, this also explains why various statistics on the web site weren't updating.

I mostly spent the day working on a revised php script with Dave that will send "reminder" e-mails to lapsed users, or those who failed to send in any work whatsoever. This actually required a new database table and me discovering "group by ... having ..." syntax to make more eloquent and efficient mysql queries. Hopefully these e-mails will help get some of our user base back on track.

- Matt

-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 544537 · Report as offensive
Profile tenebra
Volunteer tester
Avatar

Send message
Joined: 3 Jun 99
Posts: 35
Credit: 368,467
RAC: 0
Greece
Message 544888 - Posted: 12 Apr 2007, 12:03:14 UTC

Hello,

Those reminder e-mails help... Changing PCs and other stuff (unfortunately very ugly) that happened to me, made me forget about SETI. I still have something like 600 units from the classical SETI. However, a reminder e-mail that I got last year made me reattach to SETI and now I am also contributing to other projects I am interested in. Would like to be able to run a few more but have limited resources for now.


If you are from Greece, please visit the following:
Sicence Fiction and Fantasy Forum
The World of Science
ID: 544888 · Report as offensive
Profile Paul Hayslett Project Donor
Avatar

Send message
Joined: 3 Aug 00
Posts: 15
Credit: 14,207,862
RAC: 0
United States
Message 544915 - Posted: 12 Apr 2007, 12:41:52 UTC

> ...totally hosed the replica database...

Is this also the cause of all the "file not found" WU download failures? They started yesterday too.
ID: 544915 · Report as offensive
Profile champ
Volunteer tester
Avatar

Send message
Joined: 12 Mar 03
Posts: 3642
Credit: 1,489,147
RAC: 0
Germany
Message 544956 - Posted: 12 Apr 2007, 14:00:18 UTC - in response to Message 544915.  

> ...totally hosed the replica database...

Is this also the cause of all the "file not found" WU download failures? They started yesterday too.



....must be.....
ID: 544956 · Report as offensive
Arthur L. Smith

Send message
Joined: 17 Apr 02
Posts: 28
Credit: 244,050,922
RAC: 9
United States
Message 545029 - Posted: 12 Apr 2007, 17:25:40 UTC

We are constantly having WUs hang on download. On every client. This all started yesterday. It seems like 2 or 3 WUs out of every 20 it tries to download sit there in the transfer queue and never go anywhere???
ID: 545029 · Report as offensive
Profile elendil
Avatar

Send message
Joined: 7 May 02
Posts: 28
Credit: 1,908,698
RAC: 0
Netherlands
Message 545034 - Posted: 12 Apr 2007, 17:34:40 UTC - in response to Message 544915.  

> ...totally hosed the replica database...

Is this also the cause of all the "file not found" WU download failures? They started yesterday too.


The only remedy to this is to abort the transfer -and- the work unit and ask the Manager to get another work unit :(
I'm sure the guys in charge will fix it :)
-=[ Not all who wander are lost ]=-
ID: 545034 · Report as offensive
Arthur L. Smith

Send message
Joined: 17 Apr 02
Posts: 28
Credit: 244,050,922
RAC: 9
United States
Message 545044 - Posted: 12 Apr 2007, 18:10:14 UTC - in response to Message 545034.  

> ...totally hosed the replica database...

Is this also the cause of all the "file not found" WU download failures? They started yesterday too.


The only remedy to this is to abort the transfer -and- the work unit and ask the Manager to get another work unit :(
I'm sure the guys in charge will fix it :)


We have been using that as a workaround but we have about 30 machines running this and it's a little bit of a pain to keep visiting all of these workstations to clear out the stuck WUs. Hopefully they will fix this issue sometime soon.

Thanks for your response.
ID: 545044 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 545045 - Posted: 12 Apr 2007, 18:11:36 UTC - in response to Message 545034.  
Last modified: 12 Apr 2007, 18:12:39 UTC

> ...totally hosed the replica database...

Is this also the cause of all the "file not found" WU download failures? They started yesterday too.


The only remedy to this is to abort the transfer -and- the work unit and ask the Manager to get another work unit :(
I'm sure the guys in charge will fix it :)


Are you so sure that aborting all of those WUs is the correct thing to do at this point?

Matt....could you please clarify the cause and whether or not these 'file not found' downloads being held hostage should be aborted or not?

I've got a least a couple hundred of them at this point.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 545045 · Report as offensive
Arthur L. Smith

Send message
Joined: 17 Apr 02
Posts: 28
Credit: 244,050,922
RAC: 9
United States
Message 545046 - Posted: 12 Apr 2007, 18:14:30 UTC - in response to Message 545045.  

> ...totally hosed the replica database...

Is this also the cause of all the "file not found" WU download failures? They started yesterday too.


The only remedy to this is to abort the transfer -and- the work unit and ask the Manager to get another work unit :(
I'm sure the guys in charge will fix it :)


Are you so sure that aborting all of those WUs is the correct thing to do at this point?

Matt....could you please clafify the cause and whether or not these 'file not found' downloads being held hostage should be aborted or not?

I've got a least a couple hundred of them at this point.


If you don't clear out the hung work units then your computer will eventually run out of work and just sit there. The machine doesn't seem to want to get any "other" new work until those are cleared out.
ID: 545046 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 545047 - Posted: 12 Apr 2007, 18:17:14 UTC - in response to Message 545046.  

> ...totally hosed the replica database...

Is this also the cause of all the "file not found" WU download failures? They started yesterday too.


The only remedy to this is to abort the transfer -and- the work unit and ask the Manager to get another work unit :(
I'm sure the guys in charge will fix it :)


Are you so sure that aborting all of those WUs is the correct thing to do at this point?

Matt....could you please clafify the cause and whether or not these 'file not found' downloads being held hostage should be aborted or not?

I've got a least a couple hundred of them at this point.


If you don't clear out the hung work units then your computer will eventually run out of work and just sit there. The machine doesn't seem to want to get any "other" new work until those are cleared out.


Perhaps, but I am still wondering if they will complete the transfer when the database problems are rectified. Perhaps the the 'file not found' will not occur and the download will complete.

Matt...Wazzzup?

"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 545047 · Report as offensive
Arthur L. Smith

Send message
Joined: 17 Apr 02
Posts: 28
Credit: 244,050,922
RAC: 9
United States
Message 545049 - Posted: 12 Apr 2007, 18:30:09 UTC - in response to Message 545047.  

> ...totally hosed the replica database...

Is this also the cause of all the "file not found" WU download failures? They started yesterday too.


The only remedy to this is to abort the transfer -and- the work unit and ask the Manager to get another work unit :(
I'm sure the guys in charge will fix it :)


Are you so sure that aborting all of those WUs is the correct thing to do at this point?

Matt....could you please clafify the cause and whether or not these 'file not found' downloads being held hostage should be aborted or not?

I've got a least a couple hundred of them at this point.


If you don't clear out the hung work units then your computer will eventually run out of work and just sit there. The machine doesn't seem to want to get any "other" new work until those are cleared out.


Perhaps, but I am still wondering if they will complete the transfer when the database problems are rectified. Perhaps the the 'file not found' will not occur and the download will complete.

Matt...Wazzzup?


msattler

I actually have a QX6700 with 2GB of RAM and Windows XP like some of your machines. When I came in this morning I was out of work. I used to have a list of WUs that spanned about 6 screens. That all changed yesterday. Now when I request work it will only give me about 15 to 20 units and 2 or 3 will hang. Then it won't give me any more until those hung units are cleared. So to populate my WU list I have been clicking update and getting about 20 units, clearing out the 2 or 3 hung units, then clicking update again which gives me another 20, clearing out the 2 or 3 hung, and so on. If I do this about 15 times I will have a few screens of WUs again. It's tedious to say the least. I thought it was just my machine but all of my machines are doing it. I even reinstalled from scratch this morning on a new workstation and am getting the same issues with that one.

BTW, I notice your Quad core gets about 5500 RAC, mine only gets to about 3000. Can you tell me how you are getting such performance out of your Quads? :-)

Thanks

ID: 545049 · Report as offensive
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 545050 - Posted: 12 Apr 2007, 18:34:16 UTC

I'm not exactly sure what the "file not found" errors are all about, not what users should be doing to fix them. This is apparently not a big problem, as we're getting our usual complement of results/workunits received/sent. I'm also fairly certain this has nothing to do with the replica database problems, as none of the BOINC production processes touch the replica (only non-critical stats and certain web page queries).

- Matt
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 545050 · Report as offensive
Arthur L. Smith

Send message
Joined: 17 Apr 02
Posts: 28
Credit: 244,050,922
RAC: 9
United States
Message 545054 - Posted: 12 Apr 2007, 18:49:45 UTC - in response to Message 545050.  

I'm not exactly sure what the "file not found" errors are all about, not what users should be doing to fix them. This is apparently not a big problem, as we're getting our usual complement of results/workunits received/sent. I'm also fairly certain this has nothing to do with the replica database problems, as none of the BOINC production processes touch the replica (only non-critical stats and certain web page queries).

- Matt



Matt,

I think you all will start noticing that your usual results/workunits received/sent will decline when people start running out of work units. And unless you clear out the hung units from your client you will eventually run out of work. The only reason I noticed so quickly is because I have a slew of quad cores that eat up work units like crazy so it only takes me a day or two to go through my whole list. The people running slower machines make take a few days to notice. Like I said I installed a fresh copy of the software on a new machine this morning and I am having the same issues with that one. And if multiple seti users are having this issue doesn't that indicate that something may be wrong.

Thanks.
ID: 545054 · Report as offensive
Profile Kinguni
Volunteer tester
Avatar

Send message
Joined: 15 Feb 00
Posts: 239
Credit: 9,043,007
RAC: 0
Canada
Message 545055 - Posted: 12 Apr 2007, 18:53:05 UTC - in response to Message 545050.  

This is apparently not a big problem, as we're getting our usual complement of results/workunits received/sent.

It is a big problem since those who are experiencing it, and I've had it on all 4 of my local computers, will not get new WU's until the stalled download clears and if they are only attached to SETI@Home their computers may go idle.

If the problem is unrelated however, it doesn't belong in this thread.
Join Team Starfire
BOINC Chat

ID: 545055 · Report as offensive
Profile Alan Ng
Volunteer tester

Send message
Joined: 27 May 04
Posts: 17
Credit: 14,166,402
RAC: 0
United States
Message 545056 - Posted: 12 Apr 2007, 18:56:36 UTC - in response to Message 545054.  
Last modified: 12 Apr 2007, 19:16:50 UTC

Anecdotal evidence to support Arthur L. Smith's hypothesis: all of three my SETI-crunching machines, in geographically different locations and networks, with different processors, all have the same issues. In a few hours, all of my SETI machines will be doing no work for the SETI project because they're all unable to download workunits. Each machine is about to run out, or has already run out, of successfully-downloaded WUs. All machines have been reporting the following type of msg for some WUs since yesterday, and are not downloading new WU:

Thu Apr 12 13:49:40 2007|SETI@home|[file_xfer] Temporarily failed download of 18se04aa.24984.27954.117328.3.8: file not found

p.s. my BOINC manager version (all machines) is 5.8.17, all Mac OS 10.4.
Are you a musician? Join the Musicians teams on SETI, Einstein, CPDN, or Predictor at:
http://einstein.phys.uwm.edu/forum_thread.php?id=5053
ID: 545056 · Report as offensive
Iztok s52d (and friends)

Send message
Joined: 12 Jan 01
Posts: 136
Credit: 393,469,375
RAC: 116
Slovenia
Message 545059 - Posted: 12 Apr 2007, 19:09:16 UTC - in response to Message 545056.  

Anecdotal evidence to support Arthur L. Smith's hypothesis: all of three my SETI-crunching machines, in geographically different locations and networks, with different processors, all have the same issues. In a few hours, all of my SETI machines will be doing no work for the SETI project because they're all unable to download workunits. Each machine is about to run out, or has already run out, of successfully-downloaded WUs. All machines have been reporting the following type of msg for some WUs since yesterday, and are not downloading new WU:

Thu Apr 12 13:49:40 2007|SETI@home|[file_xfer] Temporarily failed download of 18se04aa.24984.27954.117328.3.8: file not found



90 minutes ago:

2007-04-12 19:40:04 [SETI@home] Started download of file 17dc04ab.28771.833.817310.3.52
2007-04-12 19:40:07 [SETI@home] Incomplete read of less than 5KB for 17dc04ab.28771.833.817310.3.52 - truncating
2007-04-12 19:40:07 [SETI@home] Temporarily failed download of 17dc04ab.28771.833.817310.3.52: HTTP file not found
2007-04-12 19:40:07 [SETI@home] Giving up on download of 17dc04ab.28771.833.817310.3.52: file was not found on serv
2007-04-12 19:40:07 [SETI@home] Giving up on download of 17dc04ab.28771.833.817310.3.52: file was not found on serv
2007-04-12 19:40:07 [SETI@home] Checksum or signature error for 17dc04ab.28771.833.817310.3.52
2007-04-12 19:40:07 [SETI@home] Checksum or signature error for 17dc04ab.28771.833.817310.3.52
2007-04-12 19:40:08 [SETI@home] Unrecoverable error for result 17dc04ab.28771.833.817310.3.52_3 (WU download error:
<file_xfer_error>
<file_name>17dc04ab.28771.833.817310.3.52</file_name>
<error_code>-163</error_code>
<error_message>file was not found on server</error_message>
</file_xfer_error>
)

73 Iztok
ID: 545059 · Report as offensive
Arthur L. Smith

Send message
Joined: 17 Apr 02
Posts: 28
Credit: 244,050,922
RAC: 9
United States
Message 545061 - Posted: 12 Apr 2007, 19:12:02 UTC - in response to Message 545056.  

Anecdotal evidence to support Arthur L. Smith's hypothesis: all of three my SETI-crunching machines, in geographically different locations and networks, with different processors, all have the same issues. In a few hours, all of my SETI machines will be doing no work for the SETI project because they're all unable to download workunits. Each machine is about to run out, or has already run out, of successfully-downloaded WUs. All machines have been reporting the following type of msg for some WUs since yesterday, and are not downloading new WU:

Thu Apr 12 13:49:40 2007|SETI@home|[file_xfer] Temporarily failed download of 18se04aa.24984.27954.117328.3.8: file not found



Thanks, Alan.

I think that confirms it. 5 different users, obviously with different machines, all on different networks, and some probably using different client versions all having the same issue??? I'm convinced it's not just me. :-)
ID: 545061 · Report as offensive
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 545068 - Posted: 12 Apr 2007, 19:18:19 UTC

Found the problem. My bad. Fixing now. Should clear up within minutes. Explanation later.

- Matt
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 545068 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 545076 - Posted: 12 Apr 2007, 19:34:11 UTC - in response to Message 545068.  

Found the problem. My bad. Fixing now. Should clear up within minutes. Explanation later.

- Matt

Looks like it's fixed here.
ID: 545076 · Report as offensive
Profile Alan Ng
Volunteer tester

Send message
Joined: 27 May 04
Posts: 17
Credit: 14,166,402
RAC: 0
United States
Message 545080 - Posted: 12 Apr 2007, 19:36:19 UTC - in response to Message 545076.  
Last modified: 12 Apr 2007, 20:05:18 UTC

Looks like it's fixed here.


Same here! I used Boinc Manager, Transfers tab, to "Retry" the stuck download, and now (unlike previously), the download succeeds, and shortly thereafter, this machine finally began requesting and getting new WUs.

Thanks, Matt.
Are you a musician? Join the Musicians teams on SETI, Einstein, CPDN, or Predictor at:
http://einstein.phys.uwm.edu/forum_thread.php?id=5053
ID: 545080 · Report as offensive
1 · 2 · Next

Message boards : Technical News : Oopsie (Apr 11 2007)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.