Message boards :
Number crunching :
Cleaning up old (dead?) results?
Message board moderation
Author | Message |
---|---|
Pooh Bear 27 Send message Joined: 14 Jul 03 Posts: 3224 Credit: 4,603,826 RAC: 0 |
It's been stated a few times, but no one seems to answer. I have a few "pending", "Errored" and "Finished" old results ranging from July - December. There are 3 pending, one says too many results and is errored, but still says pending. One has enough results, and the one from July looks like it just isn't able to go any fruther (has become a stale unit). I know the deleter has been working hard, but it seems to be missing some old things hanging onto accounts. One of the items is on a machine that is no longer going to crunch units, and would be best to remove from the lisiting. So, can these old results be looked into, please??? My movie https://vimeo.com/manage/videos/502242 |
Keck_Komputers Send message Joined: 4 Jul 99 Posts: 1575 Credit: 4,152,111 RAC: 1 |
Work from July and August will most likely be in limbo forever. There was a major database crash then and the usable backup was week(s) old. The others will hopefully be reissued or in the case of the too many results one be deleted. BOINC WIKI BOINCing since 2002/12/8 |
rsisto Send message Joined: 30 Jul 03 Posts: 135 Credit: 729,936 RAC: 0 |
I think there are a lot of these type of units that have not been deleted, for example look at http://setiweb.ssl.berkeley.edu/results.php?hostid=418941. All its results should have been deleted. |
Ingleside Send message Joined: 4 Feb 03 Posts: 1546 Credit: 15,832,022 RAC: 13 |
The db_purger have AFAIK not run since the BOINC-database moved to the new server, meaning many results that should have been purged is still showing up. For some reason it also looks like db_purger isn't removing wu that have errored-out any longer, even when running. Or, it's possible one or more of the results is incorrectly marked as pending so some fix must be done, before db_purger will remove these wu. Lastly, there are atleast 2 types of wu "stuck", a fix-script must be run on the db to try to re-start these, but most of them will probably error-out. But, till things stabilizes, it's not a good idea to try re-starting them, and even worse idea till db_purger have started again. |
mikey Send message Joined: 17 Dec 99 Posts: 4215 Credit: 3,474,603 RAC: 0 |
> The db_purger have AFAIK not run since the BOINC-database moved to the new > server, meaning many results that should have been purged is still showing > up. > > For some reason it also looks like db_purger isn't removing wu that have > errored-out any longer, even when running. Or, it's possible one or more of > the results is incorrectly marked as pending so some fix must be done, before > db_purger will remove these wu. > > Lastly, there are atleast 2 types of wu "stuck", a fix-script must be run on > the db to try to re-start these, but most of them will probably error-out. > But, till things stabilizes, it's not a good idea to try re-starting them, and > even worse idea till db_purger have started again. > Wouldn't it be easy just to run a small script to send the workunit number to a small file and then manually enter them into the resender? If you go back to say July in the search you should catch most of them. You could even go back to actual live startup, it is after only a computer program running, not an actual person doing the looking. |
Divide Overflow Send message Joined: 3 Apr 99 Posts: 365 Credit: 131,684 RAC: 0 |
Any plans for running the db_purger / deleter again? The results pages are really beginning to back up! The database could probably shrink down quite a bit unless there's a reason to keep these files around longer. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
> Any plans for running the db_purger / deleter again? The results pages are > really beginning to back up! The database could probably shrink down quite a > bit unless there's a reason to keep these files around longer. Like "giving those who were impacted by the recent outages a chance to return work and get credit?" |
Benher Send message Joined: 25 Jul 99 Posts: 517 Credit: 465,152 RAC: 0 |
> Like "giving those who were impacted by the recent outages a chance to return > work and get credit?" In the case where there are 3 returned results...and credit has been granted...there is no reason to keep all of A. the "cannonical" result (which all others are validated against) B. the 2 other validated results on the fileserver. The B results should be deleted one week after credit is granted (so their users can look at them if they like) If there is a 4th result still unreturned, the cannonical A result should be kept around until the deadline, or until 4th result is returned/checked for validation. Once the deadline is reached, any unreturned results are...dead...of course. |
mikey Send message Joined: 17 Dec 99 Posts: 4215 Credit: 3,474,603 RAC: 0 |
> Any plans for running the db_purger / deleter again? The results pages are > really beginning to back up! The database could probably shrink down quite a > bit unless there's a reason to keep these files around longer. > Would be a good idea to run it to get rid of some of these results too: http://setiweb.ssl.berkeley.edu/workunit.php?wuid=527415 That unit has been hanging around since July LAST YEAR!!!!! |
Divide Overflow Send message Joined: 3 Apr 99 Posts: 365 Credit: 131,684 RAC: 0 |
Ned: No... Like removing WU's that have had all of the results returned month(s) ago. As in one of your own here. I certainly don't have any wish to slam the door on those who were impacted by the recent downtime, but there shouldn't be any reason for keeping these old, fully reported WU's on the books any longer. Unless the purge & delete process is wildly indiscriminate, shouldn't things start returning back to "normal" around here? On a related note: How long should the project administrators wait for results to be returned after a disruption in service? The servers have been back up for a week. That should be plenty of time for everyone to return results that were held up due to the down time. Shouldn't it? (Sincere speculation here.) |
Ingleside Send message Joined: 4 Feb 03 Posts: 1546 Credit: 15,832,022 RAC: 13 |
> > In the case where there are 3 returned results...and credit has been > granted...there is no reason to keep all of > A. the "cannonical" result (which all others are validated against) > B. the 2 other validated results on the fileserver. > > The B results should be deleted one week after credit is granted (so their > users can look at them if they like) B-results can be deleted from fileserver immediately after validated and credited, there's no reason to wait a week for this. But, AFAIK the file_deleter waits till wu is "done" before deleting anything... this should at most take a fortnight, but normally only a couple days while waiting on last result. > If there is a 4th result still unreturned, the cannonical A result should be > kept around until the deadline, or until 4th result is returned/checked for > validation. > > Once the deadline is reached, any unreturned results are...dead...of course. > The way through the system is roughly like this: 1; Validate wu, get canonical result and assign credit to all passing validation. 2; Assimilator copies the "canonical result" to the science database. 3; Waits till all results either reported and tried validated, or past deadline. 4; File_deleter removes all wu & result-files for this wu from upload/download-directory. 5; One week after #4, db_purger archieves and removes info for wu/results from BOINC database. 1-4 is running, and is hopefully not backlogged, so keeps upload/download-disks reasonably empty. #5, the db_purger on the other hand has AFAIK not run since the new db-server was installed. Since the planned full load is roughly 4x todays results/day, it's not unreasonably they're currently testing the new server to see if there's any problems with it then the database is much bigger than needed for the moment. If any problems pops up due to increased size of the db, it's much better to catch these now, and not after "classic" is killed off... It can also be other reasons they're not currently running the db_purger, but except for some extra pages of results and some users waiting on the opportunity to delete a computer they've stopped using, it doesn't seem there's any problem with this now. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
> Ned: No... Like removing WU's that have had all of the results returned > month(s) ago. As in one of your own <a> href="http://setiweb.ssl.berkeley.edu/workunit.php?wuid=875082">here[/url]. I suspect that db_purger removes every result that is eligible to be purged, old results like the one you noted, and new ones that have three, but not four, results. So, the developers can either create a new db_purger that purges really old results only, or they can wait a week or two to give everyone plenty of time to report, and then start running it. In the meantime, the developer time can be used for something more generally useful. |
mikey Send message Joined: 17 Dec 99 Posts: 4215 Credit: 3,474,603 RAC: 0 |
> On a related note: How long should the project administrators wait for results > to be returned after a disruption in service? The servers have been back up > for a week. That should be plenty of time for everyone to return results that > were held up due to the down time. Shouldn't it? (Sincere speculation > here.) > The current max time for a cache is 10 days, so anything done before 10 days is premature. After that anything done before the 2 week deadline would also be premature, just because of the database issues. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
> > On a related note: How long should the project administrators wait for > results > > to be returned after a disruption in service? The servers have been back > up > > for a week. That should be plenty of time for everyone to return results > that > > were held up due to the down time. Shouldn't it? (Sincere speculation > > here.) > > > The current max time for a cache is 10 days, so anything done before 10 days > is premature. After that anything done before the 2 week deadline would also > be premature, just because of the database issues. I think you mean overdue (late) not premature (early). ... but that isn't exactly true. A work unit goes out to four machines, and they have two weeks to return results. If three machines return work, fine, all is good. If not, it goes out to more machines, and they have two weeks. No problem. If we still don't have a quorum -- three machines with results that match reasonably well, it goes out again, and there is another two weeks. So, now we're at six weeks, and hopefully we've got a quorum. Add a couple of weeks for just plain rough going, and we're at two months. Now, maybe someone with the project would comment, but I think they'd rather wait a couple of weeks just to give everything a little more time to settle. |
mikey Send message Joined: 17 Dec 99 Posts: 4215 Credit: 3,474,603 RAC: 0 |
> > The current max time for a cache is 10 days, so anything done before 10 > days > > is premature. After that anything done before the 2 week deadline would > also > > be premature, just because of the database issues. > > I think you mean overdue (late) not premature (early). > > ... but that isn't exactly true. > No we were talking about a database purge of units not returned and I was explaining that purging after only 1 week of Berkeley down time was "premature". |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
> > > > The current max time for a cache is 10 days, so anything done before > 10 > > days > > > is premature. After that anything done before the 2 week deadline > would > > also > > > be premature, just because of the database issues. > > > > I think you mean overdue (late) not premature (early). > > > > ... but that isn't exactly true. > > > No we were talking about a database purge of units not returned and I was > explaining that purging after only 1 week of Berkeley down time was > "premature". Ah, sorry. Still, a work unit can be pending for 6 to 8 weeks, easily. |
mikey Send message Joined: 17 Dec 99 Posts: 4215 Credit: 3,474,603 RAC: 0 |
> > No we were talking about a database purge of units not returned and I > was > > explaining that purging after only 1 week of Berkeley down time was > > "premature". > > Ah, sorry. Still, a work unit can be pending for 6 to 8 weeks, easily. > Actually a unit can be re-issued up to a max of 15 times...that means that counting the inital time and allowing for 2 weeks for each re-issue, it could be 32 weeks before a unit is deemed uncrunchable. AND that assumes that berkeley is prompt about the re-issue timing. |
Ingleside Send message Joined: 4 Feb 03 Posts: 1546 Credit: 15,832,022 RAC: 13 |
> Actually a unit can be re-issued up to a max of 15 times...that means that > counting the inital time and allowing for 2 weeks for each re-issue, it could > be 32 weeks before a unit is deemed uncrunchable. AND that assumes that > berkeley is prompt about the re-issue timing. > > If the seti-wu-limits haven't changed again since December, the wu will error-out before 15 results... |
Bill & Patsy Send message Joined: 6 Apr 01 Posts: 141 Credit: 508,875 RAC: 0 |
Sometime, several months ago (I haven't been able to find the post or I would have quoted it), someone on the Berkeley staff explained that they had changed the delete protocol in response to complaints that results were disappearing too fast. The protocol then, and I presume it is still in effect today, was to wait two weeks (as I recall) after the last posting activity on a WU _after_ it became eligible for purging. Thus, for example, if a quorum of three was achieved on the very first day and the fourth result was returned on the very last day, the fourth result would be "activity" that would reset the timer and the delay would be extended another two weeks. If it took two months to achieve a quorum, the timer wouldn't even start until the quorum was eventually achieved. That would seem to address all the concerns that I think I've seen mentioned in this thread. Now back to crunching... --Bill |
Ingleside Send message Joined: 4 Feb 03 Posts: 1546 Credit: 15,832,022 RAC: 13 |
This was changed to one week, as already mentioned earlier in this thread, but db_purge isn't running constantly. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.