Too late to validate?


log in

Advanced search

Message boards : Number crunching : Too late to validate?

1 · 2 · Next
Author Message
Profile Gatekeeper
Avatar
Send message
Joined: 14 Jul 04
Posts: 757
Credit: 144,177,526
RAC: 121,280
United States
Message 1240992 - Posted: 4 Jun 2012, 2:59:48 UTC
Last modified: 4 Jun 2012, 3:00:37 UTC

Here is my list of "invalids" for my main cruncher:http://setiathome.berkeley.edu/results.php?hostid=6371091&offset=0&show_names=0&state=4&appid=

Note they all show as "completed too late to validate", but had a less than 2 day turnaround. What gets me is that all of them were sent to me as the third system, even though the first two rigs had completed and returned their results. Anybody have a thought as to what's going on here?
____________

rob smith
Volunteer moderator
Send message
Joined: 7 Mar 03
Posts: 5572
Credit: 22,629,645
RAC: 43,723
United Kingdom
Message 1241025 - Posted: 4 Jun 2012, 6:58:35 UTC

This happens periodically. Most often after a big outage we see clusters of WU coming out with impossibly short deadlines.

In case they were initially sent out just before the outage to two other users, who processed them during the outage. The server has decided to send them out again, just after the outage, but before the other two users have reported, but with impossibly short deadlines.

I suspect we are going to see lots of these in the next few days :-(
____________
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?

Profile Gatekeeper
Avatar
Send message
Joined: 14 Jul 04
Posts: 757
Credit: 144,177,526
RAC: 121,280
United States
Message 1241026 - Posted: 4 Jun 2012, 7:05:48 UTC - in response to Message 1241025.

This happens periodically. Most often after a big outage we see clusters of WU coming out with impossibly short deadlines.

In case they were initially sent out just before the outage to two other users, who processed them during the outage. The server has decided to send them out again, just after the outage, but before the other two users have reported, but with impossibly short deadlines.

I suspect we are going to see lots of these in the next few days :-(


Rob,

I understand what you're saying, as it happens with VLAR's being re-sent on a GPU work request. But these were sent to me as user #3 AFTER users 1&2 had reported them. And I don't think they were sent to me with short deadlines; I returned them completed in 2 days (so I don't know what the project deadline for them was). I'm guessing that somehow, they got scheduled for resend even though they had been properly reported due to some timing issue between the validators and the scheduler, but I was looking for someone to confirm my theory.
____________

Profile Link
Avatar
Send message
Joined: 18 Sep 03
Posts: 711
Credit: 1,220,607
RAC: 1,128
Germany
Message 1241028 - Posted: 4 Jun 2012, 7:13:13 UTC - in response to Message 1240992.

Note they all show as "completed too late to validate", but had a less than 2 day turnaround.

Well, they were returned about 5 minutes after the deadline and after both your wingmen returned their results, so technically that is correct.


What gets me is that all of them were sent to me as the third system, even though the first two rigs had completed and returned their results.

No, they were always send to you before the 2nd result was returned. So that was right as well.

The real question is, why did those WUs had so short dealines? Server bug?
____________
.

Profile Link
Avatar
Send message
Joined: 18 Sep 03
Posts: 711
Credit: 1,220,607
RAC: 1,128
Germany
Message 1241030 - Posted: 4 Jun 2012, 7:16:08 UTC - in response to Message 1241026.

And I don't think they were sent to me with short deadlines; I returned them completed in 2 days (so I don't know what the project deadline for them was).

You can see the deadline in the task details, for example:
Name 12mr10aa.19911.24347.3.10.48_2
Workunit 999039288
Created 1 Jun 2012 | 5:16:22 UTC
Sent 1 Jun 2012 | 8:05:14 UTC
Received 3 Jun 2012 | 17:29:43 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 6371091
Report deadline 3 Jun 2012 | 17:24:30 UTC
Run time 744.77
CPU time 107.31
Validate state Task was reported too late to validate
Credit 0.00
Application version SETI@home Enhanced
Anonymous platform (NVIDIA GPU)

____________
.

Profile Gatekeeper
Avatar
Send message
Joined: 14 Jul 04
Posts: 757
Credit: 144,177,526
RAC: 121,280
United States
Message 1241033 - Posted: 4 Jun 2012, 7:32:22 UTC - in response to Message 1241030.
Last modified: 4 Jun 2012, 7:43:38 UTC

And I don't think they were sent to me with short deadlines; I returned them completed in 2 days (so I don't know what the project deadline for them was).

You can see the deadline in the task details, for example:
Name 12mr10aa.19911.24347.3.10.48_2
Workunit 999039288
Created 1 Jun 2012 | 5:16:22 UTC
Sent 1 Jun 2012 | 8:05:14 UTC
Received 3 Jun 2012 | 17:29:43 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 6371091
Report deadline 3 Jun 2012 | 17:24:30 UTC
Run time 744.77
CPU time 107.31
Validate state Task was reported too late to validate
Credit 0.00
Application version SETI@home Enhanced
Anonymous platform (NVIDIA GPU)


OK, it's late here, and I'm tired, and these will all probably be deleted by tomorrow morning my time, but there are alot of timing issues about these I don't understand. It's not the credits; it's my not understanding how these came about in the first place. Hopefully, we won't, as Rob suggested, be seeing alot of these.

EDIT: For example, it's curious that the project "deadline" is the same in all 8 workunits, and is almost exactly 5 minutes before I returned the workunits. It's as if when I returned them, S@H said "oops, we don't need these, let's change the deadline and make them invalid".
____________

Lionel
Send message
Joined: 25 Mar 00
Posts: 464
Credit: 152,563,123
RAC: 120,872
Australia
Message 1241035 - Posted: 4 Jun 2012, 7:45:56 UTC - in response to Message 1241025.

This happens periodically. Most often after a big outage we see clusters of WU coming out with impossibly short deadlines.

In case they were initially sent out just before the outage to two other users, who processed them during the outage. The server has decided to send them out again, just after the outage, but before the other two users have reported, but with impossibly short deadlines.

I suspect we are going to see lots of these in the next few days :-(


yep, had 250+ of these the other day ... good to see that others are getting the good news as well ...

____________

Josef W. Segur
Volunteer developer
Volunteer tester
Send message
Joined: 30 Oct 99
Posts: 3969
Credit: 937,596
RAC: 154
United States
Message 1241086 - Posted: 4 Jun 2012, 14:47:11 UTC - in response to Message 1241033.

And I don't think they were sent to me with short deadlines; I returned them completed in 2 days (so I don't know what the project deadline for them was).

You can see the deadline in the task details, for example:
Name 12mr10aa.19911.24347.3.10.48_2
Workunit 999039288
Created 1 Jun 2012 | 5:16:22 UTC
Sent 1 Jun 2012 | 8:05:14 UTC
Received 3 Jun 2012 | 17:29:43 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 6371091
Report deadline 3 Jun 2012 | 17:24:30 UTC
Run time 744.77
CPU time 107.31
Validate state Task was reported too late to validate
Credit 0.00
Application version SETI@home Enhanced
Anonymous platform (NVIDIA GPU)


OK, it's late here, and I'm tired, and these will all probably be deleted by tomorrow morning my time, but there are alot of timing issues about these I don't understand. It's not the credits; it's my not understanding how these came about in the first place. Hopefully, we won't, as Rob suggested, be seeing alot of these.

EDIT: For example, it's curious that the project "deadline" is the same in all 8 workunits, and is almost exactly 5 minutes before I returned the workunits. It's as if when I returned them, S@H said "oops, we don't need these, let's change the deadline and make them invalid".

Yes, it's another side effect of the server mod to only accept 64 at a time. At 17:24:30 UTC your host reported more than 64, so the excess became subject to the resend lost tasks logic. When that found some WUs already had a canonical result it expired them immediately (set the deadline to 'now') rather than resending them. Then on the next attempt to report them at 17:29:43 UTC they were seen as too late.
Joe

Profile Link
Avatar
Send message
Joined: 18 Sep 03
Posts: 711
Credit: 1,220,607
RAC: 1,128
Germany
Message 1241155 - Posted: 4 Jun 2012, 16:52:02 UTC - in response to Message 1241086.

You mean basically every user is forced now to set the limit to max 64 tasks per report in his cc_config.xml, otherwise there's risk of loosing credits? Not that I'm going to run into such issues anytime soon, just curious...

However, that still does not explain, why some of the _0 and _1 results for these WUs had so short deadlines.
____________
.

msattler
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 34711
Credit: 354,904,884
RAC: 377,523
United States
Message 1241167 - Posted: 4 Jun 2012, 17:03:21 UTC - in response to Message 1241086.


Yes, it's another side effect of the server mod to only accept 64 at a time. At 17:24:30 UTC your host reported more than 64, so the excess became subject to the resend lost tasks logic. When that found some WUs already had a canonical result it expired them immediately (set the deadline to 'now') rather than resending them. Then on the next attempt to report them at 17:29:43 UTC they were seen as too late.
Joe

Do we have any word if this wonderful kluge may be rescinded or fixed during tomorrow's outage?
____________
******

"Ask not, what your kitty can do for you. Ask what you can do for your kitty."

As it is kitten, so shall it be done.



Josef W. Segur
Volunteer developer
Volunteer tester
Send message
Joined: 30 Oct 99
Posts: 3969
Credit: 937,596
RAC: 154
United States
Message 1241246 - Posted: 4 Jun 2012, 18:58:51 UTC - in response to Message 1241155.

You mean basically every user is forced now to set the limit to max 64 tasks per report in his cc_config.xml, otherwise there's risk of loosing credits? Not that I'm going to run into such issues anytime soon, just curious...

Yes, probably any host with RAC of 5000 or above ought to be using that safety measure.

However, that still does not explain, why some of the _0 and _1 results for these WUs had so short deadlines.

I assume it was the same 64 limit causing the tasks to be resent, but didn't look at those details while the WUs were unpurged.
Joe

Profile Sten-Arne
Volunteer tester
Avatar
Send message
Joined: 1 Nov 08
Posts: 2762
Credit: 8,386,512
RAC: 24,418
Sweden
Message 1241257 - Posted: 4 Jun 2012, 19:25:13 UTC - in response to Message 1241155.
Last modified: 4 Jun 2012, 19:27:32 UTC

You mean basically every user is forced now to set the limit to max 64 tasks per report in his cc_config.xml, otherwise there's risk of loosing credits? Not that I'm going to run into such issues anytime soon, just curious...


But since my versions of Boinc does not support setting the limits in cc_config, and the likelyhood of me upgrading any of my clients is close to zero, my hope is that they fix the issue at the source.

However, since I run main and Beta at 50% each, the risk that a couple of days outage would put me over 64 tasks to report on one project from any computer, is really very low.
____________
/The grumpy old Swede.

"I'm so old, that 98% of all trees in the forest, are younger than I am"

Profile BilBg
Avatar
Send message
Joined: 27 May 07
Posts: 2089
Credit: 3,270,545
RAC: 8,195
Bulgaria
Message 1241260 - Posted: 4 Jun 2012, 19:29:42 UTC - in response to Message 1241257.


I think if you set NNT (until all is reported) the server bug will not make false resents to you?


____________



- ALF - "Find out what you don't do well ..... then don't do it!" :)

Profile HAL9000
Volunteer tester
Avatar
Send message
Joined: 11 Sep 99
Posts: 3297
Credit: 78,469,887
RAC: 59,468
United States
Message 1241266 - Posted: 4 Jun 2012, 19:39:22 UTC - in response to Message 1241260.


I think if you set NNT (until all is reported) the server bug will not make false resents to you?


Resends are still sent even when NNT is set.

In related news I have had <max_tasks_reported>100</max_tasks_reported> set on my faster machines for some time. That seems to keep them happy when a large number of tasks build up.
____________
SETI@home classic workunits: 93,865 CPU time: 863,447 hours

Join the BP6/VP6 User Group today!

Profile BilBg
Avatar
Send message
Joined: 27 May 07
Posts: 2089
Credit: 3,270,545
RAC: 8,195
Bulgaria
Message 1241271 - Posted: 4 Jun 2012, 19:48:28 UTC - in response to Message 1241266.


I think if you set NNT (until all is reported) the server bug will not make false resents to you?

Resends are still sent even when NNT is set.

Don't you have to actually ask for work like in:
02-Jun-2012 21:58:25 [SETI@home] Reporting 1 completed tasks, requesting new tasks for CPU and GPU
... for the Resends logic to kick in?


____________



- ALF - "Find out what you don't do well ..... then don't do it!" :)

Claggy
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 3507
Credit: 26,359,603
RAC: 16,998
United Kingdom
Message 1241273 - Posted: 4 Jun 2012, 19:49:28 UTC - in response to Message 1241266.
Last modified: 4 Jun 2012, 19:50:15 UTC


I think if you set NNT (until all is reported) the server bug will not make false resents to you?


Resends are still sent even when NNT is set.

No they aren't, not at this project anyway, at Einstein and other projects with older schedulers, yes resends are sent with NNT set.

Changeset [21823]

•scheduler: don't resend work if client isn't requesting work


Claggy

Profile HAL9000
Volunteer tester
Avatar
Send message
Joined: 11 Sep 99
Posts: 3297
Credit: 78,469,887
RAC: 59,468
United States
Message 1241281 - Posted: 4 Jun 2012, 19:58:50 UTC - in response to Message 1241273.


I think if you set NNT (until all is reported) the server bug will not make false resents to you?


Resends are still sent even when NNT is set.

No they aren't, not at this project anyway, at Einstein and other projects with older schedulers, yes resends are sent with NNT set.

Changeset [21823]

•scheduler: don't resend work if client isn't requesting work


Claggy

Ah OK. I must have seen that with an older client version I was running then. By the date of that change set it looks like 6.10.58 and newer should have that change. I was probably using a 6.10.45 or seomthing when I had that occur.
____________
SETI@home classic workunits: 93,865 CPU time: 863,447 hours

Join the BP6/VP6 User Group today!

Claggy
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 3507
Credit: 26,359,603
RAC: 16,998
United Kingdom
Message 1241291 - Posted: 4 Jun 2012, 20:09:00 UTC - in response to Message 1241281.
Last modified: 4 Jun 2012, 20:27:37 UTC


I think if you set NNT (until all is reported) the server bug will not make false resents to you?


Resends are still sent even when NNT is set.

No they aren't, not at this project anyway, at Einstein and other projects with older schedulers, yes resends are sent with NNT set.

Changeset [21823]

•scheduler: don't resend work if client isn't requesting work


Claggy

Ah OK. I must have seen that with an older client version I was running then. By the date of that change set it looks like 6.10.58 and newer should have that change. I was probably using a 6.10.45 or seomthing when I had that occur.

That's the scheduler on the server, ie on synergy, not the scheduler in the client,
older version of Boinc (pre 6.10.x) used to still ask for work even if the preferences were set to not use a resourse, (it was a server side preference then)
Boinc 6.10.x and later used different preferences (i think they were combined on the website later) that stops Boinc 6.10.x and later from even asking for work,

Claggy

Profile HAL9000
Volunteer tester
Avatar
Send message
Joined: 11 Sep 99
Posts: 3297
Credit: 78,469,887
RAC: 59,468
United States
Message 1241376 - Posted: 4 Jun 2012, 22:33:18 UTC - in response to Message 1241291.


I think if you set NNT (until all is reported) the server bug will not make false resents to you?


Resends are still sent even when NNT is set.

No they aren't, not at this project anyway, at Einstein and other projects with older schedulers, yes resends are sent with NNT set.

Changeset [21823]

•scheduler: don't resend work if client isn't requesting work


Claggy

Ah OK. I must have seen that with an older client version I was running then. By the date of that change set it looks like 6.10.58 and newer should have that change. I was probably using a 6.10.45 or seomthing when I had that occur.

That's the scheduler on the server, ie on synergy, not the scheduler in the client,
older version of Boinc (pre 6.10.x) used to still ask for work even if the preferences were set to not use a resourse, (it was a server side preference then)
Boinc 6.10.x and later used different preferences (i think they were combined on the website later) that stops Boinc 6.10.x and later from even asking for work,

Claggy

It seems like it was only a few months ago when I had set NNT on a machine and then proceeded to get numerous resends. Perhaps it was just much longer ago then it seems like.
____________
SETI@home classic workunits: 93,865 CPU time: 863,447 hours

Join the BP6/VP6 User Group today!

Lionel
Send message
Joined: 25 Mar 00
Posts: 464
Credit: 152,563,123
RAC: 120,872
Australia
Message 1241624 - Posted: 5 Jun 2012, 9:43:38 UTC - in response to Message 1241257.

[quote]You mean basically every user is forced now to set the limit to max 64 tasks per report in his cc_config.xml, otherwise there's risk of loosing credits? Not that I'm going to run into such issues anytime soon, just curious...


But since my versions of Boinc does not support setting the limits in cc_config, and the likelyhood of me upgrading any of my clients is close to zero, my hope is that they fix the issue at the source.
/quote]

my thoughts exactly Sten ...

____________

1 · 2 · Next

Message boards : Number crunching : Too late to validate?

Copyright © 2013 University of California