Panic Mode On (94) Server Problems?

Author	Message
Zalster Volunteer tester Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242	Message 1623780 - Posted: 5 Jan 2015, 18:06:38 UTC - in response to Message 1623773. I don't know about you, but this is the most APs my computers have seen in over 2 months. 1 has 118 the other 112. Finally had to turn the heater down as all the GPUs are now rived up!! ID: 1623780 ·

Lionel Send message Joined: 25 Mar 00 Posts: 680 Credit: 563,640,304 RAC: 597	Message 1623950 - Posted: 5 Jan 2015, 21:19:02 UTC - in response to Message 1623780. Last modified: 5 Jan 2015, 21:19:20 UTC Yes, I get that message to, even though I have only 4 CPU APs in the queue, normally it is 100. It brings up an interesting question in how can one be at a limit when you are 96% below the maximum value. ID: 1623950 ·

JaundicedEye Send message Joined: 14 Mar 12 Posts: 5375 Credit: 30,870,693 RAC: 1	Message 1623995 - Posted: 5 Jan 2015, 22:03:14 UTC Almost back to pre-Bruno-crash RAC.........Keep 'em coming! "Sour Grapes make a bitter Whine." <(0)> ID: 1623995 ·

Aurora Borealis Volunteer tester Send message Joined: 14 Jan 01 Posts: 3075 Credit: 5,631,463 RAC: 0	Message 1624019 - Posted: 5 Jan 2015, 22:31:48 UTC AP spliters have started running on Beta again. ID: 1624019 ·

Lionel Send message Joined: 25 Mar 00 Posts: 680 Credit: 563,640,304 RAC: 597	Message 1624135 - Posted: 6 Jan 2015, 2:01:01 UTC - in response to Message 1623958. Last modified: 6 Jan 2015, 2:12:31 UTC @Sten-Arne: I'm not worried about it at all. The only thing that annoys me is the 4+ hours I have to sit in front of the puters and keep clicking on the update button to get to 100 CPU tasks. After getting to 100 CPU tasks the caches glide down to circa 5 wus again. It's just poor software. ID: 1624135 ·

Dena Wiltsie Volunteer tester Send message Joined: 19 Apr 01 Posts: 1628 Credit: 24,230,968 RAC: 26	Message 1624141 - Posted: 6 Jan 2015, 2:31:24 UTC - in response to Message 1624135. @Sten-Arne: I'm not worried about it at all. The only thing that annoys me is the 4+ hours I have to sit in front of the puters and keep clicking on the update button to get to 100 CPU tasks. After getting to 100 CPU tasks the caches glide down to circa 5 wus again. It's just poor software. I am running the current release and after it picked up the first few AP units, it has filled my queue to 100 work units each and has maintained the level. I gave up and decided to maintain a 2 day queue which is 100 work units each. ID: 1624141 ·

Lionel Send message Joined: 25 Mar 00 Posts: 680 Credit: 563,640,304 RAC: 597	Message 1624164 - Posted: 6 Jan 2015, 3:46:15 UTC - in response to Message 1624141. I don't have a problem maintaining work in the cache for GPU WUs, only CPU WUs are an issue as it does not really re-stock the cache as they are completed. [side issue] I would like to see the limits raised. I know there are people out there with slower machines but 100 MB CPU WUs is about 1 days worth of work. 100 AP CPU WUs is about 3.5 days worth of work (if you can get 100 WUs for your CPU that is). 200 MB GPU WUs is about 8 hours of work (based on 10 minute completion times - appreciate shorties take less than 6 minutes) and 200 AP GPU WUs are about 40 hours of work (based on circa 50 minutes completion time). The other thing that I would like to see is the 8 weeks on a slow glide path down to say 2 weeks. It was set at 8 weeks some 14+ years ago when processors were much much slower. I think it's time for a review in here. Probably got an ice creams chance in hell of any of the above. ID: 1624164 ·

OTS Volunteer tester Send message Joined: 6 Jan 08 Posts: 369 Credit: 20,533,537 RAC: 0	Message 1624181 - Posted: 6 Jan 2015, 5:07:34 UTC - in response to Message 1624164. It looks like there might be less AP work being issued again. If I am reading the SSP correctly, it appears that there are three splitters again working on a single file. For a short time my CPU cache was holding steady at the maximum 100 amount but now the total in progress is slowly going down hill. ID: 1624181 ·

JanniCash Send message Joined: 17 Nov 03 Posts: 57 Credit: 1,276,920 RAC: 0	Message 1624207 - Posted: 6 Jan 2015, 7:31:27 UTC - in response to Message 1624164. [side issue] I would like to see the limits raised. I know there are people out there with slower machines but 100 MB CPU WUs is about 1 days worth of work. Not sure I understand your problem 100%, so ignore from here if my comment doesn't compute. If all you want is to download more WUs in advance, to bridge a server outage for example, why not create multiple Virtual Machines with a total number of VCPU's equal to your physical cores (or threads)? With VTx (or similar), virtualization has almost zero performance loss these days, but each VM would appear as a separate BOINC client, maintaining its own cache. ID: 1624207 ·

David S Volunteer tester Send message Joined: 4 Oct 99 Posts: 18352 Credit: 27,761,924 RAC: 12	Message 1624395 - Posted: 6 Jan 2015, 14:37:48 UTC - in response to Message 1623995. Almost back to pre-Bruno-crash RAC.........Keep 'em coming! I'm not. Last summer I was around 15K. I've only recovered about half of my drop. David Sitting on my butt while others boldly go, Waiting for a message from a small furry creature from Alpha Centauri. ID: 1624395 ·

David S Volunteer tester Send message Joined: 4 Oct 99 Posts: 18352 Credit: 27,761,924 RAC: 12	Message 1624398 - Posted: 6 Jan 2015, 14:41:00 UTC I'm surprised no one has posted *PANIC!!!* about the MB RTS being so low this morning. David Sitting on my butt while others boldly go, Waiting for a message from a small furry creature from Alpha Centauri. ID: 1624398 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874	Message 1624434 - Posted: 6 Jan 2015, 15:44:33 UTC - in response to Message 1622882. Here's my run-through of my job_log: First one's unix stamp is Aug 08, 2012: 1344411405.758627 ue 40443.513279 ct 44801.180000 fe 100366914056685.200000 nm ap_06my12ab_B6_P1_00397_20120728_06219.wu_0 1348411651.444180 ue 41462.286580 ct 41549.820000 fe 99312357715860.406000 nm ap_28jn12ab_B1_P1_00327_20120913_22283.wu_1 1353416891 ue 41154.808134 ct 44612.900000 fe 100748165223275 nm ap_27au12ab_B4_P1_00033_20121109_21887.wu_2 et 44755.875742 1354547709 ue 43096.733593 ct 41306.480000 fe 103747720196172 nm ap_27au12ab_B3_P0_00383_20121109_08471.wu_3 et 41408.402161 1354823446 ue 42485.311151 ct 36223.400000 fe 103782242333548 nm ap_01se12ab_B2_P1_00034_20121106_10784.wu_3 et 36360.880091 1365786085 ue 42041.303007 ct 43234.320000 fe 101221168606141 nm ap_30jn12ad_B0_P0_00323_20130331_14381.wu_2 et 43327.372242 1367166402 ue 41867.606117 ct 44128.500000 fe 102913816340793 nm ap_29ja13aa_B4_P0_00011_20130418_18429.wu_0 et 44210.891695 1367705446 ue 42201.410116 ct 44351.470000 fe 103734332406452 nm ap_26fe13ae_B6_P0_00170_20130424_29841.wu_0 et 44437.584740 1371698309 ue 43973.768278 ct 35040.840000 fe 107073863886764 nm ap_24no12aa_B3_P0_00177_20130610_00904.wu_0 et 35083.251799 1371782925 ue 43491.213005 ct 37562.650000 fe 105898866616610 nm ap_03ja12ai_B5_P0_00025_20130611_18331.wu_0 et 37606.444761 Last one's timestamp is June 21, 2013. If you (or anybody else) feel you have a comprehensive job log/s with tasks from all AP tapes, I could set up a parallel database and compare them with my MB records. With help from Cosmic_Ocean, I've been able to assemble a data distribution history for AP with records of 3051 tapes split. But that's still a long way short of the 5,544 I know have been split for MB (as at 31 Dec) - the difference is far more likely to be our incomplete jog logs, rather than a vast stash of unprocessed tapes. So, if anyone would actually like to know how we're getting along, I'd need help with logs from some of you assiduous AP-hounds out there, please. ID: 1624434 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874	Message 1624441 - Posted: 6 Jan 2015, 16:02:10 UTC - in response to Message 1624398. I'm surprised no one has posted *PANIC!!!* about the MB RTS being so low this morning. The tapes which have been added to the splitter queue today and over the weekend have all been split for MB before, and it looks as if re-splitting for MB has been inhibited this time - the tapes are skimmed through very quickly without any new work appearing, as was happening for AP last week. ID: 1624441 ·

Zalster Volunteer tester Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242	Message 1624448 - Posted: 6 Jan 2015, 16:18:22 UTC - in response to Message 1624441. Last modified: 6 Jan 2015, 16:34:43 UTC Yup, was my computer. Sorry. Forget what I said. Bad driver..... ID: 1624448 ·

betreger Send message Joined: 29 Jun 99 Posts: 11361 Credit: 29,581,041 RAC: 66	Message 1624556 - Posted: 6 Jan 2015, 23:11:19 UTC - in response to Message 1624508. One might wonder why they still haven't started the AP assimilators? It's as of now 477,099 AP Workunits waiting for assimilation. It's been several Tuesday outages since the DB was fixed, but still no assimilation of AP WU's. Maybe the DB isn't working as it should yet, and until it does, they will not allow the WU's to be assimilated. I was thinking this was part of a stress test for the new data base. ID: 1624556 ·

HAL9000 Volunteer tester Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57	Message 1624664 - Posted: 7 Jan 2015, 1:25:17 UTC - in response to Message 1624556. One might wonder why they still haven't started the AP assimilators? It's as of now 477,099 AP Workunits waiting for assimilation. It's been several Tuesday outages since the DB was fixed, but still no assimilation of AP WU's. Maybe the DB isn't working as it should yet, and until it does, they will not allow the WU's to be assimilated. I was thinking this was part of a stress test for the new data base. Until the assimilators start no data is going into the AP science db. sah_assimilator/ap_assimilator : Takes scientific data from validated results and puts them in the SETI@home (or Astropulse) database for later analysis. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ ID: 1624664 ·

Dena Wiltsie Volunteer tester Send message Joined: 19 Apr 01 Posts: 1628 Credit: 24,230,968 RAC: 26	Message 1624726 - Posted: 7 Jan 2015, 4:49:23 UTC - in response to Message 1624164. I don't have a problem maintaining work in the cache for GPU WUs, only CPU WUs are an issue as it does not really re-stock the cache as they are completed. [side issue] I would like to see the limits raised. I know there are people out there with slower machines but 100 MB CPU WUs is about 1 days worth of work. 100 AP CPU WUs is about 3.5 days worth of work (if you can get 100 WUs for your CPU that is). 200 MB GPU WUs is about 8 hours of work (based on 10 minute completion times - appreciate shorties take less than 6 minutes) and 200 AP GPU WUs are about 40 hours of work (based on circa 50 minutes completion time). The other thing that I would like to see is the 8 weeks on a slow glide path down to say 2 weeks. It was set at 8 weeks some 14+ years ago when processors were much much slower. I think it's time for a review in here. Probably got an ice creams chance in hell of any of the above. After the outage today I burned off a fair amount of work and one work request returned 34 MB work units. I am still a bit low on AP work but I suspect I will get more work after some of the other people get theirs. ID: 1624726 ·

Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13	Message 1624758 - Posted: 7 Jan 2015, 6:37:33 UTC I had an interesting occurrence on my slow Sempron machine. So it was crunching away on an AP, and at that time, it had not had 11 "completed" APs yet, so the estimates were still astronomical (pun sort-of intended).. like ~230 hours. Once it got to about 75% complete, the remaining time was low enough for the 2.5+1-day cache to ask for more work. By this time, the 11th "completed task" had happened, so the estimates that came with new tasks were much more realistic. Sort of. New tasks were coming with an estimate of 31 hours, when it normally takes ~46 for a <10%-blanked AP. And then it got a few more APs. And then the one that was running finished.. and I thought since the duration on that one was more than 10% of the estimate of the others.. the others would change their estimate to match what that one actually took. But I was wrong. The estimates for all the ones that were in the cache at that time dropped to 18 hours...and then work fetch happened and got more. I realized this and set it to NNT, and then counted how many that machine had and multiplied by 48 hours and I shouldn't run past the deadline on any of them, but then a thought occurred... there's a newfangled error these days that bails on a WU when it runs for more than 2x the estimated duration. So... to avoid basically losing everything after crunching most of the way through them first.. time for some client_state editing. Suspended network comms in BOINC, then ran 'net stop boinc' and then made a copy of the data directory and opened up client_state and added a 0 to the left of the decimal in the flops_est fields. Save, 'net start boinc' and checked.. now they all show 188 hours..and of course running in high priority mode. I didn't want crazy things to happen, so I did 'net stop boinc' and opened client_state back up and decided to cut the values down by approximately a third. The high-order digits were 28, so I changed them to 10. Save, 'net start boinc' and now they're 68 hours. Close enough. It should be able to smooth itself out from here. Of course, I'll keep an eye on it. I have already set that machine back to a MB+AP venue (had it on AP-only just to try to get at least the 11th "completed task" done), so hopefully, it doesn't gobble too many more of those APs up from the rest of you. I know some of you just cringe at the fact that your GPU does them in 30 minutes and then the wingmate ends up taking 47 hours with the Lunatics CPU app. That little machine has been pretty good to me over the years. It's been crunching for just over 7 years and is edging closer to 1M credits. I just wanted to share that weird scenario though. I'm pretty sure if I had set it to NNT and only gotten one AP assigned at a time, the whole debacle would have been avoided. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) ID: 1624758 ·

Lionel Send message Joined: 25 Mar 00 Posts: 680 Credit: 563,640,304 RAC: 597	Message 1624777 - Posted: 7 Jan 2015, 7:05:50 UTC - in response to Message 1624758. Still not getting any AP CPU WUs (and I have 1 box with none at the moment)... ID: 1624777 ·

JohnDK Volunteer tester Send message Joined: 28 May 00 Posts: 1222 Credit: 451,243,443 RAC: 1,127	Message 1624847 - Posted: 7 Jan 2015, 12:41:16 UTC Seems we're back to problems getting work. Before Tuesday's outage I had max GPU cache for the first time in about 2 months, now I'm down to half GPUs and it's continues going down. ID: 1624847 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.