Message boards :
Number crunching :
INFORMATION PLEASE, PRETTY PLEASE and PLEASE AGAIN
Message board moderation
Author | Message |
---|---|
Nick Cole Send message Joined: 27 May 99 Posts: 97 Credit: 3,806 RAC: 0 |
Why can't the websites (inc Classic) be kept up to date with information about what is happening? Such a simple thing to avoid annoying people and generating so many questions that appear to go unnoticed by the project team. Such as a time when something is going down, so that we can ensure that we have downloaded cacheable work, or uploaded that completed. What is going to happen over the weekend, starting soon at Californian time? We are the proverbial mushrooms...kept in the dark and fed on s**t. |
JAF Send message Joined: 9 Aug 00 Posts: 289 Credit: 168,721 RAC: 0 |
> Why can't the websites (inc Classic) be kept up to date with information about > what is happening? > > Such a simple thing to avoid annoying people and generating so many questions > that appear to go unnoticed by the project team. > > Such as a time when something is going down, so that we can ensure that we > have downloaded cacheable work, or uploaded that completed. What is going to > happen over the weekend, starting soon at Californian time? > And why couldn't the community (message boards) be hosted somewhere off the project so when there's power or construction outages or software/hardware maintenance, the people would have a place to get project information and still prticipate in the message forums? Seems "all the eggs are in one basket" as the saying goes. <img src='http://www.boincsynergy.com/images/stats/comb-912.jpg'> |
Thierry Van Driessche Send message Joined: 20 Aug 02 Posts: 3083 Credit: 150,096 RAC: 0 |
As for a few months now, we have a quite good News part on the home page: News March 2, 2005 See Technical News for power outage updates. February 28, 2005 Around 18:00 UTC we had another unexpected lab-wide power outage. The cause of these random failures is being investigated by campus, but all our systems/databases survived without any corruption. We will be down as we work to further protect ourselves. More info and updates in Technical News. February 25, 2005 The database has been restored and all services are back up. The data server is very busy right now. There will be upload and download problems until the load normalizes. More info in Technical News. And there is also the Technical News. Some months ago, there was only now and then something to read. IMHO, I believe the people in Berkeley are doing their best to keep us informed as good as possible. Read the News at the homepage Have a look at the Technical News Look if your question is not answered at the Boinc Wiki Best greetings, Thierry |
Divide Overflow Send message Joined: 3 Apr 99 Posts: 365 Credit: 131,684 RAC: 0 |
From the Technical News page: March 4, 2005 - 19:00 UTC The project is currently up but may go down (and back up) without announcement as we try to get the UPS's to talk to our servers |
Nick Cole Send message Joined: 27 May 99 Posts: 97 Credit: 3,806 RAC: 0 |
Yes guys, I know where the news pages are. The trouble is the lack of detail. Times are particularly important. Why do the UPS testing during the only time when 5 million+ users are trying to get in? A commercial operation (apart from having them installed from day one) would do this out of user hours anyway. I know from experience that it is difficult to keep expectant users informed, but we are only looking at maybe a few minutes of typing with some reliable(ish) estimates. I don't think any of the team are sitting on their backsides doing nothing. I am sure that tempers at Berkeley are somewhat frayed as well, but there are lots of us out here looking for a little more than a few maybes. A little bit of info goes a long way towards diverting all the user anger. |
Thierry Van Driessche Send message Joined: 20 Aug 02 Posts: 3083 Credit: 150,096 RAC: 0 |
> Why do the UPS testing during the only time when 5 million+ users are trying > to get in? A commercial operation (apart from having them installed from day > one) would do this out of user hours anyway. "Out of user hours" ?? This project is running by users all over the world. Meaning, the servers are contacted 24/7/365. What would then be the most appropriate time to fix it ? Read the News at the homepage Have a look at the Technical News Look if your question is not answered at the Boinc Wiki Best greetings, Thierry |
Toby Send message Joined: 26 Oct 00 Posts: 1005 Credit: 6,366,949 RAC: 0 |
> Yes guys, I know where the news pages are. The trouble is the lack of detail. > Times are particularly important. The fact is, they probably don't know exactly when they are going to shut things down next. Hence the "we may be up and down for the next several hours". I think this is more than enough detail. Over the past week since the first power outage, there has been a LOT of communication on the technical news page. The flow of information has been much better than it was during the first few months of the project. They have spent a couple minutes every day keeping us up to date. I for one appriciate this a lot. > Why do the UPS testing during the only time when 5 million+ users are trying > to get in? A commercial operation (apart from having them installed from day > one) would do this out of user hours anyway. Uh... there are users from all over the world doing seti@home. "out of user hours" do not exist. If you look at their bandwidth graphs when the project is stable, they remain pretty much constant 24 a day. So if there is no difference in the number clients trying to connect at any given time, then why come in to work at 3 AM to do something you could do at 3 PM? > I know from experience that it is difficult to keep expectant users informed, > but we are only looking at maybe a few minutes of typing with some > reliable(ish) estimates. I don't think any of the team are sitting on their > backsides doing nothing. I am sure that tempers at Berkeley are somewhat > frayed as well, but there are lots of us out here looking for a little more > than a few maybes. A little bit of info goes a long way towards diverting all > the user anger. Sign up for another BOINC project to keep your computer busy when seti goes down. This is why BOINC was created. I haven't been idle since some time in October I think. Then it really doesn't matter when seti is up/down. A member of The Knights Who Say NI! For rankings, history graphs and more, check out: My BOINC stats site |
D.J. Schweitz Send message Joined: 29 Oct 02 Posts: 157 Credit: 871,078 RAC: 0 |
Nich if you dont consider this detailed information, then perhaps you should consider applying for a job at Berkeley as either the "news editor" or the head electrician on campus. March 4, 2005 - 19:00 UTC The project is currently up but may go down (and back up) without announcement as we try to get the UPS's to talk to our servers March 3, 2005 - 23:30 UTC The UPS communication cables arrived and we spent a fair amount of time trying to get the UPSes to work. No dice. We tried everything (even going so far as to beep out the cables to make sure the pinouts were correct). Since it was wasting too much time we bailed and restarted the project for now. We'll likely shut it down for the evening again in a few hours. March 3, 2005 - 17:30 UTC The project is currently up. If the UPS communication cables arrive today we will have an outage to test the graceful shutdown procedures. If that goes well, we will bring the project back up and keep it up. March 2, 2005 - 19:00 UTC The building power is still untrustworthy. A diagnostic power outage is going to be scheduled for some time next week. To clarify our current situation, all of our servers are in fact on UPSs and we suffered no database damage from the power outage this past Monday. What we do not have in place yet is a graceful shutdown system should the power fail and we are not here. We have installed the software on the servers that will enable them to recognize when they are on battery backup. We are waiting on the special communication cables that are necessary to connect the UPSs to the servers. They had to be special ordered and we expect them tomorrow. While we have been down these last 2 days, we have been doing various maintenance tasks. Currently we are running a database backup. Once that is done, we plan to bring the project up for half a work day or so today. We will shut it down again at 01:00 UTC. Click below for our Team Website |
Hans Dorn Send message Joined: 3 Apr 99 Posts: 2262 Credit: 26,448,570 RAC: 0 |
OK, another whine from me :o) Could we pretty pretty please keep the project up until we had a chance to fill our caches? Regards Hans |
Nick Cole Send message Joined: 27 May 99 Posts: 97 Credit: 3,806 RAC: 0 |
> > Why do the UPS testing during the only time when 5 million+ users are > trying > > to get in? A commercial operation (apart from having them installed from > day > > one) would do this out of user hours anyway. > > "Out of user hours" ?? > > This project is running by users all over the world. Meaning, the servers are > contacted 24/7/365. What would then be the most appropriate time to fix it ? > > Yes the term may be confusing, but intended to illustrate that it is a 24hr operation, BUT, if they are going to limit user connectivity, and have some downtime, then that downtime can be used to undertake the upgrades, installation etc. That is what would happen commercially. So make better use of the period when the system is manned during the day, by allowing users to upload/update caches etc, especially as it takes several hours to stabilise after any outages. That allows users to continue working, and while the system is down or subject to the fragile power time then install the software (which isn't difficult). I am sure that the team are working hard but so are the millions of users of BOINC and classic, who are mostly waiting with no info. Vague statements, no formal announcements, a few words or sentences, no thought about what will happen for the next 2 days, no date for when professional electricians are going to come in. I find it extremely odd that a fault like this is not getting looked at for several days. I haven't been anywhere that such a laid back approach is seen on something as important. And it still doesn't take more than a few minutes to put something informative on a website. Many people who participate in this are IT and technical professionals already, so they are aware of the issues, but importantly aware of what needs to be done, how to do it and when. |
ponbiki Send message Joined: 9 Feb 04 Posts: 114 Credit: 115,897 RAC: 0 |
> Yes the term may be confusing, but intended to illustrate that it is a 24hr > operation, BUT, if they are going to limit user connectivity, and have some > downtime, then that downtime can be used to undertake the upgrades, > installation etc. That is what would happen commercially. So make better use > of the period when the system is manned during the day, by allowing users to > upload/update caches etc, especially as it takes several hours to stabilise > after any outages. That allows users to continue working, and while the > system is down or subject to the fragile power time then install the software > (which isn't difficult). You're looking at this from a viewpoint of a commercial operation, which is nice but wouldn't apply to something that is run in a scientific setting. They have to work not only on this project but also do research that is vital to obtaining government grants, private grants, sponsorships, not to mention teach and also conduct other research as well. They have no luxury to spend all their time on this problem. Plus, their budget doesn't go that far to paying their people to stay past certain hours to work on this project, especially since their time is Pacific Standard Time. What would be an "optimal time" to do it? Hawaii is 2 hours behind the West Coast, New York is 3 hours ahead, who gets to decide? They take it down when they are there to work out any bugs and that happens to be during business hours. Better they take it down when they're there than to have it crash when they're not. > I am sure that the team are working hard but so are the millions of users of > BOINC and classic, who are mostly waiting with no info. Vague statements, no > formal announcements, a few words or sentences, no thought about what will > happen for the next 2 days, no date for when professional electricians are > going to come in. I find it extremely odd that a fault like this is not > getting looked at for several days. I haven't been anywhere that such a laid > back approach is seen on something as important. Quoting our Secretary of Defense, "you go to work with the project you have, not the project you want or the project you'd like".(YEah, that still sounds stupid regardless of context.) Maybe if they get more money, they can stay longer and work a full 40-60 hour weeks but given their resources and time spent on troubleshooting, they've done a hell of a job. |
KRMurphy Send message Joined: 10 Nov 04 Posts: 4 Credit: 164,054 RAC: 0 |
> OK, another whine from me :o) > > Could we pretty pretty please keep the project up until we had a chance to > fill our caches? > Jeez, you crybabies whine if they're trying to make sure the project doesn't go down again, and will wail like banshees if it does go down again and you lose a half hour of your precious "crunching" time or a point doesn't get credit within a half hour. I'm sure glad I'm not one of the developers who have to listen to this junk. Commercial organizations are one thing, and collaborative projects are quite another. |
Toby Send message Joined: 26 Oct 00 Posts: 1005 Credit: 6,366,949 RAC: 0 |
> So make better use of the period when the system is manned during > the day, by allowing users to upload/update caches etc, especially > as it takes several hours to stabilise after any outages. Considering the 24/7 load the best use of their time while they are at work is to do maintenance and then just let things run and let the users fill their caches while they are at home sleeping. > I am sure that the team are working hard but so are the millions of users of > BOINC and classic, who are mostly waiting with no info. As I said... sign up with another BOINC project! > Vague statements, no > formal announcements, a few words or sentences, no thought about what will > happen for the next 2 days, no date for when professional electricians are > going to come in. I find it extremely odd that a fault like this is not > getting looked at for several days. I haven't been anywhere that such a laid > back approach is seen on something as important. Read the technical news page again! "A diagnostic power outage is going to be scheduled for some time next week." Right there is your "professional electricians" looking at things. I'm sure there were others looking at other times but there is no reason for us to know about each and every time someone pulled a voltmeter out. They gave us detailed information on what had happened to the servers and said that "power is still untrustworthy" and that they were leaving the servers off because of it. That is really all we need to know. I'm not sure what else you are looking for. Something like this perhaps? "Joe The electrician came in at 10:04 this morning. At 10:06 he opened the circuit breaker panel. At 10:14 he flipped breaker #3 off. At 10:16 he picked his nose. At 10:20 he scratched his posterior..." The way I see it, we have all the information we need to have. If you are out of work, sign up with another BOINC project! A member of The Knights Who Say NI! For rankings, history graphs and more, check out: My BOINC stats site |
FloridaBear Send message Joined: 28 Mar 02 Posts: 117 Credit: 6,480,773 RAC: 0 |
> > OK, another whine from me :o) > > > > Could we pretty pretty please keep the project up until we had a chance > to > > fill our caches? > > > > Jeez, you crybabies whine if they're trying to make sure the project doesn't > go down again, and will wail like banshees if it does go down again and you > lose a half hour of your precious "crunching" time or a point doesn't get > credit within a half hour. > > I'm sure glad I'm not one of the developers who have to listen to this junk. > > Commercial organizations are one thing, and collaborative projects are quite > another. > Well, yes and no. I don't think I've D/L'ed more than a couple SETI WU's in a week. I've now signed up for Einstein to keep the CPUs busy, but Berkeley is going to lose participants due to lack of work. The servers seem to come up during the day, but I sure can't get any WU's to upload or download before they're back down again at night. Just my two cents. |
mikey Send message Joined: 17 Dec 99 Posts: 4215 Credit: 3,474,603 RAC: 0 |
> > > Why do the UPS testing during the only time when 5 million+ users > are > > trying > > > to get in? A commercial operation (apart from having them installed > from > > day > > > one) would do this out of user hours anyway. > > > > "Out of user hours" ?? > > > > This project is running by users all over the world. Meaning, the servers > are > > contacted 24/7/365. What would then be the most appropriate time to fix > it ? > > > > > Yes the term may be confusing, but intended to illustrate that it is a 24hr > operation, BUT, if they are going to limit user connectivity, and have some > downtime, then that downtime can be used to undertake the upgrades, > installation etc. That is what would happen commercially. So make better use > of the period when the system is manned during the day, by allowing users to > upload/update caches etc, especially as it takes several hours to stabilise > after any outages. That allows users to continue working, and while the > system is down or subject to the fragile power time then install the software > (which isn't difficult). > > I am sure that the team are working hard but so are the millions of users of > BOINC and classic, who are mostly waiting with no info. Vague statements, no > formal announcements, a few words or sentences, no thought about what will > happen for the next 2 days, no date for when professional electricians are > going to come in. I find it extremely odd that a fault like this is not > getting looked at for several days. I haven't been anywhere that such a laid > back approach is seen on something as important. > > And it still doesn't take more than a few minutes to put something informative > on a website. Many people who participate in this are IT and technical > professionals already, so they are aware of the issues, but importantly aware > of what needs to be done, how to do it and when. > If it bothers you sooo much why not move on to another project, Boinc/Einstein is up and running just fine. If you can't live with the MUCH BETTER THAN IN THE PAST information coming from Berkeley, then you may need to look at why you are participating in the project in the first place. This IS a scietific experiment and NOT a commercially run operation generating ANY kind of money. IN fact this whole project COSTS money! It is a drain on resources and is stretching to the breaking point all the resources Berkeley can get. A commercial project would have shutdown permantly LONG AGO!!! I guess you don't remember the days with Classic when it would go down and come back up and then a week later a message would come out saying they were down and are now back up. The info coming from Berkeley has IMPROVED IMMENSLY since early times! Can it get better, sure. But then if I won the lottery, so would my personal financial situation! They are working as fast as they can and will fix the system as soon as they can! They have said REPEATEDLY to crunch other projects when they are down, guess you didn't listen. |
shady Send message Joined: 2 Feb 03 Posts: 40 Credit: 2,640,527 RAC: 0 |
The info on what is happening is much beter than it used to be and the team should be congratulated on that . How do you put an accurate time frame on how long it will take to solve a problem ? Do you update the website every 5 minutes with every last little detail or do you devote your time to trying to solve the actual problem instead. Personaly I think that the project staff have taken the sensible option of only running the servers whilst they are in attendance , untill they sort out the UPS communication problems. As others have said it makes perfect sense to try and sort out these UPS problems during their normal business hours as this is a worldwide 24/7 project , which does not have slack times of the day anyway.The combined effect of this , is that the servers are currently down more than they are up, but what would be a sensible alternative ? Sure its annoying when the project is down , but there are other boinc projects that are up and running with work to crunch. I dont think some people have grasped the concept yet , that if the seti team stick to their plans , and the amount of computing power available to them continues to outstrip their needs , then its likely that there will be times when you simply wont be able to get seti work to crunch. This was why boinc was created so that this spare computing power could be used for other projects , so why not get used to the concept of crunching for various projects now, that way when one project is down/out of work , your machines are still crunching. Shady <img src='http://www.boincsynergy.com/images/stats/comb-1527.jpg'> |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
> And why couldn't the community (message boards) be hosted somewhere off the > project so when there's power or construction outages or software/hardware > maintenance, the people would have a place to get project information and > still prticipate in the message forums? Seems "all the eggs are in one basket" > as the saying goes. ... because the community is tied to the database -- people who post are actually people who are signed up for the project. That's why we don't have nigerian scams and spammers posting constantly. So, to move the forums off-site means replicating the database off-site, or simply having a site outage at Berkeley also bring down the off-site forum (because it depends on the Berkeley databases). Seriously, I've just been through the kinds of problems they're having at Berkeley right now, and while they're frustrating now, they're also pretty unusual. ... and they've had an exceptionally bad run of luck lately. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
> Personaly I think that the project staff have taken the sensible option of > only running the servers whilst they are in attendance , untill they sort out > the UPS communication problems. As others have said it makes perfect sense to > try and sort out these UPS problems during their normal business hours as this > is a worldwide 24/7 project , which does not have slack times of the day > anyway.The combined effect of this , is that the servers are currently down > more than they are up, but what would be a sensible alternative ? Don't forget that there are SETI/BOINC users out there who don't even know the main site is down. BOINC will crunch what it has, and report when it can. ... and there are others that are crunching two or three projects, for them the outages are even less important. |
Nick Cole Send message Joined: 27 May 99 Posts: 97 Credit: 3,806 RAC: 0 |
> > > > Why do the UPS testing during the only time when 5 million+ > users > > are > > > trying > > > > to get in? A commercial operation (apart from having them > installed > > from > > > day > > > > one) would do this out of user hours anyway. > > > > > > "Out of user hours" ?? > > > > > > This project is running by users all over the world. Meaning, the > servers > > are > > > contacted 24/7/365. What would then be the most appropriate time to > fix > > it ? > > > > > > > > Yes the term may be confusing, but intended to illustrate that it is a > 24hr > > operation, BUT, if they are going to limit user connectivity, and have > some > > downtime, then that downtime can be used to undertake the upgrades, > > installation etc. That is what would happen commercially. So make > better use > > of the period when the system is manned during the day, by allowing users > to > > upload/update caches etc, especially as it takes several hours to > stabilise > > after any outages. That allows users to continue working, and while the > > system is down or subject to the fragile power time then install the > software > > (which isn't difficult). > > > > I am sure that the team are working hard but so are the millions of users > of > > BOINC and classic, who are mostly waiting with no info. Vague > statements, no > > formal announcements, a few words or sentences, no thought about what > will > > happen for the next 2 days, no date for when professional electricians > are > > going to come in. I find it extremely odd that a fault like this is not > > getting looked at for several days. I haven't been anywhere that such a > laid > > back approach is seen on something as important. > > > > And it still doesn't take more than a few minutes to put something > informative > > on a website. Many people who participate in this are IT and technical > > professionals already, so they are aware of the issues, but importantly > aware > > of what needs to be done, how to do it and when. > > > If it bothers you sooo much why not move on to another project, Boinc/Einstein > is up and running just fine. If you can't live with the MUCH BETTER THAN IN > THE PAST information coming from Berkeley, then you may need to look at why > you are participating in the project in the first place. This IS a scietific > experiment and NOT a commercially run operation generating ANY kind of money. > IN fact this whole project COSTS money! It is a drain on resources and is > stretching to the breaking point all the resources Berkeley can get. > A commercial project would have shutdown permantly LONG AGO!!! > I guess you don't remember the days with Classic when it would go down and > come back up and then a week later a message would come out saying they were > down and are now back up. > The info coming from Berkeley has IMPROVED IMMENSLY since early times! Can it > get better, sure. But then if I won the lottery, so would my personal > financial situation! They are working as fast as they can and will fix the > system as soon as they can! They have said REPEATEDLY to crunch other projects > when they are down, guess you didn't listen. > > Actually I do listen, but I also watch and observe. As an IT professional responsible for running very complex and major systems I find the lack of information coupled with the failure to install shutdown software on UPS (and UPS themselves?) somewhat of concern. If there is user connectivity up-time then work on the system should not be carried out at the same time, especially when the majority of the time evrything is switched off because it is unattended. The result being that for the first 4 or 5 hours in the day systems are overloaded with the amount of connection attempts and transfers, all taking place in that brief period instead of spread out over 24 hours. As for times then an up-front notice stating when it will start and when it goes down allows for people to adjust what they are doing without having to navigate to an obscure page. That doesn't stop maintenance but allows the millions of users or participants to continue working, as the project team is not actually just the people at Berkeley it includes all of us!!!! Something that occasionally I think some people forget. In addition running mirror sites at other locations from which workunits can be downloaded and uploaded to makes a lot of strategic sense. The collation can be carried out at Berkeley as at present and it merely needs a data feed which can even be tape or disk, to a different site to maintain continuity which can be done during 'working hours' without having to worry about maintaining the user connectivity processes 24hr per day from a site that would appear to be unsuitable, though it has been pretty good for the last 5 years. Unfortunately it has now built up a loyal following in excess of 400,000 continuous and regular users so a bit of thought for what they are doing helps. The amount of time wasted by us in trying to get and maintain things working is enormous. I have spent 3 hours today getting cached units down and up loaded not least tweaking the settings to ensure that the impact is minimised. Multiply that by the number of users completing more than 1 unit every few days and the time spent globally is HUGE. I don't envy the team struggling with the consequences, but a few more words than the terse couple of sentences does recognise the impact on us. Being critical is not intended as any slur, but a spur for better things. Speaking from experience I have been on the operations end of similar problems and user communication is the key to keeping them off your back. I do not run BOINC, having tried it (and got told that 5 GB of disk space was not enough so it refused to download anything) and am waiting for a version that can replicate processing functionality as for classic. I do not wish to use processing power for other projects, having devoted time since May 1999 on this. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
> Why do the UPS testing during the only time when 5 million+ users are trying > to get in? A commercial operation (apart from having them installed from day > one) would do this out of user hours anyway. We're looking at this from a U.S. West Coast point-of-view. The user base is all over the place -- Australia, Europe, Asia -- in every time zone. That means inconveniencing __somebody__ no matter what time you do the work. You're also assuming that an outage stops everyone and every thing. It doesn't. BOINC is like E-Mail. Take your mail server down for an hour, bring it back up, and mail still comes through -- just delayed a bit. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.