"Parallelizing" problems.

Author	Message
Steven Meyer Send message Joined: 24 Mar 08 Posts: 2333 Credit: 3,428,296 RAC: 0	Message 897372 - Posted: 20 May 2009, 19:36:14 UTC I was reading a paper on distributed computing on Folding@Home's web site, in which is discussed the problems of using distributed computing for problems that are not as "Embarrassingly Parallel" as SETI, in which the "calculation requires that the processors frequently communicate information and is thus poorly suited for worldwide distributed computing". The article mentions "an algorithm has been developed that helps address the problems of both parallelization and communication by allowing loosely connected multiple processors to be used". In broad strokes, can someone describe this algorithm? I ask because it occurs to me that even for such calculations that require frequent communications between processors, there must be some small amount of independant calculation that can be done without communications between the processors. A number of these small independant calculations could be packaged together into a Work Unit for the client computers to do the small steps for hundreds or thousands of complete calculations. The Work Unit would be processed on a number of processors and then returned to the project's servers, just as SETI's are. The difference is that the calculation is not finished when it is returned, just a small step of the complete calculation. At the servers, the related calculations would then "communicate" whatever information needs to be shared at that point in the calculation, then to be packaged into new Work Units for the next step in a number of complete calculations. However, it may be that this would introduce so much network traffic that the whole process would bog down so that even with millions of PCs working on the tasks, it would take years to complete any usable computation. Also, the "coordination" tasks of gathering together the related sub-calculations may require a supercomputer just to get it done in a time frame that would be able to keep the client computers supplied with Work Units. I would be very much surprized if this has not been thought of before, and would be interested in hearing what has been discovered about this idea. ID: 897372 ·

skildude Send message Joined: 4 Oct 00 Posts: 9541 Credit: 50,759,529 RAC: 60	Message 897741 - Posted: 21 May 2009, 14:17:18 UTC that link isnt working for me In a rich man's house there is no place to spit but his face. Diogenes Of Sinope ID: 897741 ·

Virtual Boss* Volunteer tester Send message Joined: 4 May 08 Posts: 417 Credit: 6,440,287 RAC: 0	Message 897781 - Posted: 21 May 2009, 15:55:03 UTC - in response to Message 897741. Last modified: 21 May 2009, 16:03:23 UTC that link isnt working for me The link is http://]http://fah-web.stanford.edu/papers/SPScience2000.pdf Removing the first http://] gives A working link EDIT Looks like an old article to me. ID: 897781 ·

skildude Send message Joined: 4 Oct 00 Posts: 9541 Credit: 50,759,529 RAC: 60	Message 897889 - Posted: 21 May 2009, 20:56:30 UTC - in response to Message 897781. agreed the atricle site nothing after the year 2000 In a rich man's house there is no place to spit but his face. Diogenes Of Sinope ID: 897889 ·

enzed Send message Joined: 27 Mar 05 Posts: 347 Credit: 1,681,694 RAC: 0	Message 909983 - Posted: 22 Jun 2009, 1:58:56 UTC - in response to Message 897372. From the first post; "The article mentions an algorithm has been developed that helps address the problems of both parallelization and communication by allowing loosely connected multiple processors to be used." The actual alogrithm used is not shown, from the wording of the article it sounds like it is a straight statistical construct, which implies it has all the inherent problems associated with statistical based algorithmic code structures. For a computational problem that changes its "base nature" during the execution cycle of the data, a statistical model will at best produce a poor result and at worst miss the objective entirely. The type of computational problems one associates with variable data, requires the use of a different approach. You should use either rule based or neural-net based systems to achieve a timely result. If the problem is one of pattern recognition from visual or audio data then neural-nets would be my choice for filtering the data. If the problem is more complex then a rule based system (possibly combined with a neural-net) would be better. Your digital camera uses a rule-set to determine the best focus/aperture/timing/subjective weighting/compensation/ balance/field depth... etc using a very small bank of rules and some fuzzy-logic. The ability to auto-tune the network and achieve optimal packet traversal performance is what matters the most in these types of problems. Additionally the use of AI as the auto-tuning tool delivers higher performance levels and appears to have enough flexibility to start tackling the harder "General-AI" problems. ID: 909983 ·

Robbie Goodwin Send message Joined: 17 Sep 03 Posts: 3 Credit: 43,856 RAC: 0	Message 911919 - Posted: 27 Jun 2009, 0:59:59 UTC I apologize if I'm going off topic or missing something basic, and on a much more mundane and every-day level can anyone tell me why BOINC seems to want my machines to process tasks in parallel? My Progress pane usually shows several tasks partially complete. Doesn't that suggest that for pretty much the same time and effort, the project is getting fewer answers than it would if they were processed serially? Best wishes ID: 911919 ·

Steven Meyer Send message Joined: 24 Mar 08 Posts: 2333 Credit: 3,428,296 RAC: 0	Message 911933 - Posted: 27 Jun 2009, 1:51:59 UTC - in response to Message 911919. I apologize if I'm going off topic or missing something basic, and on a much more mundane and every-day level can anyone tell me why BOINC seems to want my machines to process tasks in parallel? My Progress pane usually shows several tasks partially complete. Doesn't that suggest that for pretty much the same time and effort, the project is getting fewer answers than it would if they were processed serially? Best wishes Parallel processing is normal for multicore processors such as your PowerMac. However, I think that you may be talking about the fact that at times there will be a Work Unit that has been partially completed and then put aside for a while as another Work Unit is processed. This is probably due to your BOINC client having downloaded a new Work Unit with an earlier deadline than the one that was being worked on at the time that the new one was downloaded. Thus, BOINC has decided to work on the new one with the short deadline first, and then it will go back to the other one later. ID: 911933 ·

Robbie Goodwin Send message Joined: 17 Sep 03 Posts: 3 Credit: 43,856 RAC: 0	Message 912178 - Posted: 28 Jun 2009, 0:18:25 UTC - in response to Message 911933. Thanks Steven, I won't be surprised if that's completely true, and I see there are levels on which it makes sense - and the way BOINC handles my machines doesn't seem to be one of them. Parallel processing may be normal for multicore processors including my PowerMac and doesn't that rationale presuppose that BOINC Tasks are already packaged in such a way that no one Task can be broken into chunks to be handed off to different cores? In that case, I should never see more than two Tasks running simultaneously, no? Short deadlines might come into play if the deadlines were in any way realistically near, and at the moment this Mac is running two BOINC Tasks almost head-to-head. One has a deadline of 20 07, the other of 21 07 - so we could happily abandon both and start again many times over in those three weeks unless deadlines are on this level irrelevant, or there is preposterous redundancy in the system, or BOINC has no realistic method of figuring average resource availability - all of which seem to spell inefficiency Are you seriously saying that BOINC Tasks are already packaged in such a way that none can be broken down among the cores? Best, ID: 912178 ·

Steven Meyer Send message Joined: 24 Mar 08 Posts: 2333 Credit: 3,428,296 RAC: 0	Message 912214 - Posted: 28 Jun 2009, 2:59:29 UTC - in response to Message 912178. Last modified: 28 Jun 2009, 3:05:28 UTC Parallel processing may be normal for multicore processors including my PowerMac and doesn't that rationale presuppose that BOINC Tasks are already packaged in such a way that no one Task can be broken into chunks to be handed off to different cores? In that case, I should never see more than two Tasks running simultaneously, no? There is a configuration option to tell BOINC how many cores there are, and mostly people with CUDA will use it to reduce the number of non-CUDA tasks that BOINC will run in order to keep one core available to feed the GPU. In Windows, you will find it in C:\Documents and Settings\All Users\Application Data\BOINC\client_state.xml <host_info> <p_ncpus>4</p_ncpus> Mine has 4 because I have 4 cores. Yours should have 2 on the PowerMac. It is possible, therefore, to see more tasks running than there are cores, if your configuration is saying that there are 3 cores instead of 2. Short deadlines might come into play if the deadlines were in any way realistically near, and at the moment this Mac is running two BOINC Tasks almost head-to-head. One has a deadline of 20 07, the other of 21 07 - so we could happily abandon both and start again many times over in those three weeks unless deadlines are on this level irrelevant, or there is preposterous redundancy in the system, or BOINC has no realistic method of figuring average resource availability - all of which seem to spell inefficiency Are you seriously saying that BOINC Tasks are already packaged in such a way that none can be broken down among the cores? Yes, the tasks cannot be broken down. They are made already broken down. If you are not running the computer 24/7, or BOINC is not running 24/7, or you have other projects besides SETI that BOINC is giving some of your time to, then the 20th of July could be a rather short deadline. But you need to look closely at the status column to be sure that all of the tasks that you think are running actually are. They may have something like "Running", "Running, High Priority", or "Waiting to run". The first two are actually running while the third is not. Are there any PowerMac users reading this who could provide some info more specific to PowerMacs? ID: 912214 ·

Robbie Goodwin Send message Joined: 17 Sep 03 Posts: 3 Credit: 43,856 RAC: 0	Message 912343 - Posted: 28 Jun 2009, 15:45:38 UTC - in response to Message 912214. Again thanks and again, those are lovely theories. I haven't seen any configuration option dealing with cores, or anything like you suggest dealing even with processors - and don't you think that C:\Documents and Settings\All Users\Application Data\BOINC\client_state.xml <host_info> <p_ncpus>4</p_ncpus> is beyond all but about 1% of Users? Granted, BOINC-SETI users are not Joe Average, and still I suggest that's beyond many, if not most Users and anyone relying on it, ought not to. Do you seriously think UCB isn't capable of interrogating Systems directly, rather than relying on manual help from Users? That a task which takes hours to complete can't be broken down beggars belief and we'd better skip that. Even if it was true, it would be true only in inefficient environments, which comes back to where I started. Not running the computer or BOINC 24/7, or having other projects besides SETI makes three weeks a short deadline only in inefficient environments, which comes back to where I started. The question of PowerMac-specific info is so much of a red herring, it just completely spoiled the whole kettle of fish! Still, thanks again. ID: 912343 ·

Steven Meyer Send message Joined: 24 Mar 08 Posts: 2333 Credit: 3,428,296 RAC: 0	Message 912493 - Posted: 28 Jun 2009, 23:30:55 UTC - in response to Message 912343. What is it that you want to know? ID: 912493 ·

enzed Send message Joined: 27 Mar 05 Posts: 347 Credit: 1,681,694 RAC: 0	Message 912538 - Posted: 29 Jun 2009, 6:00:14 UTC - in response to Message 912343. I think some of the confusion relates to the intrinsic nature of the problem, the actual task of running multiple passes over the same single data-set with differing sets of mathematical functions/equations. I see your point of view and have "been there tried that" with other quite similar tasks on a 700 cpu virtual-machine cluster. There is a point where the network comms for interprocess communications grows almost out of control.. but thats a tuning issue and also impacts on the nature of each job. you never get 100% efficiency when you split tasks up, you tend to get about 85 to 90 % (if the jobs are "very independent" as you do have to account for the processing/management overhead involved and also collating the final result from each core into a final result. The system would need to be completely rewritten from its present "serial processing" functionality (for the client machines) into a true parallel processing functionality with automatic core count detection defining just how many copies of the same single data-set get distributed across the computer, and also how many of the "total-math-functions" are carried out on each core so as to split the job into 2 or 4 or 8 or CUDA-128 PS3-8 or ??... It would involve a rewrite from the ground up. And you still have the serial nature of the seti servers themselves and the job splitting task from the main-feed. ID: 912538 ·

Steven Meyer Send message Joined: 24 Mar 08 Posts: 2333 Credit: 3,428,296 RAC: 0	Message 912549 - Posted: 29 Jun 2009, 7:12:13 UTC Indeed, the CUDA code is an example of massively parallel processing. The single code stream working on multiple data streams simultaneously. But then, that is what the GPU hardware was designed to do from the ground up. Even still, the GPU works on only one Work Unit at a time. When we are looking at multi-core processors, such as the Q6600 with 4 cores, then the simplest thing to do is to work on 4 Work Units in parallel, one per core. This way, the same code and the same data structures can be used on single-, double-, quad-, or any number of cores, without any changes. To do a single Work Unit in parallel on all 4 cores of the Q6600 would not improve the throughput, as compared to doing 4 completely independent Work Units, one per core. Sure, you may get one Work Unit done in about 25% as much time, but that is the same as doing 400% as many Work Units in 100% as much time. When you think about it, the whole project is massively parallel, with thousands of independent processors all over the world working on small parts of the whole, one Work Unit at a time. How much more parallel do you think it needs to be? ID: 912549 ·

©2025 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.