Message boards :
Number crunching :
Update on Linux 64 -Nividia-V8-MB ?????
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next
Author | Message |
---|---|
William Send message Joined: 14 Feb 13 Posts: 2037 Credit: 17,689,662 RAC: 0 |
Estimates don't matter. that affects cache, not the crunch time allocated to a project. A person who won't read has no advantage over one who can't read. (Mark Twain) |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
that affects cache, not the crunch time allocated to a project. Not the crunch time, the server and clients estimates of your crunch time. Don't confuse reality with a crappy simulation (much as it might seem appropriate sometimes). "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
Guys, if you've quite finished, I think all this goes back to... Guys, don't know what are you talking about...when there's no job for my Quadro 1700 on Ubuntu 14.04LTS x64: We're interpreting those %ages as a resource share question, yes? Can't be adding up to 110%. Resource share, in the long term and without external constraint (like no application being available on the main project yet, hence no tasks) is maintained by Client work fetch decisions - overworked/underworked projects fetch fewer/more workunits respectively. The whitepaper on this subject - ClientSchedOctTen - talks in terms of REC: The recent estimated credit REC(P) f a project P is maintained by the client... For typical hardware, a project with GPU applications will accrue REC far faster than a project with CPU applications only (because of the higher peak_flops). That would (and does) depress the work fetch priority greatly, skewing the time allocation between projects. Projects with CPU applications only divide their CPU time (in the long-term average) in accordance with Resource Share: projects with GPU applications tend to do no CPU work at all, because the REC contribution from even a short period of GPU crunching swamps the CPU projects. That's from the white paper. I suspect the actual implementation (which would require reference back to the notes from the code-walking which took place three years ago) differs in at least two respects: 1) The white paper talks in terms of a single 'scheduling priority': in practice, there are two priority values maintained - one to decide which project to request work from next, and the other to decide which cached task to run next. 2) The white paper talks in terms of 'peak_flops' only. Unless there's some push back from the server in terms of APR, we know that the client only knows 'benchmark' speeds for CPUs (which underestimate throughput), and 'marketing' speeds for GPUs (which overestimate throughput). That would tend to suggest that the client would form an even more skewed assessment of the Resource Share achieved in practice between CPU and GPU projects. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Thanks Richard, Yes Chaos is a harsh mistress. Here is a video on the subject Julia Saori tweeted me: https://youtu.be/fUsePzlOmxw "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
William Send message Joined: 14 Feb 13 Posts: 2037 Credit: 17,689,662 RAC: 0 |
Peak flops are from benchmarks iirc yes. As I said, if you have both GPU and CPU it gets messy. But still, all the numbers for scheduling are exclusively client generated, with no input from server. And I suppose he meant he assigned resource shares of 10 and 100 respectively. However, if a project doesn't have the right kind of app for a system, it's quite irrelevant how big the share is... That would only kick in if at some point an app became available. If you want to see what's going on under the hood, you need to enable work_fetch_debug and cpu_sched_debug. A person who won't read has no advantage over one who can't read. (Mark Twain) |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
...all the numbers for scheduling are exclusively client generated, with no input from server. Exactly, the client side projected_flops, which is the key estimate component of CreditNew, a closed loop control system that estimates how long the tasks should take. It is read server side in estimate_duration(), which is then scaled by the available fraction. [Edit2:] Correction: projected_flops is updated server side as data accumulates [Edit:] My issues with the mechanism don't directly connect to resource share, or what is done with the resulting throughput and elapsed estimate(s), but directly to noticeable classic control systems engineering instabilities in those estimates that propagate through the mechanism. Before host+app convergence: Slow convergence Small change after host+app convergence: overshoot and/or ringing, or nothing, then sudden jolt. ---> sensitivity to initial conditions "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
All of which is fascinating, in a slow-motion-car-crash sort of way, but doesn't really address KLiK's question about not getting Linux 64 -Nividia-V8-MB tasks (because of the app not being ready yet), or how many he will run when it is ready. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
True, Reposting My prior posts, which I beleive address that: [Edit: made link clickable] If you're looking for or expecting stock deployment on main, there are some problems to solve first, at outlined in the previous post. Probing Cuda build went to beta last week, and we determined more Cuda versions are needed to cover older Linux distribution kernels, Cuda 4.2 looking like the weapon of choice to cover the majority on that platform. Since then, Some slow progress, though more beta packages are available when Eric's ready for them (and will upload to my site later) "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
William Send message Joined: 14 Feb 13 Posts: 2037 Credit: 17,689,662 RAC: 0 |
Yes we know all that in its glory, estimates, credit APR, the whole rigmarole. That's not my point. The question was 'how does resource share affect how boinc distributes available resources across projects?' And that, in steady state operation, is a simple % time allocation, with cpu and gpu time weighted ( factored) according to peak_flops from CPU benchmark and manufacturer peak_flops for GPU. That's how it's done and the whole messy inadequate newcredit loop doesn't come into it at that point. The system has its very own shortcomings, but Credit and estimates are not part of it, and you got that wrong in your initial answer to KLiK. Estimates do indeed come into play when rr_sim finds HP/EDF is needed. But only to schedule priority. A person who won't read has no advantage over one who can't read. (Mark Twain) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
At some point I'd like to look into the code to find out whether Resource Share - which is a function of REC - is thereby a function of peak flops (as the white paper says), or of some guesstimated fraction of peak flops, or of something derived from the real world via APR. But that requires a codewalk, and is properly for another thread. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Estimates do indeed come into play when rr_sim finds HP/EDF is needed. But only to schedule priority. And also to inhibit additional work fetch, which is a problem if the project+app on board has unconverged overinflated estimates (likely on a new host+app). Possible to give a stuck anon platform project+app a kick with a realistic <flops> entry, but given KliK (I imagine) is running beta as stock, he'd likely have to just keep processing beta work and hope times converge, then resource share and work fetch should go to normal. (assuming he installed the Linux app under anon platform, which I don't think he had from what I could tell, and posted about stock delays accordingly) "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
KliK (I imagine) is running beta as stock He was when I checked yesterday. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
At some point I'd like to look into the code to find out whether Resource Share - which is a function of REC - is thereby a function of peak flops (as the white paper says), or of some guesstimated fraction of peak flops, or of something derived from the real world via APR. Will have to scan the most recent changes at leisure (since I resynced my Git clone Earlier from a 6 month old clone), but if unchanged since then it should still be avp->flops, which (from memory) was initialised to a fraction of peak flops, or read from an app_info <flops> entry. That'll need checking though, since avp->flops is accessed or manipulated in about 19 of the cpp files in client, including RR_SIM and work fetch. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
William Send message Joined: 14 Feb 13 Posts: 2037 Credit: 17,689,662 RAC: 0 |
At some point I'd like to look into the code to find out whether Resource Share - which is a function of REC - is thereby a function of peak flops (as the white paper says), or of some guesstimated fraction of peak flops, or of something derived from the real world via APR. iirc REC draws off manufacturer GPU peak_flops and benchmark CPU peak_flops. we are talking global flops for scheduling not app flops. A person who won't read has no advantage over one who can't read. (Mark Twain) |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
It would seem to me odd, to use an imaginary figure, when a real one is on hand, but whatever. I've traced the REC updates, and they point to a new mystical number I haven't found the origin of yet called 'relative_speed' [Edit:] a bit later, looks like relative speed is initialised to coprocs.coprocs[i].count*0.2*coprocs.coprocs[i].peak_flops/cpu_flopsgiving a unitless ratio of estimated GPU speed to host single cpu core speed (likely Whetstone) Then later for REC accounting is turned into total peak estimate by multiplying by host_pfpops (Whetstone). So the same as CreditNew's initial [new project app] estimate, only 4 x faster. Probably won't matter for work fetch itself single project, as it should over-request due to optimism, though I can easily see those overrequests clogging up project switching if fulfilled. Typically single instance GPU app efficiency is in the realm of 4-10%. (10% being the compute efficiency of the extremely highly optimised CUFFT library functions at the sizes we use, for example) I do not know why one might want to use a 5% efficiency estimate on one hand, but a 20% efficiency estimated on the other. That's not in the comments. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
Decice peak flops 9638. Apr 930-1000 (varying). Whar is the Real performance? To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Decice peak flops 9638. For your 'special' application ~10% of theoretical peak is about right (So about 1TFlop, [the averaging used is pretty volatile though]) With other improvements you mentioned to me, plus some of my own special sauce, I believe we will approach/pass 20% in the future. That's with some new algorithms though. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
fractal Send message Joined: 5 Mar 16 Posts: 5 Credit: 1,000,547 RAC: 0 |
Hi! Thanks for the config. It works for me as well on a pair of 750TI's using driver 355.11 and ubuntu 12.04 I found that a single work unit per card kept GPU utilization around 80%. Two units per card kept the GPU utilization around 90% and three units per card keep the cards at 98/99% without increasing run times by 3x over single. One gotcha was the need to "chmod +x setiathome_x41zi_x86_64-pc-linux-gnu_cuda60" after "p7zip -d setiathome_x41zi_x86_64-pc-linux-gnu_cuda60.7z" so boinc would run it. |
KLiK Send message Joined: 31 Mar 14 Posts: 1304 Credit: 22,994,597 RAC: 60 |
All of which is fascinating, in a slow-motion-car-crash sort of way, but doesn't really address KLiK's question about not getting Linux 64 -Nividia-V8-MB tasks (because of the app not being ready yet), or how many he will run when it is ready. well, we (those of us running Linux also) have to run BETAs to develop an app which works in v8... ;) but, don't know why the BOINC didn't pick any of v7 AP?! :/ non-profit org. Play4Life in Zagreb, Croatia, EU |
William Send message Joined: 14 Feb 13 Posts: 2037 Credit: 17,689,662 RAC: 0 |
All of which is fascinating, in a slow-motion-car-crash sort of way, but doesn't really address KLiK's question about not getting Linux 64 -Nividia-V8-MB tasks (because of the app not being ready yet), or how many he will run when it is ready. host ID? If there is a suitable app, have you checked your preferences that you allow AP? A person who won't read has no advantage over one who can't read. (Mark Twain) |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.