Message boards :
News :
Tests of new scheduler features.
Message board moderation
Previous · 1 . . . 12 · 13 · 14 · 15 · 16 · 17 · Next
Author | Message |
---|---|
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
AstroPulse v6 6.06 windows_intelx86 (opencl_ati_100) Server didn't fix itself. Faster app still get no tasks and can't complete its 10 eligibles. |
![]() Send message Joined: 15 Mar 05 Posts: 1547 Credit: 27,183,456 RAC: 0 ![]() |
I see what the problem is. Because of the OpenGL driver version issues on BOINC 6, the ati_opencl_100 app has been deprecated. It will only get sent if your machine can't run the opencl_ati_100 app. (And since they are really the same app, their average processing rate is really the same.) ![]() |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
I see what the problem is. Because of the OpenGL driver version issues on BOINC 6, the ati_opencl_100 app has been deprecated. It will only get sent if your machine can't run the opencl_ati_100 app. (And since they are really the same app, their average processing rate is really the same.) Thanks for explanation. Yeah, sure they are same apps, I had rise this question only in bounds of this thread - to test new scheduler features per se. And anothr question regarding same host: Brook+: AstroPulse v6 6.06 windows_intelx86 (cal_ati) Number of tasks completed 8 Max tasks per day 41 Number of tasks today 0 Consecutive valid tasks 8 Average processing rate 57.077599264502 Среднее оборотное Ð²Ñ€ÐµÐ¼Ñ 1.15 days OpenCL: AstroPulse v6 6.06 windows_intelx86 (opencl_ati_100) Number of tasks completed 124 Max tasks per day 223 Number of tasks today 62 Consecutive valid tasks 190 Average processing rate 580.39346933633 Среднее оборотное Ð²Ñ€ÐµÐ¼Ñ 0.45 days What possible issue I see here - Brook+ did not get its 10 eligibles. Its APR low so to chose OpenCL for distribution is right decision but should it be imprinted before 10 eligible results for all apps? |
Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0 ![]() |
Slightly off-topic for this thread, but it is news, and I can't open a separate thread for it - feel free to move. This one - provided it works, and after testing - looks useful, and particularly so for stock apps (which get to users who don't have the slightest interest in tweaking and micro-managing): http://boinc.berkeley.edu/trac/changeset/b98bc309cceccf95b9fac578c47cbea06a8b8150/boinc-v2 http://boinc.berkeley.edu/trac/changeset/dc43f0b3752229692650e1507de54d26c2c9eb2a/boinc-v2 API: fix bug involving suspend and critical sections (I suspect we hit a git commit limit at that point) There has been a spate of reports of applications not suspending when users wish to use their computers for something else - it seems to affect mostly OpenCL applications, at both Einstein and SETI. Bearing in mind Joe's warning about using David's code before the bugs have been chased out, could we have at least a look at this, please? |
Send message Joined: 11 Dec 08 Posts: 198 Credit: 658,573 RAC: 0 ![]() |
... Cross posting my response from CA
|
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
As I understand it's API changes so app rebuild required for them to work. I'll better wait when "Einstein" confirms that problem solved. For now I use Sleep loop for suspend w/o exit logic so this fix behavior on Linux should be the same as my current implementation on windows. Don't think rebuild in any haste. |
![]() Send message Joined: 18 Jan 06 Posts: 1038 Credit: 18,734,730 RAC: 0 ![]() |
As I understand it's API changes so app rebuild required for them to work. As i had tested successfully on linux when you implemented yours windows. If anyone wants to try out how Raistmers Sleep loop works on Linux, one could use the currently here at Beta tested AstroPulse 6.07 OpenCL apps for AMD/ATI or NVIDIA GPUs, which contain the fix already. (r1844 if you want to look up in stderr). _\|/_ U r s |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
SETI v6: SETI@home Enhanced (Ð°Ð½Ð¾Ð½Ð¸Ð¼Ð½Ð°Ñ Ð¿Ð»Ð°Ñ‚Ñ„Ð¾Ñ€Ð¼Ð°, ГП NVIDIA) ЧиÑло завершённых заданий 32977 МакÑимум заданий в день 684 ЧиÑло заданий ÑÐµÐ³Ð¾Ð´Ð½Ñ 1 Правильные Ð·Ð°Ð´Ð°Ð½Ð¸Ñ Ð·Ð°Ð²ÐµÑ€ÑˆÑ‘Ð½Ð½Ñ‹Ðµ подрÑд 584 СреднÑÑ ÑкороÑть обработки 246.36690806125 Среднее Ð²Ñ€ÐµÐ¼Ñ Ð¾Ð±Ñ€Ð°Ð±Ð¾Ñ‚ÐºÐ¸ 0.77 days SETI v7 SETI@home v7 (Ð°Ð½Ð¾Ð½Ð¸Ð¼Ð½Ð°Ñ Ð¿Ð»Ð°Ñ‚Ñ„Ð¾Ñ€Ð¼Ð°, ГП NVIDIA) ЧиÑло завершённых заданий 539 МакÑимум заданий в день 705 ЧиÑло заданий ÑÐµÐ³Ð¾Ð´Ð½Ñ 17 Правильные Ð·Ð°Ð´Ð°Ð½Ð¸Ñ Ð·Ð°Ð²ÐµÑ€ÑˆÑ‘Ð½Ð½Ñ‹Ðµ подрÑд 605 СреднÑÑ ÑкороÑть обработки 159.63219097129 Среднее Ð²Ñ€ÐµÐ¼Ñ Ð¾Ð±Ñ€Ð°Ð±Ð¾Ñ‚ÐºÐ¸ 1.01 days Still very far from convergence. Cause same app used I would expect more similar APRs if v7 task credit granting would be OK. It's not OK obviously still. |
![]() Send message Joined: 14 Feb 13 Posts: 606 Credit: 588,843 RAC: 0 |
SETI v6: and I would have thought that with larger rsc_fpops_est to account for longer runtimes APR would automatically be smaller. That said, APR is really the ratio of estimated operations to actual runtime. IOW runtimes increased more than estimated. With CreditNew if apps appear less efficient, that would certainly account for less credit. Ouch. Eric, can you perhaps apply another 30% increase of rsc_fops_est across all AR? I think that's the one screw you can turn to change credit awarded. We certainly know that when other projects use insanely high rsc_fpops_est values, the tasks do get awarded a LOT of credit. Inversly if rsc_fpops_est is small it gives little credit. So if rsc_est_fpops wasn't increased as much as runtimes increased... A person who won't read has no advantage over one who can't read. (Mark Twain) |
Send message Joined: 11 Dec 08 Posts: 198 Credit: 658,573 RAC: 0 ![]() |
SETI v6: An unfortunate fact of figuring the APR as a function of operations (computation) only, is that in the simplest serial interpretation it ignores the communication (memory accesss) complexity component of serial runtime. Since multibeam is multipass ( 4 passes after chirp in V6, 5 passes for v7 after adding autcorellation) then assuming ideal spatio-temporal data locality the communication component is increased 25%. In the current GPU autocorelation implementation this is a 4NFFT dataset, so reasonably an ideal estimate would be somewhere around double runtime where memory bound ( until caches exceed the dataset size x 4, or a much more memory efficient AC implementation is devised ), or between 50-100% elapsed increase when memory bound (25-50% APR reduction). |
![]() Send message Joined: 14 Feb 13 Posts: 606 Credit: 588,843 RAC: 0 |
An unfortunate fact of figuring the APR as a function of operations (computation) only, is that in the simplest serial interpretation it ignores the communication (memory accesss) complexity component of serial runtime. Since multibeam is multipass ( 4 passes after chirp in V6, 5 passes for v7 after adding autcorellation) then assuming ideal spatio-temporal data locality the communication component is increased 25%. In the current GPU autocorelation implementation this is a 4NFFT dataset, so reasonably an ideal estimate would be somewhere around double runtime where memory bound ( until caches exceed the dataset size x 4, or a much more memory efficient AC implementation is devised ), or between 50-100% elapsed increase when memory bound (25-50% APR reduction). I don't think I understood a single word. Bottom line please? My reasoning for 30% increase is the feeling that tasks are paying 70 but should be paying more like 100 to keep in the reagion of what V6 was paying - that's err - 50% more. You win. A person who won't read has no advantage over one who can't read. (Mark Twain) |
Send message Joined: 11 Dec 08 Posts: 198 Credit: 658,573 RAC: 0 ![]() |
Bottom line please? All mine says really, is that it isn't 'average processing rate' at all, but more like some estimate of operations divided by elapsed time, which doesn't account for much of the actual work done. |
![]() Send message Joined: 14 Feb 13 Posts: 606 Credit: 588,843 RAC: 0 |
Bottom line please? One number to rule them all, one number to find them, one number to bring them all and in the darkness bind them. A person who won't read has no advantage over one who can't read. (Mark Twain) |
![]() Send message Joined: 14 Feb 13 Posts: 606 Credit: 588,843 RAC: 0 |
Bottom line please? In CreditNew Credit and runtime estimates are interlinked. Let me see if I still have a beta (stock) task. Calculations under anon work a bit different. I'm taking this down to the very trivial level and work back up. <flops>10877539382.948120</flops> for CPU which is APR e9 the task shows <rsc_fpops_est>183887307748840.000000</rsc_fpops_est> rsc_fpops_est/flops = 16905.23 which, given that APR is reasonable and above 10 valid is a good estimate. In the other direction, the server calculates APR from what it knows as rsc_fpops_est for the WU and the runtime it receives. So far so good. Crucially though it also makes use of that figure to have a shot at credit awarded - and that's where it all goes haywire. I really wanted to avoid having to walk that code :( We know from other projects that high rsc_fpops_est leading to high APR also lead to high credit. I think we are currently seeing the inverse on SETI main - APR for V7 is lower than for V6 and credit is also (a lot) lower. I _think_ that if rsc_fpops_est was increased thereby increasing APR credit should rise as well. I can't find the figure of how much runtime AC adds locally (fixed across all AR) but I currently see an APR of 28 for v6 and 18 for v7 - that's anon with identical apps, just the extra time for AC. The other host has 135 and 80. People are really sensitive about this 'getting paid less'. They feel that longer running tasks should earn them more credit. At the very least v7 tasks should get similar credit to v6 - not less. So, keeping in mind that rsc_fpops_est is the single screw we can (sorry, Eric can) turn, I'm led to believe that (thanks Jason) doubling it, thereby doubling APR _should_ lead to credit that is maybe 10-20% above the one for v6. I'm juest guesstimating. Question is really if you want to go down that route, Eric. It's a bit radical, but it should work. I mean with a bit of looking around also a figure that should lead to about equal RAC, but personally I think to heck with the numbers - most other projects 'pay' better anyway... I might want to add that I look at that from the GPU perspective, values for CPU look different. OTOH nobody is going to complain if they get more credit! A person who won't read has no advantage over one who can't read. (Mark Twain) |
![]() Send message Joined: 10 Feb 12 Posts: 107 Credit: 305,151 RAC: 0 ![]() |
William, when you say 'all ARs' are you talking about GPUs only? If not there may be a small hole in your theory. APRs AFAIKT for CPUs (I think) are very healthy. GPU APRs only are suffering. IOW, I know you are saying "let them eat cake" and that Eric actually has "cake" to give us, but you need to find a way of doing it that does not mess with CPU APRs. Then there's always a chance I've horribly misunderstood something:) Edit: unless of course you guys have made V7 CPU apps at least 100% faster than the V6. But for the moment I'm assuming you guys made them 10%-20% faster and that their APR is showing correctly. |
Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0 ![]() |
I think I agree with William that the basic underlying flaw with CreditNew is that it is unduly sensitive to the values chosen for rsc_fpops_est. In the old, old days we had DCF to fine-tune any side-effects from inaccurate rsc_fpops_est: we took a lot of care with the relative values at different ARs, but the *absolute* value didn't matter much. As a result, when optimisations were transferred into the stock apps, the initial runtime estimates were too high - as a rule of thumb, the old stock CPU apps ran at a DCF of ~0.2: which suggests rsc_fpops_est was roughly x5 too high, and we all - David included - normalised to that. (Different considerations apply to Astropulse, where Josh aimed for a stock DCF of 0.4 with his very first - far from optimal - release). Turning to the new v7 release: yes, we've added a compensatory nudge to rsc_fpops_est to account for the extra work of autocorrs - but unless I've missed it, we didn't adjust for the extra optimisation in the stock CPU apps (some via Joe's contributions to the core code, but also from the switch to libFFTW v3.3). That might mean that rsc_fpops_est is now even further from the 'neutral' value (equivalent to DCF=1.0) that I suspect CreditNew (silently) depends on. Note that I've only mentioned the CPU apps. There was an interesting little side-comment buried in the "Let's argue about RAC" thread on the Main board, immediately before my post about the drop not being deliberate: I guess that is why no other gpu enabled project (that I know of) uses credit new. If that's true (and I think I might ask him privately what led to that observation - data would be useful here), then I suspect that, either, CreditNew has never been deployed in earnest in a predominantly GPU environment due to administrative nervousness, or, that it has been tested and abandoned because of flaws. This project - the Main project, that is - has become much closer to a predominantly GPU project in the three years since CreditNew was first rolled out in rather a panic. I'm coming more and more to the view that: CreditNew has never been properly reviewed and assesed since launch, and contains serious flaws for situations other than simple CPU-only projects. I was a participant in the AQUA multi-threaded project, and watched it blow up - spectacularly - when a test app was deployed with very bad rsc_fpops_est values. Credit grew exponentially into the millions and (IIRC) 100s of millions. But very soon afterwards, AQUA withdrew from using BOINC entirely, and the problem was never fully explored and explained. Over the weekend, Milkyway - after six months of nagging by me - have finally got their multi-threaded application properly deployed, and they're using silly numbers for rsc_fpops_est too (I've just completed a task with a FOUR YEAR estimated runtime). I'm not sure if they use the full-blown CreditNew package, but I'll keep an eye on credit and report back. |
![]() Send message Joined: 10 Feb 12 Posts: 107 Credit: 305,151 RAC: 0 ![]() |
...but unless I've missed it, we didn't adjust for the extra optimisation in the stock CPU apps (some via Joe's contributions to the core code, but also from the switch to libFFTW v3.3). That might mean that rsc_fpops_est is now even further from the 'neutral' value (equivalent to DCF=1.0) that I suspect CreditNew (silently) depends on. This is pretty much what I was trying to say above. Forgot to mention though that I was also assuming CreditNew was silently using CPU as its benchmark. To rephrase my above comment, it appears (shockingly) that CreditNew is actually working correctly as far as the CPU apps are concerned. |
![]() Send message Joined: 14 Feb 13 Posts: 606 Credit: 588,843 RAC: 0 |
William, when you say 'all ARs' are you talking about GPUs only? If not there may be a small hole in your theory. I am merely proposing a very radical bandaid on a completely screwed algorithm. trying to find a good picture for it. Let's say you build a tall house. The statics have been utterly miscalculated and the builders used bad materials - foundation too soft, beams uneven, that sort. As a result your tall house leans so much to one side, it's in danger of collapse and indeed people on the upper floors have already left. People near the bottom don't notice much. Now the proper thing to do would be to fire the statics guy, redo the calculations and get a proper building crew to demolish and rebuild. But until then, find a BIG lever and push the house at the top back into a vertical position. Medium to long term the only solution I see is something I've been putting off for over two years: Walk the bloody CreditNew code, find a good statistucs book, get the backup of two projects with a lot of leverage and recode. The short term solution is a simple fix that should lead to higher credit. Or to speak in medical terms: The patient has terminal cancer and is in horrible pain. He really needs surgery. But what he needs NOW is a large dose of morphine to take the pain away :D A person who won't read has no advantage over one who can't read. (Mark Twain) |
Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0 ![]() |
One small brick to add to the foundations. CreditNew (and much of BOINC's operation) depends on measuring and recording of time. We have two timings available to use - elapsed time and CPU time. As far as credit is concerned, for CPU apps, they're pretty comparable, and to a first approximation either could be used. For GPU apps, CPU time is useless, and - in the absence of recorded GPU time - we have to use elapsed time instead. For multithreaded apps, the elapsed time is useless, and CPU time is king - sorry, queen. I do hope you find that properly catered for in the code. |
![]() Send message Joined: 10 Feb 12 Posts: 107 Credit: 305,151 RAC: 0 ![]() |
Ah, OK, we are on the same page. "Band-Aid" solution is exactly what I was trying to say! (I just thought you hadn't realized it so I was trying to point out that it may be "nice" but not "correct"). But since you already knew that, I can shut-up now and let you grown-ups do your wizardry and get to the bottom of this thing:) |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.