Message boards :
Number crunching :
near vlar? (AR 0.01) super slow on NVIDIA gtx 780
Message board moderation
Author | Message |
---|---|
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
I've had a few MB tasks that run about 5000 seconds when running 3 at a time. The normal run time for a multibeam task is 450-1100 seconds when running 3 at a time. The slow ones are 'near' vlars I guess? To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
Go to your account page then click on show your computers, click on the task for said computer and find the task you are talking about. To the far left under "task click for details", click on corresponding number and it will show you the name of the task and the stderr report. Look at the name at the top of the page. It will say name and next to is the name and at the very end it may or may not say VLAR. Example 28no12ad.32157.136740.438086664206.12.235.vlar_0 I check some of yours and the ones running over 5000s tend to be VLARs. I suspect you already knew how to find this (but I put the direction on here for any new person who doesn't already know how to find them) and wanted to check their own results Zalster |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
Thank You Zalster. I knew that, but had had no time to check that the were actual vlars. And a real vlar (when processed) slows down the other tasks running on the same GPU at the same time. Now I'd like to know how to make a shortened version of a vlar that I captured so that I can profile it under nvprof. If I run it as it is it generates too much profiling data. Joe? Raistmer? Any? How to make a shortened wu? And then there remains a question why did I get some vlars for my NV.. To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
How to make a shortened wu? I believe Joe makes them primarily by adjusting the chirp limits in the WU file or some such. There are some VLARs amongst shortened test tasks among Lunatics downloads, and some full length. What I'd expect you'd find is that the complex/messy pulsefind kernels, particularly for the short FFTs (long pulsepots with many periods) are rubbish. It's something that needs to be completely reengineered for x42, along with the requisite updating to use cuda streams properly etc for solving the driver latency issues. For other portions I'm moving to kernel fusion techniques, to exploit the caches, though haven't worked out how to fuse pulsefinding in yet. At minimum, for locality purposes the 3 4 & 5 period kernels should probably be fused and broken up into blocks such that the sums are as local as possible (i.e runs on the powers of two that are nearby in all three based periods). If you are able to confirm/reject the findings that, with pulses, the vertical strided accesses are at the core of the issues (mostly at the shortest FFT lengths) Then I have a lot of the design work complete to undo that, along with a couple of years of testing various access methods, maximising bandwidth, reducing latencies and automatic scaling. I estimate that the pulsefinding portions should be capable of being about 2-10x faster, depending on architecture and task parameters (not including gains from kernel fusion, streaming chirps etc) If you want to help accelerate that direction, then that would be most excellent, especially since I've been tied up with non-seti stuff for some time. If you reach different findings or solve the issues in a simpler way, then that would be even better. And then there remains a question why did I get some vlars for my NV.. Good question. It *could* be collateral damage when I requested that the Cuda 3.2 stock application be stopped from issue to Kepler/Maxwell class cards. I'll drop Eric an email. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13744 Credit: 208,696,464 RAC: 304 |
And then there remains a question why did I get some vlars for my NV.. If that request was implemented yesterday, then that was probably what did it. Yesterday was when VLARs started coming through to all GPUs. Grant Darwin NT |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
And then there remains a question why did I get some vlars for my NV.. Some correspondence since, suggested some unrelated boinc server scheduler updates were applied,. Apparently some of multibeam's customisations were blitzed in the process. So unrelated to the Cuda 3.2 issue, it turns out to be coincidence. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340 |
Apparently some of multibeam's customisations were blitzed in the process. So unrelated to the Cuda 3.2 issue, it turns out to be coincidence. Will this be fixed soon? I have noticed that, although I am chugging along, the watt usage of my GPUs is way down (and GPU temps down 5-10C) and my RAC is rapidly decreasing as well. And I also noticed at least one WU where the time elapsed was increasing, but so was the time remaining. Coincidence? Or a reflection of the problem? |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Apparently some of multibeam's customisations were blitzed in the process. So unrelated to the Cuda 3.2 issue, it turns out to be coincidence. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13744 Credit: 208,696,464 RAC: 304 |
And I also noticed at least one WU where the time elapsed was increasing, but so was the time remaining. Coincidence? Or a reflection of the problem? Reflection of the problem. Actual Run time is 3 times that of the Estimated Run time. End result- after a while remaining time will start increasing once the manager realises that the estimated time isn't even remotely close to how long it will actually take. Eventually it gets to the point where Time remaining starts to count down again, instead of upwardly. Grant Darwin NT |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
And I also noticed at least one WU where the time elapsed was increasing, but so was the time remaining. Coincidence? Or a reflection of the problem? The inability of the boinc time estimation to cope well with different types of work is one of a number of (increasingly studied and documented) issues with the scheduling mechanism. I've been working with Albert/Einstein quietly (though currently on hiatus) to devise comprehensive fixes for that. These issues affect both task scheduling via estimation (primary importance, includes work fetch) and credit mechanisms (considered secondary, but a reflection on the underlying malaise) [Edit:] just managed to dig out some of the initial findings from last year: https://wiki.atlas.aei.uni-hannover.de/foswiki/bin/view/EinsteinAtHome/BOINC/EvaluationOfCreditNew Introduction "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
JakeTheDog Send message Joined: 3 Nov 13 Posts: 153 Credit: 2,585,912 RAC: 0 |
What's a vlar? I did a quick forum search and might have missed a description. I think this is the first time I've noticed them in my task results, and the first time I've seen MB tasks take this long on my GPU, and the estimated timer keep going up. And first time I've seen GPU tasks create a barely noticeable lag in my PC, like switching tabs on my browser or scrolling up or down in any application. |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
What's a vlar? I did a quick forum search and might have missed a description. I think this is the first time I've noticed them in my task results, and the first time I've seen MB tasks take this long on my GPU, and the estimated timer keep going up. And first time I've seen GPU tasks create a barely noticeable lag in my PC, like switching tabs on my browser or scrolling up or down in any application. Very Low Angle Range....signals recorded when the telescope at Arecibo is scanning very slowly or changing direction. They process very poorly on nVidia GPUs and bog them down badly. That's why you see the lag. Normally, the server does not send them out to GPUs, but a server code update earlier this week broke the patch that prevented that. Eric is working to fix the situation. Or re-fix it, as it were. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
Thamk You Jason, I'll check out how the Lunatics shortened wu's are done. That is just what I need. To test out with maximum size wu that nvprof can use. To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
And Eric has the fix in now. GPU cards should not be getting any more VLARs. Meow. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
And Eric has the fix in now. GPU cards should not be getting any more VLARs. There are others that have expressed that, but ATI users are in the minority, and since it would take a code rewrite from DA above, I think, to add the additional selection, it has never been done. Since the nVidia users far outweigh ATI users, the project does what it has to do to avoid crippling a large percent of their user base. I think you would have to petition DA in another forum to ask for a change. Meow. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
And Eric has the fix in now. GPU cards should not be getting any more VLARs. When i started testing r_177 years ago VLAR`s were sent to ATI GPU`s. So i`m certain it wouldn`t be to hard to implement. I think its a question of priority. I`m fine with everything. I`ll crunch what my GPU obtains. With each crime and every kindness we birth our future. |
JakeTheDog Send message Joined: 3 Nov 13 Posts: 153 Credit: 2,585,912 RAC: 0 |
There are FOUR cats! -JLP |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874 |
I'd imagine that there's a cost/benefit analysis in play here - where cost is measured in staff time and project outcomes, rather than traditional dollars. Sending VLARs to NVidia GPUs has a cost in project outcomes - less work is processed. That was addressed with the code that prevents VLARs being sent to GPUs, and has no further cost in staff time (except resolving problems like yesterday's glitch). Making the distribution to different GPUs more sophisticated - either by the project enabling ATI cards en masse for VLARs, or by enabling user selection - has a significant staff time cost, both to set it up, and possibly in ongoing maintenance (would they enable sending VLARs to intel_gpus? Probably yes. And to the next coprocessor that comes along?). And I don't currently see any benefit to the project in spending that staff time. The crossover might come if we ever reached a point where the proportion of VLARs in the telescope recordings grew so high, and the proportion of users allowing their CPUs to be used for SETI grew so low, that VLARs regularly backed up in the feeder and delayed work distribution. I think we've already seen that happen for a few hours at a time: it's rare, but it's worth keeping an eye on. |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
There are FOUR cats! -JLP I have 3, lost one a year ago. I have many more, who I see at the shelters from time to time... One day, another shall be mine. Kitty, kitty, don't go crazy on me.....one day fine I shall set you free. Just me being me,.... "Freedom is just Chaos, with better lighting." Alan Dean Foster |
Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340 |
The fix must have gone in just about midnight Berkeley time, as I was up on the East Coast busily aborting vlars that were sent to my (Nvidia) GPUs (about 50 off my 2 crunchers) and noticed that the wattage of both machines both jumped up to normal (~650) from the ~450 they had been running. I lingered a bit, but no more vlars came to my GPUs, thankfully. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.