near vlar? (AR 0.01) super slow on NVIDIA gtx 780

Author	Message
petri33 Volunteer tester Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156	Message 1641224 - Posted: 13 Feb 2015, 14:17:29 UTC I've had a few MB tasks that run about 5000 seconds when running 3 at a time. The normal run time for a multibeam task is 450-1100 seconds when running 3 at a time. The slow ones are 'near' vlars I guess? To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones ID: 1641224 ·

Zalster Volunteer tester Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242	Message 1641238 - Posted: 13 Feb 2015, 14:34:40 UTC - in response to Message 1641224. Last modified: 13 Feb 2015, 14:35:36 UTC Go to your account page then click on show your computers, click on the task for said computer and find the task you are talking about. To the far left under "task click for details", click on corresponding number and it will show you the name of the task and the stderr report. Look at the name at the top of the page. It will say name and next to is the name and at the very end it may or may not say VLAR. Example 28no12ad.32157.136740.438086664206.12.235.vlar_0 I check some of yours and the ones running over 5000s tend to be VLARs. I suspect you already knew how to find this (but I put the direction on here for any new person who doesn't already know how to find them) and wanted to check their own results Zalster ID: 1641238 ·

petri33 Volunteer tester Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156	Message 1641324 - Posted: 13 Feb 2015, 17:17:18 UTC - in response to Message 1641238. Last modified: 13 Feb 2015, 17:18:56 UTC Thank You Zalster. I knew that, but had had no time to check that the were actual vlars. And a real vlar (when processed) slows down the other tasks running on the same GPU at the same time. Now I'd like to know how to make a shortened version of a vlar that I captured so that I can profile it under nvprof. If I run it as it is it generates too much profiling data. Joe? Raistmer? Any? How to make a shortened wu? And then there remains a question why did I get some vlars for my NV.. To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones ID: 1641324 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1641344 - Posted: 13 Feb 2015, 17:55:54 UTC - in response to Message 1641324. Last modified: 13 Feb 2015, 18:00:18 UTC How to make a shortened wu? I believe Joe makes them primarily by adjusting the chirp limits in the WU file or some such. There are some VLARs amongst shortened test tasks among Lunatics downloads, and some full length. What I'd expect you'd find is that the complex/messy pulsefind kernels, particularly for the short FFTs (long pulsepots with many periods) are rubbish. It's something that needs to be completely reengineered for x42, along with the requisite updating to use cuda streams properly etc for solving the driver latency issues. For other portions I'm moving to kernel fusion techniques, to exploit the caches, though haven't worked out how to fuse pulsefinding in yet. At minimum, for locality purposes the 3 4 & 5 period kernels should probably be fused and broken up into blocks such that the sums are as local as possible (i.e runs on the powers of two that are nearby in all three based periods). If you are able to confirm/reject the findings that, with pulses, the vertical strided accesses are at the core of the issues (mostly at the shortest FFT lengths) Then I have a lot of the design work complete to undo that, along with a couple of years of testing various access methods, maximising bandwidth, reducing latencies and automatic scaling. I estimate that the pulsefinding portions should be capable of being about 2-10x faster, depending on architecture and task parameters (not including gains from kernel fusion, streaming chirps etc) If you want to help accelerate that direction, then that would be most excellent, especially since I've been tied up with non-seti stuff for some time. If you reach different findings or solve the issues in a simpler way, then that would be even better. And then there remains a question why did I get some vlars for my NV.. Good question. It could be collateral damage when I requested that the Cuda 3.2 stock application be stopped from issue to Kepler/Maxwell class cards. I'll drop Eric an email. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1641344 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13731 Credit: 208,696,464 RAC: 304	Message 1641474 - Posted: 13 Feb 2015, 22:27:22 UTC - in response to Message 1641344. And then there remains a question why did I get some vlars for my NV.. Good question. It could be collateral damage when I requested that the Cuda 3.2 stock application be stopped from issue to Kepler/Maxwell class cards. I'll drop Eric an email. If that request was implemented yesterday, then that was probably what did it. Yesterday was when VLARs started coming through to all GPUs. Grant Darwin NT ID: 1641474 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1641626 - Posted: 14 Feb 2015, 1:45:32 UTC - in response to Message 1641474. And then there remains a question why did I get some vlars for my NV.. Good question. It could be collateral damage when I requested that the Cuda 3.2 stock application be stopped from issue to Kepler/Maxwell class cards. I'll drop Eric an email. If that request was implemented yesterday, then that was probably what did it. Yesterday was when VLARs started coming through to all GPUs. Some correspondence since, suggested some unrelated boinc server scheduler updates were applied,. Apparently some of multibeam's customisations were blitzed in the process. So unrelated to the Cuda 3.2 issue, it turns out to be coincidence. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1641626 ·

Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340	Message 1641667 - Posted: 14 Feb 2015, 3:07:52 UTC - in response to Message 1641626. Last modified: 14 Feb 2015, 3:10:11 UTC Apparently some of multibeam's customisations were blitzed in the process. So unrelated to the Cuda 3.2 issue, it turns out to be coincidence. Will this be fixed soon? I have noticed that, although I am chugging along, the watt usage of my GPUs is way down (and GPU temps down 5-10C) and my RAC is rapidly decreasing as well. And I also noticed at least one WU where the time elapsed was increasing, but so was the time remaining. Coincidence? Or a reflection of the problem? ID: 1641667 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1641671 - Posted: 14 Feb 2015, 3:25:57 UTC - in response to Message 1641667. Last modified: 14 Feb 2015, 3:32:00 UTC Apparently some of multibeam's customisations were blitzed in the process. So unrelated to the Cuda 3.2 issue, it turns out to be coincidence. Will this be fixed soon? I have noticed that, although I am chugging along, the watt usage of my GPUs is way down (and GPU temps down 5-10C) and my RAC is rapidly decreasing as well. And I also noticed at least one WU where the time elapsed was increasing, but so was the time remaining. Coincidence? Or a reflection of the problem? "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1641671 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13731 Credit: 208,696,464 RAC: 304	Message 1641672 - Posted: 14 Feb 2015, 3:30:53 UTC - in response to Message 1641667. And I also noticed at least one WU where the time elapsed was increasing, but so was the time remaining. Coincidence? Or a reflection of the problem? Reflection of the problem. Actual Run time is 3 times that of the Estimated Run time. End result- after a while remaining time will start increasing once the manager realises that the estimated time isn't even remotely close to how long it will actually take. Eventually it gets to the point where Time remaining starts to count down again, instead of upwardly. Grant Darwin NT ID: 1641672 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1641673 - Posted: 14 Feb 2015, 3:32:15 UTC Last modified: 14 Feb 2015, 3:41:23 UTC And I also noticed at least one WU where the time elapsed was increasing, but so was the time remaining. Coincidence? Or a reflection of the problem? The inability of the boinc time estimation to cope well with different types of work is one of a number of (increasingly studied and documented) issues with the scheduling mechanism. I've been working with Albert/Einstein quietly (though currently on hiatus) to devise comprehensive fixes for that. These issues affect both task scheduling via estimation (primary importance, includes work fetch) and credit mechanisms (considered secondary, but a reflection on the underlying malaise) [Edit:] just managed to dig out some of the initial findings from last year: https://wiki.atlas.aei.uni-hannover.de/foswiki/bin/view/EinsteinAtHome/BOINC/EvaluationOfCreditNew Introduction Superficially BOINC credit related issues are often deemed of little scientific consequence, by users and project staff alike. However, as a measure of work the interrelationship of task estimates and important backend and client functionality is critical. Correct work estimation is central to scheduling of tasks to be processed at all levels, and therefore forms the backbone of making BOINC a valuable scientific instrument. Here are some notes describing the key identified deficiencies, noting overall that there is a temporal mismatch between the demands of user experience, expectation hardwired into the BOINC client software, and the needs of BOINC server software. These are primarily identified as basic control systems related issues to be addressed, with 'Open questions' intended to identify key related aspects (such as reliability and maintenance demands), so that project developer and end-user experience may be improved. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1641673 ·

JakeTheDog Send message Joined: 3 Nov 13 Posts: 153 Credit: 2,585,912 RAC: 0	Message 1641716 - Posted: 14 Feb 2015, 6:59:41 UTC What's a vlar? I did a quick forum search and might have missed a description. I think this is the first time I've noticed them in my task results, and the first time I've seen MB tasks take this long on my GPU, and the estimated timer keep going up. And first time I've seen GPU tasks create a barely noticeable lag in my PC, like switching tabs on my browser or scrolling up or down in any application. ID: 1641716 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004	Message 1641720 - Posted: 14 Feb 2015, 7:06:58 UTC - in response to Message 1641716. Last modified: 14 Feb 2015, 7:24:00 UTC What's a vlar? I did a quick forum search and might have missed a description. I think this is the first time I've noticed them in my task results, and the first time I've seen MB tasks take this long on my GPU, and the estimated timer keep going up. And first time I've seen GPU tasks create a barely noticeable lag in my PC, like switching tabs on my browser or scrolling up or down in any application. Very Low Angle Range....signals recorded when the telescope at Arecibo is scanning very slowly or changing direction. They process very poorly on nVidia GPUs and bog them down badly. That's why you see the lag. Normally, the server does not send them out to GPUs, but a server code update earlier this week broke the patch that prevented that. Eric is working to fix the situation. Or re-fix it, as it were. "Freedom is just Chaos, with better lighting." Alan Dean Foster ID: 1641720 ·

petri33 Volunteer tester Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156	Message 1641731 - Posted: 14 Feb 2015, 7:44:46 UTC - in response to Message 1641344. Thamk You Jason, I'll check out how the Lunatics shortened wu's are done. That is just what I need. To test out with maximum size wu that nvprof can use. To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones ID: 1641731 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004	Message 1641745 - Posted: 14 Feb 2015, 8:09:47 UTC And Eric has the fix in now. GPU cards should not be getting any more VLARs. Meow. "Freedom is just Chaos, with better lighting." Alan Dean Foster ID: 1641745 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004	Message 1641796 - Posted: 14 Feb 2015, 9:59:01 UTC - in response to Message 1641792. And Eric has the fix in now. GPU cards should not be getting any more VLARs. Meow. Crap. I like them on my HD7870. I wish they could allow vlar's going to ATI cards, but not to Cuda cards. There are others that have expressed that, but ATI users are in the minority, and since it would take a code rewrite from DA above, I think, to add the additional selection, it has never been done. Since the nVidia users far outweigh ATI users, the project does what it has to do to avoid crippling a large percent of their user base. I think you would have to petition DA in another forum to ask for a change. Meow. "Freedom is just Chaos, with better lighting." Alan Dean Foster ID: 1641796 ·

Mike Volunteer tester Send message Joined: 17 Feb 01 Posts: 34257 Credit: 79,922,639 RAC: 80	Message 1641820 - Posted: 14 Feb 2015, 11:18:41 UTC - in response to Message 1641796. Last modified: 14 Feb 2015, 11:20:44 UTC And Eric has the fix in now. GPU cards should not be getting any more VLARs. Meow. Crap. I like them on my HD7870. I wish they could allow vlar's going to ATI cards, but not to Cuda cards. There are others that have expressed that, but ATI users are in the minority, and since it would take a code rewrite from DA above, I think, to add the additional selection, it has never been done. Since the nVidia users far outweigh ATI users, the project does what it has to do to avoid crippling a large percent of their user base. I think you would have to petition DA in another forum to ask for a change. Meow. When i started testing r_177 years ago VLAR`s were sent to ATI GPU`s. So i`m certain it wouldn`t be to hard to implement. I think its a question of priority. I`m fine with everything. I`ll crunch what my GPU obtains. With each crime and every kindness we birth our future. ID: 1641820 ·

JakeTheDog Send message Joined: 3 Nov 13 Posts: 153 Credit: 2,585,912 RAC: 0	Message 1641821 - Posted: 14 Feb 2015, 11:28:48 UTC There are FOUR cats! -JLP ID: 1641821 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1641824 - Posted: 14 Feb 2015, 11:41:30 UTC - in response to Message 1641820. I'd imagine that there's a cost/benefit analysis in play here - where cost is measured in staff time and project outcomes, rather than traditional dollars. Sending VLARs to NVidia GPUs has a cost in project outcomes - less work is processed. That was addressed with the code that prevents VLARs being sent to GPUs, and has no further cost in staff time (except resolving problems like yesterday's glitch). Making the distribution to different GPUs more sophisticated - either by the project enabling ATI cards en masse for VLARs, or by enabling user selection - has a significant staff time cost, both to set it up, and possibly in ongoing maintenance (would they enable sending VLARs to intel_gpus? Probably yes. And to the next coprocessor that comes along?). And I don't currently see any benefit to the project in spending that staff time. The crossover might come if we ever reached a point where the proportion of VLARs in the telescope recordings grew so high, and the proportion of users allowing their CPUs to be used for SETI grew so low, that VLARs regularly backed up in the feeder and delayed work distribution. I think we've already seen that happen for a few hours at a time: it's rare, but it's worth keeping an eye on. ID: 1641824 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004	Message 1641825 - Posted: 14 Feb 2015, 11:42:25 UTC - in response to Message 1641821. There are FOUR cats! -JLP I have 3, lost one a year ago. I have many more, who I see at the shelters from time to time... One day, another shall be mine. Kitty, kitty, don't go crazy on me.....one day fine I shall set you free. Just me being me,.... "Freedom is just Chaos, with better lighting." Alan Dean Foster ID: 1641825 ·

Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340	Message 1641843 - Posted: 14 Feb 2015, 14:25:28 UTC The fix must have gone in just about midnight Berkeley time, as I was up on the East Coast busily aborting vlars that were sent to my (Nvidia) GPUs (about 50 off my 2 crunchers) and noticed that the wattage of both machines both jumped up to normal (~650) from the ~450 they had been running. I lingered a bit, but no more vlars came to my GPUs, thankfully. ID: 1641843 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.