Message boards :
Number crunching :
OpenCL MB v8.12 issues thread attempt 2
Message board moderation
Author | Message |
---|---|
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
If you see lags that prevent normal PC operation or driver restart or invalids please post you config and other circumstances here. Credit issues go elsewhere. It's app's technical support thread so stay on topic. There are enough room around to express anything. Please don't make app support work harder than it should be. |
Kiska Send message Joined: 31 Mar 12 Posts: 302 Credit: 3,067,762 RAC: 0 |
It looks like a VLAR task will take upto 4 hours on a GT 840M. I am using -sbs 1024 -period_iterations_num 300 -spike_fft_thresh 3072 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 32 -oclfft_tune_cw 32 cmdline attributes, however GPU usage is sporadic. Also using this command CPU usage seems to have dropped to nearly 0%. This might be an indication to not issue VLAR to low-end to mid-end cards |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
It looks like a VLAR task will take upto 4 hours on a GT 840M. No completed tasks so far? Also, what prevented to use stock settings for start? |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
It looks like a VLAR task will take upto 4 hours on a GT 840M. Doesn`t surprise me this doesn`t work. Why not asking before using those params ? Try -sbs 128 -period_iterations_num 300 -spike_fft_thresh 1024 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 256 -oclfft_tune_bn 16 -oclfft_tune_cw 16 With each crime and every kindness we birth our future. |
S@NL Etienne Dokkum Send message Joined: 11 Jun 99 Posts: 212 Credit: 43,822,095 RAC: 0 |
Why not asking before using those params ? What parameters would you guys suggest for a GTX970 and also where do I put -use_sleep ? Thanks in andvance. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
AFAICS you using anonymous platform and CUDA app. This thread is about OpenCL app. |
Kiska Send message Joined: 31 Mar 12 Posts: 302 Credit: 3,067,762 RAC: 0 |
Using that cmdline attributes results in a GPU usage/load of nearly 0%. While mine is a bit weird it resulted in more load. I initially used, which I copied from somewhere: -sbs 512 -period_iterations_num 80 -spike_fft_thresh 2048 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 32 -oclfft_tune_cw 32 But usage did not exceed 50%. Whilst using this usage was more consistent and did not dip below 40%, while the memory controller was stuck at 100% load. -sbs 1024 -period_iterations_num 300 -spike_fft_thresh 3072 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 32 -oclfft_tune_cw 32 |
Kiska Send message Joined: 31 Mar 12 Posts: 302 Credit: 3,067,762 RAC: 0 |
Stock settings resulted in a GPU load of 0%, but the core clock speed remained at 1124Mhz, also resulted in 0% memory controller load. It was just stuck at 0% progress, so I experimented |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Stock settings resulted in a GPU load of 0%, but the core clock speed remained at 1124Mhz, also resulted in 0% memory controller load. Please check in system event log if there are some warnings about video driver restarts? |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Using that cmdline attributes results in a GPU usage/load of nearly 0%. While mine is a bit weird it resulted in more load. Set of params used for last part of your first completed task is: For low-performance GPU path use_sleep enabled with 5ms per iteration Used GPU device parameters are: Number of compute units: 3 Single buffer allocation size: 512MB Total device global memory: 2048MB max WG size: 1024 local mem type: Real FERMI path used: yes LotOfMem path: no LowPerformanceGPU path: yes period_iterations_num=300 I implemented sleep with 5ms per invocation to reduce any possible screen lags on enty-level devices. Think it's the reason of low GPU usage you see. Your GPU has 2GB of memory so, cause you already started tweaking, you could try to run 2 tasks simultaneously. Also, you could try to reduce sleep period (provide -use_sleep option) if it will introduce acceptable or no lags. For entry-level GPUs app was artifically slowed to quite high degree indeed to follow "better safe than sorry" rule. It should not cause lags in default config - that was priority. And users who wanna actively participate can unlock and tune its speed for acceptable balance between speed and lags. Your GPU will be good example of this, I hope. EDIT: Also, add this param to your tuning line: -no_defaults_scaling this will disable low-performance path adjustements and return full control to operator on app's behavior. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Why not asking before using those params ? for optimal tunng line experiment + refer to ReadMe. -use_sleep goes along with other cmd line options. Into mb_cmdline*.txt file, for example. Or in corresponding tag in app_config.xml - refer to app_config.xml docs on BOINC site. |
Phobyx Send message Joined: 15 Jan 16 Posts: 12 Credit: 36,234,378 RAC: 25 |
I posted in the other thread, but to make sure I repeat here. OpenCL 8.12 (both sag and sah) ignores the cpu-usage settings in app_config.xml and always runs at 100% on its CPU Core. It also produces massive OS latency (Keyboard/Mouse lag) about every one or two seconds for me as soon as another CPU Process is worked on (not necessarily 8.12, but ANY other. Doesn't matter wether the second process is GPU or CPU crunching) Having 4 cores, giving 400% CPU Power total, starting from ~ 120% I get annoying lags. Relates to any 8.12 WU (and I don't get any others for a day or two now). System is Win 7, standard BOINC installation(!) with no special settings except app_config.xml added and a GTX660 with current default drivers. No other settings besides gpu_usage and cpu_usage in the xml (gpu_usage 1.0 or 0.5, doesn't matter) (Note: Yes, use_sleep does fix the CPU usage issue, and mitigates the latency issue, but on a standard installation users out there are not expected to fiddle around with manual settings. THey're used to the cuda apps using about 0.2 CPU) Once I have noticed the GPU drivers crashed. All screens went black and when they returned, the OS lost the Nvidia card until reboot. Unfortunately I had no chance to investigate and this may or may not be related. |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
About app_config ... <cpu_usage>0.4</cpu_usage> Does NOT mean it will only use 0.4 % of GPU, it means to reserve 0.4 cores for the GPU. So for 0.4 you would have to run 3 GPU tasks to shutdown a core. Now back to the technical aspect of this thread .. Why can't use_sleep be programmed in as a default for apps, and leave it for users who want to tinker to turn it off? The new apps are CPU hogs, so why not try to alleviate that at the cost of some runtime? But end up being much more user friendly. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
App doesn't know those settings. Those settings for BOINC scheduler, not for app.
Last time I looked your host was hidden. Post link to your host and unhide it.
Until I see your particular GPU hard to say anything. |
Kiska Send message Joined: 31 Mar 12 Posts: 302 Credit: 3,067,762 RAC: 0 |
Thanks for the 2 switches, now GPU load is between 30-99%. So I guess I am forced do 2 instances for the GPU EDIT: So what is the optimal app_info I should use? Also help? Last time I use app_info was back in version 6 days, and subsequently have forgotten |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Cause there is always some balance between performance and usability. As you could see from few earlier posts on some hosts low GPU usage is issue instead of lags. Currently 2 different levels of performance were chosen. Perhaps there should be more of them or another params should be tweaked or in another degree. Sleep is enabled by default for low-performance path. So low-end cards get it enabled. With more feedback on beta or some compatible hardware at my disposal that balance could be established better before moving to main. Unfortunately, beta feedback, especially on last builds with usability tunings was very limited. |
Kiska Send message Joined: 31 Mar 12 Posts: 302 Credit: 3,067,762 RAC: 0 |
I am sorry for not assisting you find the tuning. I just don't have the time to commit to do these sort of things. (aka preparing for final exams) |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Thanks for the 2 switches, now GPU load is between 30-99%. So I guess I am forced do 2 instances for the GPU It's support thread mostly regarding stock release issues. So I imply users want to solve issue w/o going to anonymous platform. That is, no app_info. You can supply all needed configuration via ap_config.xml file. Refer to documentation on BOINC site about syntax. http://boinc.berkeley.edu/wiki/Client_configuration In particular, tuning line for app should reside between <cmdline></cmdline> tags. In general, yes, I would recommend to always run 2 app instances instead of 1 as rule of thumb on all but CC1.x NV GPUs. But much better GPU load (than almost zero you saw initially) can be reached even with single task per GPU. Regarding best possible tuning line - don't know the best one for your particular device - experimentation required. In your case -no_defaults_scaling -sbs 256 -period_iterations_num 100 can be tried. If lags too high, try to add -use_sleep and/or increase PulseFind iterations number from 100 to 300 you used before. Other options could be taken as is from best-practices advises from app's ReadMe located in project folder. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
I am sorry for not assisting you find the tuning. I just don't have the time to commit to do these sort of things. (aka preparing for final exams) yep, we have line of exams about this time too. Perhaps it's world-wide issue :) |
Kiska Send message Joined: 31 Mar 12 Posts: 302 Credit: 3,067,762 RAC: 0 |
I am sorry for not assisting you find the tuning. I just don't have the time to commit to do these sort of things. (aka preparing for final exams) Urgh, please I would prefer my time to be spent on learning C or C++ than studying |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.