OpenCL MB v8.12 issues thread attempt 2

Message boards : Number crunching : OpenCL MB v8.12 issues thread attempt 2
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 6 · Next

AuthorMessage
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1788424 - Posted: 18 May 2016, 12:52:37 UTC

If you see lags that prevent normal PC operation or driver restart or invalids please post you config and other circumstances here.
Credit issues go elsewhere. It's app's technical support thread so stay on topic.
There are enough room around to express anything. Please don't make app support work harder than it should be.
ID: 1788424 · Report as offensive
Kiska
Volunteer tester

Send message
Joined: 31 Mar 12
Posts: 302
Credit: 3,067,762
RAC: 0
Australia
Message 1788427 - Posted: 18 May 2016, 12:58:14 UTC - in response to Message 1788424.  

It looks like a VLAR task will take upto 4 hours on a GT 840M.
I am using
-sbs 1024 -period_iterations_num 300 -spike_fft_thresh 3072 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 32 -oclfft_tune_cw 32

cmdline attributes, however GPU usage is sporadic. Also using this command CPU usage seems to have dropped to nearly 0%.
This might be an indication to not issue VLAR to low-end to mid-end cards
ID: 1788427 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1788453 - Posted: 18 May 2016, 15:44:16 UTC - in response to Message 1788427.  
Last modified: 18 May 2016, 15:49:13 UTC

It looks like a VLAR task will take upto 4 hours on a GT 840M.
I am using
-sbs 1024 -period_iterations_num 300 -spike_fft_thresh 3072 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 32 -oclfft_tune_cw 32

cmdline attributes, however GPU usage is sporadic. Also using this command CPU usage seems to have dropped to nearly 0%.
This might be an indication to not issue VLAR to low-end to mid-end cards

No completed tasks so far?
Also, what prevented to use stock settings for start?
ID: 1788453 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1788459 - Posted: 18 May 2016, 16:14:58 UTC - in response to Message 1788427.  
Last modified: 18 May 2016, 16:16:18 UTC

It looks like a VLAR task will take upto 4 hours on a GT 840M.
I am using
-sbs 1024 -period_iterations_num 300 -spike_fft_thresh 3072 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 32 -oclfft_tune_cw 32

cmdline attributes, however GPU usage is sporadic. Also using this command CPU usage seems to have dropped to nearly 0%.
This might be an indication to not issue VLAR to low-end to mid-end cards


Doesn`t surprise me this doesn`t work.

Why not asking before using those params ?

Try

-sbs 128 -period_iterations_num 300 -spike_fft_thresh 1024 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 256 -oclfft_tune_bn 16 -oclfft_tune_cw 16


With each crime and every kindness we birth our future.
ID: 1788459 · Report as offensive
Profile S@NL Etienne Dokkum
Volunteer tester
Avatar

Send message
Joined: 11 Jun 99
Posts: 212
Credit: 43,822,095
RAC: 0
Netherlands
Message 1788477 - Posted: 18 May 2016, 16:51:34 UTC - in response to Message 1788459.  
Last modified: 18 May 2016, 16:52:04 UTC

Why not asking before using those params ?


What parameters would you guys suggest for a GTX970 and also where do I put -use_sleep ?

Thanks in andvance.
ID: 1788477 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1788479 - Posted: 18 May 2016, 16:56:13 UTC - in response to Message 1788477.  
Last modified: 18 May 2016, 16:57:36 UTC


What parameters would you guys suggest for a GTX970 and also where do I put -use_sleep ?

AFAICS you using anonymous platform and CUDA app. This thread is about OpenCL app.
ID: 1788479 · Report as offensive
Kiska
Volunteer tester

Send message
Joined: 31 Mar 12
Posts: 302
Credit: 3,067,762
RAC: 0
Australia
Message 1788597 - Posted: 19 May 2016, 1:32:11 UTC - in response to Message 1788459.  
Last modified: 19 May 2016, 1:32:47 UTC

Using that cmdline attributes results in a GPU usage/load of nearly 0%. While mine is a bit weird it resulted in more load.
I initially used, which I copied from somewhere:
-sbs 512 -period_iterations_num 80 -spike_fft_thresh 2048 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 32 -oclfft_tune_cw 32

But usage did not exceed 50%.
Whilst using this usage was more consistent and did not dip below 40%, while the memory controller was stuck at 100% load.
-sbs 1024 -period_iterations_num 300 -spike_fft_thresh 3072 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 32 -oclfft_tune_cw 32
ID: 1788597 · Report as offensive
Kiska
Volunteer tester

Send message
Joined: 31 Mar 12
Posts: 302
Credit: 3,067,762
RAC: 0
Australia
Message 1788599 - Posted: 19 May 2016, 1:35:13 UTC - in response to Message 1788453.  

Stock settings resulted in a GPU load of 0%, but the core clock speed remained at 1124Mhz, also resulted in 0% memory controller load.
It was just stuck at 0% progress, so I experimented
ID: 1788599 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1788654 - Posted: 19 May 2016, 7:48:22 UTC - in response to Message 1788599.  

Stock settings resulted in a GPU load of 0%, but the core clock speed remained at 1124Mhz, also resulted in 0% memory controller load.
It was just stuck at 0% progress, so I experimented

Please check in system event log if there are some warnings about video driver restarts?
ID: 1788654 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1788657 - Posted: 19 May 2016, 8:36:45 UTC - in response to Message 1788597.  
Last modified: 19 May 2016, 9:02:16 UTC

Using that cmdline attributes results in a GPU usage/load of nearly 0%. While mine is a bit weird it resulted in more load.
I initially used, which I copied from somewhere:
-sbs 512 -period_iterations_num 80 -spike_fft_thresh 2048 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 32 -oclfft_tune_cw 32

But usage did not exceed 50%.
Whilst using this usage was more consistent and did not dip below 40%, while the memory controller was stuck at 100% load.
-sbs 1024 -period_iterations_num 300 -spike_fft_thresh 3072 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 32 -oclfft_tune_cw 32


Set of params used for last part of your first completed task is:
For low-performance GPU path use_sleep enabled with 5ms per iteration
Used GPU device parameters are:
	Number of compute units: 3
	Single buffer allocation size: 512MB
	Total device global memory: 2048MB
	max WG size: 1024
	local mem type: Real
	FERMI path used: yes
	LotOfMem path: no
	LowPerformanceGPU path: yes
period_iterations_num=300

I implemented sleep with 5ms per invocation to reduce any possible screen lags on enty-level devices.
Think it's the reason of low GPU usage you see.
Your GPU has 2GB of memory so, cause you already started tweaking, you could try to run 2 tasks simultaneously. Also, you could try to reduce sleep period (provide -use_sleep option) if it will introduce acceptable or no lags.

For entry-level GPUs app was artifically slowed to quite high degree indeed to follow "better safe than sorry" rule. It should not cause lags in default config - that was priority.
And users who wanna actively participate can unlock and tune its speed for acceptable balance between speed and lags.
Your GPU will be good example of this, I hope.

EDIT: Also, add this param to your tuning line: -no_defaults_scaling
this will disable low-performance path adjustements and return full control to operator on app's behavior.
ID: 1788657 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1788662 - Posted: 19 May 2016, 9:06:14 UTC - in response to Message 1788477.  

Why not asking before using those params ?


What parameters would you guys suggest for a GTX970 and also where do I put -use_sleep ?

Thanks in andvance.



for optimal tunng line experiment + refer to ReadMe.
-use_sleep goes along with other cmd line options. Into mb_cmdline*.txt file, for example.
Or in corresponding tag in app_config.xml - refer to app_config.xml docs on BOINC site.
ID: 1788662 · Report as offensive
Profile Phobyx

Send message
Joined: 15 Jan 16
Posts: 12
Credit: 36,234,378
RAC: 25
Germany
Message 1788667 - Posted: 19 May 2016, 10:59:02 UTC
Last modified: 19 May 2016, 11:29:02 UTC

I posted in the other thread, but to make sure I repeat here.

OpenCL 8.12 (both sag and sah) ignores the cpu-usage settings in app_config.xml and always runs at 100% on its CPU Core.
It also produces massive OS latency (Keyboard/Mouse lag) about every one or two seconds for me as soon as another CPU Process is worked on (not necessarily 8.12, but ANY other. Doesn't matter wether the second process is GPU or CPU crunching)
Having 4 cores, giving 400% CPU Power total, starting from ~ 120% I get annoying lags.

Relates to any 8.12 WU (and I don't get any others for a day or two now).

System is Win 7, standard BOINC installation(!) with no special settings except app_config.xml added and a GTX660 with current default drivers.
No other settings besides gpu_usage and cpu_usage in the xml (gpu_usage 1.0 or 0.5, doesn't matter)

(Note: Yes, use_sleep does fix the CPU usage issue, and mitigates the latency issue, but on a standard installation users out there are not expected to fiddle around with manual settings. THey're used to the cuda apps using about 0.2 CPU)

Once I have noticed the GPU drivers crashed. All screens went black and when they returned, the OS lost the Nvidia card until reboot. Unfortunately I had no chance to investigate and this may or may not be related.
ID: 1788667 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1788673 - Posted: 19 May 2016, 11:43:08 UTC - in response to Message 1788667.  
Last modified: 19 May 2016, 11:44:12 UTC

About app_config ... <cpu_usage>0.4</cpu_usage> Does NOT mean it will only use 0.4 % of GPU, it means to reserve 0.4 cores for the GPU. So for 0.4 you would have to run 3 GPU tasks to shutdown a core.

Now back to the technical aspect of this thread .. Why can't use_sleep be programmed in as a default for apps, and leave it for users who want to tinker to turn it off? The new apps are CPU hogs, so why not try to alleviate that at the cost of some runtime?

But end up being much more user friendly.
ID: 1788673 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1788679 - Posted: 19 May 2016, 12:46:20 UTC - in response to Message 1788667.  


OpenCL 8.12 (both sag and sah) ignores the cpu-usage settings in app_config.xml and always runs at 100% on its CPU Core.

App doesn't know those settings. Those settings for BOINC scheduler, not for app.


It also produces massive OS latency (Keyboard/Mouse lag) about every one or two seconds for me as soon as another CPU Process is worked on (not necessarily 8.12, but ANY other. Doesn't matter wether the second process is GPU or CPU crunching)
Having 4 cores, giving 400% CPU Power total, starting from ~ 120% I get annoying lags.

Last time I looked your host was hidden. Post link to your host and unhide it.


(Note: Yes, use_sleep does fix the CPU usage issue, and mitigates the latency issue, but on a standard installation users out there are not expected to fiddle around with manual settings.)

Until I see your particular GPU hard to say anything.
ID: 1788679 · Report as offensive
Kiska
Volunteer tester

Send message
Joined: 31 Mar 12
Posts: 302
Credit: 3,067,762
RAC: 0
Australia
Message 1788680 - Posted: 19 May 2016, 12:51:15 UTC - in response to Message 1788657.  
Last modified: 19 May 2016, 12:54:15 UTC

Thanks for the 2 switches, now GPU load is between 30-99%. So I guess I am forced do 2 instances for the GPU

EDIT: So what is the optimal app_info I should use?
Also help? Last time I use app_info was back in version 6 days, and subsequently have forgotten
ID: 1788680 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1788681 - Posted: 19 May 2016, 12:58:03 UTC - in response to Message 1788673.  


Now back to the technical aspect of this thread .. Why can't use_sleep be programmed in as a default for apps, and leave it for users who want to tinker to turn it off? The new apps are CPU hogs, so why not try to alleviate that at the cost of some runtime?

But end up being much more user friendly.


Cause there is always some balance between performance and usability.
As you could see from few earlier posts on some hosts low GPU usage is issue instead of lags.
Currently 2 different levels of performance were chosen. Perhaps there should be more of them or another params should be tweaked or in another degree. Sleep is enabled by default for low-performance path. So low-end cards get it enabled.

With more feedback on beta or some compatible hardware at my disposal that balance could be established better before moving to main. Unfortunately, beta feedback, especially on last builds with usability tunings was very limited.
ID: 1788681 · Report as offensive
Kiska
Volunteer tester

Send message
Joined: 31 Mar 12
Posts: 302
Credit: 3,067,762
RAC: 0
Australia
Message 1788682 - Posted: 19 May 2016, 13:03:55 UTC - in response to Message 1788681.  

I am sorry for not assisting you find the tuning. I just don't have the time to commit to do these sort of things. (aka preparing for final exams)
ID: 1788682 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1788683 - Posted: 19 May 2016, 13:08:43 UTC - in response to Message 1788680.  

Thanks for the 2 switches, now GPU load is between 30-99%. So I guess I am forced do 2 instances for the GPU

EDIT: So what is the optimal app_info I should use?
Also help? Last time I use app_info was back in version 6 days, and subsequently have forgotten


It's support thread mostly regarding stock release issues. So I imply users want to solve issue w/o going to anonymous platform. That is, no app_info.

You can supply all needed configuration via ap_config.xml file.
Refer to documentation on BOINC site about syntax.
http://boinc.berkeley.edu/wiki/Client_configuration
In particular, tuning line for app should reside between <cmdline></cmdline> tags.

In general, yes, I would recommend to always run 2 app instances instead of 1 as rule of thumb on all but CC1.x NV GPUs. But much better GPU load (than almost zero you saw initially) can be reached even with single task per GPU.

Regarding best possible tuning line - don't know the best one for your particular device - experimentation required.
In your case -no_defaults_scaling -sbs 256 -period_iterations_num 100 can be tried. If lags too high, try to add -use_sleep and/or increase PulseFind iterations number from 100 to 300 you used before. Other options could be taken as is from best-practices advises from app's ReadMe located in project folder.
ID: 1788683 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1788685 - Posted: 19 May 2016, 13:10:08 UTC - in response to Message 1788682.  

I am sorry for not assisting you find the tuning. I just don't have the time to commit to do these sort of things. (aka preparing for final exams)

yep, we have line of exams about this time too. Perhaps it's world-wide issue :)
ID: 1788685 · Report as offensive
Kiska
Volunteer tester

Send message
Joined: 31 Mar 12
Posts: 302
Credit: 3,067,762
RAC: 0
Australia
Message 1788686 - Posted: 19 May 2016, 13:12:17 UTC - in response to Message 1788685.  

I am sorry for not assisting you find the tuning. I just don't have the time to commit to do these sort of things. (aka preparing for final exams)

yep, we have line of exams about this time too. Perhaps it's world-wide issue :)

Urgh, please I would prefer my time to be spent on learning C or C++ than studying
ID: 1788686 · Report as offensive
1 · 2 · 3 · 4 . . . 6 · Next

Message boards : Number crunching : OpenCL MB v8.12 issues thread attempt 2


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.