Message boards :
Number crunching :
Intel® iGPU AP bench test run (e.g. @ J1900)
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next
Author | Message |
---|---|
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
http://www.pcworld.com/article/240016/idf_day_1_recap_ivy_bridge_and_the_x79_factor_in_photos.html The designers also added an L3 cache to the GPU itself. In addition to performance improvements, the cache also helps power efficiency, since anything located in the cache means the CPU ring bus doesn’t need to be fired up. |
Dirk Sadowski Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
If I would add a NV GPU card to the J1900 PC ... I set -cpu_lock for the Intel iGPU AP app. I set -cpu_lock also for the NV GPU AP app. Does this mean both apps are fixed at CPU-Core #0? Or each GPU app get his own CPU-Core, #0 & #1? Thanks. |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
Code for -cpu_lock is probably the same in all Raistmer apps. So any app will see (at start) which cores are already in use by other running apps (if they also use -cpu_lock) and choose the first unused core (if unused core exists). Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Code for -cpu_lock is probably the same in all Raistmer apps. Yep, they should to do so.... but was never really checked with few different types of GPU installed. And it's quite easy to test. Just open TaskManager when both app have running instances. Then look in affinity menu item what CPUs are checked for what process. But even if both GPU apps happened to be pinned to the same core it can be still better than allow them to freely float between cores. Also, it's known that -cpu_lock has great impact on ATi app. Worth to directly check and show what its impact on iGPU's one. -cpu_lock : Enables CPUlock feature. Results in CPUs number limitation for particular app instance. Also attempt to bind different instances to different CPU cores will be made. So, maybe -instances_per_device 2 will be needed in this case too. Check that. |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
So, maybe -instances_per_device 2 will be needed in this case too. Check that. @Dirk To clarify: this doesn't mean you have to run 2 instances on one GPU You may even use -instances_per_device 4 (just to 'reserve' counter for max 4 apps running) - since BOINC starts the apps it will still start only 2 apps (one per GPU) unless you change this by app_info.xml / app_config.xml  - ALF - "Find out what you don't do well ..... then don't do it!" :)  |
ivan Send message Joined: 5 Mar 01 Posts: 783 Credit: 348,560,338 RAC: 223 |
I've noticed lately that some GPU tasks on my Win10 J1900 are stalling at some point (e.g. the first one I noted was at a suspicious 66.677%) while the CPU total keeps marching on. If I restart BOINC then the Progress counter drops back a little but the CPU total drops back a lot (presumably to the time the Progress stalled) and again the Progress ticks up until it hits the previous stall point and stops there as CPU keeps building. Has anyone else seen that? Or know a cure? I'm just running bare Lunatics with no tweaks. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14669 Credit: 200,643,578 RAC: 874 |
I've noticed lately that some GPU tasks on my Win10 J1900 are stalling at some point (e.g. the first one I noted was at a suspicious 66.677%) while the CPU total keeps marching on. If I restart BOINC then the Progress counter drops back a little but the CPU total drops back a lot (presumably to the time the Progress stalled) and again the Progress ticks up until it hits the previous stall point and stops there as CPU keeps building. I've noticed that on two of my XP machines with GTX 750 Ti cards - specifically, the two I upgraded to driver 347.88 to be able to run cuda65 tasks for GPUGrid (where it can be more of a problem, with tasks estimated at up to 22 hours, but return requested within 24 hours). It didn't seem to be a problem when running cuda60 with, IIRC, driver 335.28 |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
I've noticed lately that some GPU tasks on my Win10 J1900 are stalling at some point (e.g. the first one I noted was at a suspicious 66.677%) while the CPU total keeps marching on. If I restart BOINC then the Progress counter drops back a little but the CPU total drops back a lot (presumably to the time the Progress stalled) and again the Progress ticks up until it hits the previous stall point and stops there as CPU keeps building. I'm not seeing that on my J1900 with the same driver release 4061. It could be OS related, but maybe it is MB related? The ASRock board I have had a BIOS update that listed. 1. Improve integrated graphics compatibility. 2. Improve add-on card compatibility. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Dirk Sadowski Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
@ BilBg, Raistmer Currently I don't have GPU card connected to my mobo. I'm in contact with the manufacturer, if it's possible (just PCIe 2.0 x1 slot). @ ivan ASRock Q1900DC-ITX mobo with J1900 CPU. IIRC, bought in the middle of last year. No BIOS update. Win8.1 x64. Intel driver v10.18.10.3408. SETI & AstroPulse (CPU & iGPU) apps without problems (Lunatics Installer v0.43a x64). BOINC Client v7.4.42 with BoincTasks Manager v1.67. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
In view of last findings in other thread I would suggest to check parameters (unroll, ffa_block) area around your current best config under full load. Loaded state is not negligible difference. |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
In view of last findings in other thread I would suggest to check parameters (unroll, ffa_block) area around your current best config under full load. iGPU slows ~19% with all 4 CPU loaded in the bench test I did using default iGPU config. http://hal6000.com/seti/test/apbench_test_celeron_j1900.htm SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Dirk Sadowski Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
Does this mean you also not used -hp for priority high for the Intel iGPU OpenCL app? If used, I guess the calculation time will decrease. |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
Does this mean you also not used -hp for priority high for the Intel iGPU OpenCL app? True. For Bench test -hp not used. Also for normal running -hp is not being used. It is late for me now, 2:17 AM. So tomorrow, or rather today after I have slept, I can run more tests. Similar to your test, but with CPU loaded. I will start with -hp only, to have baseline comparison with my other data. Then try the "best" config you found while running iGPU solo. Each test config will take me just under 2 hours to complete. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
In benchmark app's process uses same priority as under BOINC. That is, CPU apps run with IDLE, GPU - with BELOW_NORMAL. And BOINC influence excluded (that runs on NORMAL hence competes with GPU app for CPU). So, I would not expect big difference. Worth to check though. |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
Having completed the benchmarks on my J1900 iGPU with just the -hp switch while CPU cores active. I found little change from not using it 2-3 seconds + or -. However in the case of 2 CPU cores + iGPU the performance for CPU & iGPU was less. With the run times for the CPU ~3% & the iGPU ~2% longer. Next I'm running -hp -unroll 5 -ffa_block 1472 -ffa_block_fetch 368 After which I'm going to run -unroll 5 -ffa_block 1472 -ffa_block_fetch 368 to see if it makes any difference with tuned settings. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Dirk Sadowski Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
I saw something strange on my J1900 board. Normally, AFAIK, in past with WinXP x86: Intel Core2 Extreme QX6700 (Intel's 1st 4-Core-CPU, with 2x 2-Core-chips inside): I suspend all tasks in BOINC, just let run one task. Task-Manager say 25% CPU usage - OK. Intel Core2 Duo E7600 (2-Core-CPU): I suspend all tasks in BOINC, just let run one task. Task-Manager say 50% CPU usage - OK. Now the strange ... with Intel Celeron J1900 (4-Core-CPU (2x L2-Cache, so maybe 'like' QX6700?)) and Win8.1 x64: I suspend all tasks in BOINC, just let run one task. Task-Manager say ~30% CPU usage (30% of the CPU) - strange. Let run 2 tasks, 2x ~30% usage (60% of the CPU) - strange? Let run 3 tasks, 3x ~30% usage (90% of the CPU) - strange! ;-) The stock AP 7.09 (r2742) iGPU app use 0-3% CPU (up & down). So finally up to ~93% CPU usage if 3x CPU + 1x iGPU tasks (on a J1900 CPU). Is this normal or strange? I tested live -use_sleep and without this settings (r2742): Both times with: -v 0 -unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -oclFFT_plan 32 8 256 -cpu_lock -instances_per_device 1 With 4x AP CPU tasks simultaneously both times. Without -use_sleep: Run time 16 hours 55 min 28 sec (60,928 sec) CPU time 34 min 2 sec (2,042 sec) single pulses: 4 repetitive pulses: 1 percent blanked: 18.39 With -use_sleep: Run time 18 hours 15 min 34 sec (65,734 sec) CPU time 26 min 8 sec (1,568 sec) single pulses: 6 repetitive pulses: 0 percent blanked: 15.48 I don't know if the results and %blankeds are ~similar enough for to make the conclusion: With -use_sleep (compared to without -use_sleep): Calculation time: +4,806 sec (1 hour 26 min) CPU time usage: -474 sec (7 min 54 sec) Before we start to make a comparison what this means (longer iGPU calculation times and less CPU time usage - means 'faster' CPU tasks calculations then) ... best settings for max whole PC RAC. What means the CPU time usage (in OS/BOINC phrasing)? From the above mentioned example with -use_sleep: 1,568 sec of the whole CPU, or (x4) 6,272 sec on one CPU-Core? OK, enough until now. Thanks. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Because of the methods windows task manager uses to sample CPU usage, they don't always show on the correct core exactly, but sometimes as a split between two, and in varying amounts. The sampling rate of task manager can also be changed, which may show a different total usage, depending on if set to low, normal or high 'update speed' Comparing with process explorer, at different sampling sppeds, may give you a clearer indication depending on the situation and update speed. I mention this because I found with some unrelated test pieces last week, that I can draw really cool patterns in task manager, and also in eVGA precision, by making code with timers that 'beat' with the tool sample rate. So if you see weird numbers, always check with different tools at different sampling speeds. That's because it can be purely measurement 'artefacts', and not the pure data you were looking for. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
1,568 sec of the whole CPU, or (x4) 6,272 sec on one CPU-Core? 1568 seconds CPU time would be the sum of all the threads in that process. There will be one main worker thread, and a few little ones that don't do much (but can scatter on other cores at the same time). So without information on all the threads in the process, the CPU time can be a little useful, or much more than elapsed time in total making it harder to use. elapsed would be near enough wall clock (with some accuracy limitations) "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
Now the strange ... with Intel Celeron J1900 (4-Core-CPU (2x L2-Cache, so maybe 'like' QX6700?)) and Win8.1 x64: I see the same with my: GenuineIntel I thought at first it was Hyper Threading, but no, it says I don't have that. I have seen running 3 tasks each CPU go as high as 30-34% and have one doing not much. |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
I think there was a change in Windows 8 Task Manager that adjusts/accounts for the MHz changes of the CPU (I don't remember where I read this at the time) I also don't know/remember which MHz is used - the current, the max/turbo, or the base. But this 'adjustment' makes it show not what the user will expect. E.g.: http://superuser.com/questions/495699/windows-8-task-manager-shows-49-cpu-process-explorer-shows-100 So I also suggest using Process Explorer https://technet.microsoft.com/en-us/sysinternals/bb896653.aspx  - ALF - "Find out what you don't do well ..... then don't do it!" :)  |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.