Message boards :
News :
SETI@home v8 beta to begin on Tuesday
Message board moderation
Previous · 1 . . . 40 · 41 · 42 · 43 · 44 · 45 · 46 . . . 99 · Next
Author | Message |
---|---|
Send message Joined: 6 Mar 09 Posts: 8 Credit: 72,401 RAC: 0 ![]() |
How do i stop it detecting my GPU as low performance? (sog r3401) Work Unit Info: ............... Credit multiplier is : 2.85 WU true angle range is : 5.890797 Low-performance GPU detected, default period_iterations_num set to 500 For low-performance GPU path use_sleep enabled with 5ms per iteration Used GPU device parameters are: Number of compute units: 2 Single buffer allocation size: 128MB Total device global memory: 1024MB max WG size: 1024 local mem type: Real FERMI path used: yes LotOfMem path: no LowPerformanceGPU path: yes period_iterations_num=500 |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
How do i stop it detecting my GPU as low performance? (sog r3401) ReadMe read? News about SETI opt app releases: https://twitter.com/Raistmer |
Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0 ![]() |
Times were logged automatically as I went along, so interval between 17:22:52 (before initial checkpoint file state.sah and boinc_task_state.xml had been written) and 17:30:08 is pretty much the whole run. As the linked result states, BOINC recorded the elapsed time as Run time 7 min 33 sec so if my intervals had exactly matched the start point and end point, I might have squeezed one extra in - but no more. state.sah and boinc_task_state.xml are only updated when the task checkpoints, so I had already set a checkpoint interval of 15 seconds before capturing that log. You can see that <prog> (extracted from the master checkpoint file) is updating at every sample interval, which confirms that the reduced checkpoint interval was indeed active. |
![]() Send message Joined: 7 Jun 09 Posts: 285 Credit: 2,822,466 RAC: 0 ![]() |
For around 6 months, as I started the PC, IIRC with Catalyst 15.7.1, all 4 GPU apps (1 WU/GPU) were fixed at Core#0. The calculation time was very high, because the Core#0 was fully loaded. I had to use -no_cpu_lock that the 4 GPU apps (1 WU/GPU) aren't fixed at Core#0. I used this option up to now (Crimson 16.3 Hotfix (Beta)). You think it will work now with Crimson 15.12 like it should (Core#0, #1, #2 and #3 each with 1 fixed GPU app)? Will test: '-instances_per_device 1 -hp' (so that the cpu_lock will work properly?). |
![]() Send message Joined: 7 Jun 09 Posts: 285 Credit: 2,822,466 RAC: 0 ![]() |
The 'default' settings (with automatic cpu_lock) work now with Crimson 15.12 like it should... Core#0, #1, #2 and #3 each with 1 fixed GPU app. It's OK to use -instances_per_device N still (automatic cpu_lock work also without)? I like it to read it in the stderr_txt. ;-) I use '-instances_per_device 1 -hp' and 1 CPU-Core reserved for 1 GPU app (app_config.xml). ![]() |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
updating at every sample interval, which confirms that the reduced checkpoint interval was indeed active. then show same log but with normal checkpoint of 1 min - how prog will differ? News about SETI opt app releases: https://twitter.com/Raistmer |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
Not actually required (if 1 instance - it's by default). If they will be pinned to single core report with screenshot. News about SETI opt app releases: https://twitter.com/Raistmer |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
yes News about SETI opt app releases: https://twitter.com/Raistmer |
Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0 ![]() |
updating at every sample interval, which confirms that the reduced checkpoint interval was indeed active. Here's one from earlier in the development of my batch file - before I'd added name extraction, and before I'd reduced checkpointing to 15 seconds. WU true angle range is : 9.964068 <prog> <fraction_done> 15:51:28 15:51:43 15:51:58 15:52:13 15:52:28 0.01403484 15:52:43 0.01403484 0.000023 15:52:58 0.01403484 0.000023 15:53:13 0.01403484 0.000023 15:53:28 0.01403484 0.000023 15:53:43 0.03066970 0.000023 15:53:58 0.03066970 0.000023 15:54:13 0.03066970 0.000023 15:54:28 0.03066970 0.000023 15:54:43 0.04720692 0.042726 15:54:58 0.04720692 0.042726 15:55:13 0.04720692 0.042726 15:55:28 0.04720692 0.042726 15:55:43 0.04720692 0.042726 15:55:58 0.06383516 0.042726 15:56:13 0.06383516 0.042726 15:56:28 0.06383516 0.042726 15:56:43 0.06383516 0.042726 15:56:58 0.08046837 0.042726 15:57:13 0.08046837 0.042726 15:57:28 0.08046837 0.042726 15:57:43 0.08046837 0.042726 15:57:58 0.09699277 0.146938 15:58:13 0.09699277 0.146938 15:58:28 0.09699277 0.146938 15:58:43 0.09699277 0.146938 Initial values are missing for the minute before the first checkpoint (files not yet created), and <prog> only updates every 60 seconds. |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
there is no discrepancy like 17:30:08 0.11434709 0.622830 I would like to see whole log from 0 to ~100% News about SETI opt app releases: https://twitter.com/Raistmer |
![]() Send message Joined: 7 Jun 09 Posts: 285 Credit: 2,822,466 RAC: 0 ![]() |
This was the 1st test: http://setiweb.ssl.berkeley.edu/beta/results.php?hostid=76831&offset=40&show_names=0&state=0&appid= A lot EXIT_TIME_LIMIT_EXCEEDED marked as 'Error while computing' (also marked as 'Aborted', BOINC do this automatically?). (After the 2nd test, I guess the errors happened, because of AMD driver restarts - and then the GPUs can't work properly.) 2nd test: http://setiweb.ssl.berkeley.edu/beta/show_host_detail.php?hostid=78440 Until now I never saw AMD driver restarts. Driver restart with (I saw it because I was in front of the screen): 8.09 SETI@home v8 (opencl_ati5_SoG_nocal) No progress -> it would finish with EXIT_TIME_LIMIT_EXCEEDED (in Task-Manager still there after BOINC exit) SoG app use 8% CPU, this is 1 Core. With this task: http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=23324856 I saw 3 GPUs are not running (the FuryX VGA card have an indicator with a few LEDs): No progress: 3x 8.09 SETI@home v8 (opencl_ati_nocal) -> they would finish with EXIT_TIME_LIMIT_EXCEEDED Progress: 1x 8.09 SETI@home v8 (opencl_ati5_nocal) Driver restart with (test with -no_cpu_lock) (I saw it because I was in front of the screen): 3x 8.09 SETI@home v8 (opencl_atiapu_SoG) 1x 8.09 SETI@home v8 (opencl_ati5_nocal) SoG app use 12% CPU, this are 1 1/2 Cores. I have no idea why the r3401 apps don't run smoothly on my PC. The r3330 run very well. ![]() |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
try to enable -v 8 option (nothing else, defaults!) and copy stderr.txt from slot folder before it will be deleted (from driver restart to task abortion you should have some hours to act). Then send stderr.txt to me. EDIT: Number of compute units: 64 - it's the real "monster" so constitues high-end edge case. And while I will process log you could try to add -pref_wg_num_per_cu 1 or 2 and see if driver restarts continue News about SETI opt app releases: https://twitter.com/Raistmer |
Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0 ![]() |
there is no discrepancy like 17:30:08 0.11434709 0.622830 That's because the <fraction_done>, as Jinbocous pointed out at the beginning, is compressed into a tiny segment at the end of the run - less than 30 seconds. I'll try and set up a logging run with one second granularity (both checkpointing and logging), if that's the only way to convince you. |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
With log ending @ ~14% I can't see if both numbers converge to 100% eventually or one completely left behind. Also take into account that changing checkpoint time also changes the way app interacts with GPU (it does that on checkpoints). If checkpoints ~1s give more "linear" progress - there is nothing to fix. If even often checkpoints show big discrepancy - perhaps some of progress state update code omitted. News about SETI opt app releases: https://twitter.com/Raistmer |
Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0 ![]() |
Well, I've been fetching work all day, but haven't been able to snag any VHARs like that lucky batch yesterday. This is the closest I've come: wu_name: 24no10ab.26598.1703.8.42.239 WU true angle range is : 1.050587 <prog> <fraction_done> 14:23:59 14:24:14 0.00157712 0.001948 14:24:29 0.00729066 0.001948 14:24:44 0.00729066 0.007534 14:24:59 0.01293547 0.013279 14:25:14 0.01861221 0.018956 14:25:29 0.02423934 0.024707 14:25:59 0.03566080 0.035957 14:26:14 0.04136161 0.041698 14:26:29 0.04703214 0.047397 14:26:44 0.05266307 0.053082 14:26:59 0.05831132 0.058615 14:27:14 0.06399740 0.064447 14:27:29 0.06964457 0.064447 14:27:43 0.06964457 0.070206 14:27:57 0.07527402 0.076051 14:28:12 0.08099277 0.081977 14:28:27 0.08668146 0.087977 14:28:42 0.09232784 0.094266 14:28:57 0.09790277 0.100736 14:29:12 0.10368747 0.107579 14:29:27 0.10916947 0.114494 14:29:42 0.11481814 0.122307 14:29:57 0.12045761 0.130827 14:30:13 0.12619096 0.140275 14:30:28 0.12619096 0.140275 14:30:43 0.13187701 0.150882 14:30:58 0.13751588 0.162950 14:31:13 0.14306452 0.176779 14:31:28 0.14887551 0.192903 14:31:43 0.15453547 0.211255 14:31:58 0.16020197 0.233611 14:32:13 0.18524558 0.287315 14:32:28 0.21310161 0.347672 14:32:43 0.24495262 0.428355 14:32:58 0.27631588 0.518778 14:33:13 0.30818283 0.518778 14:33:28 0.30818283 0.628669 14:33:43 0.33650467 0.743310 14:33:58 0.38950199 1.000000 So, <fraction_done> does reach 100% eventually (as I see it reporting live in BOINC Manager), but it rushes up in a hockey-stick curve at the end. Meanwhile, prog(ress) never made it past 40... (I didn't bother with 1-second sampling - I think this shows the effect clearly enough. I'll perhaps try again tomorrow.) |
Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0 ![]() |
If checkpoints ~1s give more "linear" progress - there is nothing to fix. Nobody in their right minds would use 1-second checkpoint intervals for production running. If anything, people extend checkpoint intervals to save wear and tear on their storage devices. |
![]() Send message Joined: 18 Jan 06 Posts: 1038 Credit: 18,734,730 RAC: 0 ![]() |
Notice: Trying to get property of non-object in /disks/carolyn/b/home/boincadm/projects/beta/html/inc/result.inc on line 80hostid 51991, if i look at the error list of that host i get above server notices. Anything to fix there on server side ? _\|/_ U r s |
![]() Send message Joined: 18 Jan 06 Posts: 1038 Credit: 18,734,730 RAC: 0 ![]() |
I'll pick one or two wus from that host with the characteristics "Gaussian(s) missing compared to co-host" and try to rerun in standalone comparing to CPU app(s).I am getting quite a few inconclusives. Seeing that on my Mac Pro both on Main and beta, but not on a Nvidia 570 in Windows. In each case I'm missing a gaussian compared to my wingman. _\|/_ U r s |
Send message Joined: 27 Aug 12 Posts: 56 Credit: 127,133 RAC: 0 ![]() |
Let me know if you need any additional information. Thanks, Chris |
![]() Send message Joined: 7 Jun 09 Posts: 285 Credit: 2,822,466 RAC: 0 ![]() |
Raistmer wrote: try to enable -v 8 option (nothing else, defaults!) and copy stderr.txt from slot folder before it will be deleted (from driver restart to task abortion you should have some hours to act). Then send stderr.txt to me. You mean copy/paste the entries of the stderr.txt file and send a private message to you? With BBCs [ pre ] ? Or the file via E-Mail? ...not a few hours - fast VGA cards. ;-) I should test it here or at Main? Because there it happens after a few minutes. (so maybe just 2 or 3 tasks error out - or with good luck, no task) Could be a SETI GPU app 'destroy' the AMD driver? I tested the r3401 app (with default settings) at SETI-Main. I had let run two tasks successively - for to create the .WISDOM file. Then I had let run the PC fully loaded and after ~ 2 minutes the driver restarts starts, every ~ 10 seconds. After a few restarts the Windows (8.1 Pro x64) Desktop disappeared and a blue screen came, something like the following was written: the AMD*****.*** file has been destroyed or the file disappeared. Then the PC made a self reboot. This was with Crimson 16.3 Hotfix (Beta) - and the reason that I installed again Crimson 15.12 (if the 16.3 driver isn't longer usable?) (of course, between usage of DDU ;-). ![]() |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.