SETI@home v8 beta to begin on Tuesday

Author	Message
Rasputin42 Volunteer tester Send message Joined: 6 Mar 09 Posts: 8 Credit: 72,401 RAC: 0	Message 57321 - Posted: 14 Mar 2016, 20:43:19 UTC Last modified: 14 Mar 2016, 20:44:37 UTC How do i stop it detecting my GPU as low performance? (sog r3401) Work Unit Info: ............... Credit multiplier is : 2.85 WU true angle range is : 5.890797 Low-performance GPU detected, default period_iterations_num set to 500 For low-performance GPU path use_sleep enabled with 5ms per iteration Used GPU device parameters are: Number of compute units: 2 Single buffer allocation size: 128MB Total device global memory: 1024MB max WG size: 1024 local mem type: Real FERMI path used: yes LotOfMem path: no LowPerformanceGPU path: yes period_iterations_num=500 ID: 57321 ·

Raistmer Volunteer tester Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0	Message 57322 - Posted: 14 Mar 2016, 20:45:55 UTC - in response to Message 57321. How do i stop it detecting my GPU as low performance? (sog r3401) ReadMe read? News about SETI opt app releases: https://twitter.com/Raistmer ID: 57322 ·

Richard Haselgrove Volunteer tester Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0	Message 57323 - Posted: 14 Mar 2016, 20:55:27 UTC - in response to Message 57318. 17:30:08 0.11434709 0.622830 Difference is big enough indeed. How long task lasted as whole? What checkpoint interval was? What if checkpoint interval will be set to 10 seconds? Times were logged automatically as I went along, so interval between 17:22:52 (before initial checkpoint file state.sah and boinc_task_state.xml had been written) and 17:30:08 is pretty much the whole run. As the linked result states, BOINC recorded the elapsed time as Run time 7 min 33 sec so if my intervals had exactly matched the start point and end point, I might have squeezed one extra in - but no more. state.sah and boinc_task_state.xml are only updated when the task checkpoints, so I had already set a checkpoint interval of 15 seconds before capturing that log. You can see that <prog> (extracted from the master checkpoint file) is updating at every sample interval, which confirms that the reduced checkpoint interval was indeed active. ID: 57323 ·

Dirk Sadowski Volunteer tester Send message Joined: 7 Jun 09 Posts: 285 Credit: 2,822,466 RAC: 0	Message 57325 - Posted: 14 Mar 2016, 21:49:18 UTC - in response to Message 57312. Last modified: 14 Mar 2016, 22:34:43 UTC The cmdline settings/usage for/of r3401 is differnet than for/of r3330? yes, new added. -sbs N will act differently so new tuning required for it. The 'default' settings for/of r3401 'use' the GPU 'better'? Yes, default settings improved. If so, I should test the apps with 'default' settings? Yes, it's preferable for initial testing. Only when baseline established I would recommend to start further optimization. But with at least '-no_cpu_lock -hp'? No, most probably -no_cpu_lock resulted in EXIT_TIME_LIMIT_EXCEEDED. Either keep CPU free or not use this option. For around 6 months, as I started the PC, IIRC with Catalyst 15.7.1, all 4 GPU apps (1 WU/GPU) were fixed at Core#0. The calculation time was very high, because the Core#0 was fully loaded. I had to use -no_cpu_lock that the 4 GPU apps (1 WU/GPU) aren't fixed at Core#0. I used this option up to now (Crimson 16.3 Hotfix (Beta)). You think it will work now with Crimson 15.12 like it should (Core#0, #1, #2 and #3 each with 1 fixed GPU app)? Will test: '-instances_per_device 1 -hp' (so that the cpu_lock will work properly?). ID: 57325 ·

Dirk Sadowski Volunteer tester Send message Joined: 7 Jun 09 Posts: 285 Credit: 2,822,466 RAC: 0	Message 57326 - Posted: 14 Mar 2016, 22:34:28 UTC - in response to Message 57325. Last modified: 14 Mar 2016, 22:42:36 UTC The 'default' settings (with automatic cpu_lock) work now with Crimson 15.12 like it should... Core#0, #1, #2 and #3 each with 1 fixed GPU app. It's OK to use -instances_per_device N still (automatic cpu_lock work also without)? I like it to read it in the stderr_txt. ;-) I use '-instances_per_device 1 -hp' and 1 CPU-Core reserved for 1 GPU app (app_config.xml). ID: 57326 ·

Raistmer Volunteer tester Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0	Message 57327 - Posted: 14 Mar 2016, 23:21:22 UTC - in response to Message 57323. updating at every sample interval, which confirms that the reduced checkpoint interval was indeed active. then show same log but with normal checkpoint of 1 min - how prog will differ? News about SETI opt app releases: https://twitter.com/Raistmer ID: 57327 ·

Raistmer Volunteer tester Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0	Message 57328 - Posted: 14 Mar 2016, 23:23:11 UTC - in response to Message 57325. Will test: '-instances_per_device 1 -hp' (so that the cpu_lock will work properly?). Not actually required (if 1 instance - it's by default). If they will be pinned to single core report with screenshot. News about SETI opt app releases: https://twitter.com/Raistmer ID: 57328 ·

Raistmer Volunteer tester Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0	Message 57329 - Posted: 14 Mar 2016, 23:24:07 UTC - in response to Message 57326. It's OK to use -instances_per_device N still (automatic cpu_lock work also without)? I like it to read it in the stderr_txt. ;-) yes News about SETI opt app releases: https://twitter.com/Raistmer ID: 57329 ·

Richard Haselgrove Volunteer tester Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0	Message 57330 - Posted: 14 Mar 2016, 23:39:45 UTC - in response to Message 57327. updating at every sample interval, which confirms that the reduced checkpoint interval was indeed active. then show same log but with normal checkpoint of 1 min - how prog will differ? Here's one from earlier in the development of my batch file - before I'd added name extraction, and before I'd reduced checkpointing to 15 seconds. WU true angle range is : 9.964068 <prog> <fraction_done> 15:51:28 15:51:43 15:51:58 15:52:13 15:52:28 0.01403484 15:52:43 0.01403484 0.000023 15:52:58 0.01403484 0.000023 15:53:13 0.01403484 0.000023 15:53:28 0.01403484 0.000023 15:53:43 0.03066970 0.000023 15:53:58 0.03066970 0.000023 15:54:13 0.03066970 0.000023 15:54:28 0.03066970 0.000023 15:54:43 0.04720692 0.042726 15:54:58 0.04720692 0.042726 15:55:13 0.04720692 0.042726 15:55:28 0.04720692 0.042726 15:55:43 0.04720692 0.042726 15:55:58 0.06383516 0.042726 15:56:13 0.06383516 0.042726 15:56:28 0.06383516 0.042726 15:56:43 0.06383516 0.042726 15:56:58 0.08046837 0.042726 15:57:13 0.08046837 0.042726 15:57:28 0.08046837 0.042726 15:57:43 0.08046837 0.042726 15:57:58 0.09699277 0.146938 15:58:13 0.09699277 0.146938 15:58:28 0.09699277 0.146938 15:58:43 0.09699277 0.146938 Initial values are missing for the minute before the first checkpoint (files not yet created), and <prog> only updates every 60 seconds. ID: 57330 ·

Raistmer Volunteer tester Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0	Message 57331 - Posted: 14 Mar 2016, 23:54:01 UTC - in response to Message 57330. Last modified: 14 Mar 2016, 23:55:55 UTC there is no discrepancy like 17:30:08 0.11434709 0.622830 I would like to see whole log from 0 to ~100% News about SETI opt app releases: https://twitter.com/Raistmer ID: 57331 ·

Dirk Sadowski Volunteer tester Send message Joined: 7 Jun 09 Posts: 285 Credit: 2,822,466 RAC: 0	Message 57337 - Posted: 15 Mar 2016, 6:56:41 UTC Last modified: 15 Mar 2016, 7:14:01 UTC This was the 1st test: http://setiweb.ssl.berkeley.edu/beta/results.php?hostid=76831&offset=40&show_names=0&state=0&appid= A lot EXIT_TIME_LIMIT_EXCEEDED marked as 'Error while computing' (also marked as 'Aborted', BOINC do this automatically?). (After the 2nd test, I guess the errors happened, because of AMD driver restarts - and then the GPUs can't work properly.) 2nd test: http://setiweb.ssl.berkeley.edu/beta/show_host_detail.php?hostid=78440 Until now I never saw AMD driver restarts. Driver restart with (I saw it because I was in front of the screen): 8.09 SETI@home v8 (opencl_ati5_SoG_nocal) No progress -> it would finish with EXIT_TIME_LIMIT_EXCEEDED (in Task-Manager still there after BOINC exit) SoG app use 8% CPU, this is 1 Core. With this task: http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=23324856 I saw 3 GPUs are not running (the FuryX VGA card have an indicator with a few LEDs): No progress: 3x 8.09 SETI@home v8 (opencl_ati_nocal) -> they would finish with EXIT_TIME_LIMIT_EXCEEDED Progress: 1x 8.09 SETI@home v8 (opencl_ati5_nocal) Driver restart with (test with -no_cpu_lock) (I saw it because I was in front of the screen): 3x 8.09 SETI@home v8 (opencl_atiapu_SoG) 1x 8.09 SETI@home v8 (opencl_ati5_nocal) SoG app use 12% CPU, this are 1 1/2 Cores. I have no idea why the r3401 apps don't run smoothly on my PC. The r3330 run very well. ID: 57337 ·

Raistmer Volunteer tester Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0	Message 57338 - Posted: 15 Mar 2016, 7:46:23 UTC - in response to Message 57337. Last modified: 15 Mar 2016, 7:50:29 UTC try to enable -v 8 option (nothing else, defaults!) and copy stderr.txt from slot folder before it will be deleted (from driver restart to task abortion you should have some hours to act). Then send stderr.txt to me. EDIT: Number of compute units: 64 - it's the real "monster" so constitues high-end edge case. And while I will process log you could try to add -pref_wg_num_per_cu 1 or 2 and see if driver restarts continue News about SETI opt app releases: https://twitter.com/Raistmer ID: 57338 ·

Richard Haselgrove Volunteer tester Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0	Message 57341 - Posted: 15 Mar 2016, 8:14:54 UTC - in response to Message 57331. there is no discrepancy like 17:30:08 0.11434709 0.622830 I would like to see whole log from 0 to ~100% That's because the <fraction_done>, as Jinbocous pointed out at the beginning, is compressed into a tiny segment at the end of the run - less than 30 seconds. I'll try and set up a logging run with one second granularity (both checkpointing and logging), if that's the only way to convince you. ID: 57341 ·

Raistmer Volunteer tester Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0	Message 57348 - Posted: 15 Mar 2016, 15:19:24 UTC - in response to Message 57341. With log ending @ ~14% I can't see if both numbers converge to 100% eventually or one completely left behind. Also take into account that changing checkpoint time also changes the way app interacts with GPU (it does that on checkpoints). If checkpoints ~1s give more "linear" progress - there is nothing to fix. If even often checkpoints show big discrepancy - perhaps some of progress state update code omitted. News about SETI opt app releases: https://twitter.com/Raistmer ID: 57348 ·

Richard Haselgrove Volunteer tester Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0	Message 57349 - Posted: 15 Mar 2016, 15:28:02 UTC - in response to Message 57348. Well, I've been fetching work all day, but haven't been able to snag any VHARs like that lucky batch yesterday. This is the closest I've come: wu_name: 24no10ab.26598.1703.8.42.239 WU true angle range is : 1.050587 <prog> <fraction_done> 14:23:59 14:24:14 0.00157712 0.001948 14:24:29 0.00729066 0.001948 14:24:44 0.00729066 0.007534 14:24:59 0.01293547 0.013279 14:25:14 0.01861221 0.018956 14:25:29 0.02423934 0.024707 14:25:59 0.03566080 0.035957 14:26:14 0.04136161 0.041698 14:26:29 0.04703214 0.047397 14:26:44 0.05266307 0.053082 14:26:59 0.05831132 0.058615 14:27:14 0.06399740 0.064447 14:27:29 0.06964457 0.064447 14:27:43 0.06964457 0.070206 14:27:57 0.07527402 0.076051 14:28:12 0.08099277 0.081977 14:28:27 0.08668146 0.087977 14:28:42 0.09232784 0.094266 14:28:57 0.09790277 0.100736 14:29:12 0.10368747 0.107579 14:29:27 0.10916947 0.114494 14:29:42 0.11481814 0.122307 14:29:57 0.12045761 0.130827 14:30:13 0.12619096 0.140275 14:30:28 0.12619096 0.140275 14:30:43 0.13187701 0.150882 14:30:58 0.13751588 0.162950 14:31:13 0.14306452 0.176779 14:31:28 0.14887551 0.192903 14:31:43 0.15453547 0.211255 14:31:58 0.16020197 0.233611 14:32:13 0.18524558 0.287315 14:32:28 0.21310161 0.347672 14:32:43 0.24495262 0.428355 14:32:58 0.27631588 0.518778 14:33:13 0.30818283 0.518778 14:33:28 0.30818283 0.628669 14:33:43 0.33650467 0.743310 14:33:58 0.38950199 1.000000 So, <fraction_done> does reach 100% eventually (as I see it reporting live in BOINC Manager), but it rushes up in a hockey-stick curve at the end. Meanwhile, prog(ress) never made it past 40... (I didn't bother with 1-second sampling - I think this shows the effect clearly enough. I'll perhaps try again tomorrow.) ID: 57349 ·

Richard Haselgrove Volunteer tester Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0	Message 57350 - Posted: 15 Mar 2016, 15:33:33 UTC - in response to Message 57348. If checkpoints ~1s give more "linear" progress - there is nothing to fix. Nobody in their right minds would use 1-second checkpoint intervals for production running. If anything, people extend checkpoint intervals to save wear and tear on their storage devices. ID: 57350 ·

Urs Echternacht Volunteer tester Send message Joined: 18 Jan 06 Posts: 1038 Credit: 18,734,730 RAC: 0	Message 57357 - Posted: 15 Mar 2016, 21:50:35 UTC Notice: Trying to get property of non-object in /disks/carolyn/b/home/boincadm/projects/beta/html/inc/result.inc on line 80 Notice: Trying to get property of non-object in /disks/carolyn/b/home/boincadm/projects/beta/html/inc/result.inc on line 81 Notice: Trying to get property of non-object in /disks/carolyn/b/home/boincadm/projects/beta/html/inc/result.inc on line 82 Notice: Trying to get property of non-object in /disks/carolyn/b/home/boincadm/projects/beta/html/inc/result.inc on line 83 Notice: Trying to get property of non-object in /disks/carolyn/b/home/boincadm/projects/beta/html/inc/result.inc on line 83 Notice: Trying to get property of non-object in /disks/carolyn/b/home/boincadm/projects/beta/html/inc/result.inc on line 84 Notice: Trying to get property of non-object in /disks/carolyn/b/home/boincadm/projects/beta/html/inc/result.inc on line 89 hostid 51991, if i look at the error list of that host i get above server notices. Anything to fix there on server side ? _\\|/_ U r s ID: 57357 ·

Urs Echternacht Volunteer tester Send message Joined: 18 Jan 06 Posts: 1038 Credit: 18,734,730 RAC: 0	Message 57358 - Posted: 15 Mar 2016, 22:04:57 UTC - in response to Message 57320. Last modified: 15 Mar 2016, 22:13:08 UTC I am getting quite a few inconclusives. Seeing that on my Mac Pro both on Main and beta, but not on a Nvidia 570 in Windows. In each case I'm missing a gaussian compared to my wingman. Main: http://setiathome.berkeley.edu/results.php?hostid=6105482&offset=0&show_names=0&state=3&appid= Beta: https://setiweb.ssl.berkeley.edu/beta/results.php?hostid=60016&offset=0&show_names=0&state=3&appid= Let me know if you need something else. Thanks, Chris Worth to check if it's SOG specific or Mac-specific issue. I could process those beta tasks later with own GPU, maybe Urs could check them on Mac. Urs? I'll pick one or two wus from that host with the characteristics "Gaussian(s) missing compared to co-host" and try to rerun in standalone comparing to CPU app(s). _\\|/_ U r s ID: 57358 ·

Chris Adamek Volunteer tester Send message Joined: 27 Aug 12 Posts: 56 Credit: 127,133 RAC: 0	Message 57363 - Posted: 16 Mar 2016, 1:06:13 UTC - in response to Message 57358. Let me know if you need any additional information. Thanks, Chris ID: 57363 ·

Dirk Sadowski Volunteer tester Send message Joined: 7 Jun 09 Posts: 285 Credit: 2,822,466 RAC: 0	Message 57364 - Posted: 16 Mar 2016, 3:11:39 UTC - in response to Message 57338. Last modified: 16 Mar 2016, 3:14:19 UTC Raistmer wrote: try to enable -v 8 option (nothing else, defaults!) and copy stderr.txt from slot folder before it will be deleted (from driver restart to task abortion you should have some hours to act). Then send stderr.txt to me. EDIT: Number of compute units: 64 - it's the real "monster" so constitues high-end edge case. And while I will process log you could try to add -pref_wg_num_per_cu 1 or 2 and see if driver restarts continue You mean copy/paste the entries of the stderr.txt file and send a private message to you? With BBCs [ pre ] ? Or the file via E-Mail? ...not a few hours - fast VGA cards. ;-) I should test it here or at Main? Because there it happens after a few minutes. (so maybe just 2 or 3 tasks error out - or with good luck, no task) Could be a SETI GPU app 'destroy' the AMD driver? I tested the r3401 app (with default settings) at SETI-Main. I had let run two tasks successively - for to create the .WISDOM file. Then I had let run the PC fully loaded and after ~ 2 minutes the driver restarts starts, every ~ 10 seconds. After a few restarts the Windows (8.1 Pro x64) Desktop disappeared and a blue screen came, something like the following was written: the AMD***.* file has been destroyed or the file disappeared. Then the PC made a self reboot. This was with Crimson 16.3 Hotfix (Beta) - and the reason that I installed again Crimson 15.12 (if the 16.3 driver isn't longer usable?) (of course, between usage of DDU ;-). ID: 57364 ·

©2025 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.