SETI@home v8 beta to begin on Tuesday

Author	Message
Raistmer Volunteer tester Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0	Message 57366 - Posted: 16 Mar 2016, 5:59:52 UTC - in response to Message 57349. Meanwhile, prog(ress) never made it past 40... (I didn't bother with 1-second sampling - I think this shows the effect clearly enough. I'll perhaps try again tomorrow.) What prog value recorded at state.sah when task finishes? I would like to have some test case for offline checking. News about SETI opt app releases: https://twitter.com/Raistmer ID: 57366 ·

Raistmer Volunteer tester Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0	Message 57367 - Posted: 16 Mar 2016, 7:48:49 UTC - in response to Message 57364. Or the file via E-Mail? Better this way. I should test it here or at Main? Testing on beta preferable cause results remain visible long enough. Also, if app not working properly it's not wise to use it on main. Could be a SETI GPU app 'destroy' the AMD driver? No. It's just Windows who destroys (restarts) driver. Windows watchdog timer exceeded most probably. This was with Crimson 16.3 Hotfix (Beta) - and the reason that I installed again Crimson 15.12 (if the 16.3 driver isn't longer usable?) (of course, between usage of DDU ;-). With what driver r3330 worked well? News about SETI opt app releases: https://twitter.com/Raistmer ID: 57367 ·

Richard Haselgrove Volunteer tester Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0	Message 57374 - Posted: 16 Mar 2016, 12:25:52 UTC - in response to Message 57366. Meanwhile, prog(ress) never made it past 40... (I didn't bother with 1-second sampling - I think this shows the effect clearly enough. I'll perhaps try again tomorrow.) What prog value recorded at state.sah when task finishes? I would like to have some test case for offline checking. I managed to snag one of the very, very high AR WUs that started all this off. I won't bore you with all 464 lines, but here are the significant ones. wu_name: 24no10ab.7605.6611.8.42.47 WU true angle range is : 136.505732 <prog> <fraction_done> 11:35:33 11:35:34 0.00000745 11:35:35 0.00000745 11:35:36 0.00031612 0.000012 11:35:37 0.00056272 0.000012 11:35:38 0.00086394 0.000012 11:35:39 0.00111551 0.000012 11:35:40 0.00136377 0.000012 <snip> 11:43:12 0.12065697 0.000012 11:43:13 0.12165332 0.000012 11:43:14 0.12312135 0.000012 11:43:15 0.12312135 1.000000 11:43:16 0.12312135 1.000000 I also saw what happens to estimates if you tell BOINC that your application has only made 0.0012% progress in 7 minutes: I've kept the WU file (saves downloading it again later), and I'll try and organise an offline run after the budget speech. ID: 57374 ·

Richard Haselgrove Volunteer tester Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0	Message 57375 - Posted: 16 Mar 2016, 15:06:46 UTC - in response to Message 57366. What prog value recorded at state.sah when task finishes? OK, offline results. 1) Running MB8_win_x86_SSE3_OpenCL_NV_r3401_SoG with standard 60-second checkpoint interval. Final checkpoint file timestamped â€Ž16 â€ŽMarch â€Ž2016, â€â€Ž14:34:41 App Ended at : 14:34:43.804 (2.8 seconds later) Final state.sah starts <ncfft>99182</ncfft> <cr>-9.999761e+001</cr> <fl>32768</fl> <prog>0.12312135</prog> <potfreq>-1</potfreq> <potactivity>0</potactivity> <signal_count>5</signal_count> <flops>964166035.817873</flops> <spike_count>5</spike_count> <autocorr_count>0</autocorr_count> <pulse_count>0</pulse_count> <gaussian_count>0</gaussian_count> <triplet_count>0</triplet_count> 2) Reference run with Lunatics_x41zi_win32_cuda50 Final checkpoint file timestamped 16 â€ŽMarch â€Ž2016, â€â€Ž14:26:01 App Ended at : 14:27:05.574 (64.5 seconds later) Final state.sah starts <ncfft>84154</ncfft> <cr>2.916597e+001</cr> <fl>131072</fl> <prog>0.85205204</prog> <potfreq>-1</potfreq> <potactivity>0</potactivity> <signal_count>4</signal_count> <flops>14347800817644.363000</flops> <spike_count>4</spike_count> <autocorr_count>0</autocorr_count> <pulse_count>0</pulse_count> <gaussian_count>0</gaussian_count> <triplet_count>0</triplet_count> (but validated Q= 99.96% - the fifth spike must have been found in the last minute) 3) Running MB8_win_x86_SSE3_OpenCL_NV_r3401_SoG with special 1-second checkpoint interval. Final checkpoint file timestamped 16 â€ŽMarch â€Ž2016, â€â€Ž14:55:22 App Ended at : 14:55:24.560 (2.5 seconds later) Final state.sah starts <ncfft>99182</ncfft> <cr>-9.999761e+001</cr> <fl>32768</fl> <prog>0.12312135</prog> <potfreq>-1</potfreq> <potactivity>0</potactivity> <signal_count>5</signal_count> <flops>964166035.817873</flops> <spike_count>5</spike_count> <autocorr_count>0</autocorr_count> <pulse_count>0</pulse_count> <gaussian_count>0</gaussian_count> <triplet_count>0</triplet_count> I was checking at intervals throughout both SoG runs, and state.sah was being updated at the prescribed intervals - but it looks as if Murphy's law intervened and the final 60-second checkpoint occurred just as the app was preparing to clean up anyway. It looks to me as if both <prog> and <fraction_done> are broken, but in different ways. I have to put this research to one side now, and go out - back tomorrow (Thursday) evening, and we can pick it up again then. ID: 57375 ·

Raistmer Volunteer tester Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0	Message 57382 - Posted: 16 Mar 2016, 22:22:14 UTC - in response to Message 57375. Last modified: 16 Mar 2016, 22:46:54 UTC I think I have idea how to fix resulting readings a little (though it's all cosmetic) EDIT: what if BOINC will have 100% completion instead of 0.0012% after few seconds from task start? Will it be preferable than sit at 0.0012% for most of time? News about SETI opt app releases: https://twitter.com/Raistmer ID: 57382 ·

Dirk Sadowski Volunteer tester Send message Joined: 7 Jun 09 Posts: 285 Credit: 2,822,466 RAC: 0	Message 57394 - Posted: 17 Mar 2016, 19:25:24 UTC - in response to Message 57367. Last modified: 17 Mar 2016, 19:32:23 UTC Or the file via E-Mail? Better this way. I should test it here or at Main? Testing on beta preferable cause results remain visible long enough. Also, if app not working properly it's not wise to use it on main. Could be a SETI GPU app 'destroy' the AMD driver? No. It's just Windows who destroys (restarts) driver. Windows watchdog timer exceeded most probably. This was with Crimson 16.3 Hotfix (Beta) - and the reason that I installed again Crimson 15.12 (if the 16.3 driver isn't longer usable?) (of course, between usage of DDU ;-). With what driver r3330 worked well? So 'nothing' could 'destroy' the AMD driver this way, so I would need to install the AMD driver again? With (Catalyst 15.7.1(?) up to 15.11) Crimson 15.11 all others after up to Crimson 16.3 Hotfix (Beta) r3330 work/ed fine - IIRC. -> The PC is running since ~ October last year and I had the newest/current (also Beta) AMD drivers installed. But, the default cpu_lock of r3330 don't work properly (with all used drivers). If default enabled all 4 GPU (1 WU/GPU) apps are fixed at Core#0. I need to use -no_cpu_lock, so all Cores are used. With r3401 the default cpu_lock work like it should. Core#0, #1, #2 and #3 each with one fixed GPU app. Please send me your E-Mail via private message. ;-) ID: 57394 ·

Urs Echternacht Volunteer tester Send message Joined: 18 Jan 06 Posts: 1038 Credit: 18,734,730 RAC: 0	Message 57405 - Posted: 18 Mar 2016, 2:17:28 UTC - in response to Message 57363. Last modified: 18 Mar 2016, 2:22:18 UTC Let me know if you need any additional information. Thanks, Chris Chris, my reruns of two wus from your missed Gaussians results have finished (see results if interested) No signals have been found missing. From my point of view the Mac apps work ok for these wus. Additional i've looked over mostly all the results in your result list and found that at some point in time the second GPU starts to run at lower frequency. That could point to a driver crash, but i'm not quite sure yet. Does your system logs show any problem that could come from a crashed GPU driver on the second GPU ? There is also some other Mac with AMD D700 GPUs at beta, which seems to have no trouble at all with these apps. https://setiweb.ssl.berkeley.edu/beta//show_host_detail.php?hostid=71984 _\\|/_ U r s ID: 57405 ·

Chris Adamek Volunteer tester Send message Joined: 27 Aug 12 Posts: 56 Credit: 127,133 RAC: 0	Message 57414 - Posted: 18 Mar 2016, 12:08:35 UTC - in response to Message 57405. Hmm, interesting. I have not noticed and driver crashes, but I will certainly take a look at my system log to see if there is anything going on there. Where are you seeing the clock drop? In the openCL report in the output file? I have occasionally seen what I'd call erroneous (I.e 150MHz info reported there, as I never see hugely different completion times between the two units. Thank you for the information. I'll dig into my system logs this morning and see what's happening. Might be a case of me running beta OS X. I have a new version of 10.11.4 to install today as well. Thanks, Chris ID: 57414 ·

Mike Volunteer tester Send message Joined: 16 Jun 05 Posts: 2531 Credit: 1,074,556 RAC: 0	Message 57417 - Posted: 18 Mar 2016, 13:25:14 UTC But, the default cpu_lock of r3330 don't work properly (with all used drivers). If default enabled all 4 GPU (1 WU/GPU) apps are fixed at Core#0. I need to use -no_cpu_lock, so all Cores are used. With r3401 the default cpu_lock work like it should. Core#0, #1, #2 and #3 each with one fixed GPU app. Thats why i suggest to use 3401. Its more stable also. I just have to run a few more speed benches to make sure about the speed settings. With each crime and every kindness we birth our future. ID: 57417 ·

Chris Adamek Volunteer tester Send message Joined: 27 Aug 12 Posts: 56 Credit: 127,133 RAC: 0	Message 57422 - Posted: 18 Mar 2016, 16:32:00 UTC - in response to Message 57414. Hmm, interesting. I have not noticed and driver crashes, but I will certainly take a look at my system log to see if there is anything going on there. Where are you seeing the clock drop? In the openCL report in the output file? I have occasionally seen what I'd call erroneous (I.e 150MHz info reported there, as I never see hugely different completion times between the two units. Thank you for the information. I'll dig into my system logs this morning and see what's happening. Might be a case of me running beta OS X. I have a new version of 10.11.4 to install today as well. FYI, I looked at my system logs. I don't see any driver restarts per se but there looks like there is an OpenCL error every so often with the 8.07 app. My guess it is due to the beta version of OS X I'm running. About to update to the 7th beta of it so we'll see if it continues. Thanks, Chris ID: 57422 ·

Chris Adamek Volunteer tester Send message Joined: 27 Aug 12 Posts: 56 Credit: 127,133 RAC: 0	Message 57423 - Posted: 18 Mar 2016, 18:24:42 UTC - in response to Message 57422. Hmm, interesting. I have not noticed and driver crashes, but I will certainly take a look at my system log to see if there is anything going on there. Where are you seeing the clock drop? In the openCL report in the output file? I have occasionally seen what I'd call erroneous (I.e 150MHz info reported there, as I never see hugely different completion times between the two units. Thank you for the information. I'll dig into my system logs this morning and see what's happening. Might be a case of me running beta OS X. I have a new version of 10.11.4 to install today as well. FYI, I looked at my system logs. I don't see any driver restarts per se but there looks like there is an OpenCL error every so often with the 8.07 app. My guess it is due to the beta version of OS X I'm running. About to update to the 7th beta of it so we'll see if it continues. Thanks, Chris It also reports that 8.07 is making too many wakeup calls, it allows 150 per second and sometimes there are as many as 1300 per second. Doesn't seem to cause a crash exactly, but it is reported in the system log. Chris ID: 57423 ·

Urs Echternacht Volunteer tester Send message Joined: 18 Jan 06 Posts: 1038 Credit: 18,734,730 RAC: 0	Message 57425 - Posted: 18 Mar 2016, 22:27:02 UTC - in response to Message 57423. Last modified: 18 Mar 2016, 22:27:57 UTC Hmm, interesting. I have not noticed and driver crashes, but I will certainly take a look at my system log to see if there is anything going on there. Where are you seeing the clock drop? In the openCL report in the output file? I have occasionally seen what I'd call erroneous (I.e 150MHz info reported there, as I never see hugely different completion times between the two units. Thank you for the information. I'll dig into my system logs this morning and see what's happening. Might be a case of me running beta OS X. I have a new version of 10.11.4 to install today as well. FYI, I looked at my system logs. I don't see any driver restarts per se but there looks like there is an OpenCL error every so often with the 8.07 app. My guess it is due to the beta version of OS X I'm running. About to update to the 7th beta of it so we'll see if it continues. Thanks, Chris It also reports that 8.07 is making too many wakeup calls, it allows 150 per second and sometimes there are as many as 1300 per second. Doesn't seem to cause a crash exactly, but it is reported in the system log. Chris Is there listed if ati5 or ati5_SoG causes the wakeup calls ? _\\|/_ U r s ID: 57425 ·

Chris Adamek Volunteer tester Send message Joined: 27 Aug 12 Posts: 56 Credit: 127,133 RAC: 0	Message 57426 - Posted: 18 Mar 2016, 23:55:48 UTC - in response to Message 57425. Both the SoG and non-SoG are guilty based on the log files. Chris ID: 57426 ·

Urs Echternacht Volunteer tester Send message Joined: 18 Jan 06 Posts: 1038 Credit: 18,734,730 RAC: 0	Message 57430 - Posted: 19 Mar 2016, 18:08:39 UTC - in response to Message 57426. Both the SoG and non-SoG are guilty based on the log files. Chris Was it with optimized settings or with defaults ? Someone with a similar Mac (Pro, 2x D300/500/700 GPUs) could look up how many wakecalls happen on their hosts when OpenCL apps are running, to see if this is normal, please. _\\|/_ U r s ID: 57430 ·

Chris Adamek Volunteer tester Send message Joined: 27 Aug 12 Posts: 56 Credit: 127,133 RAC: 0	Message 57432 - Posted: 20 Mar 2016, 3:57:10 UTC - in response to Message 57430. If you don't hear from anyone, I'll have a second Mac Pro in about a month and a half with D500 cards and I'll see if it does the same on it. It looks like the wake up calls were occurring both with default and optimized settings, it's kinda hard to correlate the log files to specific wu's (I actually think it may match some of the inconclusives as well but it's hard to figure that out exactly)but I'll go back to default setting tomorrow and verify for you. Thanks, Chris ID: 57432 ·

Raistmer Volunteer tester Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0	Message 57447 - Posted: 21 Mar 2016, 14:15:47 UTC Seems Sutaru found bug in current RC apps. I need task with AR of 1.047818 for offline benchmarking. Please find such. News about SETI opt app releases: https://twitter.com/Raistmer ID: 57447 ·

Richard Haselgrove Volunteer tester Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0	Message 57448 - Posted: 21 Mar 2016, 14:37:56 UTC - in response to Message 57447. Seems Sutaru found bug in current RC apps. I need task with AR of 1.047818 for offline benchmarking. Please find such. How precise do you need the match to be (+/- ?) I had a number around that range during last week's test run, like 24no10ab.26598.1703.8.42.240_0 with WU true angle range is : 1.050587 And if that's not close enough, you can always edit the header for testing... ID: 57448 ·

Raistmer Volunteer tester Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0	Message 57454 - Posted: 22 Mar 2016, 7:54:47 UTC - in response to Message 57448. Seems just AR field change not enough to simulate same PulseFind geometry through task. So I prefer to get exact task. News about SETI opt app releases: https://twitter.com/Raistmer ID: 57454 ·

Mike Volunteer tester Send message Joined: 16 Jun 05 Posts: 2531 Credit: 1,074,556 RAC: 0	Message 57457 - Posted: 22 Mar 2016, 9:17:31 UTC Will not be easy to find exactly same AR out in the field. With each crime and every kindness we birth our future. ID: 57457 ·

Richard Haselgrove Volunteer tester Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0	Message 57458 - Posted: 22 Mar 2016, 10:05:22 UTC - in response to Message 57457. You could try a command like findstr "<true_angle_range>1.0478" . in a batch file that you run periodically in the project directory of a machine with a busy cache - either interactively with a pause command to eyeball the results, or scheduled with a redirect/append to a log file for later analysis. I did it like that, with a 4 decimal place trim to catch near misses, on this machine, but nothing. ID: 57458 ·

©2025 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.