Message boards :
News :
SETI@home v8 beta to begin on Tuesday
Message board moderation
Previous · 1 . . . 41 · 42 · 43 · 44 · 45 · 46 · 47 . . . 99 · Next
Author | Message |
---|---|
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
What prog value recorded at state.sah when task finishes? I would like to have some test case for offline checking. News about SETI opt app releases: https://twitter.com/Raistmer |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
Better this way.
Testing on beta preferable cause results remain visible long enough. Also, if app not working properly it's not wise to use it on main.
No. It's just Windows who destroys (restarts) driver. Windows watchdog timer exceeded most probably.
With what driver r3330 worked well? News about SETI opt app releases: https://twitter.com/Raistmer |
Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0 ![]() |
Meanwhile, prog(ress) never made it past 40... I managed to snag one of the very, very high AR WUs that started all this off. I won't bore you with all 464 lines, but here are the significant ones. wu_name: 24no10ab.7605.6611.8.42.47 WU true angle range is : 136.505732 <prog> <fraction_done> 11:35:33 11:35:34 0.00000745 11:35:35 0.00000745 11:35:36 0.00031612 0.000012 11:35:37 0.00056272 0.000012 11:35:38 0.00086394 0.000012 11:35:39 0.00111551 0.000012 11:35:40 0.00136377 0.000012 <snip> 11:43:12 0.12065697 0.000012 11:43:13 0.12165332 0.000012 11:43:14 0.12312135 0.000012 11:43:15 0.12312135 1.000000 11:43:16 0.12312135 1.000000 I also saw what happens to estimates if you tell BOINC that your application has only made 0.0012% progress in 7 minutes: ![]() I've kept the WU file (saves downloading it again later), and I'll try and organise an offline run after the budget speech. |
Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0 ![]() |
What prog value recorded at state.sah when task finishes? OK, offline results. 1) Running MB8_win_x86_SSE3_OpenCL_NV_r3401_SoG with standard 60-second checkpoint interval. Final checkpoint file timestamped ‎16 ‎March ‎2016, â€â€Ž14:34:41 App Ended at : 14:34:43.804 (2.8 seconds later) Final state.sah starts <ncfft>99182</ncfft> <cr>-9.999761e+001</cr> <fl>32768</fl> <prog>0.12312135</prog> <potfreq>-1</potfreq> <potactivity>0</potactivity> <signal_count>5</signal_count> <flops>964166035.817873</flops> <spike_count>5</spike_count> <autocorr_count>0</autocorr_count> <pulse_count>0</pulse_count> <gaussian_count>0</gaussian_count> <triplet_count>0</triplet_count> 2) Reference run with Lunatics_x41zi_win32_cuda50 Final checkpoint file timestamped 16 ‎March ‎2016, â€â€Ž14:26:01 App Ended at : 14:27:05.574 (64.5 seconds later) Final state.sah starts <ncfft>84154</ncfft> <cr>2.916597e+001</cr> <fl>131072</fl> <prog>0.85205204</prog> <potfreq>-1</potfreq> <potactivity>0</potactivity> <signal_count>4</signal_count> <flops>14347800817644.363000</flops> <spike_count>4</spike_count> <autocorr_count>0</autocorr_count> <pulse_count>0</pulse_count> <gaussian_count>0</gaussian_count> <triplet_count>0</triplet_count> (but validated Q= 99.96% - the fifth spike must have been found in the last minute) 3) Running MB8_win_x86_SSE3_OpenCL_NV_r3401_SoG with special 1-second checkpoint interval. Final checkpoint file timestamped 16 ‎March ‎2016, â€â€Ž14:55:22 App Ended at : 14:55:24.560 (2.5 seconds later) Final state.sah starts <ncfft>99182</ncfft> <cr>-9.999761e+001</cr> <fl>32768</fl> <prog>0.12312135</prog> <potfreq>-1</potfreq> <potactivity>0</potactivity> <signal_count>5</signal_count> <flops>964166035.817873</flops> <spike_count>5</spike_count> <autocorr_count>0</autocorr_count> <pulse_count>0</pulse_count> <gaussian_count>0</gaussian_count> <triplet_count>0</triplet_count> I was checking at intervals throughout both SoG runs, and state.sah was being updated at the prescribed intervals - but it looks as if Murphy's law intervened and the final 60-second checkpoint occurred just as the app was preparing to clean up anyway. It looks to me as if both <prog> and <fraction_done> are broken, but in different ways. I have to put this research to one side now, and go out - back tomorrow (Thursday) evening, and we can pick it up again then. |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
I think I have idea how to fix resulting readings a little (though it's all cosmetic) EDIT: what if BOINC will have 100% completion instead of 0.0012% after few seconds from task start? Will it be preferable than sit at 0.0012% for most of time? News about SETI opt app releases: https://twitter.com/Raistmer |
![]() Send message Joined: 7 Jun 09 Posts: 285 Credit: 2,822,466 RAC: 0 ![]() |
So 'nothing' could 'destroy' the AMD driver this way, so I would need to install the AMD driver again? With (Catalyst 15.7.1(?) up to 15.11) Crimson 15.11 all others after up to Crimson 16.3 Hotfix (Beta) r3330 work/ed fine - IIRC. -> The PC is running since ~ October last year and I had the newest/current (also Beta) AMD drivers installed. But, the default cpu_lock of r3330 don't work properly (with all used drivers). If default enabled all 4 GPU (1 WU/GPU) apps are fixed at Core#0. I need to use -no_cpu_lock, so all Cores are used. With r3401 the default cpu_lock work like it should. Core#0, #1, #2 and #3 each with one fixed GPU app. Please send me your E-Mail via private message. ;-) ![]() |
![]() Send message Joined: 18 Jan 06 Posts: 1038 Credit: 18,734,730 RAC: 0 ![]() |
Let me know if you need any additional information.Chris, my reruns of two wus from your missed Gaussians results have finished (see results if interested) No signals have been found missing. From my point of view the Mac apps work ok for these wus. Additional i've looked over mostly all the results in your result list and found that at some point in time the second GPU starts to run at lower frequency. That could point to a driver crash, but i'm not quite sure yet. Does your system logs show any problem that could come from a crashed GPU driver on the second GPU ? There is also some other Mac with AMD D700 GPUs at beta, which seems to have no trouble at all with these apps. https://setiweb.ssl.berkeley.edu/beta//show_host_detail.php?hostid=71984 _\|/_ U r s |
Send message Joined: 27 Aug 12 Posts: 56 Credit: 127,133 RAC: 0 ![]() |
Hmm, interesting. I have not noticed and driver crashes, but I will certainly take a look at my system log to see if there is anything going on there. Where are you seeing the clock drop? In the openCL report in the output file? I have occasionally seen what I'd call erroneous (I.e 150MHz info reported there, as I never see hugely different completion times between the two units. Thank you for the information. I'll dig into my system logs this morning and see what's happening. Might be a case of me running beta OS X. I have a new version of 10.11.4 to install today as well. Thanks, Chris |
![]() ![]() Send message Joined: 16 Jun 05 Posts: 2531 Credit: 1,074,556 RAC: 0 ![]() |
But, the default cpu_lock of r3330 don't work properly (with all used drivers). Thats why i suggest to use 3401. Its more stable also. I just have to run a few more speed benches to make sure about the speed settings. With each crime and every kindness we birth our future. |
Send message Joined: 27 Aug 12 Posts: 56 Credit: 127,133 RAC: 0 ![]() |
Hmm, interesting. I have not noticed and driver crashes, but I will certainly take a look at my system log to see if there is anything going on there. Where are you seeing the clock drop? In the openCL report in the output file? I have occasionally seen what I'd call erroneous (I.e 150MHz info reported there, as I never see hugely different completion times between the two units. Thank you for the information. I'll dig into my system logs this morning and see what's happening. Might be a case of me running beta OS X. I have a new version of 10.11.4 to install today as well. |
Send message Joined: 27 Aug 12 Posts: 56 Credit: 127,133 RAC: 0 ![]() |
Hmm, interesting. I have not noticed and driver crashes, but I will certainly take a look at my system log to see if there is anything going on there. Where are you seeing the clock drop? In the openCL report in the output file? I have occasionally seen what I'd call erroneous (I.e 150MHz info reported there, as I never see hugely different completion times between the two units. Thank you for the information. I'll dig into my system logs this morning and see what's happening. Might be a case of me running beta OS X. I have a new version of 10.11.4 to install today as well. It also reports that 8.07 is making too many wakeup calls, it allows 150 per second and sometimes there are as many as 1300 per second. Doesn't seem to cause a crash exactly, but it is reported in the system log. Chris |
![]() Send message Joined: 18 Jan 06 Posts: 1038 Credit: 18,734,730 RAC: 0 ![]() |
Hmm, interesting. I have not noticed and driver crashes, but I will certainly take a look at my system log to see if there is anything going on there. Where are you seeing the clock drop? In the openCL report in the output file? I have occasionally seen what I'd call erroneous (I.e 150MHz info reported there, as I never see hugely different completion times between the two units. Thank you for the information. I'll dig into my system logs this morning and see what's happening. Might be a case of me running beta OS X. I have a new version of 10.11.4 to install today as well. Is there listed if ati5 or ati5_SoG causes the wakeup calls ? _\|/_ U r s |
Send message Joined: 27 Aug 12 Posts: 56 Credit: 127,133 RAC: 0 ![]() |
Both the SoG and non-SoG are guilty based on the log files. Chris |
![]() Send message Joined: 18 Jan 06 Posts: 1038 Credit: 18,734,730 RAC: 0 ![]() |
Both the SoG and non-SoG are guilty based on the log files. Was it with optimized settings or with defaults ? Someone with a similar Mac (Pro, 2x D300/500/700 GPUs) could look up how many wakecalls happen on their hosts when OpenCL apps are running, to see if this is normal, please. _\|/_ U r s |
Send message Joined: 27 Aug 12 Posts: 56 Credit: 127,133 RAC: 0 ![]() |
If you don't hear from anyone, I'll have a second Mac Pro in about a month and a half with D500 cards and I'll see if it does the same on it. It looks like the wake up calls were occurring both with default and optimized settings, it's kinda hard to correlate the log files to specific wu's (I actually think it may match some of the inconclusives as well but it's hard to figure that out exactly)but I'll go back to default setting tomorrow and verify for you. Thanks, Chris |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
Seems Sutaru found bug in current RC apps. I need task with AR of 1.047818 for offline benchmarking. Please find such. News about SETI opt app releases: https://twitter.com/Raistmer |
Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0 ![]() |
Seems Sutaru found bug in current RC apps. How precise do you need the match to be (+/- ?) I had a number around that range during last week's test run, like 24no10ab.26598.1703.8.42.240_0 with WU true angle range is : 1.050587 And if that's not close enough, you can always edit the header for testing... |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
Seems just AR field change not enough to simulate same PulseFind geometry through task. So I prefer to get exact task. News about SETI opt app releases: https://twitter.com/Raistmer |
![]() ![]() Send message Joined: 16 Jun 05 Posts: 2531 Credit: 1,074,556 RAC: 0 ![]() |
Will not be easy to find exactly same AR out in the field. With each crime and every kindness we birth our future. |
Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0 ![]() |
You could try a command like findstr "<true_angle_range>1.0478" *.* in a batch file that you run periodically in the project directory of a machine with a busy cache - either interactively with a pause command to eyeball the results, or scheduled with a redirect/append to a log file for later analysis. I did it like that, with a 4 decimal place trim to catch near misses, on this machine, but nothing. |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.