Message boards :
News :
SETI@home v8 beta to begin on Tuesday
Message board moderation
Previous · 1 . . . 94 · 95 · 96 · 97 · 98 · 99 · Next
Author | Message |
---|---|
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
about what is expected before an application is accepted for deployment as a stock application on that SETI Main server, Richard, binary passed beta testing. In all senses possible, there is no point for discussion. Beta testing itself was inadequatedue to lack of tasks and/or hosts diversity. If it willbe onbeta one more year it would change nothing until beta will be improved itself. News about SETI opt app releases: https://twitter.com/Raistmer |
Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0 ![]() |
But the post you're quoting from makes no mention of 'testing at Beta'. I don't think any of the application versions you and I have worked on together have relied exclusively on testing at Beta. We've done offline bench testing with known stock apps for comparison: run under app_info at Main (which produces exactly the same result files as stock deployment): monitored host result lists for valid/inconclusive/invalid: downloaded live data files for checking in bench tests when results look unusual: and so on.about what is expected before an application is accepted for deployment as a stock application on that SETI Main server,Richard, binary passed beta testing. In all senses possible, there is no point for discussion. Beta testing itself was inadequatedue to lack of tasks and/or hosts diversity. If it willbe onbeta one more year it would change nothing until beta will be improved itself. I suspect the issue is that Eric assumes that the whole gamut of testing has been run before an app is submitted to him for deployment: others may assume that an app offered to Eric for Beta deployment subsequently goes through all necessary testing stages before being transferred to Main. |
Send message Joined: 2 Jul 13 Posts: 505 Credit: 5,019,318 RAC: 0 ![]() |
Well, we definitely have 2 issues here:You forgot the third one, 3) A Sanity checker set a touch to high. You could also add a fourth, 4) A New version of BOINC that can't read the GPU Memory correctly helping the Sanity Check Fail on some machines. The Failures with the nVidia version on hardware slightly different than stock indicates a Sanity Checker a little too unforgiving. The nVidia version should be replaced as well, you shouldn't get Sanity failures from just using third party hardware. I believe if you check you will find the machines with the most failures are the ones reporting Zero or Lower GPU Memory. Some Projects are also having Failures from BOINC reporting Negative GPU Memory, Negative amount of GPU VRAM and valid Einstein results discarded. The requirement for the ATI App are Extremely simple, the OpenCL driver is built-in to the OS. All you need for that ATI App is OSX 10.7.5 (Darwin 11.4.2) or above, you can't get much simpler. There is nothing I can do about software that has a Sanity Checker set too high, there also isn't much you can do about BOINC reporting Negative GPU Memory to the Sanity Check that is already set too high. Fortunately, My ATI Cards along with all the other HD 5000 & HD 6000's didn't have the problem with the Sanity Checker. If you check the Macs at Beta you will find many of them are HD 5770 and some HD 6700 & 6900s which wouldn't have problems with the Sanity. That leaves few Macs that would have found the problem, and apparently those few at Beta are working well enough to pass the Sanity Check. It appears the r3610 version has a more tolerant Checker as the machines running it on Main haven't had any trouble in the months they've been running it. No one person, or couple of people, can check the software better than running it on Beta. To keep harping that more testing should be done before arriving at Beta is just a lame attempt to Blame someone else rather than placing the blame on the Beta procedure where it belongs. |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
What do you mean under "Sanity checker set too high" ? sanity check is procedure that compares aquired results with theoretically possible (more precisely, impossible) ones. The single ajustable limit is for Autocorr search. If sanity check fails on another type of signal it means processing gave theoretically impossible result, there is nothing to ajust. More or less sanity check failures then just means more or less stable computations app does for some reason. Maybe in low-memory condition some of kernels fail silently (it can be cuFFT kernel cause cuFFT requires huge amount of RAM or some of new Petri kernels or some of baseline kernels). And this failure results in theoretically impossible numbers in some of result fields. That causing sanity check fail but that's exactly it designed for - to catch errors. News about SETI opt app releases: https://twitter.com/Raistmer |
Send message Joined: 2 Jul 13 Posts: 505 Credit: 5,019,318 RAC: 0 ![]() |
Whatever is causing the problem is occurring at startup, the App is failing to start numerous times. What else could cause random startup failures? Since the cards that aren't affected are the Legacy cards it would appear it's some difference with the newer driver. The 5000 & 6000 series were labeled Legacy by AMD some time ago, the 7000 series and higher use a different driver and are the ones suffering the startup failures. The same problem with the nVidia version. Most people using the Apple Supplied GPUs are using the Built-in Apple Supplied driver while those using the Non-Apple cards are using the Web Driver from nVidia that must be downloaded and installed. So, what's being randomly triggered by the different drivers in r3552 that's not being triggered by r3610? This is a normal startup in r3552; LotOfMem path: no LowPerformanceGPU path: no HighPerformanceGPU path: no period_iterations_num=50 Triplet: peak=11.68754, time=37.56, period=23.58, d_freq=2225387591.59, chirp=0.64733, fft_len=128 Autocorr: peak=19.66915, time=17.18, delay=3.7208, d_freq=2225385894.55, chirp=-8.8951, fft_len=128k Pulse: peak=1.974179, time=45.9, period=3.657, d_freq=2225389062.62, score=1.023, chirp=-19.039, fft_len=2k D: threshold 0.3587262; unscaled peak power: 0.3640594 exceeds threshold for 1.487% Triplet: peak=11.68612, time=65.65, period=20.01, d_freq=2225385645.68, chirp=-38.806, fft_len=32 This is a bad startup in 3552; LotOfMem path: no LowPerformanceGPU path: no HighPerformanceGPU path: no period_iterations_num=50 OpenCL platform detected: Apple Number of OpenCL devices found : 2 So what is happening right after 'period_iterations_num=50' is printed to cause the task to restart? |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
That could be checked with increased verbositybuild. Butwe have no hardware to test. I asked forguinea pig host many times already no one volunteered. News about SETI opt app releases: https://twitter.com/Raistmer |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
But the post you're quoting from makes no mention of 'testing at Beta'. I don't think any of the application versions you and I have worked on together have relied exclusively on testing at Beta. We've done offline bench testing with known stock apps for comparison: run under app_info at Main (which produces exactly the same result files as stock deployment): monitored host result lists for valid/inconclusive/invalid: downloaded live data files for checking in bench tests when results look unusual: and so on. All that is right. But how it relates to current situation with ATi app release? Why we did testing? Because I had no suitable hardware and you had. Do we have eligible host currently? No. Regarding bad cases collection - there were no such cases (according to TBar) before deployment on main - what to collect then? There were no failures on hardware at oddline testers disposal. That is, after you post we still can't do anything additional to improve situation. Besides that I fully agree. Single reason I encourage to use beta servers is much bigger diversity they can provide versus limited diversity I and offline testers team could provide. And now w clearly see that even beta diversity not enough? So? Need to improve that! Load different tapes. News about SETI opt app releases: https://twitter.com/Raistmer |
Send message Joined: 2 Jul 13 Posts: 505 Credit: 5,019,318 RAC: 0 ![]() |
That could be checked with increased verbosity build. But we have no hardware to test.Well, my machine doesn't have the startup problem. It also, like all other 5000 & 6000 series ATI cards, doesn't have the problem with BOINC 7.8.2 & 7.8.3 detecting Negative GPU Memory. Check it out, I suppose it's just another one of those coincidences. The GPUs that don't have the startup problem with r3552 also don't have the GPU memory problem with BOINC 7.8.x. Instead of trying to troubleshoot year old software wouldn't it be easier to just use the current software that doesn't have the problem? You could just do a limited release of r3610 to see if it really does work, say, release it to the latest OS version, Darwin 17.0.0 and above, and to the rest later. Coprocessors : AMD ATI Radeon HD 5770 (1024MB) OpenCL: 1.2 Operating System : Darwin 16.7.0 BOINC version : 7.8.3 Coprocessors : AMD ATI Radeon HD 5770 (1024MB) OpenCL: 1.2 Operating System : Darwin 17.3.0 BOINC version : 7.8.3 Coprocessors [2] AMD AMD Radeon HD - FirePro D500 Compute Engine (-1024MB) OpenCL: 1.2 Operating System : Darwin 16.7.0 BOINC version : 7.8.3 Coprocessors : AMD AMD Radeon R9 M290X Compute Engine (-2048MB) OpenCL: 1.2 Operating System : Darwin 17.2.0 BOINC version : 7.8.3 Coprocessors : AMD ATI Radeon HD 6750M (512MB) OpenCL: 1.2 Operating System : Darwin 16.7.0 BOINC version : 7.8.3 etc...etc....etc... BTW, anyone is free to look through the ATI Hosts at Beta and see if you can find a single Error result declaring Too Many Exits. I couldn't find any. |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
What toolset used to build r3552 and r3610? Different ones? News about SETI opt app releases: https://twitter.com/Raistmer |
Send message Joined: 2 Jul 13 Posts: 505 Credit: 5,019,318 RAC: 0 ![]() |
Both Apps were built in Darwin 10.12.6 with the same tools. The difference is I tested different Defines and came up with a different set from what You had suggested for r3552. The set I used for r3610 works better on all the machines that tested it. r3552; Build features: SETI8 Non-graphics OpenCL USE_OPENCL_HD5xxx OCL_ZERO_COPY OCL_CHIRP3 ASYNC_SPIKE FFTW JSPF SSSE3 64bit r3610; Build features: SETI8 Non-graphics OpenCL USE_OPENCL_HD5xxx OCL_CHIRP3 ASYNC_SPIKE FFTW SSSE3 64bit The version for nVidia works better with adding OCL_ZERO_COPY. r3551, Build features: SETI8 Non-graphics OpenCL USE_OPENCL_INTEL OCL_ZERO_COPY OCL_CHIRP3 ASYNC_SPIKE FFTW JSPF SSSE3 64bit The newer version of the nVidia App also doesn't use JSPF, SSSE3x OS X 64bit Build 3709 BTW, this Windows SoG App was found to have a Bad Best Gaussian when compared against the ATI Non-SoG App, the CPU agrees with the ATI App, SSSE3x OS X 64bit Build 3710 |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
r3610 is on beta servers already? under what plan class? News about SETI opt app releases: https://twitter.com/Raistmer |
Send message Joined: 2 Jul 13 Posts: 505 Credit: 5,019,318 RAC: 0 ![]() |
8.22 (opencl_ati5_mac) : 17 Oct 2017, 23:48:14 UTC : 45 GigaFLOPS http://setiweb.ssl.berkeley.edu/beta/setiathome_v8_x86_64-apple-darwin__opencl_ati5_mac.html |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
OK, lets try... Still I would prefer to know for sure if r3610 really free from this silent terminations or not. But this requires volunteer with right hardware and seems we don't have one. News about SETI opt app releases: https://twitter.com/Raistmer |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
The newer version of the nVidia App also doesn't use JSPF, SSSE3x OS X 64bit Build 3709 For what reason? News about SETI opt app releases: https://twitter.com/Raistmer |
Send message Joined: 2 Jul 13 Posts: 505 Credit: 5,019,318 RAC: 0 ![]() |
I don't see it here anywhere, https://setisvn.ssl.berkeley.edu/trac/browser/branches/sah_v7_opt/AKv8/ConfigureOSX_AKv8d_OPENCL_SSE3_MBv8.txt In fact, I don't see it in any GPU Application. The only place I see JSPF is in CPU Apps, so, why did You suggest putting it in the OSX GPU Apps? The Apps seem to work just fine without it. |
Send message Joined: 2 Jul 13 Posts: 505 Credit: 5,019,318 RAC: 0 ![]() |
Still I would prefer to know for sure if r3610 really free from this silent terminations or not. But this requires volunteer with right hardware and seems we don't have one.These two machines have been running r3610 for months, the only Errors I see is from what I suspect is the cmdline 'Target kernel sequence time set to 600ms'. The other machine isn't using that line and doesn't have any Errors; Running 2 instances per GPU, https://setiathome.berkeley.edu/results.php?hostid=8243589 Running 3 instances per GPU with a problematic cmdline, https://setiathome.berkeley.edu/results.php?hostid=6105482 A refugee from Q & A still clearing Ghosts, https://setiathome.berkeley.edu/results.php?hostid=8248108 I don't see any Too Many Exits there. |
Send message Joined: 2 Jul 13 Posts: 505 Credit: 5,019,318 RAC: 0 ![]() |
The newer version of the nVidia App also doesn't use JSPF, SSSE3x OS X 64bit Build 3709 Have you ever put JSPF in one of Your GPU Apps? The other Mac App 8.20 r3556, and even 8.10r3430 doesn't have JSPF. None of my other GPU builds have it either. So, why do you think JSPF should be in the Mac GPU Apps? The Only GPU Apps that I know of with JSPF are r3552 & 3551, which are the only two Apps I know of that have the Too Many Exits problem. Perhaps it would be better without it? BTW, Chris says he's had other problems with his D700 Mac, and you should just look at his other machine, the D500 one, https://setiathome.berkeley.edu/results.php?hostid=8243589. He also says the D500 is also running Three Tasks at once, just like the D700. As far as I know the other Platforms AMD GPUs can't run 3 or even 2 Tasks at once. |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
So, why do you think JSPF should be in the Mac GPU Apps? This option governs CPU-based Pulse computations so should have minimal influence on GPU builds. Correlation you noticed may be random one (or not). Regarding running few per once - stockbuild intended to run correctly one per device. All above is user's responsibility and choice. News about SETI opt app releases: https://twitter.com/Raistmer |
![]() Send message Joined: 10 Mar 12 Posts: 1700 Credit: 13,216,373 RAC: 0 ![]() |
16 days later, and still 100% *DIAG_KIC* Isn't the idea of Beta, that we should test all kinds of WU's and apps? One WU type is surely not enough. Yes I know that there really isn't much testing going on right now, but nevertheless, as long as we get tasks, provide a good mix at least. Not many running beta though (377 Users in last 24 hours). Even when we had new apps to test, it was rarely over 400 Users in last 24 hours. With so few testers/computers, there's no way in h*** we can find all the bugs. It's a need for a Beta push, to get new Beta participants for the next round of app tests. And of course a better mix of WU's. |
![]() ![]() Send message Joined: 9 Apr 07 Posts: 1702 Credit: 4,622,751 RAC: 0 ![]() |
16 days later, and still 100% *DIAG_KIC* Well, make cafe SETI at Beta a hip place to be and at least a few more will have to crunch to be there. |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.