Message boards :
Number crunching :
@Pre-FERMI nVidia GPU users: Important warning
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 13 · Next
Author | Message |
---|---|
Jacob Klein Send message Joined: 15 Apr 11 Posts: 149 Credit: 9,783,406 RAC: 9 |
Okay. That helps a little. Is there any chance you can isolate an exact operation that is returning an incorrect result? |
styxdogg Send message Joined: 7 Mar 01 Posts: 3 Credit: 1,540,836 RAC: 0 |
I am really only running lately on a school machine I wont be upgrading. I got a message from someone as well so I reverted to 337 on that machine. I am still getting invalid results on some apps according to the records I looked up. opencl_nvidia_cc1 comes up ok: 3816687402 1613138364 6773137 4 Nov 2014, 15:01:46 UTC 5 Nov 2014, 7:05:34 UTC Completed and validated 11,223.64 10,908.10 585.17 AstroPulse v7 v7.05 (opencl_nvidia_cc1) but cuda23 is still being pulled and fails: 3817796369 1632271283 6773137 5 Nov 2014, 14:42:50 UTC 5 Nov 2014, 15:06:52 UTC Error while computing 0.00 0.00 --- SETI@home v7 v7.00 (cuda23) I went to my "Separate preferences for school" and set: SETI@home v7: no is this the right thing to do? any other suggestions? thanks. |
Jacob Klein Send message Joined: 15 Apr 11 Posts: 149 Credit: 9,783,406 RAC: 9 |
Raistmer, Can you please relay that those 4 SDK examples are failing, to the NVIDIA team? You could copy/paste what I wrote (so they will have my GPU make, driver version info, OS version, etc.) Additionally, I am considering logging my own request with them, to get it fixed. I have a high suspicion that whatever is causing those SDK examples to fail, is also responsible for making your application fail. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14671 Credit: 200,643,578 RAC: 874 |
Raistmer, I've posted the first three samples in the development area at Lunatics, from a confirmation run on my 9800GT with drivers 337.88 (test apps ran successfully) and 340.52 (tests failed). I'd completed that run and dismounted the card again before I saw that you'd found a fourth example, so I'll complete the set later when the test machine is free again. I think that the best approach - as I've also written at Lunatics - would be to concentrate for the time being on convincing the ODE driver team that a fix is necessary: these test failures with NVidia's own demonstration suite are a very potent weapon in that battle. Any overlap we can identify between the functions implicated in the test suite failures, and the failure of Raistmer's Astropulse application, will be of most use in phase two, when we pressure NVidia to replicate the ODE fixes into consumer drivers. BTW, Raistmer works during the week (in the sense of earning a living), and I don't think he's seen my report yet. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Okay. That helps a little. Is there any chance you can isolate an exact operation that is returning an incorrect result? Not in any priority for now. Guys, seems you don't understand how nVidia treats 1) OpenCL 2) old pre-FERMI cards. EDIT: try to read these forums a little: https://devtalk.nvidia.com/ With first goal - try to find OpenCL subforum there... |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
Thomas Arnold wrote: Hello, I need your insight and help. The driver is old enough so it doesn't have the issue which started this thread. I don't know why all SETI@home v7 7.00 windows_intelx86 (cuda22) and (cuda23) tasks are failing on your host 6648399, but it does very well on (cuda32), (cuda42), and (cuda50). Perhaps one of the CUDA experts here can figure out why the servers aren't sending tasks for the plan classes which work well. Joe |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Thomas Arnold wrote:Hello, I need your insight and help. Will likely be digging out the scheduler code again on the weekend, if someone doesn't beat me to it. No accumulated data for the app versions, plus a logic hole with respect to systematically issuing to all app versions, ignoring the error count & quota, seems to be along the lines of what's happening. [I'll need to start by looking if that server code's been changed since a couple of months ago] For the host side, FWIW the application (2.2 & 2.3 planclasses) appears to not even be making it to device initialisation. That would seem to me the DLLs are somehow damaged, or the driver install has gone awry. I'd imagine a clean driver install (of a suitable known good older version for this GPU) and a project reset may be in order. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
For the host side, FWIW the application (2.2 & 2.3 planclasses) appears to not even be making it to device initialisation. That would seem to me the DLLs are somehow damaged, or the driver install has gone awry. I'd imagine a clean driver install (of a suitable known good older version for this GPU) and a project reset may be in order. I was thinking of a project detach/reattach might be in order, a project reset doesn't clear out files not mentioned in the client_state.xml, in case the setienhanced cuda22 and cuda23 dll's are still hanging around. (Eric renamed them for Seti v7 release) Claggy |
Jacob Klein Send message Joined: 15 Apr 11 Posts: 149 Credit: 9,783,406 RAC: 9 |
For what it's worth, I have filed my official bug report of it. The full details are below. Basically, I can reproduce the SDK example problems and the Astropulse problem, on any R340 driver for my GPU. And when I use R337, it works fine. Hopefully they fix it so that the OpenCL SDK examples work correctly, and then hopefully that makes your app work correctly too :) Regards, Jacob ================================================================== Filed Bug #1574543 https://developer.nvidia.com/nvbugs/cuda/edit/1574543 Summary: R340 drivers cause OpenCL data errors on pre-Fermi GPUs (see OpenCL SDK Code Samples) Relevant Area: CUDA C/C++ Runtime Description: R340 drivers cause OpenCL data errors on pre-Fermi GPUs (see OpenCL SDK Code Samples in Duplication Steps). R337 drivers were working correctly. Duplication Steps: ------------------------------- I have a laptop with a Quadro FX 3800M GPU, on Windows 8.1 x64. Recently, I installed the latest drivers (R340 341.05 WHQL), and have noticed data errors on some OpenCL applications. I then ran the full suite of OpenCL SDK Code Samples (from https://developer.nvidia.com/opencl). The full results of my testing, including an Excel summary file and a folder called "NVIDIA OpenCL SDK Code Samples - Testing Results", can be found on my OneDrive folder, here: http://1drv.ms/1zwM8k7 I've found that the following samples are presently failing on all R340 drivers for my GPU (340.43, 340.52, 340.66, 340.84, 341.05). The samples are NOT failing on the older R337 drivers (337.88) - oclFDTD3d (FAILED): CompareData (tolerance 0.000100)… Data error at point (0,0,0) 3.678468 instead of 10.912090 - oclDXTCompression (FAILED): RMS(reference, result) = 5606.724609 - oclQuasirandomGenerator (FAILED): ckQuasirandomGenerator deviations ABOVE Allowable Tolerance - oclConvolutionSeparable (FAILED): Relative L2 norm: 1.204e-001 Additionally, the following 2 samples appear to possibly have some other regression using R340: - oclVolumeRender: Passed, but it seemed to look different than other times that I tested this. - oclParticles: Failed - TDR - Out of Memory? - Error # -5 (CL_OUT_OF_RESOURCES) at line 99 , in file .\src\oclManager.cpp One colleague has confirmed the exact same behavior (those first 4 tests fail on R340, but succeed on R337), on a GT 9800. My questions are: - Is this a known bug/limitation with using R340 WHQL on this GPU? - Is it at all related to the errata on the Cuda6.5 toolkit, regarding csr2csc() and bsr2bsc() - Would you please consider fixing it in a future R340 driver release? - Do you plan on supporting this GPU (and its OpenCL execution) until April 2016, per http://nvidia.custhelp.com/app/answers/detail/a_id/3473 Note: Whatever is causing these silent data errors, may also be the cause of bug 1554016, reported by an acquaintance of mine. If there's anything else you need to get this resolved promptly, please don't hesitate to contact me. ------------------------------- Product: FX 3800M, GT 9800 CUDA Toolkit Version: CUDA Toolkit 6.5 Bug Priority: High CUDA Toolkit Details: OpenCL is failing data integrity on R340 on pre-Fermi GPUs Operating System(s): Windows8-x64 Operating System Details: Confirmed on Windows 8.1 x64 ================================================================== |
shizaru Send message Joined: 14 Jun 04 Posts: 1130 Credit: 1,967,904 RAC: 0 |
For the host side, FWIW the application (2.2 & 2.3 planclasses) appears to not even be making it to device initialisation. That would seem to me the DLLs are somehow damaged, or the driver install has gone awry. I'd imagine a clean driver install (of a suitable known good older version for this GPU) and a project reset may be in order. And maybe uncheck the old v6 apps (SETI@home Enhanced) in SETI@home preferences and/or "If no work for selected applications is available, accept work from other applications"? I honestly can't remember anymore but I'm pretty sure I had to forgo v6 to get started on v7. Is this ringing any bells? (I could probably manage to dig up the thread if need be) |
DanHansen@Denmark Send message Joined: 14 Nov 12 Posts: 194 Credit: 5,881,465 RAC: 0 |
Hi, Haven't read all posts - only as long as Tbar's post: Attention, if you have a nVidia card that's 4 years old or older, and have updated to Driver 340.xx, you are now Flooding SETI with Bad AstroPulse Science. This includes just about any nVidia card that's not at least a 400 series or around 4 years old or newer. Except many people don't have a clue about 'FERMI' and the title doesn't mention a thing about Flooding SETI with BAD SCIENCE. To make matters worse, there is a thread about this New Driver, with over 1000 views, without a mention about it causing the older cards to Flood SETI with BAD SCIENCE. Hi Tbar, You couldn't be more right ;) Maybe you can help me. Is these cards affected?: OS: Win7 32bit/64bit - Asus GeForce GTX770 OS: Linux 64 bit - Asus GeForce GT640 Currently I'm using these drivers so I'm not affected yet, apparently: OS: Win - Driver Version 327.23 OS: Linux - Driver Version 340.29 Thanks in advance ;) . Project Headless CLI Linux Multiple GPU Boinc Servers Ubuntu Server 14.04.1 64bit Kernel 3.13.0-32-generic CPU's i5-4690K GPU's GT640/GTX750TI Nvidia v.340.29 BOINC v.7.2.42 |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
For what it's worth, I have filed my official bug report of it. The full details are below. Lets hope... hope dies last as known... :) wbr |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
Maybe you can help me. Is these cards affected?: No, none of those GPUs are pre-Fermi's. Claggy |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Another problem with the Pre-Fermi cards is they apparently produce the same type of problem, Invalid APv7 results, when using the APv7 CommandLine option -oclFFT_plan xxx xx xxx. I noticed the problem with my NV 8800 GT running Driver 266.58 in Windows XP. I ran the AP Benchmark suit and it confirmed the issue was with the -oclFFT_plan option and the driver works fine when not using that option. I don't have any results for any of the other Pre-Fermi cards or drivers. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Another problem with the Pre-Fermi cards is they apparently produce the same type of problem, Invalid APv7 results, when using the APv7 CommandLine option -oclFFT_plan xxx xx xxx. I noticed the problem with my NV 8800 GT running Driver 266.58 in Windows XP. I ran the AP Benchmark suit and it confirmed the issue was with the -oclFFT_plan option and the driver works fine when not using that option. I don't have any results for any of the other Pre-Fermi cards or drivers. What particular values did you use? It's known that not all theoretically possible combos are OK even for ATi cards and little testing were done for NV so far. |
Mike Send message Joined: 17 Feb 01 Posts: 34343 Credit: 79,922,639 RAC: 80 |
Another problem with the Pre-Fermi cards is they apparently produce the same type of problem, Invalid APv7 results, when using the APv7 CommandLine option -oclFFT_plan xxx xx xxx. I noticed the problem with my NV 8800 GT running Driver 266.58 in Windows XP. I ran the AP Benchmark suit and it confirmed the issue was with the -oclFFT_plan option and the driver works fine when not using that option. I don't have any results for any of the other Pre-Fermi cards or drivers. I only had the chance to test oclFFT_plan on some Fermi and later Cards. With each crime and every kindness we birth our future. |
Jacob Klein Send message Joined: 15 Apr 11 Posts: 149 Credit: 9,783,406 RAC: 9 |
I have received the following updates to my NVIDIA bug 1574543: https://developer.nvidia.com/nvbugs/cuda/edit/1574543 Status changed from "Open - pending review" to "Open - in progress" 5 November 2014 9:32 pm Kevin Kang |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14671 Credit: 200,643,578 RAC: 874 |
I have received the following updates to my NVIDIA bug 1574543: Well, at least you got a named contact out of it - that's more than Raistmer's rather less specific version of the same report got. I do think that the identified failure of NVidia's own sample code on professional hardware with enterprise drivers stands a better chance of being fixed than a third-party application on the consumer platform. Best of luck. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Another problem with the Pre-Fermi cards is they apparently produce the same type of problem, Invalid APv7 results, when using the APv7 CommandLine option -oclFFT_plan xxx xx xxx. I noticed the problem with my NV 8800 GT running Driver 266.58 in Windows XP. I ran the AP Benchmark suit and it confirmed the issue was with the -oclFFT_plan option and the driver works fine when not using that option. I don't have any results for any of the other Pre-Fermi cards or drivers. Besides the ones in the link, the test ran; 4 science app(s) found (AP7_win_x86_SSE2_OpenCL_NV_r2721.exe) (AP7_win_x86_SSE2_OpenCL_NV_r2721.exe -oclFFT_plan 128 8 64) (AP7_win_x86_SSE2_OpenCL_NV_r2721.exe -oclFFT_plan 128 8 128) (AP7_win_x86_SSE2_OpenCL_NV_r2721.exe -oclFFT_plan 128 8 256) I still have the full results from the test. |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
For the host side, FWIW the application (2.2 & 2.3 planclasses) appears to not even be making it to device initialisation. That would seem to me the DLLs are somehow damaged, or the driver install has gone awry. I'd imagine a clean driver install (of a suitable known good older version for this GPU) and a project reset may be in order. Bells are definitely ringing, very old ones. Take a look at thread "v7 cuda23 WUs getting ERR_TOO_MANY_EXITS" from June, 2013. The problem was never actually addressed directly, the hope apparently being that it would just eventually go away all by itself. That was 16+ months ago. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.