Need help trying to understand what happened on v7 Cuda50 WUs

Message boards : Number crunching : Need help trying to understand what happened on v7 Cuda50 WUs
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Cliff Harding
Volunteer tester
Avatar

Send message
Joined: 18 Aug 99
Posts: 1432
Credit: 110,967,840
RAC: 67
United States
Message 1546626 - Posted: 24 Jul 2014, 1:52:00 UTC

I installed a 2nd GTX750Ti FTW yesterday. I did reinstall the NVidia driver and recycled the machine prior to restarting BOINC again. I'm running Win7(x64), 7.48 (x64) with Lunatics 0.41. The app_config_xml is

<app_config>
<app>
<name>astropulse_v6</name>
<max_concurrent>8</max_concurrent>
<gpu_versions>
<gpu_usage>.5</gpu_usage>
<cpu_usage>1</cpu_usage>
</gpu_versions>
</app>
<app>
<name>setiathome_v7</name>
<gpu_versions>
<gpu_usage>.25</gpu_usage>
<cpu_usage>.5</cpu_usage>
</gpu_versions>
</app>
</app_config>



Tonight I spotted this http://setiathome.berkeley.edu/results.php?hostid=5501972&offset=0&show_names=0&state=6&appid=11 showing 100 WUs aborted by user with an error status 201 (0xc9) EXIT_MISSING_COPROC.


Snippet of STDOUTDAE from 22 July --


22-Jul-2014 18:26:58 [---] Starting BOINC client version 7.4.8 for windows_x86_64
22-Jul-2014 18:26:58 [---] log flags: file_xfer, sched_ops, task, cpu_sched, dcf_debug
22-Jul-2014 18:26:58 [---] Libraries: libcurl/7.33.0 OpenSSL/1.0.1h zlib/1.2.8
22-Jul-2014 18:26:58 [---] Data directory: D:\BOINC
22-Jul-2014 18:26:58 [---] Running under account Cliff Harding
22-Jul-2014 18:26:58 [---] OpenCL: NVIDIA GPU 0: GeForce GTX 750 Ti (driver version 340.43, device version OpenCL 1.1 CUDA, 2048MB, 2048MB available, 101 GFLOPS peak)
22-Jul-2014 18:26:58 [---] OpenCL: NVIDIA GPU 1: GeForce GTX 750 Ti (driver version 340.43, device version OpenCL 1.1 CUDA, 2048MB, 2048MB available, 101 GFLOPS peak)
22-Jul-2014 18:26:58 [---] OpenCL: Intel GPU 0 (ignored by config): Intel(R) HD Graphics 4600 (driver version 10.18.10.3621, device version OpenCL 1.2, 1195MB, 1195MB available, 200 GFLOPS peak)
22-Jul-2014 18:26:58 [---] OpenCL CPU: Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz (OpenCL driver vendor: Intel(R) Corporation, driver version 3.0.1.10878, device version OpenCL 1.2 (Build 76413))
22-Jul-2014 18:26:58 [Milkyway@Home] Found app_info.xml; using anonymous platform
22-Jul-2014 18:26:58 [---] App version needs CUDA but GPU doesn't support it
22-Jul-2014 18:26:58 [SETI@home] Found app_info.xml; using anonymous platform
22-Jul-2014 18:26:58 [---] App version needs CUDA but GPU doesn't support it
22-Jul-2014 18:26:58 [---] App version needs CUDA but GPU doesn't support it
22-Jul-2014 18:26:58 [---] App version needs CUDA but GPU doesn't support it
...
...
22-Jul-2014 18:26:58 [---] App version needs CUDA but GPU doesn't support it
22-Jul-2014 18:26:58 [---] App version needs CUDA but GPU doesn't support it
22-Jul-2014 18:26:58 [---] App version needs CUDA but GPU doesn't support it
22-Jul-2014 18:26:58 [---] App version needs CUDA but GPU doesn't support it
22-Jul-2014 18:26:58 [SETI@home] Missing coprocessor for task 21ja09ac.7210.12751.438086664200.12.145_1
22-Jul-2014 18:26:58 [SETI@home] Missing coprocessor for task 22mr08ab.19719.17250.438086664196.12.236_0
22-Jul-2014 18:26:58 [SETI@home] Missing coprocessor for task 21ja09ac.7210.19704.438086664200.12.191_0
...
...
22-Jul-2014 18:26:58 [SETI@home] Missing coprocessor for task 22mr08ab.6744.15205.438086664205.12.90_0
22-Jul-2014 18:26:58 [---] Host name: A-SYS
22-Jul-2014 18:26:58 [---] Processor: 8 GenuineIntel Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz [Family 6 Model 60 Stepping 3]
22-Jul-2014 18:26:58 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 fma cx16 sse4_1 sse4_2 movebe popcnt aes f16c rdrandsyscall nx lm avx avx2 vmx tm2 pbe fsgsbase bmi1 smep bmi2
22-Jul-2014 18:26:58 [---] OS: Microsoft Windows 7: Ultimate x64 Edition, Service Pack 1, (06.01.7601.00)
22-Jul-2014 18:26:58 [---] Memory: 7.67 GB physical, 8.06 GB virtual
22-Jul-2014 18:26:58 [---] Disk: 119.24 GB total, 88.74 GB free
22-Jul-2014 18:26:58 [---] Local time is UTC -4 hours
22-Jul-2014 18:26:58 [Milkyway@Home] Found app_config.xml
22-Jul-2014 18:26:58 [SETI@home] Found app_config.xml
22-Jul-2014 18:26:58 [---] Config: report completed tasks immediately
22-Jul-2014 18:26:58 [---] Config: use all coprocessors
22-Jul-2014 18:26:58 [---] Config: ignoring Intel GPU 0
22-Jul-2014 18:26:58 [---] Config: GUI RPCs allowed from:
22-Jul-2014 18:26:58 [Milkyway@Home] URL http://milkyway.cs.rpi.edu/milkyway/; Computer ID 543798; resource share 0
22-Jul-2014 18:26:58 [SETI@home] URL http://setiathome.berkeley.edu/; Computer ID 5501972; resource share 100
22-Jul-2014 18:26:58 [SETI@home] General prefs: from SETI@home (last modified 18-Aug-2013 12:00:17)
22-Jul-2014 18:26:58 [SETI@home] Host location: none
22-Jul-2014 18:26:58 [SETI@home] General prefs: using your defaults
22-Jul-2014 18:26:58 [---] Reading preferences override file
22-Jul-2014 18:26:58 [---] Preferences:
22-Jul-2014 18:26:58 [---] max memory usage when active: 7462.60MB
22-Jul-2014 18:26:58 [---] max memory usage when idle: 7855.37MB
22-Jul-2014 18:26:58 [---] max disk usage: 79.46GB
22-Jul-2014 18:26:58 [---] max CPUs used: 6
22-Jul-2014 18:26:58 [---] (to change preferences, visit a project web site or select Preferences in the Manager)
22-Jul-2014 18:26:58 [---] Not using a proxy
22-Jul-2014 18:27:00 Initialization completed

The machine has been recycled several times since then and there doesn't appear to be any more aborts. Temps seem to be within nominal range for both the CPU & GPUs. It does appear though that with both GPUs working nominally that they are sucking up WUs like a Pac Man.


I don't buy computers, I build them!!
ID: 1546626 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1546650 - Posted: 24 Jul 2014, 2:26:43 UTC - in response to Message 1546626.  

where did you get your nvidia driver from? Current Nvidia driver is driver: 337.88
ID: 1546650 · Report as offensive
Profile Cliff Harding
Volunteer tester
Avatar

Send message
Joined: 18 Aug 99
Posts: 1432
Credit: 110,967,840
RAC: 67
United States
Message 1546679 - Posted: 24 Jul 2014, 3:14:45 UTC - in response to Message 1546650.  

where did you get your nvidia driver from? Current Nvidia driver is driver: 337.88


340.43 is the beta driver dated 17 June from the NVidia web site, where I get all of my version upgrades. It has been on my machine since 13 July with no problems. I always d/l a beta version after it has been up for a while and only had one problem that I can remember. If you look at WUs from this machine prior to yesterday there were not any problems.


I don't buy computers, I build them!!
ID: 1546679 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1546872 - Posted: 24 Jul 2014, 11:04:25 UTC

Aparently something is broken/missing in your driver or an incompatibility with your configuraton.

Try to reinstall the driver but use the recomended (more tested and stable) GeForce 337.88 Driver instead of the beta ones. DL it directly from the nvidia site of course.

Not forget: do clean instalation.
ID: 1546872 · Report as offensive
Profile Cliff Harding
Volunteer tester
Avatar

Send message
Joined: 18 Aug 99
Posts: 1432
Credit: 110,967,840
RAC: 67
United States
Message 1546884 - Posted: 24 Jul 2014, 12:12:05 UTC

I always do a clean driver install and it has been steadily crunching using one GPU since 13 July with no errors, so I don't believe that is/was the problem. The problem only lasted for approx. 100 WUs. If the driver was at fault it would have had blew all GPU tasks down the drain which is not the case, since the problem lasted for less than 1 minute (clock time) then ceased. If it had lasted longer or if affected other types (Open_cl), I would have suspected the driver. The errors, some of which have dropped off, are the only errors that I have seen for this machine for this driver; in fact since it first came online in May.


I don't buy computers, I build them!!
ID: 1546884 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1546889 - Posted: 24 Jul 2014, 12:29:50 UTC - in response to Message 1546884.  

I'm going to include a link from another thread I saw from 2 years ago. While not completely describing what is happening, it's pretty close. Might want to read all the way thru the thread and see what you think

http://boinc.berkeley.edu/dev/forum_thread.php?id=7600
ID: 1546889 · Report as offensive
Profile Cliff Harding
Volunteer tester
Avatar

Send message
Joined: 18 Aug 99
Posts: 1432
Credit: 110,967,840
RAC: 67
United States
Message 1546919 - Posted: 24 Jul 2014, 13:48:02 UTC - in response to Message 1546889.  

I'm going to include a link from another thread I saw from 2 years ago. While not completely describing what is happening, it's pretty close. Might want to read all the way thru the thread and see what you think

http://boinc.berkeley.edu/dev/forum_thread.php?id=7600


I read the thread and two thing immediately pop out:

1) The driver for the card was not recognized upon startup, and my driver was immediately detected.
2) He is using Linux and I'm using Win7 (x64). I also noticed that sometimes you have to play games to get BOINC to recognize NVidia devices while using Linux or other OSs whereas you don't have this problem with Windows regardless of where the data directory is located.

BOINC resides on C:\Program Files\BOINC

The data directory, therefore all .EXEs resides on D:\BOINC\Projects - with sub-folders for each particular project attached to the machine. This has been the configuration for all of my machines for several years.


I don't buy computers, I build them!!
ID: 1546919 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1547090 - Posted: 25 Jul 2014, 1:14:16 UTC - in response to Message 1546919.  

...
1) The driver for the card was not recognized upon startup, and my driver was immediately detected.
...

Detection of the CUDA capability is separate from detection of OpenCL, and the snippet of STDOUTDAE in your original post was missing the expected lines for CUDA. From subsequent posts it seems that may have been a one-time glitch, and the cause may remain inscrutable. OTOH, it might make sense to report the incident to the BOINC developers since you're using the current alpha version. I doubt they have many alpha testers who have transitioned from one GPU to two.
                                                                   Joe
ID: 1547090 · Report as offensive

Message boards : Number crunching : Need help trying to understand what happened on v7 Cuda50 WUs


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.