Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /disks/centurion/b/carolyn/b/home/boincadm/projects/beta/html/inc/boinc_db.inc on line 147
SETI@home v8 beta to begin on Tuesday

SETI@home v8 beta to begin on Tuesday

Message boards : News : SETI@home v8 beta to begin on Tuesday
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 19 · 20 · 21 · 22 · 23 · 24 · 25 . . . 99 · Next

AuthorMessage
mimo
Volunteer tester

Send message
Joined: 4 Aug 08
Posts: 11
Credit: 1,437,079
RAC: 0
Slovakia
Message 55761 - Posted: 8 Jan 2016, 21:01:40 UTC - in response to Message 55755.  

ya i mean SETI v8 not ARM v8
ok this is good, that the pi and pi2 will be usefull
ID: 55761 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 29 May 06
Posts: 1037
Credit: 8,440,339
RAC: 0
United Kingdom
Message 55762 - Posted: 8 Jan 2016, 21:09:01 UTC - in response to Message 55761.  

ok this is good, that the pi and pi2 will be usefull

I said Armv7, so Pi 2, Not Armv6, not Pi.

Claggy
ID: 55762 · Report as offensive
MarkJ
Volunteer tester

Send message
Joined: 18 Oct 09
Posts: 48
Credit: 73,283
RAC: 0
Australia
Message 55763 - Posted: 8 Jan 2016, 21:34:22 UTC

Just attached this host

Its an i7-6700 with HD Graphics 530. My main aim is to try and shake out the OpenCL multi-beam app. I have this host and 5 others doing CPU work on main without any issues.
ID: 55763 · Report as offensive
Juha
Volunteer tester

Send message
Joined: 18 Jun 08
Posts: 76
Credit: 113,089
RAC: 0
Finland
Message 55764 - Posted: 8 Jan 2016, 21:48:22 UTC - in response to Message 55598.  

Eric,

On Main there's a Mac failing with the following message:

<core_client_version>7.0.28</core_client_version>
<![CDATA[
<message>
process got signal 5
</message>
<stderr_txt>
dyld: Symbol not found: ___stack_chk_guard
  Referenced from: /Library/Application Support/BOINC Data/slots/2/../../projects/setiathome.berkeley.edu/setiathome_8.00_i686-apple-darwin
  Expected in: /usr/lib/libSystem.B.dylib


</stderr_txt>
]]>


Did some googling. Looks like Apple introduced stack protecting in 10.6. If you want to still support 10.5 and earlier that error might go away if the app is compiled with -fno-stack-protector .
ID: 55764 · Report as offensive
firebyrdman
Volunteer tester

Send message
Joined: 5 Jan 16
Posts: 6
Credit: 63,298
RAC: 0
United States
Message 55765 - Posted: 8 Jan 2016, 21:51:35 UTC

Not getting any wus. If not to busy please check it out.


Thanks
ID: 55765 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 29 May 06
Posts: 1037
Credit: 8,440,339
RAC: 0
United Kingdom
Message 55766 - Posted: 8 Jan 2016, 21:54:02 UTC - in response to Message 55765.  

Not getting any wus. If not to busy please check it out.


Thanks

Try getting CPU wu's, If you get those, them you're just butted up against your quota for the day for GPU wu's.

Claggy
ID: 55766 · Report as offensive
Alex Storey
Volunteer tester
Avatar

Send message
Joined: 10 Feb 12
Posts: 107
Credit: 305,151
RAC: 0
Greece
Message 55767 - Posted: 8 Jan 2016, 23:44:15 UTC - in response to Message 55695.  
Last modified: 8 Jan 2016, 23:48:34 UTC

What version of NVIDIA driver is required here?

I use:
Coprocessors NVIDIA GeForce GTX 750 Ti (2048MB) driver: 347.88 OpenCL: 1.1

I like it and it have been working sooooo well on the main project.
But I guess one change means another has to come.

OpenCL NV requires 350.xx+ drivers.


Hi Raistmer!

Is it normal for earlier drivers to get work too?

Got a few tasks on a 270.xx driver and 306.xx driver but anytime I try to open IE or Chrome I get driver restarts. Funny thing is, app appears to want to run as long as I don't actually USE my computer ;)

Getting this msg:
WARNING: Used device has low amount of local memory,local memory FFT size will be reduced. User-defined local FFT size will be ignored if exceeds allowed value

Edit:
Is there anyone with an x86-android device that can tell me whether the same problem exists there?

Unfortunately I managed to lose my Intel Atom Asus Fonepad
ID: 55767 · Report as offensive
Profile Eric J Korpela
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 15 Mar 05
Posts: 1547
Credit: 27,183,456
RAC: 0
United States
Message 55768 - Posted: 8 Jan 2016, 23:55:23 UTC - in response to Message 55764.  


Did some googling. Looks like Apple introduced stack protecting in 10.6. If you want to still support 10.5 and earlier that error might go away if the app is compiled with -fno-stack-protector .


I'll try it.
ID: 55768 · Report as offensive
woohoo
Volunteer tester

Send message
Joined: 4 Aug 14
Posts: 4
Credit: 16,555,053
RAC: 0
United States
Message 55769 - Posted: 9 Jan 2016, 0:11:39 UTC

so i haven't been here much in a while but i want to help get the bugs worked out.

for those who don't know my setup i'm running one 390x, three 290x(s), two 295x2(s) and a couple of bitcoin miners but who cares about that. i don't run cpu work units because i'm impatient.

if i look at the applications list there are seven different ati apps and if it were up to me i would pick ati5 over ati and atiapu. of course i'm trying to let the server figure that out on it's own, this is not lunatics. i'm not running any command lines; one of my rigs is having a thread affinity issue but i'm ignoring it for now. i'm also only running just one wu per gpu, not trying to create more problems. if i get a chance i will try catalyst 16.1 hotfix
ID: 55769 · Report as offensive
Profile Eric J Korpela
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 15 Mar 05
Posts: 1547
Credit: 27,183,456
RAC: 0
United States
Message 55770 - Posted: 9 Jan 2016, 0:48:23 UTC - in response to Message 55769.  

LTNS. Welcome back.
ID: 55770 · Report as offensive
Wedge009
Volunteer tester

Send message
Joined: 2 Aug 12
Posts: 14
Credit: 7,429,417
RAC: 0
Australia
Message 55771 - Posted: 9 Jan 2016, 2:18:19 UTC

For what it's worth, I'm still using Catalyst 14.12 Omega (Driver 1642.5) since every driver release thereafter produces a fairly high rate of invalid results with MB (at least v7, haven't tried v8).

This is only for my Hawaii running on Windows, however. Interestingly, I've had no apparent troubles with the latest stable releases running Hawaii on Linux (currently Crimson 15.12, driver 1912.5), nor Tahiti on Windows.

It does make me wonder if there are similar issues with Tonga/Antigua or Fiji on Windows (I don't have either of these newer GPUs).
ID: 55771 · Report as offensive
woohoo
Volunteer tester

Send message
Joined: 4 Aug 14
Posts: 4
Credit: 16,555,053
RAC: 0
United States
Message 55772 - Posted: 9 Jan 2016, 4:18:56 UTC

Tahiti and Tonga seem similar to me, nobody has issues with them, meaning you can use any os and any number of wus per gpu without problems

if i want to run multiple wu per gpu i can't use a driver newer than 14.4, maybe something changed when they released opencl 2.0. but 14.4 with multiple wu per gpu feels slow so i'm on 16.1 now. i do think that linux is a solution, but i use my rigs as test environments for office versions so windows is the way to go for me. i can't get an old driver like 14.4 to work on my 390x; i don't think it would work with fiji either. i'm not currently trying to extract every last bit of performance from my gpus; i use the bitcoin miners to hoard cobblestones
ID: 55772 · Report as offensive
Profile Jeff Buck
Volunteer tester

Send message
Joined: 11 Dec 14
Posts: 96
Credit: 1,240,941
RAC: 0
United States
Message 55773 - Posted: 9 Jan 2016, 5:41:12 UTC

This report is going to be long because I want to be thorough and record any details that I've been able to cobble together.

Had an apparent NVIDIA driver (361.43) crash on my T7400 earlier this evening. I discovered it when I went to look at BOINC Manager here on my daily driver, which was remotely monitoring the T7400, and found:



All 3 GPUs were showing greater than 99% complete, with 0% remaining, but were showing run times that were about 10 times what they should have taken to complete (and still incrementing).

When I went to check on the T7400 (which is in another room), I found that the computer was still humming along but the monitor would not wake up. So I came back and took the above screenshot and a minute or so later BOINC Manager lost the connection to the T7400. When I went back to check on it again, it was in the process of rebooting itself. After the reboot, all 3 tasks restarted from scratch (apparently due to my checkpoint interval being set to 300 seconds) and finished normally.

The 3 tasks are: 21709414, 21709524, and 21709625. The Stderr of the first one starts repeating itself after the line "WU true angle range is : 2.725146", while the other two start repeating after "OpenCL platform detected: NVIDIA Corporation".

BOINC Event Log for the relevant period shows:
08-Jan-2016 18:25:26 [SETI@home Beta Test] [cpu_sched] Starting task 06ap11ag.26830.18063.8.42.169_1 using setiathome_v8 version 805 (opencl_nvidia_sah) in slot 0
08-Jan-2016 18:25:28 [SETI@home Beta Test] Started upload of 06ap11ag.26830.18063.8.42.200_2_0
08-Jan-2016 18:25:30 [SETI@home Beta Test] Finished upload of 06ap11ag.26830.18063.8.42.200_2_0
08-Jan-2016 18:26:55 [SETI@home Beta Test] Message from task: 0
08-Jan-2016 18:27:00 [SETI@home Beta Test] Computation for task 06ap11ag.26830.18063.8.42.205_0 finished
08-Jan-2016 18:27:00 [SETI@home Beta Test] Starting task 06ap11ag.26830.18063.8.42.206_0
08-Jan-2016 18:27:00 [SETI@home Beta Test] [cpu_sched] Starting task 06ap11ag.26830.18063.8.42.206_0 using setiathome_v8 version 805 (opencl_nvidia_sah) in slot 3
08-Jan-2016 18:27:01 [SETI@home Beta Test] Started upload of 06ap11ag.26830.18063.8.42.205_0_0
08-Jan-2016 18:27:03 [SETI@home Beta Test] Finished upload of 06ap11ag.26830.18063.8.42.205_0_0
08-Jan-2016 18:29:21 [SETI@home Beta Test] Message from task: 0
08-Jan-2016 18:29:26 [SETI@home Beta Test] Computation for task 06ap11ag.26830.18063.8.42.72_2 finished
08-Jan-2016 18:29:26 [SETI@home Beta Test] Starting task 06ap11ag.26830.18063.8.42.239_2
08-Jan-2016 18:29:26 [SETI@home Beta Test] [cpu_sched] Starting task 06ap11ag.26830.18063.8.42.239_2 using setiathome_v8 version 805 (opencl_nvidia_sah) in slot 4
08-Jan-2016 18:29:27 [SETI@home Beta Test] Started upload of 06ap11ag.26830.18063.8.42.72_2_0
08-Jan-2016 18:29:29 [SETI@home Beta Test] Finished upload of 06ap11ag.26830.18063.8.42.72_2_0
08-Jan-2016 19:40:13 [SETI@home Beta Test] Sending scheduler request: To report completed tasks.
08-Jan-2016 19:40:13 [SETI@home Beta Test] Reporting 9 completed tasks
08-Jan-2016 19:40:13 [SETI@home Beta Test] Requesting new tasks for NVIDIA GPU
08-Jan-2016 19:40:16 [SETI@home Beta Test] Scheduler request completed: got 13 new tasks
08-Jan-2016 19:40:18 [SETI@home Beta Test] Started download of 06ap11ag.15032.19290.8.42.13
08-Jan-2016 19:40:18 [SETI@home Beta Test] Started download of 06ap11ag.15032.19290.8.42.15
08-Jan-2016 19:40:20 [SETI@home Beta Test] Finished download of 06ap11ag.15032.19290.8.42.13
08-Jan-2016 19:40:20 [SETI@home Beta Test] Started download of 06ap11ag.15032.19290.8.42.16
08-Jan-2016 19:40:21 [SETI@home Beta Test] Finished download of 06ap11ag.15032.19290.8.42.15
08-Jan-2016 19:40:21 [SETI@home Beta Test] Finished download of 06ap11ag.15032.19290.8.42.16
08-Jan-2016 19:40:21 [SETI@home Beta Test] Started download of 06ap11ag.15032.18881.8.42.145
08-Jan-2016 19:40:21 [SETI@home Beta Test] Started download of 06ap11ag.15032.18881.8.42.43
08-Jan-2016 19:40:23 [SETI@home Beta Test] Finished download of 06ap11ag.15032.18881.8.42.145
08-Jan-2016 19:40:23 [SETI@home Beta Test] Finished download of 06ap11ag.15032.18881.8.42.43
08-Jan-2016 19:40:23 [SETI@home Beta Test] Started download of 06ap11ag.15032.18881.8.42.196
08-Jan-2016 19:40:23 [SETI@home Beta Test] Started download of 06ap11ag.15032.18881.8.42.249
08-Jan-2016 19:40:25 [SETI@home Beta Test] Finished download of 06ap11ag.15032.18881.8.42.196
08-Jan-2016 19:40:25 [SETI@home Beta Test] Finished download of 06ap11ag.15032.18881.8.42.249
08-Jan-2016 19:40:25 [SETI@home Beta Test] Started download of 06ap11ag.15032.18881.8.42.154
08-Jan-2016 19:40:25 [SETI@home Beta Test] Started download of 06ap11ag.15032.19290.8.42.10
08-Jan-2016 19:40:28 [SETI@home Beta Test] Finished download of 06ap11ag.15032.18881.8.42.154
08-Jan-2016 19:40:28 [SETI@home Beta Test] Finished download of 06ap11ag.15032.19290.8.42.10
08-Jan-2016 19:40:28 [SETI@home Beta Test] Started download of 06ap11ag.15032.18881.8.42.178
08-Jan-2016 19:40:28 [SETI@home Beta Test] Started download of 06ap11ag.15032.18881.8.42.136
08-Jan-2016 19:40:31 [SETI@home Beta Test] Finished download of 06ap11ag.15032.18881.8.42.178
08-Jan-2016 19:40:31 [SETI@home Beta Test] Finished download of 06ap11ag.15032.18881.8.42.136
08-Jan-2016 19:40:31 [SETI@home Beta Test] Started download of 06ap11ag.15032.18881.8.42.240
08-Jan-2016 19:40:31 [SETI@home Beta Test] Started download of 06ap11ag.15032.19290.8.42.24
08-Jan-2016 19:40:34 [SETI@home Beta Test] Finished download of 06ap11ag.15032.18881.8.42.240
08-Jan-2016 19:40:34 [SETI@home Beta Test] Finished download of 06ap11ag.15032.19290.8.42.24
08-Jan-2016 19:52:03 [---] Starting BOINC client version 7.6.9 for windows_x86_64

To me, that shows at least two things. First, although I took the screenshot at about 19:45 or so, which showed about 40-44 minutes of run time for the tasks, they had actually been started more than an hour earlier and must have been stalled for almost all of that time since they never hit their first checkpoint. Second, BOINC continued to run for long after the tasks stalled, reporting and downloading tasks at 19:40.

Checking the System Logs, I first find:
Log Name:      System
Source:        Display
Date:          1/8/2016 7:40:13 PM
Event ID:      4101
Task Category: None
Level:         Warning
Keywords:      Classic
User:          N/A
Computer:      T7400
Description:
Display driver nvlddmkm stopped responding and has successfully recovered.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Display" />
    <EventID Qualifiers="0">4101</EventID>
    <Level>3</Level>
    <Task>0</Task>
    <Keywords>0x80000000000000</Keywords>
    <TimeCreated SystemTime="2016-01-09T03:40:13.000000000Z" />
    <EventRecordID>15387</EventRecordID>
    <Channel>System</Channel>
    <Computer>T7400</Computer>
    <Security />
  </System>
  <EventData>
    <Data>nvlddmkm</Data>
    <Data>
    </Data>
  </EventData>
</Event>

That seems to show a driver restart at 19:40, but I can't find an earlier crash event in the log. BTW, 19:40 would have been about the time I was trying to wake the monitor up with the mouse and also corresponds exactly to the time of the BOINC scheduler request.

Then I have 3 other relevant System Log entries:
Log Name:      System
Source:        Microsoft-Windows-Kernel-Power
Date:          1/8/2016 7:50:54 PM
Event ID:      41
Task Category: (63)
Level:         Critical
Keywords:      (2)
User:          SYSTEM
Computer:      T7400
Description:
The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Microsoft-Windows-Kernel-Power" Guid="{331C3B3A-2005-44C2-AC5E-77220C37D6B4}" />
    <EventID>41</EventID>
    <Version>3</Version>
    <Level>1</Level>
    <Task>63</Task>
    <Opcode>0</Opcode>
    <Keywords>0x8000000000000002</Keywords>
    <TimeCreated SystemTime="2016-01-09T03:50:54.756842600Z" />
    <EventRecordID>15403</EventRecordID>
    <Correlation />
    <Execution ProcessID="4" ThreadID="8" />
    <Channel>System</Channel>
    <Computer>T7400</Computer>
    <Security UserID="S-1-5-18" />
  </System>
  <EventData>
    <Data Name="BugcheckCode">159</Data>
    <Data Name="BugcheckParameter1">0x3</Data>
    <Data Name="BugcheckParameter2">0xffffe00014d3c810</Data>
    <Data Name="BugcheckParameter3">0xfffff8029a0d6a60</Data>
    <Data Name="BugcheckParameter4">0xffffe0001641c830</Data>
    <Data Name="SleepInProgress">0</Data>
    <Data Name="PowerButtonTimestamp">0</Data>
    <Data Name="BootAppStatus">0</Data>
  </EventData>
</Event>

Log Name:      System
Source:        Microsoft-Windows-WER-SystemErrorReporting
Date:          1/8/2016 7:51:12 PM
Event ID:      1001
Task Category: None
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      T7400
Description:
The computer has rebooted from a bugcheck.  The bugcheck was: 0x0000009f (0x0000000000000003, 0xffffe00014d3c810, 0xfffff8029a0d6a60, 0xffffe0001641c830). A dump was saved in: C:\WINDOWS\MEMORY.DMP. Report Id: 010816-34468-01.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Microsoft-Windows-WER-SystemErrorReporting" Guid="{ABCE23E7-DE45-4366-8631-84FA6C525952}" EventSourceName="BugCheck" />
    <EventID Qualifiers="16384">1001</EventID>
    <Version>0</Version>
    <Level>2</Level>
    <Task>0</Task>
    <Opcode>0</Opcode>
    <Keywords>0x80000000000000</Keywords>
    <TimeCreated SystemTime="2016-01-09T03:51:12.000000000Z" />
    <EventRecordID>15392</EventRecordID>
    <Correlation />
    <Execution ProcessID="0" ThreadID="0" />
    <Channel>System</Channel>
    <Computer>T7400</Computer>
    <Security />
  </System>
  <EventData>
    <Data Name="param1">0x0000009f (0x0000000000000003, 0xffffe00014d3c810, 0xfffff8029a0d6a60, 0xffffe0001641c830)</Data>
    <Data Name="param2">C:\WINDOWS\MEMORY.DMP</Data>
    <Data Name="param3">010816-34468-01</Data>
  </EventData>
</Event>

Log Name:      System
Source:        EventLog
Date:          1/8/2016 7:51:11 PM
Event ID:      6008
Task Category: None
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      T7400
Description:
The previous system shutdown at 7:20:56 PM on ?1/?8/?2016 was unexpected.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="EventLog" />
    <EventID Qualifiers="32768">6008</EventID>
    <Level>2</Level>
    <Task>0</Task>
    <Keywords>0x80000000000000</Keywords>
    <TimeCreated SystemTime="2016-01-09T03:51:11.000000000Z" />
    <EventRecordID>15388</EventRecordID>
    <Channel>System</Channel>
    <Computer>T7400</Computer>
    <Security />
  </System>
  <EventData>
    <Data>7:20:56 PM</Data>
    <Data>?1/?8/?2016</Data>
    <Data>
    </Data>
    <Data>
    </Data>
    <Data>89281</Data>
    <Data>
    </Data>
    <Data>
    </Data>
    <Binary>E0070100050008001300140038009303E0070100060009000300140038009303600900003C000000010000006009000001000000B00400000100000000000000</Binary>
  </EventData>
</Event>

This last one seems really odd because it references an unexpected shutdown at 7:20:56 PM, which, as far as I could tell, would have been about half an hour before it actually crashed and rebooted. Very puzzling!

Anyway, since the restart, everything has been running smoothly again. Hopefully it won't crash again during the night, because I'm sure not going to stay up to monitor it. ;^)
ID: 55773 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 55774 - Posted: 9 Jan 2016, 7:09:46 UTC - in response to Message 55767.  
Last modified: 9 Jan 2016, 7:10:01 UTC

What version of NVIDIA driver is required here?

I use:
Coprocessors NVIDIA GeForce GTX 750 Ti (2048MB) driver: 347.88 OpenCL: 1.1

I like it and it have been working sooooo well on the main project.
But I guess one change means another has to come.

OpenCL NV requires 350.xx+ drivers.


Hi Raistmer!

Is it normal for earlier drivers to get work too?

Got a few tasks on a 270.xx driver and 306.xx driver but anytime I try to open IE or Chrome I get driver restarts. Funny thing is, app appears to want to run as long as I don't actually USE my computer ;)

Getting this msg:
WARNING: Used device has low amount of local memory,local memory FFT size will be reduced. User-defined local FFT size will be ignored if exceeds allowed value

FERMI path used: no


It seems your ION detected as being pre-FERMI one (CC1.x).
With pre-FERMI devices older drivers can be used indeed.
And restarts you get most probably just because of Windows watchdog timer on videodriver.
Try usual measure like changing -sbs N (try smth from 32 to 256) and -period_iterations_num 100 or more.
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 55774 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 55775 - Posted: 9 Jan 2016, 7:30:04 UTC - in response to Message 55773.  
Last modified: 9 Jan 2016, 7:38:39 UTC

This report is going to be long because I want to be thorough and record any details that I've been able to cobble together.

Had an apparent NVIDIA driver (361.43) crash on my T7400 earlier this evening.

Looks not like usual driver restart because of watchdog timer issue. Such restarts will eventually cause OS reboot but number of driver restarts should happen before that.
What's interesting: BOINC appears to do nothing more than hour. Just as computer would be switched off/suspended.
Also, bugcheck number supports this:
"0x0000009F" Stop error in Windows 7 or in Windows Server 2008 R2 when the computer enters or resumes from the Soft Off (S5) power state
https://support.microsoft.com/en-us/kb/2459268
So, please check if your PC could enter in sleep mode (suspend).
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 55775 · Report as offensive
TRuEQ & TuVaLu
Volunteer tester
Avatar

Send message
Joined: 28 Jan 11
Posts: 619
Credit: 2,580,051
RAC: 0
Sweden
Message 55776 - Posted: 9 Jan 2016, 8:16:18 UTC
Last modified: 9 Jan 2016, 8:22:52 UTC

Wow, finaly I got my first driver restart.
But something is different here then it use to be.
Normaly(the gpu that causes the restart stops to run tasks so only 2 of 3 gpus keep running tasks until i restart computor.
This morning the gpu that caused the restart kept running and produced error tasks. Normaly they stop producing work.
http://setiweb.ssl.berkeley.edu/beta/results.php?hostid=72388&offset=0&show_names=0&state=6&appid= between 0400 and 0800, jan 09.

So then I wonder...ATI: Why bother to restart a driver when it doesn't restart properly.

Or if it did restart propery, what might be wrong with the 3'rd of the seti apps that keeps running. It should also restart and keep produce tasks as usual, but that is if driver restarted properly.

Is that possible/worth to look into..?
ID: 55776 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,272,268
RAC: 0
United Kingdom
Message 55777 - Posted: 9 Jan 2016, 9:13:59 UTC - in response to Message 55773.  

This report is going to be long because I want to be thorough and record any details that I've been able to cobble together.

Had an apparent NVIDIA driver (361.43) crash on my T7400 earlier this evening. I discovered it when I went to look at BOINC Manager here on my daily driver, which was remotely monitoring the T7400, and found:

All 3 GPUs were showing greater than 99% complete, with 0% remaining, but were showing run times that were about 10 times what they should have taken to complete (and still incrementing).

To me, that shows at least two things. First, although I took the screenshot at about 19:45 or so, which showed about 40-44 minutes of run time for the tasks, they had actually been started more than an hour earlier and must have been stalled for almost all of that time since they never hit their first checkpoint. Second, BOINC continued to run for long after the tasks stalled, reporting and downloading tasks at 19:40.

That machine (it says it's an E5430, but you know them better than me) is running BOINC v7.6.9. That, and subsequent, versions of BOINC display a pseudo-progress value, designed to reassure users that something is happening, even when a rogue application (from another project, of course, not here!) is incapable of reporting its own progress. The pseudo-progress displayed is designed to converge asymptotically towards 100%, at a rate determined by the initial estimate for the job. It looks as if that's what you're seeing, and the tasks indeed never started running.

I have seen similar problems on one of my Windows 7 machines. Occasional GPU task stalls, accompanied by a complete screen freeze (even the clock stops updating), and no response to mouse or keyboard input. It appears that the complete Windows graphics sub-system goes south, but the kernel processes keep running. That includes the BOINC client, and CPU applications launched by it. I've been able to connect to the machine via a remote BOINC Manager, shut down the running client cleanly, and restart the whole computer with the Big Red Switch.

You mention remote monitoring. I use BoincView, which still calculates and displays CPU efficiency (I believe BoincTasks may do the same - unfortunately the concept was removed from BOINC itself some years ago). A BOINC CPU task will show 97-100% efficiency, and a good GPU task in the low single figure range. But one of the stalled tasks we're talking about will display exactly 0.0000 CPU efficiency - that's a useful warning that a visit to the remote machine is required urgently.
ID: 55777 · Report as offensive
Profile Mike
Volunteer tester
Avatar

Send message
Joined: 16 Jun 05
Posts: 2531
Credit: 1,074,556
RAC: 0
Germany
Message 55778 - Posted: 9 Jan 2016, 9:39:10 UTC - in response to Message 55776.  

Wow, finaly I got my first driver restart.
But something is different here then it use to be.
Normaly(the gpu that causes the restart stops to run tasks so only 2 of 3 gpus keep running tasks until i restart computor.
This morning the gpu that caused the restart kept running and produced error tasks. Normaly they stop producing work.
http://setiweb.ssl.berkeley.edu/beta/results.php?hostid=72388&offset=0&show_names=0&state=6&appid= between 0400 and 0800, jan 09.

So then I wonder...ATI: Why bother to restart a driver when it doesn't restart properly.

Or if it did restart propery, what might be wrong with the 3'rd of the seti apps that keeps running. It should also restart and keep produce tasks as usual, but that is if driver restarted properly.

Is that possible/worth to look into..?


Are you running Cat 13.4 on that host ?
I only had trouble with it on my old 5850.
With each crime and every kindness we birth our future.
ID: 55778 · Report as offensive
TRuEQ & TuVaLu
Volunteer tester
Avatar

Send message
Joined: 28 Jan 11
Posts: 619
Credit: 2,580,051
RAC: 0
Sweden
Message 55779 - Posted: 9 Jan 2016, 10:07:57 UTC - in response to Message 55778.  

Wow, finaly I got my first driver restart.
But something is different here then it use to be.
Normaly(the gpu that causes the restart stops to run tasks so only 2 of 3 gpus keep running tasks until i restart computor.
This morning the gpu that caused the restart kept running and produced error tasks. Normaly they stop producing work.
http://setiweb.ssl.berkeley.edu/beta/results.php?hostid=72388&offset=0&show_names=0&state=6&appid= between 0400 and 0800, jan 09.

So then I wonder...ATI: Why bother to restart a driver when it doesn't restart properly.

Or if it did restart propery, what might be wrong with the 3'rd of the seti apps that keeps running. It should also restart and keep produce tasks as usual, but that is if driver restarted properly.

Is that possible/worth to look into..?


Are you running Cat 13.4 on that host ?
I only had trouble with it on my old 5850.



Yes I do. And on my win7 64-bit which has 2 5850 in it and no probems detected whatsoever.
I am sure that if I install win 7 64 bit on this host as well problems will dissapear, but then I woudn't have any "bugs" to report.
I consider driver very stable.
But win Vista with 1.75GB ram with 5850 + 5970 is a perfect machine for Beta.
When I play clash of clans in Bluestacks virtual android the swap file goes above 4.5GB and still no problems with 3 seti tasks running + 2 gerasim tasks on cpu. I consider that pretty ok for an old computor.
This Beta test has very little trouble and so far I consider ATI app very good.
Before I had probems with flash applications.... But that seems fixed now.
Some probems detected when accessing in firefox some login activeX or Java apps/script. I haven't been able to pinpoint exact what is making trouble yet...
I am not even sure it is seti app, ATI driver related so....It might be firefox or windows....
ID: 55779 · Report as offensive
TRuEQ & TuVaLu
Volunteer tester
Avatar

Send message
Joined: 28 Jan 11
Posts: 619
Credit: 2,580,051
RAC: 0
Sweden
Message 55780 - Posted: 9 Jan 2016, 10:53:03 UTC

Mike.

Just because i wrote what i did in the last post the computor died with blue screen in a clash of clan attack 5 mins after. No error on wu's so I guess it was due to "out ow memory" windows issue.

I have 4GB in computor but need to re-install windows vista 32-bit to use it.
I'll do that after Beta test is finnished.
ID: 55780 · Report as offensive
Previous · 1 . . . 19 · 20 · 21 · 22 · 23 · 24 · 25 . . . 99 · Next

Message boards : News : SETI@home v8 beta to begin on Tuesday


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.