Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /disks/centurion/b/carolyn/b/home/boincadm/projects/beta/html/inc/boinc_db.inc on line 147
SETI@home v8 beta to begin on Tuesday

SETI@home v8 beta to begin on Tuesday

Message boards : News : SETI@home v8 beta to begin on Tuesday
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 40 · 41 · 42 · 43 · 44 · 45 · 46 . . . 99 · Next

AuthorMessage
Rasputin42
Volunteer tester

Send message
Joined: 6 Mar 09
Posts: 8
Credit: 72,401
RAC: 0
United States
Message 57321 - Posted: 14 Mar 2016, 20:43:19 UTC
Last modified: 14 Mar 2016, 20:44:37 UTC

How do i stop it detecting my GPU as low performance? (sog r3401)

Work Unit Info:
...............
Credit multiplier is : 2.85
WU true angle range is : 5.890797
Low-performance GPU detected, default period_iterations_num set to 500
For low-performance GPU path use_sleep enabled with 5ms per iteration
Used GPU device parameters are:
Number of compute units: 2
Single buffer allocation size: 128MB
Total device global memory: 1024MB
max WG size: 1024
local mem type: Real
FERMI path used: yes
LotOfMem path: no
LowPerformanceGPU path: yes
period_iterations_num=500
ID: 57321 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 57322 - Posted: 14 Mar 2016, 20:45:55 UTC - in response to Message 57321.  

How do i stop it detecting my GPU as low performance? (sog r3401)

ReadMe read?
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 57322 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,272,268
RAC: 0
United Kingdom
Message 57323 - Posted: 14 Mar 2016, 20:55:27 UTC - in response to Message 57318.  


17:30:08 0.11434709 0.622830

Difference is big enough indeed. How long task lasted as whole?
What checkpoint interval was?
What if checkpoint interval will be set to 10 seconds?

Times were logged automatically as I went along, so interval between 17:22:52 (before initial checkpoint file state.sah and boinc_task_state.xml had been written) and 17:30:08 is pretty much the whole run. As the linked result states, BOINC recorded the elapsed time as

Run time	7 min 33 sec

so if my intervals had exactly matched the start point and end point, I might have squeezed one extra in - but no more.

state.sah and boinc_task_state.xml are only updated when the task checkpoints, so I had already set a checkpoint interval of 15 seconds before capturing that log. You can see that <prog> (extracted from the master checkpoint file) is updating at every sample interval, which confirms that the reduced checkpoint interval was indeed active.
ID: 57323 · Report as offensive
Profile Dirk Sadowski
Volunteer tester

Send message
Joined: 7 Jun 09
Posts: 285
Credit: 2,822,466
RAC: 0
Germany
Message 57325 - Posted: 14 Mar 2016, 21:49:18 UTC - in response to Message 57312.  
Last modified: 14 Mar 2016, 22:34:43 UTC


The cmdline settings/usage for/of r3401 is differnet than for/of r3330?

yes, new added. -sbs N will act differently so new tuning required for it.


The 'default' settings for/of r3401 'use' the GPU 'better'?

Yes, default settings improved.


If so, I should test the apps with 'default' settings?

Yes, it's preferable for initial testing. Only when baseline established I would recommend to start further optimization.


But with at least '-no_cpu_lock -hp'?

No, most probably -no_cpu_lock resulted in EXIT_TIME_LIMIT_EXCEEDED.
Either keep CPU free or not use this option.


For around 6 months, as I started the PC, IIRC with Catalyst 15.7.1, all 4 GPU apps (1 WU/GPU) were fixed at Core#0.
The calculation time was very high, because the Core#0 was fully loaded.

I had to use -no_cpu_lock that the 4 GPU apps (1 WU/GPU) aren't fixed at Core#0.

I used this option up to now (Crimson 16.3 Hotfix (Beta)).

You think it will work now with Crimson 15.12 like it should (Core#0, #1, #2 and #3 each with 1 fixed GPU app)?

Will test: '-instances_per_device 1 -hp' (so that the cpu_lock will work properly?).
ID: 57325 · Report as offensive
Profile Dirk Sadowski
Volunteer tester

Send message
Joined: 7 Jun 09
Posts: 285
Credit: 2,822,466
RAC: 0
Germany
Message 57326 - Posted: 14 Mar 2016, 22:34:28 UTC - in response to Message 57325.  
Last modified: 14 Mar 2016, 22:42:36 UTC

The 'default' settings (with automatic cpu_lock) work now with Crimson 15.12 like it should...

Core#0, #1, #2 and #3 each with 1 fixed GPU app.

It's OK to use -instances_per_device N still (automatic cpu_lock work also without)?
I like it to read it in the stderr_txt. ;-)

I use '-instances_per_device 1 -hp' and 1 CPU-Core reserved for 1 GPU app (app_config.xml).
ID: 57326 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 57327 - Posted: 14 Mar 2016, 23:21:22 UTC - in response to Message 57323.  

updating at every sample interval, which confirms that the reduced checkpoint interval was indeed active.

then show same log but with normal checkpoint of 1 min - how prog will differ?
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 57327 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 57328 - Posted: 14 Mar 2016, 23:23:11 UTC - in response to Message 57325.  


Will test: '-instances_per_device 1 -hp' (so that the cpu_lock will work properly?).


Not actually required (if 1 instance - it's by default).
If they will be pinned to single core report with screenshot.
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 57328 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 57329 - Posted: 14 Mar 2016, 23:24:07 UTC - in response to Message 57326.  


It's OK to use -instances_per_device N still (automatic cpu_lock work also without)?
I like it to read it in the stderr_txt. ;-)

yes
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 57329 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,272,268
RAC: 0
United Kingdom
Message 57330 - Posted: 14 Mar 2016, 23:39:45 UTC - in response to Message 57327.  

updating at every sample interval, which confirms that the reduced checkpoint interval was indeed active.

then show same log but with normal checkpoint of 1 min - how prog will differ?

Here's one from earlier in the development of my batch file - before I'd added name extraction, and before I'd reduced checkpointing to 15 seconds.

WU true angle range is :  9.964068

            <prog>        <fraction_done>
15:51:28
15:51:43
15:51:58
15:52:13
15:52:28    0.01403484
15:52:43    0.01403484    0.000023
15:52:58    0.01403484    0.000023
15:53:13    0.01403484    0.000023
15:53:28    0.01403484    0.000023
15:53:43    0.03066970    0.000023
15:53:58    0.03066970    0.000023
15:54:13    0.03066970    0.000023
15:54:28    0.03066970    0.000023
15:54:43    0.04720692    0.042726
15:54:58    0.04720692    0.042726
15:55:13    0.04720692    0.042726
15:55:28    0.04720692    0.042726
15:55:43    0.04720692    0.042726
15:55:58    0.06383516    0.042726
15:56:13    0.06383516    0.042726
15:56:28    0.06383516    0.042726
15:56:43    0.06383516    0.042726
15:56:58    0.08046837    0.042726
15:57:13    0.08046837    0.042726
15:57:28    0.08046837    0.042726
15:57:43    0.08046837    0.042726
15:57:58    0.09699277    0.146938
15:58:13    0.09699277    0.146938
15:58:28    0.09699277    0.146938
15:58:43    0.09699277    0.146938

Initial values are missing for the minute before the first checkpoint (files not yet created), and <prog> only updates every 60 seconds.
ID: 57330 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 57331 - Posted: 14 Mar 2016, 23:54:01 UTC - in response to Message 57330.  
Last modified: 14 Mar 2016, 23:55:55 UTC

there is no discrepancy like 17:30:08 0.11434709 0.622830

I would like to see whole log from 0 to ~100%
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 57331 · Report as offensive
Profile Dirk Sadowski
Volunteer tester

Send message
Joined: 7 Jun 09
Posts: 285
Credit: 2,822,466
RAC: 0
Germany
Message 57337 - Posted: 15 Mar 2016, 6:56:41 UTC
Last modified: 15 Mar 2016, 7:14:01 UTC

This was the 1st test:
http://setiweb.ssl.berkeley.edu/beta/results.php?hostid=76831&offset=40&show_names=0&state=0&appid=
A lot EXIT_TIME_LIMIT_EXCEEDED marked as 'Error while computing' (also marked as 'Aborted', BOINC do this automatically?).
(After the 2nd test, I guess the errors happened, because of AMD driver restarts - and then the GPUs can't work properly.)


2nd test:
http://setiweb.ssl.berkeley.edu/beta/show_host_detail.php?hostid=78440

Until now I never saw AMD driver restarts.

Driver restart with (I saw it because I was in front of the screen):
8.09 SETI@home v8 (opencl_ati5_SoG_nocal)
No progress -> it would finish with EXIT_TIME_LIMIT_EXCEEDED
(in Task-Manager still there after BOINC exit)

SoG app use 8% CPU, this is 1 Core.

With this task: http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=23324856


I saw 3 GPUs are not running (the FuryX VGA card have an indicator with a few LEDs):
No progress:
3x 8.09 SETI@home v8 (opencl_ati_nocal) -> they would finish with EXIT_TIME_LIMIT_EXCEEDED

Progress:
1x 8.09 SETI@home v8 (opencl_ati5_nocal)


Driver restart with (test with -no_cpu_lock) (I saw it because I was in front of the screen):
3x 8.09 SETI@home v8 (opencl_atiapu_SoG)
1x 8.09 SETI@home v8 (opencl_ati5_nocal)

SoG app use 12% CPU, this are 1 1/2 Cores.


I have no idea why the r3401 apps don't run smoothly on my PC.
The r3330 run very well.
ID: 57337 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 57338 - Posted: 15 Mar 2016, 7:46:23 UTC - in response to Message 57337.  
Last modified: 15 Mar 2016, 7:50:29 UTC

try to enable -v 8 option (nothing else, defaults!) and copy stderr.txt from slot folder before it will be deleted (from driver restart to task abortion you should have some hours to act). Then send stderr.txt to me.

EDIT: Number of compute units: 64 - it's the real "monster" so constitues high-end edge case.

And while I will process log you could try to add -pref_wg_num_per_cu 1 or 2 and see if driver restarts continue
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 57338 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,272,268
RAC: 0
United Kingdom
Message 57341 - Posted: 15 Mar 2016, 8:14:54 UTC - in response to Message 57331.  

there is no discrepancy like 17:30:08 0.11434709 0.622830

I would like to see whole log from 0 to ~100%

That's because the <fraction_done>, as Jinbocous pointed out at the beginning, is compressed into a tiny segment at the end of the run - less than 30 seconds.

I'll try and set up a logging run with one second granularity (both checkpointing and logging), if that's the only way to convince you.
ID: 57341 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 57348 - Posted: 15 Mar 2016, 15:19:24 UTC - in response to Message 57341.  

With log ending @ ~14% I can't see if both numbers converge to 100% eventually or one completely left behind.
Also take into account that changing checkpoint time also changes the way app interacts with GPU (it does that on checkpoints). If checkpoints ~1s give more "linear" progress - there is nothing to fix. If even often checkpoints show big discrepancy - perhaps some of progress state update code omitted.
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 57348 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,272,268
RAC: 0
United Kingdom
Message 57349 - Posted: 15 Mar 2016, 15:28:02 UTC - in response to Message 57348.  

Well, I've been fetching work all day, but haven't been able to snag any VHARs like that lucky batch yesterday. This is the closest I've come:

wu_name: 24no10ab.26598.1703.8.42.239
WU true angle range is :  1.050587

            <prog>        <fraction_done>
14:23:59
14:24:14    0.00157712    0.001948
14:24:29    0.00729066    0.001948
14:24:44    0.00729066    0.007534
14:24:59    0.01293547    0.013279
14:25:14    0.01861221    0.018956
14:25:29    0.02423934    0.024707
14:25:59    0.03566080    0.035957
14:26:14    0.04136161    0.041698
14:26:29    0.04703214    0.047397
14:26:44    0.05266307    0.053082
14:26:59    0.05831132    0.058615
14:27:14    0.06399740    0.064447
14:27:29    0.06964457    0.064447
14:27:43    0.06964457    0.070206
14:27:57    0.07527402    0.076051
14:28:12    0.08099277    0.081977
14:28:27    0.08668146    0.087977
14:28:42    0.09232784    0.094266
14:28:57    0.09790277    0.100736
14:29:12    0.10368747    0.107579
14:29:27    0.10916947    0.114494
14:29:42    0.11481814    0.122307
14:29:57    0.12045761    0.130827
14:30:13    0.12619096    0.140275
14:30:28    0.12619096    0.140275
14:30:43    0.13187701    0.150882
14:30:58    0.13751588    0.162950
14:31:13    0.14306452    0.176779
14:31:28    0.14887551    0.192903
14:31:43    0.15453547    0.211255
14:31:58    0.16020197    0.233611
14:32:13    0.18524558    0.287315
14:32:28    0.21310161    0.347672
14:32:43    0.24495262    0.428355
14:32:58    0.27631588    0.518778
14:33:13    0.30818283    0.518778
14:33:28    0.30818283    0.628669
14:33:43    0.33650467    0.743310
14:33:58    0.38950199    1.000000

So, <fraction_done> does reach 100% eventually (as I see it reporting live in BOINC Manager), but it rushes up in a hockey-stick curve at the end.

Meanwhile, prog(ress) never made it past 40...

(I didn't bother with 1-second sampling - I think this shows the effect clearly enough. I'll perhaps try again tomorrow.)
ID: 57349 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,272,268
RAC: 0
United Kingdom
Message 57350 - Posted: 15 Mar 2016, 15:33:33 UTC - in response to Message 57348.  

If checkpoints ~1s give more "linear" progress - there is nothing to fix.

Nobody in their right minds would use 1-second checkpoint intervals for production running. If anything, people extend checkpoint intervals to save wear and tear on their storage devices.
ID: 57350 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 18 Jan 06
Posts: 1038
Credit: 18,734,730
RAC: 0
Germany
Message 57357 - Posted: 15 Mar 2016, 21:50:35 UTC

Notice: Trying to get property of non-object in /disks/carolyn/b/home/boincadm/projects/beta/html/inc/result.inc on line 80
Notice: Trying to get property of non-object in /disks/carolyn/b/home/boincadm/projects/beta/html/inc/result.inc on line 81
Notice: Trying to get property of non-object in /disks/carolyn/b/home/boincadm/projects/beta/html/inc/result.inc on line 82
Notice: Trying to get property of non-object in /disks/carolyn/b/home/boincadm/projects/beta/html/inc/result.inc on line 83
Notice: Trying to get property of non-object in /disks/carolyn/b/home/boincadm/projects/beta/html/inc/result.inc on line 83
Notice: Trying to get property of non-object in /disks/carolyn/b/home/boincadm/projects/beta/html/inc/result.inc on line 84
Notice: Trying to get property of non-object in /disks/carolyn/b/home/boincadm/projects/beta/html/inc/result.inc on line 89
hostid 51991, if i look at the error list of that host i get above server notices. Anything to fix there on server side ?
_\|/_
U r s
ID: 57357 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 18 Jan 06
Posts: 1038
Credit: 18,734,730
RAC: 0
Germany
Message 57358 - Posted: 15 Mar 2016, 22:04:57 UTC - in response to Message 57320.  
Last modified: 15 Mar 2016, 22:13:08 UTC

I am getting quite a few inconclusives. Seeing that on my Mac Pro both on Main and beta, but not on a Nvidia 570 in Windows. In each case I'm missing a gaussian compared to my wingman.

Main:

http://setiathome.berkeley.edu/results.php?hostid=6105482&offset=0&show_names=0&state=3&appid=

Beta:

https://setiweb.ssl.berkeley.edu/beta/results.php?hostid=60016&offset=0&show_names=0&state=3&appid=

Let me know if you need something else.

Thanks,

Chris


Worth to check if it's SOG specific or Mac-specific issue.
I could process those beta tasks later with own GPU, maybe Urs could check them on Mac. Urs?
I'll pick one or two wus from that host with the characteristics "Gaussian(s) missing compared to co-host" and try to rerun in standalone comparing to CPU app(s).
_\|/_
U r s
ID: 57358 · Report as offensive
Chris Adamek
Volunteer tester

Send message
Joined: 27 Aug 12
Posts: 56
Credit: 127,133
RAC: 0
United States
Message 57363 - Posted: 16 Mar 2016, 1:06:13 UTC - in response to Message 57358.  

Let me know if you need any additional information.

Thanks,

Chris
ID: 57363 · Report as offensive
Profile Dirk Sadowski
Volunteer tester

Send message
Joined: 7 Jun 09
Posts: 285
Credit: 2,822,466
RAC: 0
Germany
Message 57364 - Posted: 16 Mar 2016, 3:11:39 UTC - in response to Message 57338.  
Last modified: 16 Mar 2016, 3:14:19 UTC

Raistmer wrote:
try to enable -v 8 option (nothing else, defaults!) and copy stderr.txt from slot folder before it will be deleted (from driver restart to task abortion you should have some hours to act). Then send stderr.txt to me.

EDIT: Number of compute units: 64 - it's the real "monster" so constitues high-end edge case.

And while I will process log you could try to add -pref_wg_num_per_cu 1 or 2 and see if driver restarts continue


You mean copy/paste the entries of the stderr.txt file and send a private message to you? With BBCs [ pre ] ?
Or the file via E-Mail?

...not a few hours - fast VGA cards. ;-)

I should test it here or at Main?
Because there it happens after a few minutes.
(so maybe just 2 or 3 tasks error out - or with good luck, no task)


Could be a SETI GPU app 'destroy' the AMD driver?

I tested the r3401 app (with default settings) at SETI-Main.
I had let run two tasks successively - for to create the .WISDOM file.
Then I had let run the PC fully loaded and after ~ 2 minutes the driver restarts starts, every ~ 10 seconds.
After a few restarts the Windows (8.1 Pro x64) Desktop disappeared and a blue screen came, something like the following was written: the AMD*****.*** file has been destroyed or the file disappeared. Then the PC made a self reboot.

This was with Crimson 16.3 Hotfix (Beta) - and the reason that I installed again Crimson 15.12 (if the 16.3 driver isn't longer usable?) (of course, between usage of DDU ;-).
ID: 57364 · Report as offensive
Previous · 1 . . . 40 · 41 · 42 · 43 · 44 · 45 · 46 . . . 99 · Next

Message boards : News : SETI@home v8 beta to begin on Tuesday


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.