benchmark stock vs. optimized -- problem

Message boards : Number crunching : benchmark stock vs. optimized -- problem
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3

AuthorMessage
Gene Project Donor

Send message
Joined: 26 Apr 99
Posts: 150
Credit: 48,393,279
RAC: 118
United States
Message 1735123 - Posted: 18 Oct 2015, 5:12:51 UTC

@juha

re: your point #1. I set up the benchmark script to supply a command line option of "-device 0" and the result was the same, i.e. SEGV at the same place.

re: your point #4. I think this works o.k. in the "boinc managed" context, maybe in conjunction with the app_info.xml file reference tags. But I don't see anything in the benchmark script that would do the same thing. To work around that, I've put both libcufft.so.3 and libcudart.so.3 in the /lib/x86_64-linux-gnu/ directory, where ldd seems happy and where the seti application also finds it. If not present there or in any of a number of other places, like /usr/lib, the application terminates with "cannot open shared object file" long before it reaches the potential SEGV point.

The remaining boinc actions to set up the science application do not appear to be critical. And, yes, /dev/null has write permission.

@petri33

See second paragraph above.
echo $LIBPATH returns <null>
By way of comparison, to run the benchmark on the x41zc (optimized) cuda application I've put its cuda libraries (libcudart.so.6 and libcufft.so.6) in the /usr/lib/ directory and it runs fine. As noted above, putting those libs in the BENCH working directory, alongside the executable application, will not work as that directory is never searched. My sole reason for choosing the /usr/lib/ directory is that, according to the strace logs, that is the first place searched - might as well return success as soon as possible.

As Jason observed early in this thread, the fact that x41zc does not exhibit any benchmark anomoly suggests that whatever subtle "bug" (and I hesitate to use that word since x41g has been running fine for many years) there might have been is no longer a concern. One of Murphy's rules, however, is that "Problems that go away for no reason have a way of coming back - for no reason!"
ID: 1735123 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1735245 - Posted: 18 Oct 2015, 19:07:28 UTC - in response to Message 1735123.  
Last modified: 18 Oct 2015, 19:10:29 UTC

So far, what seems to mostly confirm the observations, is that I had injected x41zc with the executable's origin in the LD Library path. As far as I can tell (having still not gotten to the Linux machine for a bit), x41g would have been relying on whatever path had been exported by the client to the execution environment. It makes sense, if a little flaky. What should enable the application to work wherever, would be to insert a LD_LIBRARY path export command at the start of the bench, pointing to the executable's folder.

[Edit:] or a command to echo the LD_LIBRARY_PATH environment variable, then put the requisite libraries on one of those path(s)
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1735245 · Report as offensive
Juha
Volunteer tester

Send message
Joined: 7 Mar 04
Posts: 388
Credit: 1,857,738
RAC: 0
Finland
Message 1735255 - Posted: 18 Oct 2015, 19:52:02 UTC - in response to Message 1735123.  

is no longer a concern.


I agree. I just can't resist a good mystery.

But it shall remain a mystery unless someone comes up with a good idea because I've got nothing any more :(
ID: 1735255 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14653
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1735299 - Posted: 19 Oct 2015, 0:01:28 UTC - in response to Message 1735245.  
Last modified: 19 Oct 2015, 0:26:09 UTC

... whatever path had been exported by the client to the execution environment ...

I don't think the client exports anything. David was working on the assumption that (Linux) libraries in the same folder that the executable was launched from are always searched automatically. Under client control, the executables are always launched as if they reside in the slot directory where the intermediate working files reside: I don't know if the fact that what is (typically) launched is a softlink pointing back to the main project folder makes any difference - especially since the CUrt and CUfft are typically themselves softlinks in the slot directory, pointing back to the same place.

Edit, thinking about it on the way to bed, I may not have phrased that quite right. I think David's point was that all the working data files (input and output) would be in the working folder - which under client control would be set to be the allocated slot directory. And that the OS's automatic library search would scan that same working directory.
ID: 1735299 · Report as offensive
Gene Project Donor

Send message
Joined: 26 Apr 99
Posts: 150
Credit: 48,393,279
RAC: 118
United States
Message 1735348 - Posted: 19 Oct 2015, 6:11:47 UTC
Last modified: 19 Oct 2015, 6:15:19 UTC

Regarding Library search path...

As "user" echo $LD_LIBRARY_PATH returns <null>
However, "cat /proc/<pid>/environ" lists the full user environment PLUS
LD_LIBRARY_PATH=../../projects/setiathome.berkeley.edu:.:../..

which suggests to me that the boinc client exported it in the process of setting up the "slots" directory and launching the application.

Everything in the /slots/ is a <softlink>, including links to libcudart and libcufft, except for real files init_data.xml, result.sah, state.sah, and stderr.txt .

The usual OS library search path is configured by the ldconfig files to add additional paths to the default /lib and /usr/lib paths.

So, give me a couple of days and I'll modify the benchmark script to export a LD_LIBRARY_PATH similar to that shown above but with appropriate changes to align with the actual benchmark working directory. And with the libcu**s put there to (presumably) be found by the linker instead of in other system configured search paths. I'm not very optimistic but it's a simple thing to try.

Is there a Linux "doctor" in the house??? I would certainly be curious to know what happens when an x41g benchmark is run with a different kernel version and a different boinc version. See <http://lunatics.kwsn.info> for details.

/EDIT/: The /proc/<pid> alluded to is for an x41g application running in the normal boinc managed context, NOT as a benchmark.
ID: 1735348 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1735478 - Posted: 19 Oct 2015, 18:41:08 UTC - in response to Message 1735348.  
Last modified: 19 Oct 2015, 18:43:11 UTC

Yeah, now the pieces of the puzzle are starting to fall into place. Looks like the client starts new environments for the child processes. I do recall there were some export / ldconfig complexities arising back when I was building x41zc, while trying to get a working Cuda toolkit and samples going, through a change in the way the environments were loaded. That was under ubuntu at the time, and involved exporting from some buried .conf instead of the familiar means, and did involve running ldconfig as well. In the end that was why I added the origin in the exe, as complex installation for standalone testing/operation wasn't an option.

In any case, yeah will be interesting to nut out the full details in time, because while adding the origin into the paths within the exe worked for our purposes, it'd be non-ideal for other generic tools/apps. I more or less just stopped at what would work as expected for x41zc, rather than completely explore the proper system mechanism.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1735478 · Report as offensive
Juha
Volunteer tester

Send message
Joined: 7 Mar 04
Posts: 388
Credit: 1,857,738
RAC: 0
Finland
Message 1735497 - Posted: 19 Oct 2015, 20:46:16 UTC - in response to Message 1735348.  

I'll modify the benchmark script


Drop the libs on the root of the benchmark directory and then either run

LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH ./benchmark


or add the following line to the benchmark script

export LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH



I would certainly be curious to know what happens when an x41g benchmark is run with a different kernel version and a different boinc version.


Doesn't blow up. Of course I don't have an NVIDIA GPU or drivers so yeah very useful test...
ID: 1735497 · Report as offensive
Gene Project Donor

Send message
Joined: 26 Apr 99
Posts: 150
Credit: 48,393,279
RAC: 118
United States
Message 1736261 - Posted: 22 Oct 2015, 18:41:30 UTC

BREAKTHROUGH
The good news is that the fix described below allows the benchmark process to run the ..x41g.. application. The "bad" news is that I have no idea why!

With the caveat that the following "solution" is very specific for my benchmark directory context and, thus, not meant as a patch to the benchmark script for general use:

In benchmark (v2.01.08) script at about line 141 (JUST for the test APP and no corresponding change made for the REF_APP because the ref app runs fine anyway) insert the bold line, pointing to the working directory for the benchmark:
...
declare -i myStarttime='date -u +%s'
export LD_LIBRARY_PATH='/tmp/BENCH/KWSN-Bench-Linux-MBv7_v2.01.08'
/usr/bin/time --format="%C %e sec %U sec %S sec" \
-o testData/$hostName.testlog.$timeNowshort.txt -a ./$BENCH_APP
...

I intentionally did NOT append to LD_LIBRARY_PATH since an echo of that environment variable returns <null>.

The relevant libcudart.so.3 and libcufft.so.3 files are in that working directory, but also exist in /lib/x86_64-linux-gnu, with read and execute permissions in both places (and md5sums match for both places).

Without the "patch", cuda libs are read from the /lib/x86_64.. path (according to strace log) but the application fails as discussed in this thread.

With the "patch", cuda libs are read from ./ (according to strace log) and now the application runs to completion normally.

I leave it to the Linux gurus to make sense of this.
ID: 1736261 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 692
Credit: 135,197,781
RAC: 211
Germany
Message 1736267 - Posted: 22 Oct 2015, 20:33:01 UTC - in response to Message 1736261.  

Try the blue command

user@host> env | grep LD_LIBRARY_PATH
_\|/_
U r s
ID: 1736267 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1736272 - Posted: 22 Oct 2015, 20:46:03 UTC - in response to Message 1736261.  

Great! yeah seems as though some ldcomfig related changes made to the system long after Cuda3.2 was current, limit permissions in some way. Perhaps I'll need to understand that in more detail for installer and utility development down the road, where installing libraries side-by-side is non ideal. A good package maintainer probably will have some good tips and best practices recommendations there. Good to know about a potential future issue :D
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1736272 · Report as offensive
Previous · 1 · 2 · 3

Message boards : Number crunching : benchmark stock vs. optimized -- problem


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.