[hpct] Re: diassembly on bgp
Vitali A. Morozov
morozov at anl.gov
Tue Oct 28 14:36:05 CDT 2008
Hi Guojing,
Yes, we had a very productive discussion with I-Hsin about the ways to
better use Xprofiler, and, in the discussion, we have seen a very
strange way of handing double FPU instructions.
Let me first describe the example.
The programs measure the performance of two implementations for BLAS1
function daxpy, one written naively in C, and second written with the
use of double FPU in ASM. All time measurements are performed by using
time SPRs with the help of inline asm calls.
The main function is written in C. First of all, it allocates arrays,
measures the function call overhead (repeatedly calling "empty"
function, written in ASM, and then starts the main loop. The main loop
iterates over the size of the arrays, 16 to 512 step 16 in this example.
For each size N2, it calls C-function K times, than ASM function K
times, after which prints average time to stderr.
Notes:
1) It is very important to compile C-implementation with -O5 to ensure
double FPU instructions get generated.
2) ASM routine is compiled with C compiler, -g -pg options.
My observations
1) gprof and Xprofiler give a little different flat profile. In
Xprofiler, daxpy_ASM_small_N is not recognized, but another function
called __mcount_internal has appeared.
2) Both gprof and Xprofiler identify a function called EX1, which takes
28% of the time. However,
2.1) Trying to open the source code for it, Xprofiler says "Cannot open
file "daxpy_C_small_N.c" for reading. Apparently, such file exists,
contains daxpy_C_small_N function, and does not contain EX1 function.
2.2) EX1 function was wrongly misinterpreted from daxpy_ASM_small_N
function, given in daxpy_ASM_small_N.s file. However, EX1 is not a
global label, so it should not be interpreted as an entry point.
2.3) I cannot disassembly EX1 function - Xprofiler gives an empty window.
3) Xprofiler correctly finds the source code for daxpy_C_small_N
function. The problem starts when I am trying to disassembly this
function, where, for example, looking at offsets 1001D14 and below, the
double FPU opcodes are interpreted incorrectly. For example, 1001D14
7C87439C opcode should be disassembled as "lfpdx f4,r7,r8", that can be
obtained from
/bgsys/drivers/ppcfloor/gnu-linux/powerpc-bgp-linux/bin/objdump -D
daxpy_C_small_N.o
So, Xprofiler uses front-end version of objdump, but should use a
back-end version.
The sources and example run are attached.
Thank you for you help, and sorry for such a long message.
Best regards,
Vitali
Guojing Cong wrote:
>
> Hi Vitali,
>
> I heard from I-hsin that you are having problem with showing
> double hummer assembly. Can you give me the binary and the code
> region you are interested, so that I can take a look here?
>
> Regards
> Guojing
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alcf.anl.gov/pipermail/hpct/attachments/20081028/48e32172/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 007.3-Xprofiler.tar
Type: application/x-tar
Size: 2795520 bytes
Desc: not available
URL: <http://lists.alcf.anl.gov/pipermail/hpct/attachments/20081028/48e32172/attachment.tar>
More information about the hpct
mailing list