[hpct] Re: diassembly on bgp

Vitali A. Morozov morozov at anl.gov
Tue Oct 28 14:36:05 CDT 2008


Hi Guojing,

Yes, we had a very productive discussion with I-Hsin about the ways to 
better use Xprofiler, and, in the discussion, we have seen a very 
strange way of handing double FPU instructions.

Let me first describe the example.

The programs measure the performance of two implementations for BLAS1 
function daxpy, one written naively in C, and second written with the 
use of double FPU in ASM. All time measurements are performed by using 
time SPRs with the help of inline asm calls.

The main function is written in C. First of all, it allocates arrays, 
measures the function call overhead (repeatedly calling "empty" 
function, written in ASM, and then starts the main loop. The main loop 
iterates over the size of the arrays, 16 to 512 step 16 in this example. 
For each size N2, it calls C-function K times, than ASM function K 
times, after which prints average time to stderr.

Notes:

1) It is very important to compile C-implementation with -O5 to ensure 
double FPU instructions get generated.
2) ASM routine is compiled with C compiler, -g -pg options.

My observations

1) gprof and Xprofiler give a little different flat profile. In 
Xprofiler, daxpy_ASM_small_N is not recognized, but another function 
called __mcount_internal has appeared.

2) Both gprof and Xprofiler identify a function called EX1, which takes 
28% of the time. However,

2.1) Trying to open the source code for it, Xprofiler says "Cannot open 
file "daxpy_C_small_N.c" for reading. Apparently, such file exists, 
contains daxpy_C_small_N function, and does not contain EX1 function.
2.2) EX1 function was wrongly misinterpreted from daxpy_ASM_small_N 
function, given in daxpy_ASM_small_N.s file. However, EX1 is not a 
global label, so it should not be interpreted as an entry point.
2.3) I cannot disassembly EX1 function - Xprofiler gives an empty window.

3) Xprofiler correctly finds the source code for daxpy_C_small_N 
function. The problem starts when I am trying to disassembly this 
function, where, for example, looking at offsets 1001D14 and below, the 
double FPU opcodes are interpreted incorrectly. For example, 1001D14  
7C87439C opcode should be disassembled as "lfpdx f4,r7,r8", that can be 
obtained from

/bgsys/drivers/ppcfloor/gnu-linux/powerpc-bgp-linux/bin/objdump -D 
daxpy_C_small_N.o

So, Xprofiler uses front-end version of objdump, but should use a 
back-end version.

The sources and example run are attached.

Thank you for you help, and sorry for such a long message.

Best regards,

Vitali







 

Guojing Cong wrote:
>
> Hi Vitali,
>
>         I heard from I-hsin that you are having problem with showing 
> double hummer assembly.   Can you give me the binary and the code 
> region you are interested, so that I can take a look here?
>
> Regards
> Guojing

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alcf.anl.gov/pipermail/hpct/attachments/20081028/48e32172/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 007.3-Xprofiler.tar
Type: application/x-tar
Size: 2795520 bytes
Desc: not available
URL: <http://lists.alcf.anl.gov/pipermail/hpct/attachments/20081028/48e32172/attachment.tar>


More information about the hpct mailing list