<font size=2 face="sans-serif">Hal,</font>
<br>
<br><font size=2 face="sans-serif">Tom and I have discovered what appears
to be a code gen problem when compiling the Kernel_RanksToCoords() CNK
syscall. See Tom's analysis below.</font>
<br>
<br><font size=2 face="sans-serif">GPR4 receives the second parameter,
GPR5 receives the third parameter, and GPR3 is the return value.</font>
<br>
<br><font size=2 face="sans-serif">---</font>
<br>
<br><font size=2 face="sans-serif">BTW, I'm using the latest "bgclang"
wrapper script, but I've renamed it to </font><tt><font size=2>powerpc64-bgq-linux-clang</font></tt><font size=2 face="sans-serif">
so it plays nice with autoconf.</font>
<br>
<br><font size=2 face="sans-serif"><br>
Michael Blocksome<br>
Blue Gene Messaging<br>
blocksom@us.ibm.com<br>
</font>
<br><font size=1 color=#800080 face="sans-serif">----- Forwarded by Michael
Blocksome/Rochester/IBM on 07/26/2013 01:20 PM -----</font>
<br>
<br><font size=1 color=#5f5f5f face="sans-serif">From:
</font><font size=1 face="sans-serif">Thomas Gooding/Rochester/IBM</font>
<br><font size=1 color=#5f5f5f face="sans-serif">To:
</font><font size=1 face="sans-serif">Michael Blocksome/Rochester/IBM@IBMUS,
</font>
<br><font size=1 color=#5f5f5f face="sans-serif">Date:
</font><font size=1 face="sans-serif">07/26/2013 11:20 AM</font>
<br><font size=1 color=#5f5f5f face="sans-serif">Subject:
</font><font size=1 face="sans-serif">Re: clang</font>
<br>
<hr noshade>
<br>
<br><font size=2 face="sans-serif">Looks like the addresses are getting
truncated to 32-bits.</font>
<br>
<br><font size=2 face="sans-serif">after ... Kernel_RanksToCoords (8, </font><font size=2 color=blue face="sans-serif">0x1dbfffba18</font><font size=2 face="sans-serif">,
</font><font size=2 color=red face="sans-serif">0x1dbfffba10 </font><font size=2 face="sans-serif">{0})
= </font><font size=2 color=#008000 face="sans-serif">14</font>
<br>
<br><font size=2 face="sans-serif">{4}.16.1: TB=0000000773c39a5c FL_SYSCALLAT:0
Syscall 1055 at IP=0x0000000001001314 LR=0x00000000010012bc
SP=0x0000001dbfffb9a0 (RANKS2COORDS)</font>
<br><font size=2 face="sans-serif">{4}.16.1: TB=0000000773c39acc FL_SYSCALLEN:0
Syscall Entry GPR3=0x0000000000000008 GPR4=</font><font size=2 color=blue face="sans-serif">0x00000000bfffba18
</font><font size=2 face="sans-serif">GPR5=</font><font size=2 color=red face="sans-serif">0x00000000bfffba10
</font><font size=2 face="sans-serif">GPR6=0x0000001dbfffba50</font>
<br><font size=2 face="sans-serif">{4}.16.1: TB=0000000773c3a1a4 FL_SYSCALLRT:0
Syscall Return GPR3=</font><font size=2 color=#008000 face="sans-serif">0x000000000000000e</font>
<br>
<br>
<br><tt><font size=2> 10012c0: 38 60 04 1f
li r3,1055</font></tt>
<br><tt><font size=2> 10012c4: 38 9f 00 a8
addi r4,r31,168</font></tt>
<br><tt><font size=2> 10012c8: 38 df 00 b0
addi r6,r31,176</font></tt>
<br><tt><font size=2> 10012cc: 38 ff 00 b8
addi r7,r31,184</font></tt>
<br><tt><font size=2> 10012d0: 39 1f 00 c0
addi r8,r31,192</font></tt>
<br><tt><font size=2> 10012d4: e8 bf 00 80
ld r5,128(r31) <---- quite
a few load/stores... seems very wasteful.</font></tt>
<br><tt><font size=2> 10012d8: fb df 00 d0
std r30,208(r31)</font></tt>
<br><tt><font size=2> 10012dc: fb bf 00 c8
std r29,200(r31)</font></tt>
<br><tt><font size=2> 10012e0: e9 3f 00 d0
ld r9,208(r31)</font></tt>
<br><tt><font size=2> 10012e4: e9 5f 00 c8
ld r10,200(r31)</font></tt>
<br><tt><font size=2> 10012e8: f8 bf 00 d8
std r5,216(r31)</font></tt>
<br><tt><font size=2> 10012ec: e8 bf 00 d8
ld r5,216(r31)</font></tt>
<br><tt><font size=2> 10012f0: f9 3f 00 b0
std r9,176(r31)</font></tt>
<br><tt><font size=2> 10012f4: f9 5f 00 a8
std r10,168(r31)</font></tt>
<br><tt><font size=2> 10012f8: f8 bf 00 b8
std r5,184(r31)</font></tt>
<br><tt><font size=2> 10012fc: f8 7f 00 c0
std r3,192(r31)</font></tt>
<br><tt><font size=2 color=red> 1001300: 80 a4
00 04 lwz r5,4(r4) <----
these are 32-bit loads, should be 64-bit. </font></tt>
<br><tt><font size=2 color=red> 1001304: 80 86
00 04 lwz r4,4(r6)</font></tt>
<br><tt><font size=2 color=red> 1001308: 80 67
00 04 lwz r3,4(r7)</font></tt>
<br><tt><font size=2 color=red> 100130c: 80 08
00 04 lwz r0,4(r8)</font></tt>
<br><tt><font size=2> 1001310: 44 00 00 02
sc</font></tt>
<br>
<br>
<br><font size=2 face="sans-serif">The inline assembly piece is:</font>
<br><font size=2 face="sans-serif">#define CNK_SPI_SYSCALL_3(name, arg0,
arg1, arg2)
\</font>
<br><font size=2 face="sans-serif">({
\</font>
<br><font size=2 face="sans-serif"> register uint64_t r0 __asm__
("r0") = (__NR_ ## name);
\</font>
<br><font size=2 face="sans-serif"> register uint64_t r3 __asm__
("r3") = ((uint64_t) (arg0));
\</font>
<br><font size=2 face="sans-serif"> register uint64_t r4 __asm__
("r4") = ((uint64_t) (arg1));
\</font>
<br><font size=2 face="sans-serif"> register uint64_t r5 __asm__
("r5") = ((uint64_t) (arg2));
\</font>
<br><font size=2 face="sans-serif"> __asm__ __volatile__
\</font>
<br><font size=2 face="sans-serif"> ("sc"
\</font>
<br><font size=2 face="sans-serif"> : "=&r"(r0),"=&r"(r3),"=&r"(r4),"=&r"(r5)
\</font>
<br><font size=2 face="sans-serif"> : "0"(r0),
"1"(r3), "2"(r4), "3"(r5)
\</font>
<br><font size=2 face="sans-serif"> : "r6","r7","r8","r9","r10","r11","r12","cr0","memory");
\</font>
<br><font size=2 face="sans-serif"> r3;
\</font>
<br><font size=2 face="sans-serif">})</font>
<br>
<br><font size=2 face="sans-serif">I don't see where they would be interpreted
as 32-bits. In the input/outputs section, "r" is a register
for gcc assembly, which should be 64-bit for a 64-bit compile. I presume
LLVM is following gcc assembly semantics. </font>
<br>
<br><font size=2 face="sans-serif">The other comment was that Hal seems
to be directing people to his "bgclang" wrapper script. I'm
not sure it will make a difference, but could be something to try. </font>
<br>
<br><font size=2 face="sans-serif">Tom</font>
<br>
<br><font size=2 face="sans-serif">Tom Gooding<br>
Senior Engineer / Blue Gene Kernels<br>
507-253-0747 (internal: 553-0747)<br>
</font>
<br>
<br>
<br>
<table width=100% style="border-collapse:collapse;">
<tr valign=top height=8>
<td width=96 style="border-style:solid;border-color:#000000;border-width:0px 0px 0px 0px;padding:0px 0px;"><font size=1 color=#5f5f5f face="sans-serif">From:</font>
<td style="border-style:solid;border-color:#000000;border-width:0px 0px 0px 0px;padding:0px 0px;"><font size=1 face="sans-serif">Michael
Blocksome/Rochester/IBM</font>
<tr valign=top height=8>
<td width=96 style="border-style:solid;border-color:#000000;border-width:0px 0px 0px 0px;padding:0px 0px;"><font size=1 color=#5f5f5f face="sans-serif">To:</font>
<td style="border-style:solid;border-color:#000000;border-width:0px 0px 0px 0px;padding:0px 0px;"><font size=1 face="sans-serif">Thomas
Gooding/Rochester/IBM@IBMUS, </font>
<tr valign=top height=8>
<td width=96 style="border-style:solid;border-color:#000000;border-width:0px 0px 0px 0px;padding:0px 0px;"><font size=1 color=#5f5f5f face="sans-serif">Date:</font>
<td style="border-style:solid;border-color:#000000;border-width:0px 0px 0px 0px;padding:0px 0px;"><font size=1 face="sans-serif">07/26/2013
08:56 AM</font>
<tr valign=top height=8>
<td width=96 style="border-style:solid;border-color:#000000;border-width:0px 0px 0px 0px;padding:0px 0px;"><font size=1 color=#5f5f5f face="sans-serif">Subject:</font>
<td style="border-style:solid;border-color:#000000;border-width:0px 0px 0px 0px;padding:0px 0px;"><font size=1 face="sans-serif">clang</font></table>
<br>
<hr noshade>
<br>
<br><font size=2 face="sans-serif">Tom,</font>
<br>
<br><font size=2 face="sans-serif">I wrote a very simple test and compiled
with the latest version of llvm/clang from Hal. </font>
<br>
<br><tt><font size=2>$ cat kernel_rankstocoords.c </font></tt>
<br>
<br><tt><font size=2>#include <stdlib.h></font></tt>
<br><tt><font size=2>#include <stdio.h></font></tt>
<br><tt><font size=2>#include <stdint.h></font></tt>
<br>
<br><tt><font size=2>#include "kernel/location.h"</font></tt>
<br>
<br>
<br><tt><font size=2>int main (int argc, char * argv[])</font></tt>
<br><tt><font size=2>{</font></tt>
<br><tt><font size=2> uint32_t rc = 1;</font></tt>
<br><tt><font size=2> size_t mapsize = 2*sizeof(BG_CoordinateMapping_t);</font></tt>
<br><tt><font size=2> BG_CoordinateMapping_t map[2];</font></tt>
<br><tt><font size=2> uint64_t n = 0;</font></tt>
<br>
<br><tt><font size=2> fprintf (stdout, "before .. Kernel_RanksToCoords
(%zu, %p, %p {%lu}) = %d\n", mapsize, map, &n, n, rc);</font></tt>
<br><tt><font size=2> rc = Kernel_RanksToCoords (mapsize, map, &n);</font></tt>
<br><tt><font size=2> fprintf (stdout, "after ... Kernel_RanksToCoords
(%zu, %p, %p {%lu}) = %d\n", mapsize, map, &n, n, rc);</font></tt>
<br>
<br><tt><font size=2> return 0;</font></tt>
<br><tt><font size=2>}</font></tt>
<br>
<br><tt><font size=2>$ /bghome/blocksom/development/c++11/install/powerpc64-bgq-linux-clang
kernel_rankstocoords.c -o kernel_rankstocoords.clang -I/bgsys/drivers/ppcfloor/spi/include
-I/bgsys/drivers/ppcfloor/spi/include/kernel/cnk -I/bgsys/drivers/ppcfloor</font></tt>
<br>
<br>
<br><font size=2 face="sans-serif">When I run this I get that same error
as when I ran pami compiled with clang..</font>
<br>
<br><tt><font size=2>$ runjob --block R00-M1-N10 --np 2 : kernel_rankstocoords.clang
</font></tt>
<br><tt><font size=2>before .. Kernel_RanksToCoords (8, 0x1dbfffba18, 0x1dbfffba10
{0}) = 1</font></tt>
<br><tt><font size=2>before .. Kernel_RanksToCoords (8, 0x1dbfffba18, 0x1dbfffba10
{0}) = 1</font></tt>
<br><tt><font size=2>after ... Kernel_RanksToCoords (8, 0x1dbfffba18, 0x1dbfffba10
{0}) = 14</font></tt>
<br><tt><font size=2>after ... Kernel_RanksToCoords (8, 0x1dbfffba18, 0x1dbfffba10
{0}) = 14</font></tt>
<br>
<br><font size=2 face="sans-serif">.. but it runs fine when compiled with
gcc..</font>
<br>
<br><tt><font size=2>$ /bgsys/drivers/ppcfloor/gnu-linux/bin/powerpc64-bgq-linux-gcc
kernel_rankstocoords.c -o kernel_rankstocoords.gcc -I/bgsys/drivers/ppcfloor/spi/include
-I/bgsys/drivers/ppcfloor/spi/include/kernel/cnk -I/bgsys/drivers/ppcfloor</font></tt>
<br>
<br><tt><font size=2>$ runjob --block R00-M1-N10 --np 2 : kernel_rankstocoords.gcc
</font></tt>
<br><tt><font size=2>before .. Kernel_RanksToCoords (8, 0x1dbfffba54, 0x1dbfffba60
{0}) = 1</font></tt>
<br><tt><font size=2>before .. Kernel_RanksToCoords (8, 0x1dbfffba54, 0x1dbfffba60
{0}) = 1</font></tt>
<br><tt><font size=2>after ... Kernel_RanksToCoords (8, 0x1dbfffba54, 0x1dbfffba60
{2}) = 0</font></tt>
<br><tt><font size=2>after ... Kernel_RanksToCoords (8, 0x1dbfffba54, 0x1dbfffba60
{2}) = 0</font></tt>
<br>
<br>
<br><font size=2 face="sans-serif">Any ideas? Is it something I'm
doing wrong, or should I post this to the llvm-bgq mailing list?</font>
<br>
<br><font size=2 face="sans-serif"><br>
Michael Blocksome<br>
Blue Gene Messaging<br>
blocksom@us.ibm.com<br>
</font>
<br>
<br>