[Llvm-bgq-discuss] Patches for r176829-20130309 (the current vesta version)
Michael Kruse
MichaelKruse at meinersbur.de
Thu Apr 18 08:45:33 CDT 2013
2013/4/17 Hal Finkel <hfinkel at anl.gov>:
>> Not a problem; I'd rather get reports early and often.
Good to know. I only get embarrassed when I report something that was
caused by my own stupidity and consumed someone else's time.
>> > 2. bgclang is quite unreliable on inline assembler. With this:
>> > asm (
>> > "dcbt 0,%[ptr] \n"
>> > "dcbt %[c64],%[ptr] \n"
>> > "dcbt %[c128],%[ptr] \n"
>> > "dcbt %[c192],%[ptr] \n"
>> > "dcbt %[c256],%[ptr] \n"
>> > "dcbt %[c320],%[ptr] \n"
>> > : :
>> > [ptr] "+r" (ptr),
>> > [c64] "b" (64),
>> > [c128] "b" (128),
>> > [c192] "b" (192),
>> > [c256] "b" (256),
>> > [c320] "b" (320)
>> > );
>> >
>> > I sometimes get
>> > error: invalid input constraint '+r' in asm
>> > other times
>> > fatal error: error in backend: Do not know how to split the result
>> > of
>> > this operator!
>> > (though I am not sure it's this piece of code, clang doesn't give
>> > me
>> > a location)
>
> Also, in the mean time, you can use the __dcbt intrinsic. It works just like in xlc (so special header currently required).
There is actually a reason why I use this inline assembly.
I have a big loop body working on two contiguous streams of data. Per
iteration, there are 1536 + 2304 bytes to read that I want to
prefetch. Using
__dcbt(p+0)
__dcbt(p+64)
__dcbt(p+128)
...
will make xlc generate code like
dcbt 0, r1
li r2 64
dcbt r2, r1
li r2 128
dcbt r2, r1
...
because there are more constants involved than general purpose
registers available. So the constants have to be rematerialised in
every loop iteration.
Using a scheme like
dcbt 0, r1
dcbt r3, r1
dcbt r4, r1
dcbt r5, r1
dcbt r6, r1
addi r1, r1, 320
dcbt 0, r1
dcbt r3, r1
dcbt r4, r1
dcbt r5, r1
dcbt r6, r1
addi r1, r1, 320
only 4 constants are needed in registers that can be preserved during
loop iterations.
The situation is even worse when using vec_ld, which generates code
with just a single qvlfdux (with u=update) instruction and lots of
"li"s .
I don't know yet how clang behaves here.
Regards,
Michael
--
Tardyzentrismus verboten!
More information about the llvm-bgq-discuss
mailing list