<div dir="ltr">On Wed, Jun 19, 2013 at 1:24 PM, Hal Finkel <span dir="ltr"><<a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div class="HOEnZb"><div class="h5">----- Original Message -----<br>

><br>

> On Wed, Jun 19, 2013 at 1:15 PM, Hal Finkel < <a href="mailto:hfinkel@anl.gov">hfinkel@anl.gov</a> ><br>

> wrote:<br>

><br>

><br>

><br>

><br>

><br>

> ----- Original Message -----<br>

> ><br>

> > Actually, the Intel AVX instructions have a similar issue: rint()<br>

> > has<br>

> > a fast instruction, but round() does not. On this architecture,<br>

> > round() still does the "right thing", even with -ffast-math, both<br>

> > with gcc and clang.<br>

> ><br>

> ><br>

> > Actually, as I just see, gcc generates a short sequence of<br>

> > instructions (five or so) to implement round() properly, whereas<br>

> > clang calls _round.<br>

><br>

> LLVM only seems to have special handling of:<br>

> FCEIL, FTRUNC, FRINT, FNEARBYINT, FFLOOR<br>

> (as there is no FROUND and so round() will always give you the<br>

> library call).<br>

><br>

><br>

> ><br>

> ><br>

> > Given this, the behaviour on BGQ is indeed special. I would expect<br>

> > clang to behave consistently -- to either apply this optimization<br>

> > across the board, or nowhere. Do you want to raise the issue on the<br>

> > llvm mailing list?<br>

><br>

> Unfortunately, this optimization (as are many low-level fast-math<br>

> optimizations) is target-specific. As a result, I'm not sure that<br>

> you'll even really get the cross-platform consistency that you'd<br>

> like. That having been said, if this change is too strong, then we<br>

> should back it out.<br>

><br>

><br>

><br>

> Yes, it's target specific. Nevertheless, whether rint's tie-breaking<br>

> can be influenced by __FAST_MATH__ should be a consensus decision.<br>

> Either BGQ is over-zealous, or Intel is missing a possible<br>

> optimisation, or llvm makes different speed/accuracy trade-offs on<br>

> different architectures. And the latter would be bad for users.<br>

<br>

</div></div>Agreed. Would you like to write to the list or should I?<br></blockquote><div><br></div><div>I'd prefer if you did it -- I'd hope that your asking has less chance of being told off via "don't complain, that's what you get when you use __FAST_MATH__".</div>

<div><br></div><div>-erik</div><div><br></div></div>-- <br>Erik Schnetter <<a href="mailto:schnetter@cct.lsu.edu" target="_blank">schnetter@cct.lsu.edu</a>><br><a href="http://www.perimeterinstitute.ca/personal/eschnetter/" target="_blank">http://www.perimeterinstitute.ca/personal/eschnetter/</a>

</div></div>