<div dir="ltr">On Wed, Jun 19, 2013 at 1:24 PM, Hal Finkel <span dir="ltr"><<a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="HOEnZb"><div class="h5">----- Original Message -----<br>
><br>
> On Wed, Jun 19, 2013 at 1:15 PM, Hal Finkel < <a href="mailto:hfinkel@anl.gov">hfinkel@anl.gov</a> ><br>
> wrote:<br>
><br>
><br>
><br>
><br>
><br>
> ----- Original Message -----<br>
> ><br>
> > Actually, the Intel AVX instructions have a similar issue: rint()<br>
> > has<br>
> > a fast instruction, but round() does not. On this architecture,<br>
> > round() still does the "right thing", even with -ffast-math, both<br>
> > with gcc and clang.<br>
> ><br>
> ><br>
> > Actually, as I just see, gcc generates a short sequence of<br>
> > instructions (five or so) to implement round() properly, whereas<br>
> > clang calls _round.<br>
><br>
> LLVM only seems to have special handling of:<br>
> FCEIL, FTRUNC, FRINT, FNEARBYINT, FFLOOR<br>
> (as there is no FROUND and so round() will always give you the<br>
> library call).<br>
><br>
><br>
> ><br>
> ><br>
> > Given this, the behaviour on BGQ is indeed special. I would expect<br>
> > clang to behave consistently -- to either apply this optimization<br>
> > across the board, or nowhere. Do you want to raise the issue on the<br>
> > llvm mailing list?<br>
><br>
> Unfortunately, this optimization (as are many low-level fast-math<br>
> optimizations) is target-specific. As a result, I'm not sure that<br>
> you'll even really get the cross-platform consistency that you'd<br>
> like. That having been said, if this change is too strong, then we<br>
> should back it out.<br>
><br>
><br>
><br>
> Yes, it's target specific. Nevertheless, whether rint's tie-breaking<br>
> can be influenced by __FAST_MATH__ should be a consensus decision.<br>
> Either BGQ is over-zealous, or Intel is missing a possible<br>
> optimisation, or llvm makes different speed/accuracy trade-offs on<br>
> different architectures. And the latter would be bad for users.<br>
<br>
</div></div>Agreed. Would you like to write to the list or should I?<br></blockquote><div><br></div><div>I'd prefer if you did it -- I'd hope that your asking has less chance of being told off via "don't complain, that's what you get when you use __FAST_MATH__".</div>
<div><br></div><div>-erik</div><div><br></div></div>-- <br>Erik Schnetter <<a href="mailto:schnetter@cct.lsu.edu" target="_blank">schnetter@cct.lsu.edu</a>><br><a href="http://www.perimeterinstitute.ca/personal/eschnetter/" target="_blank">http://www.perimeterinstitute.ca/personal/eschnetter/</a>
</div></div>