[Llvm-bgq-discuss] Status of OpenMP support in BG/Q

Hal Finkel hfinkel at anl.gov
Fri Feb 7 08:35:01 CST 2014


----- Original Message -----
> From: "Michael Schlottke" <m.schlottke at fz-juelich.de>
> To: llvm-bgq-discuss at lists.alcf.anl.gov
> Sent: Friday, February 7, 2014 3:46:34 AM
> Subject: Re: [Llvm-bgq-discuss] Status of OpenMP support in BG/Q
> 
> Since a couple of people asked me privately, here are the results for
> everyone. Please note however, that my previous statement on the
> compiler performance difference (clang being only 5% slower than
> IBM) does not hold up anymore: unfortunately I did have some old
> results in mind, but the numbers in this email are the most up to
> date ones as obtained by one of my students in the past two month.
> It now seems like clang is up to 20% slower than IBM's XL compiler,
> depending on the compiler settings:
> 
> ZFS/Finite Volume kernel (IBM/clang):
> production: 971s / 955s (-1.6%)
> extreme: 873s / 977s (+11.8%)
> 
> ZFS/Lattice Boltzmann kernel (IBM/clang):
> production: 107s / 117s (+10%)
> extreme: 97s / 118s) (+21%)

Thanks for sharing these! Are these the 'extreme' numbers (or just best of each)?

If we've really had a significant slowdown in the latest release, I would like to understand that. I did disable use of type-based aliasing information during instruction scheduling in the latest build (because stress testing revealed correctness issues), and that could cause a slow down. Hopefully, I'll be able to re-enable that soon. However, if there is some other cause, I'd like to know about it.

If it would be possible for you to provide me with the kernels, or even just the IR and assmembly dumps (run with -S -emit-llvm and just -S) that would help a lot. We can discuss this off list.

> 
> The baseline is always the IBM compiler. All results were obtained
> using a single thread on BG/Q "JUQUEEN" at FZ Juelich between Nov.
> '13 and Jan. '14. The measurements were taken from our inner loops
> (total inner loop time > 100s), without any substantial I/O taking
> place (I/O time is less than 0.1%). These are the compiler flags
> used for the two build types "production" and "extreme":
> 
> mpixlcxx/*: -qarch=qp -qtune=qp -qmaxmem=-1 -qreport -qlist
> -qlanglvl=variadictemplates
> mpixlcxx/production:  -g -O2
> mpixlcxx/extreme: -O5 -qstrict
> 
> mpibgclang++/*: -std=c++11 -stdlib=libc++  -O3 -DNDEBUG -mtune=native
> -fvectorize -fslp-vectorize

-mtune=native seems wrong in general here: you're cross compiling! (I don't think it does anything with bgclang, but if it did, if would cause you to optimize for the P7 and not the A2, and the instruction scheduling is very different for those two cores).

> mpibgclang++/production:  -g
> mpibgclang++/extreme:  -fstrict-aliasing -fslp-vectorize-aggressive
> -fno-rtti -fno-exceptions -fomit-frame-pointer
> 
> Of course these numbers are not a perfect measurement: only one
> sample is not enough, the compiler flags could use some tuning etc.
> However, they might offer an initial insight for others who are
> thinking about switching completely.

[To everyone:]

In general, I don't recommend that anyone switch without benchmarking ;)

In my experience, the performance of bgclang vs. xl has been steadily improving, but there is substantial variance. xl's performance has also improved substantially over the last year or so on some benchmarks (and decreased on others -- they also have a large variance).

Generally speaking, to make an unscientific statement, xl outperforms bgclang on somewhere around 3 out of every 4 benchmark kernels (which, also seems to be about what you're seeing). Because of the way that clang internally deals with pointer aliasing information, which I'm working on improving, I suspect it will be another year or so before we can really seriously compete on many of these things, especially on an in-order core like the A2. 

That having been said, having actual kernels that people cause about to tune against will help a lot.

Thanks again,
Hal

> 
> Regards,
> 
> Michael
> 
> 
> 
> 
> ------------------------------------------------------------------------------------------------
> ------------------------------------------------------------------------------------------------
> Forschungszentrum Juelich GmbH
> 52425 Juelich
> Sitz der Gesellschaft: Juelich
> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
> Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
> Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
> Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
> Prof. Dr. Sebastian M. Schmidt
> ------------------------------------------------------------------------------------------------
> ------------------------------------------------------------------------------------------------
> 
> _______________________________________________
> llvm-bgq-discuss mailing list
> llvm-bgq-discuss at lists.alcf.anl.gov
> https://lists.alcf.anl.gov/mailman/listinfo/llvm-bgq-discuss
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory


More information about the llvm-bgq-discuss mailing list