[Llvm-bgq-discuss] clang on BGQ performance
Biddiscombe, John A.
biddisco at cscs.ch
Tue Mar 25 12:24:48 CDT 2014
Hal,
Interesting. (One clarification. My own code doesn't use openmp at all - so my own slow application must be just poor thread scheduling/placement/contention).
JB
> -----Original Message-----
> From: Hal Finkel [mailto:hfinkel at anl.gov]
> Sent: 25 March 2014 18:03
> To: Biddiscombe, John A.
> Cc: llvm-bgq-discuss at lists.alcf.anl.gov
> Subject: Re: [Llvm-bgq-discuss] clang on BGQ performance
>
> John,
>
> Thanks for looking into this (and providing a useful benchmark)! You'll find
> this interesting:
>
> bgclang -O3 -fopenmp with 1 thread:
>
> Function Best Rate MB/s Avg time Min time Max time
> Copy: 635.7 0.251708 0.251708 0.251709
> Scale: 519.7 0.307855 0.307855 0.307856
> Add: 802.0 0.299267 0.299266 0.299267
> Triad: 753.4 0.318716 0.318566 0.318735
>
> gcc 4.7.2 -O3 -fopenmp with 1 thread:
>
> Function Best Rate MB/s Avg time Min time Max time
> Copy: 2067.4 0.077393 0.077392 0.077395
> Scale: 1329.4 0.120353 0.120353 0.120354
> Add: 1943.5 0.123490 0.123489 0.123490
> Triad: 1872.4 0.128179 0.128178 0.128179
>
> gcc without OpenMP is actually slightly worse, go figure ;)
>
> bgclang -O3 with 1 thread (with no -fopenmp)
>
> Function Best Rate MB/s Avg time Min time Max time
> Copy: 15660.2 0.010296 0.010217 0.010870
> Scale: 5523.7 0.028967 0.028966 0.028967
> Add: 6283.2 0.038198 0.038197 0.038198
> Triad: 6331.9 0.037906 0.037903 0.037920
>
> bgxlc_r -O3 -qsmp=omp with 1 thread:
>
> Function Best Rate MB/s Avg time Min time Max time
> Copy: 3762.0 0.042535 0.042531 0.042538
> Scale: 5083.5 0.031481 0.031474 0.031494
> Add: 7394.2 0.032487 0.032458 0.032510
> Triad: 7397.6 0.032481 0.032443 0.032499
>
> bgxlc_r -O3 (no -qsmp=omp) with 1 thread:
>
> Function Best Rate MB/s Avg time Min time Max time
> Copy: 3574.1 0.044768 0.044767 0.044769
> Scale: 3301.2 0.048468 0.048467 0.048469
> Add: 4233.2 0.056696 0.056694 0.056699
> Triad: 4350.1 0.055173 0.055171 0.055177
>
> all of these defined TUNED (just because it puts the kernels into separate
> functions). It seems that the OpenMP outlining in Clang/LLVM is seriously
> interfering with the ability of the vectorizer and instruction scheduler to do
> useful work. I assume that most of this is because of pointer aliasing
> information being lost in the OpenMP transformation. We'll need to work on
> this! (I'm actually in the middle of working on a new pointer aliasing
> framework for LLVM, and I'll be able to use that to solve a lot of these
> issues).
>
> -Hal
>
> ----- Original Message -----
> > From: "John A. Biddiscombe" <biddisco at cscs.ch>
> > To: "Hal Finkel" <hfinkel at anl.gov>
> > Cc: llvm-bgq-discuss at lists.alcf.anl.gov
> > Sent: Tuesday, March 25, 2014 11:44:50 AM
> > Subject: RE: [Llvm-bgq-discuss] clang on BGQ performance
> >
> > > Can you please provide details on exactly what you did? What compile
> > > flags did you use, did you define TUNED?
> >
> > edited Makefile to skip the fortran and set bgclang vars
> >
> > bbpbgas040:~/bgas/clang/build/stream$ cat Makefile
> >
> > CC = bgclang
> > CFLAGS = -O3 -fopenmp
> > -L/gpfs/bbp.cscs.ch/home/biddisco/apps/clang/bgclang/omp/lib/
> >
> > all: stream_c.exe
> >
> > stream_c.exe: stream.c
> > $(CC) $(CFLAGS) stream.c -o stream_c.exe
> >
> > clean:
> > rm -f stream_c.exe *.o
> >
> >
> > then just a make. I didn't set any other vars (like TUNED etc)
> >
> >
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
More information about the llvm-bgq-discuss
mailing list