[Llvm-bgq-discuss] clang on BGQ performance
Hal Finkel
hfinkel at anl.gov
Tue Mar 25 12:02:59 CDT 2014
John,
Thanks for looking into this (and providing a useful benchmark)! You'll find this interesting:
bgclang -O3 -fopenmp with 1 thread:
Function Best Rate MB/s Avg time Min time Max time
Copy: 635.7 0.251708 0.251708 0.251709
Scale: 519.7 0.307855 0.307855 0.307856
Add: 802.0 0.299267 0.299266 0.299267
Triad: 753.4 0.318716 0.318566 0.318735
gcc 4.7.2 -O3 -fopenmp with 1 thread:
Function Best Rate MB/s Avg time Min time Max time
Copy: 2067.4 0.077393 0.077392 0.077395
Scale: 1329.4 0.120353 0.120353 0.120354
Add: 1943.5 0.123490 0.123489 0.123490
Triad: 1872.4 0.128179 0.128178 0.128179
gcc without OpenMP is actually slightly worse, go figure ;)
bgclang -O3 with 1 thread (with no -fopenmp)
Function Best Rate MB/s Avg time Min time Max time
Copy: 15660.2 0.010296 0.010217 0.010870
Scale: 5523.7 0.028967 0.028966 0.028967
Add: 6283.2 0.038198 0.038197 0.038198
Triad: 6331.9 0.037906 0.037903 0.037920
bgxlc_r -O3 -qsmp=omp with 1 thread:
Function Best Rate MB/s Avg time Min time Max time
Copy: 3762.0 0.042535 0.042531 0.042538
Scale: 5083.5 0.031481 0.031474 0.031494
Add: 7394.2 0.032487 0.032458 0.032510
Triad: 7397.6 0.032481 0.032443 0.032499
bgxlc_r -O3 (no -qsmp=omp) with 1 thread:
Function Best Rate MB/s Avg time Min time Max time
Copy: 3574.1 0.044768 0.044767 0.044769
Scale: 3301.2 0.048468 0.048467 0.048469
Add: 4233.2 0.056696 0.056694 0.056699
Triad: 4350.1 0.055173 0.055171 0.055177
all of these defined TUNED (just because it puts the kernels into separate functions). It seems that the OpenMP outlining in Clang/LLVM is seriously interfering with the ability of the vectorizer and instruction scheduler to do useful work. I assume that most of this is because of pointer aliasing information being lost in the OpenMP transformation. We'll need to work on this! (I'm actually in the middle of working on a new pointer aliasing framework for LLVM, and I'll be able to use that to solve a lot of these issues).
-Hal
----- Original Message -----
> From: "John A. Biddiscombe" <biddisco at cscs.ch>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: llvm-bgq-discuss at lists.alcf.anl.gov
> Sent: Tuesday, March 25, 2014 11:44:50 AM
> Subject: RE: [Llvm-bgq-discuss] clang on BGQ performance
>
> > Can you please provide details on exactly what you did? What
> > compile flags
> > did you use, did you define TUNED?
>
> edited Makefile to skip the fortran and set bgclang vars
>
> bbpbgas040:~/bgas/clang/build/stream$ cat Makefile
>
> CC = bgclang
> CFLAGS = -O3 -fopenmp
> -L/gpfs/bbp.cscs.ch/home/biddisco/apps/clang/bgclang/omp/lib/
>
> all: stream_c.exe
>
> stream_c.exe: stream.c
> $(CC) $(CFLAGS) stream.c -o stream_c.exe
>
> clean:
> rm -f stream_c.exe *.o
>
>
> then just a make. I didn't set any other vars (like TUNED etc)
>
>
--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
More information about the llvm-bgq-discuss
mailing list