[Llvm-bgq-discuss] clang on BGQ performance

Thomas Gooding tgooding at us.ibm.com
Tue Mar 25 12:58:36 CDT 2014


bgclang's non-OMP good COPY performance is due to an implicit call to
memcpy(), which is QPX optimized.  (see previous thread about built-ins  ;)

If it helps, my sampling profiler has this breakdown of the -fopenmp.
   thread  0 count=  772 (19.30%)   ..omp_microtask.37
   thread  0 count=  748 (18.70%)   ..omp_microtask.35
   thread  0 count=  717 (17.93%)   ..omp_microtask.36
   thread  0 count=  622 (15.55%)   ..omp_microtask.34
   thread  0 count=  102 (2.55%)    .checkSTREAMresults
   thread  0 count=   72 (1.80%)    ..omp_microtask.15
   thread  0 count=   44 (1.10%)    ..omp_microtask.12


runjob --strace 0 shows several calls to gettimeofday, but not much other
kernel activity.  So I suspect its spending its time in an OMP runtime
optimization opportunity.

Tom Gooding
Senior Engineer / Blue Gene SW Lead / CAPI
tgooding at us.ibm.com   507-253-0747



|------------>
| From:      |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Hal Finkel <hfinkel at anl.gov>                                                                                                                      |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| To:        |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |"John A. Biddiscombe" <biddisco at cscs.ch>                                                                                                          |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Cc:        |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |llvm-bgq-discuss at lists.alcf.anl.gov                                                                                                               |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Date:      |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |03/25/2014 12:03 PM                                                                                                                               |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Subject:   |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Re: [Llvm-bgq-discuss] clang on BGQ performance                                                                                                   |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Sent by:   |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |llvm-bgq-discuss-bounces at lists.alcf.anl.gov                                                                                                       |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|





John,

Thanks for looking into this (and providing a useful benchmark)! You'll
find this interesting:

bgclang -O3 -fopenmp with 1 thread:

Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:             635.7     0.251708     0.251708     0.251709
Scale:            519.7     0.307855     0.307855     0.307856
Add:              802.0     0.299267     0.299266     0.299267
Triad:            753.4     0.318716     0.318566     0.318735

gcc 4.7.2 -O3 -fopenmp with 1 thread:

Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:            2067.4     0.077393     0.077392     0.077395
Scale:           1329.4     0.120353     0.120353     0.120354
Add:             1943.5     0.123490     0.123489     0.123490
Triad:           1872.4     0.128179     0.128178     0.128179

gcc without OpenMP is actually slightly worse, go figure ;)

bgclang -O3 with 1 thread (with no -fopenmp)

Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           15660.2     0.010296     0.010217     0.010870
Scale:           5523.7     0.028967     0.028966     0.028967
Add:             6283.2     0.038198     0.038197     0.038198
Triad:           6331.9     0.037906     0.037903     0.037920

bgxlc_r -O3 -qsmp=omp with 1 thread:

Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:            3762.0     0.042535     0.042531     0.042538
Scale:           5083.5     0.031481     0.031474     0.031494
Add:             7394.2     0.032487     0.032458     0.032510
Triad:           7397.6     0.032481     0.032443     0.032499

bgxlc_r -O3 (no -qsmp=omp) with 1 thread:

Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:            3574.1     0.044768     0.044767     0.044769
Scale:           3301.2     0.048468     0.048467     0.048469
Add:             4233.2     0.056696     0.056694     0.056699
Triad:           4350.1     0.055173     0.055171     0.055177

all of these defined TUNED (just because it puts the kernels into separate
functions). It seems that the OpenMP outlining in Clang/LLVM is seriously
interfering with the ability of the vectorizer and instruction scheduler to
do useful work. I assume that most of this is because of pointer aliasing
information being lost in the OpenMP transformation. We'll need to work on
this! (I'm actually in the middle of working on a new pointer aliasing
framework for LLVM, and I'll be able to use that to solve a lot of these
issues).

 -Hal

----- Original Message -----
> From: "John A. Biddiscombe" <biddisco at cscs.ch>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: llvm-bgq-discuss at lists.alcf.anl.gov
> Sent: Tuesday, March 25, 2014 11:44:50 AM
> Subject: RE: [Llvm-bgq-discuss] clang on BGQ performance
>
> > Can you please provide details on exactly what you did? What
> > compile flags
> > did you use, did you define TUNED?
>
> edited Makefile to skip the fortran and set bgclang vars
>
> bbpbgas040:~/bgas/clang/build/stream$ cat Makefile
>
> CC = bgclang
> CFLAGS = -O3 -fopenmp
> -L/gpfs/bbp.cscs.ch/home/biddisco/apps/clang/bgclang/omp/lib/
>
> all:  stream_c.exe
>
> stream_c.exe: stream.c
>         $(CC) $(CFLAGS) stream.c -o stream_c.exe
>
> clean:
>         rm -f stream_c.exe *.o
>
>
> then just a make. I didn't set any other vars (like TUNED etc)
>
>

--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
_______________________________________________
llvm-bgq-discuss mailing list
llvm-bgq-discuss at lists.alcf.anl.gov
https://lists.alcf.anl.gov/mailman/listinfo/llvm-bgq-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alcf.anl.gov/pipermail/llvm-bgq-discuss/attachments/20140325/6b74148a/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://lists.alcf.anl.gov/pipermail/llvm-bgq-discuss/attachments/20140325/6b74148a/attachment-0002.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://lists.alcf.anl.gov/pipermail/llvm-bgq-discuss/attachments/20140325/6b74148a/attachment-0003.gif>


More information about the llvm-bgq-discuss mailing list