[Llvm-bgq-discuss] clang on BGQ performance

Thomas Gooding tgooding at us.ibm.com
Tue Mar 25 09:58:03 CDT 2014


John,

ionodes have 68 hwthreads available, however there are a few services
running on the ionode that will take CPU.  Core 0 takes PCIe interrupts
(impacts performance on "cpus" 0-3) and bgvrnic takes cpus 66 and 67.  I'm
not sure how clang's OMP binds software threads to cpus - - maybe there's a
way to avoid those cpus.

I assume you're seeing this (lack of) performance only with the OpenMP
builds?

Tom

Tom Gooding
Senior Engineer / Blue Gene SW Lead / CAPI
tgooding at us.ibm.com   507-253-0747



|------------>
| From:      |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |"Biddiscombe, John A." <biddisco at cscs.ch>                                                                                                         |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| To:        |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |"llvm-bgq-discuss at lists.alcf.anl.gov" <llvm-bgq-discuss at lists.alcf.anl.gov>                                                                       |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Date:      |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |03/25/2014 08:58 AM                                                                                                                               |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Subject:   |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |[Llvm-bgq-discuss] clang on BGQ performance                                                                                                       |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Sent by:   |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |llvm-bgq-discuss-bounces at lists.alcf.anl.gov                                                                                                       |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|





Dear people

I’d had terrible performance of my application which is intended to run on
IO nodes, so I’ve been poking around to try to find out what might be
wrong.

Today I compiled a simple stream memory writing test from
http://www.cs.virginia.edu/stream/FTP/Code/
I’ve run it using openmp threads up to 60, (because for reasons I don’t
understand, the IO node only shows 15*4 threads)

The results for bgclang seem to echo what I’ve been finding with my code. I
have not tested my stuff fully with gcc as I only just got that installed
recently.

Any advice on what I might try to improve the bgclang numbers? in some
cases gcc looks 2x better.

Note that my program doesn’t use openmp so I don’t directly care much about
this particular example, but the trend mirrors what I’m seeing with HPX
threads

thanks

JB

using bgclang version 20140309

export OMP_NUM_THREADS=1
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:             659.5     0.242635     0.242601     0.242724
Scale:            536.2     0.298403     0.298376     0.298535
Add:              828.5     0.289701     0.289669     0.289839
Triad:            711.8     0.337206     0.337151     0.337325
-------------------------------------------------------------
export OMP_NUM_THREADS=2
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:            1318.8     0.121335     0.121322     0.121360
Scale:           1072.5     0.149223     0.149185     0.149375
Add:             1657.2     0.144868     0.144823     0.145036
Triad:           1423.8     0.168611     0.168565     0.168755
-------------------------------------------------------------
export OMP_NUM_THREADS=4
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:            2636.4     0.060729     0.060688     0.060919
Scale:           2236.9     0.071580     0.071529     0.071774
Add:             3311.2     0.072555     0.072482     0.072750
Triad:           2845.6     0.084426     0.084341     0.084540
-------------------------------------------------------------
export OMP_NUM_THREADS=8
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:            5265.6     0.030446     0.030386     0.030614
Scale:           4468.1     0.035848     0.035809     0.036030
Add:             6611.9     0.036341     0.036298     0.036526
Triad:           5684.9     0.042258     0.042217     0.042420
-------------------------------------------------------------
export OMP_NUM_THREADS=16
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:            9390.8     0.018977     0.017038     0.025704
Scale:           7688.2     0.021786     0.020811     0.029255
Add:            11985.7     0.020990     0.020024     0.028394
Triad:          10875.0     0.023131     0.022069     0.031470
-------------------------------------------------------------
export OMP_NUM_THREADS=32
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           15556.4     0.011463     0.010285     0.012906
Scale:          13361.1     0.013228     0.011975     0.014883
Add:            20438.0     0.012872     0.011743     0.014259
Triad:          18047.8     0.014270     0.013298     0.016016
-------------------------------------------------------------
export OMP_NUM_THREADS=60
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           11472.0     0.016570     0.013947     0.022287
Scale:          10145.1     0.019031     0.015771     0.028346
Add:            15317.9     0.018322     0.015668     0.025756
Triad:          14106.8     0.018959     0.017013     0.025986
-------------------------------------------------------------

using GCC 4.8.2
export OMP_NUM_THREADS=1
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:            3534.4     0.045289     0.045270     0.045306
Scale:           1318.8     0.121390     0.121325     0.121632
Add:             1899.0     0.126403     0.126384     0.126428
Triad:           1910.3     0.125667     0.125637     0.125724
-------------------------------------------------------------
export OMP_NUM_THREADS=2
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:            7053.2     0.022716     0.022685     0.022744
Scale:           2613.9     0.061247     0.061211     0.061278
Add:             3794.3     0.063271     0.063252     0.063292
Triad:           3794.4     0.063288     0.063251     0.063449
-------------------------------------------------------------
export OMP_NUM_THREADS=4
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           13999.4     0.011470     0.011429     0.011494
Scale:           5218.5     0.030683     0.030660     0.030729
Add:             7585.3     0.031647     0.031640     0.031681
Triad:           7583.4     0.031663     0.031648     0.031690
-------------------------------------------------------------
export OMP_NUM_THREADS=8
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           25910.8     0.006205     0.006175     0.006233
Scale:          10432.9     0.015373     0.015336     0.015484
Add:            15130.5     0.015922     0.015862     0.016092
Triad:          15116.2     0.015971     0.015877     0.016139
-------------------------------------------------------------
export OMP_NUM_THREADS=16
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           28433.5     0.005643     0.005627     0.005665
Scale:          20547.1     0.007831     0.007787     0.007860
Add:            27006.3     0.008922     0.008887     0.008948
Triad:          27758.5     0.008658     0.008646     0.008672
-------------------------------------------------------------
export OMP_NUM_THREADS=32
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           28368.6     0.005673     0.005640     0.005742
Scale:          26302.8     0.006115     0.006083     0.006175
Add:            27164.4     0.008878     0.008835     0.008960
Triad:          27691.3     0.008702     0.008667     0.008744
-------------------------------------------------------------
export OMP_NUM_THREADS=60
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           25715.2     0.008484     0.006222     0.012176
Scale:          22472.2     0.012979     0.007120     0.021724
Add:            25319.6     0.014178     0.009479     0.023234
Triad:          25591.9     0.013839     0.009378     0.023146
-------------------------------------------------------------



--
John Biddiscombe,                        email:biddisco @.at.@ cscs.ch
http://www.cscs.ch/
CSCS, Swiss National Supercomputing Centre  | Tel:  +41 (91) 610.82.07
Via Trevano 131, 6900 Lugano, Switzerland   | Fax:  +41 (91) 610.82.82
 _______________________________________________
llvm-bgq-discuss mailing list
llvm-bgq-discuss at lists.alcf.anl.gov
https://lists.alcf.anl.gov/mailman/listinfo/llvm-bgq-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alcf.anl.gov/pipermail/llvm-bgq-discuss/attachments/20140325/26f174d2/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://lists.alcf.anl.gov/pipermail/llvm-bgq-discuss/attachments/20140325/26f174d2/attachment-0002.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://lists.alcf.anl.gov/pipermail/llvm-bgq-discuss/attachments/20140325/26f174d2/attachment-0003.gif>


More information about the llvm-bgq-discuss mailing list