[Llvm-bgq-discuss] clang on BGQ performance

Biddiscombe, John A. biddisco at cscs.ch
Tue Mar 25 08:57:56 CDT 2014


Dear people

I'd had terrible performance of my application which is intended to run on IO nodes, so I've been poking around to try to find out what might be wrong.

Today I compiled a simple stream memory writing test from http://www.cs.virginia.edu/stream/FTP/Code/
I've run it using openmp threads up to 60, (because for reasons I don't understand, the IO node only shows 15*4 threads)

The results for bgclang seem to echo what I've been finding with my code. I have not tested my stuff fully with gcc as I only just got that installed recently.

Any advice on what I might try to improve the bgclang numbers? in some cases gcc looks 2x better.

Note that my program doesn't use openmp so I don't directly care much about this particular example, but the trend mirrors what I'm seeing with HPX threads

thanks

JB

using bgclang version 20140309

export OMP_NUM_THREADS=1
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:             659.5     0.242635     0.242601     0.242724
Scale:            536.2     0.298403     0.298376     0.298535
Add:              828.5     0.289701     0.289669     0.289839
Triad:            711.8     0.337206     0.337151     0.337325
-------------------------------------------------------------
export OMP_NUM_THREADS=2
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:            1318.8     0.121335     0.121322     0.121360
Scale:           1072.5     0.149223     0.149185     0.149375
Add:             1657.2     0.144868     0.144823     0.145036
Triad:           1423.8     0.168611     0.168565     0.168755
-------------------------------------------------------------
export OMP_NUM_THREADS=4
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:            2636.4     0.060729     0.060688     0.060919
Scale:           2236.9     0.071580     0.071529     0.071774
Add:             3311.2     0.072555     0.072482     0.072750
Triad:           2845.6     0.084426     0.084341     0.084540
-------------------------------------------------------------
export OMP_NUM_THREADS=8
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:            5265.6     0.030446     0.030386     0.030614
Scale:           4468.1     0.035848     0.035809     0.036030
Add:             6611.9     0.036341     0.036298     0.036526
Triad:           5684.9     0.042258     0.042217     0.042420
-------------------------------------------------------------
export OMP_NUM_THREADS=16
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:            9390.8     0.018977     0.017038     0.025704
Scale:           7688.2     0.021786     0.020811     0.029255
Add:            11985.7     0.020990     0.020024     0.028394
Triad:          10875.0     0.023131     0.022069     0.031470
-------------------------------------------------------------
export OMP_NUM_THREADS=32
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           15556.4     0.011463     0.010285     0.012906
Scale:          13361.1     0.013228     0.011975     0.014883
Add:            20438.0     0.012872     0.011743     0.014259
Triad:          18047.8     0.014270     0.013298     0.016016
-------------------------------------------------------------
export OMP_NUM_THREADS=60
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           11472.0     0.016570     0.013947     0.022287
Scale:          10145.1     0.019031     0.015771     0.028346
Add:            15317.9     0.018322     0.015668     0.025756
Triad:          14106.8     0.018959     0.017013     0.025986
-------------------------------------------------------------

using GCC 4.8.2
export OMP_NUM_THREADS=1
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:            3534.4     0.045289     0.045270     0.045306
Scale:           1318.8     0.121390     0.121325     0.121632
Add:             1899.0     0.126403     0.126384     0.126428
Triad:           1910.3     0.125667     0.125637     0.125724
-------------------------------------------------------------
export OMP_NUM_THREADS=2
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:            7053.2     0.022716     0.022685     0.022744
Scale:           2613.9     0.061247     0.061211     0.061278
Add:             3794.3     0.063271     0.063252     0.063292
Triad:           3794.4     0.063288     0.063251     0.063449
-------------------------------------------------------------
export OMP_NUM_THREADS=4
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           13999.4     0.011470     0.011429     0.011494
Scale:           5218.5     0.030683     0.030660     0.030729
Add:             7585.3     0.031647     0.031640     0.031681
Triad:           7583.4     0.031663     0.031648     0.031690
-------------------------------------------------------------
export OMP_NUM_THREADS=8
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           25910.8     0.006205     0.006175     0.006233
Scale:          10432.9     0.015373     0.015336     0.015484
Add:            15130.5     0.015922     0.015862     0.016092
Triad:          15116.2     0.015971     0.015877     0.016139
-------------------------------------------------------------
export OMP_NUM_THREADS=16
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           28433.5     0.005643     0.005627     0.005665
Scale:          20547.1     0.007831     0.007787     0.007860
Add:            27006.3     0.008922     0.008887     0.008948
Triad:          27758.5     0.008658     0.008646     0.008672
-------------------------------------------------------------
export OMP_NUM_THREADS=32
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           28368.6     0.005673     0.005640     0.005742
Scale:          26302.8     0.006115     0.006083     0.006175
Add:            27164.4     0.008878     0.008835     0.008960
Triad:          27691.3     0.008702     0.008667     0.008744
-------------------------------------------------------------
export OMP_NUM_THREADS=60
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           25715.2     0.008484     0.006222     0.012176
Scale:          22472.2     0.012979     0.007120     0.021724
Add:            25319.6     0.014178     0.009479     0.023234
Triad:          25591.9     0.013839     0.009378     0.023146
-------------------------------------------------------------



--
John Biddiscombe,                        email:biddisco @.at.@ cscs.ch
http://www.cscs.ch/
CSCS, Swiss National Supercomputing Centre  | Tel:  +41 (91) 610.82.07
Via Trevano 131, 6900 Lugano, Switzerland   | Fax:  +41 (91) 610.82.82

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alcf.anl.gov/pipermail/llvm-bgq-discuss/attachments/20140325/a631a5ec/attachment-0001.html>


More information about the llvm-bgq-discuss mailing list