[Llvm-bgq-discuss] clang on BGQ performance

Biddiscombe, John A. biddisco at cscs.ch
Tue Mar 25 10:25:29 CDT 2014


Tom

Well, I’m not using openMP myself, I am using HPX which has its own thread scheduling (Thomas Heller reads this list and knows the details).

My best results so far have been obtained using a commandline which passes some location setting via hwloc


bin/H5FDdsmRaw_bandwidth_rw --hpx:print-bind --hpx:threads=15 --hpx:bind=thread:0-14=socket:0-14 2048 Block 16777216 VirtualRAM

here I’m attempting to place one thread on each of the 15 cpus that I can see with hwloc. Now If there’s a way I can avoid the IOnode services which are running (for example there are always 2xbgvrnic processes running consuming 2x100% cpu - these are servicing io requests from CNK I assume).

I was planning on asking that exact question to the IBM contacts here to see if they know how to skip the cores that the services are using (if just one). the problem is that hwloc doesn’t seem to give the correct results either so I’m experimenting a bit.

I just looked in my email from last week and I see that for bgvrnic “If there's no communication with the compute nodes, they are "just" spin-waiting and shouldn't have an impact - unless you get processes scheduled onto the same core (i.e. CPU 56-59).”.
you mention that they are running on 66/67 - is it possible to reconcile these numbers by taking into account a different counting method? (i.e, not including some)

JB




From: Thomas Gooding [mailto:tgooding at us.ibm.com]
Sent: 25 March 2014 15:58
To: Biddiscombe, John A.
Cc: llvm-bgq-discuss at lists.alcf.anl.gov
Subject: Re: [Llvm-bgq-discuss] clang on BGQ performance


John,

ionodes have 68 hwthreads available, however there are a few services running on the ionode that will take CPU.  Core 0 takes PCIe interrupts (impacts performance on "cpus" 0-3) and bgvrnic takes cpus 66 and 67.  I'm not sure how clang's OMP binds software threads to cpus - - maybe there's a way to avoid those cpus.

I assume you're seeing this (lack of) performance only with the OpenMP builds?

Tom

Tom Gooding
Senior Engineer / Blue Gene SW Lead / CAPI
tgooding at us.ibm.com<mailto:tgooding at us.ibm.com>   507-253-0747


[Inactive hide details for "Biddiscombe, John A." ---03/25/2014 08:58:04 AM---Dear people I'd had terrible performance of my app]"Biddiscombe, John A." ---03/25/2014 08:58:04 AM---Dear people I'd had terrible performance of my application which is intended to run on IO nodes, so

From:


"Biddiscombe, John A." <biddisco at cscs.ch<mailto:biddisco at cscs.ch>>


To:


"llvm-bgq-discuss at lists.alcf.anl.gov<mailto:llvm-bgq-discuss at lists.alcf.anl.gov>" <llvm-bgq-discuss at lists.alcf.anl.gov<mailto:llvm-bgq-discuss at lists.alcf.anl.gov>>


Date:


03/25/2014 08:58 AM


Subject:


[Llvm-bgq-discuss] clang on BGQ performance


Sent by:


llvm-bgq-discuss-bounces at lists.alcf.anl.gov<mailto:llvm-bgq-discuss-bounces at lists.alcf.anl.gov>

________________________________



Dear people

I’d had terrible performance of my application which is intended to run on IO nodes, so I’ve been poking around to try to find out what might be wrong.

Today I compiled a simple stream memory writing test from http://www.cs.virginia.edu/stream/FTP/Code/
I’ve run it using openmp threads up to 60, (because for reasons I don’t understand, the IO node only shows 15*4 threads)

The results for bgclang seem to echo what I’ve been finding with my code. I have not tested my stuff fully with gcc as I only just got that installed recently.

Any advice on what I might try to improve the bgclang numbers? in some cases gcc looks 2x better.

Note that my program doesn’t use openmp so I don’t directly care much about this particular example, but the trend mirrors what I’m seeing with HPX threads

thanks

JB

using bgclang version 20140309

export OMP_NUM_THREADS=1
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:             659.5     0.242635     0.242601     0.242724
Scale:            536.2     0.298403     0.298376     0.298535
Add:              828.5     0.289701     0.289669     0.289839
Triad:            711.8     0.337206     0.337151     0.337325
-------------------------------------------------------------
export OMP_NUM_THREADS=2
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:            1318.8     0.121335     0.121322     0.121360
Scale:           1072.5     0.149223     0.149185     0.149375
Add:             1657.2     0.144868     0.144823     0.145036
Triad:           1423.8     0.168611     0.168565     0.168755
-------------------------------------------------------------
export OMP_NUM_THREADS=4
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:            2636.4     0.060729     0.060688     0.060919
Scale:           2236.9     0.071580     0.071529     0.071774
Add:             3311.2     0.072555     0.072482     0.072750
Triad:           2845.6     0.084426     0.084341     0.084540
-------------------------------------------------------------
export OMP_NUM_THREADS=8
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:            5265.6     0.030446     0.030386     0.030614
Scale:           4468.1     0.035848     0.035809     0.036030
Add:             6611.9     0.036341     0.036298     0.036526
Triad:           5684.9     0.042258     0.042217     0.042420
-------------------------------------------------------------
export OMP_NUM_THREADS=16
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:            9390.8     0.018977     0.017038     0.025704
Scale:           7688.2     0.021786     0.020811     0.029255
Add:            11985.7     0.020990     0.020024     0.028394
Triad:          10875.0     0.023131     0.022069     0.031470
-------------------------------------------------------------
export OMP_NUM_THREADS=32
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           15556.4     0.011463     0.010285     0.012906
Scale:          13361.1     0.013228     0.011975     0.014883
Add:            20438.0     0.012872     0.011743     0.014259
Triad:          18047.8     0.014270     0.013298     0.016016
-------------------------------------------------------------
export OMP_NUM_THREADS=60
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           11472.0     0.016570     0.013947     0.022287
Scale:          10145.1     0.019031     0.015771     0.028346
Add:            15317.9     0.018322     0.015668     0.025756
Triad:          14106.8     0.018959     0.017013     0.025986
-------------------------------------------------------------

using GCC 4.8.2
export OMP_NUM_THREADS=1
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:            3534.4     0.045289     0.045270     0.045306
Scale:           1318.8     0.121390     0.121325     0.121632
Add:             1899.0     0.126403     0.126384     0.126428
Triad:           1910.3     0.125667     0.125637     0.125724
-------------------------------------------------------------
export OMP_NUM_THREADS=2
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:            7053.2     0.022716     0.022685     0.022744
Scale:           2613.9     0.061247     0.061211     0.061278
Add:             3794.3     0.063271     0.063252     0.063292
Triad:           3794.4     0.063288     0.063251     0.063449
-------------------------------------------------------------
export OMP_NUM_THREADS=4
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           13999.4     0.011470     0.011429     0.011494
Scale:           5218.5     0.030683     0.030660     0.030729
Add:             7585.3     0.031647     0.031640     0.031681
Triad:           7583.4     0.031663     0.031648     0.031690
-------------------------------------------------------------
export OMP_NUM_THREADS=8
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           25910.8     0.006205     0.006175     0.006233
Scale:          10432.9     0.015373     0.015336     0.015484
Add:            15130.5     0.015922     0.015862     0.016092
Triad:          15116.2     0.015971     0.015877     0.016139
-------------------------------------------------------------
export OMP_NUM_THREADS=16
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           28433.5     0.005643     0.005627     0.005665
Scale:          20547.1     0.007831     0.007787     0.007860
Add:            27006.3     0.008922     0.008887     0.008948
Triad:          27758.5     0.008658     0.008646     0.008672
-------------------------------------------------------------
export OMP_NUM_THREADS=32
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           28368.6     0.005673     0.005640     0.005742
Scale:          26302.8     0.006115     0.006083     0.006175
Add:            27164.4     0.008878     0.008835     0.008960
Triad:          27691.3     0.008702     0.008667     0.008744
-------------------------------------------------------------
export OMP_NUM_THREADS=60
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           25715.2     0.008484     0.006222     0.012176
Scale:          22472.2     0.012979     0.007120     0.021724
Add:            25319.6     0.014178     0.009479     0.023234
Triad:          25591.9     0.013839     0.009378     0.023146
-------------------------------------------------------------



--
John Biddiscombe,                        email:biddisco @.at.@ cscs.ch
http://www.cscs.ch/
CSCS, Swiss National Supercomputing Centre  | Tel:  +41 (91) 610.82.07
Via Trevano 131, 6900 Lugano, Switzerland   | Fax:  +41 (91) 610.82.82
 _______________________________________________
llvm-bgq-discuss mailing list
llvm-bgq-discuss at lists.alcf.anl.gov<mailto:llvm-bgq-discuss at lists.alcf.anl.gov>
https://lists.alcf.anl.gov/mailman/listinfo/llvm-bgq-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alcf.anl.gov/pipermail/llvm-bgq-discuss/attachments/20140325/8de8f5f9/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 105 bytes
Desc: image001.gif
URL: <http://lists.alcf.anl.gov/pipermail/llvm-bgq-discuss/attachments/20140325/8de8f5f9/attachment-0001.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image003.png
Type: image/png
Size: 168 bytes
Desc: image003.png
URL: <http://lists.alcf.anl.gov/pipermail/llvm-bgq-discuss/attachments/20140325/8de8f5f9/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image004.png
Type: image/png
Size: 166 bytes
Desc: image004.png
URL: <http://lists.alcf.anl.gov/pipermail/llvm-bgq-discuss/attachments/20140325/8de8f5f9/attachment-0003.png>


More information about the llvm-bgq-discuss mailing list