[Llvm-bgq-discuss] clang on BGQ performance

Jeff Hammond jhammond at anl.gov
Tue Mar 25 10:32:56 CDT 2014


And what happens if you turn off OpenMP at compile time?  I wonder if the
LLVM OpenMP runtime just sucks too much right now on BGQ.  Hal and I have
looked at it enough that I would believe this explanation.

Jeff

On Tue, Mar 25, 2014 at 10:25 AM, Biddiscombe, John A. <biddisco at cscs.ch>wrote:

>  Tom
>
>
>
> Well, I'm not using openMP myself, I am using HPX which has its own thread
> scheduling (Thomas Heller reads this list and knows the details).
>
>
>
> My best results so far have been obtained using a commandline which passes
> some location setting via hwloc
>
>
>
> bin/H5FDdsmRaw_bandwidth_rw --hpx:print-bind --hpx:threads=15
> --hpx:bind=thread:0-14=socket:0-14 2048 Block 16777216 VirtualRAM
>
>
>
> here I'm attempting to place one thread on each of the 15 cpus that I can
> see with hwloc. Now If there's a way I can avoid the IOnode services which
> are running (for example there are always 2xbgvrnic processes running
> consuming 2x100% cpu - these are servicing io requests from CNK I assume).
>
>
>
> I was planning on asking that exact question to the IBM contacts here to
> see if they know how to skip the cores that the services are using (if just
> one). the problem is that hwloc doesn't seem to give the correct results
> either so I'm experimenting a bit.
>
>
>
> I just looked in my email from last week and I see that for bgvrnic "If
> there's no communication with the compute nodes, they are "just"
> spin-waiting and shouldn't have an impact - unless you get processes
> scheduled onto the same core (i.e. CPU 56-59).".
>
> you mention that they are running on 66/67 - is it possible to reconcile
> these numbers by taking into account a different counting method? (i.e, not
> including some)
>
>
>
> JB
>
>
>
>
>
>
>
>
>
> *From:* Thomas Gooding [mailto:tgooding at us.ibm.com]
> *Sent:* 25 March 2014 15:58
> *To:* Biddiscombe, John A.
> *Cc:* llvm-bgq-discuss at lists.alcf.anl.gov
> *Subject:* Re: [Llvm-bgq-discuss] clang on BGQ performance
>
>
>
> John,
>
> ionodes have 68 hwthreads available, however there are a few services
> running on the ionode that will take CPU.  Core 0 takes PCIe interrupts
> (impacts performance on "cpus" 0-3) and bgvrnic takes cpus 66 and 67.  I'm
> not sure how clang's OMP binds software threads to cpus - - maybe there's a
> way to avoid those cpus.
>
> I assume you're seeing this (lack of) performance only with the OpenMP
> builds?
>
> Tom
>
> Tom Gooding
> Senior Engineer / Blue Gene SW Lead / CAPI
> tgooding at us.ibm.com   507-253-0747
>
>
> [image: Inactive hide details for "Biddiscombe, John A." ---03/25/2014
> 08:58:04 AM---Dear people I'd had terrible performance of my app]"Biddiscombe,
> John A." ---03/25/2014 08:58:04 AM---Dear people I'd had terrible
> performance of my application which is intended to run on IO nodes, so
>
>  From:
>
>
> "Biddiscombe, John A." <biddisco at cscs.ch>
>
>  To:
>
>
> "llvm-bgq-discuss at lists.alcf.anl.gov" <llvm-bgq-discuss at lists.alcf.anl.gov
> >
>
>  Date:
>
>
> 03/25/2014 08:58 AM
>
>  Subject:
>
>
> [Llvm-bgq-discuss] clang on BGQ performance
>
>  Sent by:
>
>
> llvm-bgq-discuss-bounces at lists.alcf.anl.gov
>    ------------------------------
>
>
>
>
> Dear people
>
> I'd had terrible performance of my application which is intended to run on
> IO nodes, so I've been poking around to try to find out what might be wrong.
>
> Today I compiled a simple stream memory writing test from
> http://www.cs.virginia.edu/stream/FTP/Code/
> I've run it using openmp threads up to 60, (because for reasons I don't
> understand, the IO node only shows 15*4 threads)
>
> The results for bgclang seem to echo what I've been finding with my code.
> I have not tested my stuff fully with gcc as I only just got that installed
> recently.
>
> Any advice on what I might try to improve the bgclang numbers? in some
> cases gcc looks 2x better.
>
> Note that my program doesn't use openmp so I don't directly care much
> about this particular example, but the trend mirrors what I'm seeing with
> HPX threads
>
> thanks
>
> JB
>
> using bgclang version 20140309
>
> export OMP_NUM_THREADS=1
> -------------------------------------------------------------
> Function    Best Rate MB/s  Avg time     Min time     Max time
> Copy:             659.5     0.242635     0.242601     0.242724
> Scale:            536.2     0.298403     0.298376     0.298535
> Add:              828.5     0.289701     0.289669     0.289839
> Triad:            711.8     0.337206     0.337151     0.337325
> -------------------------------------------------------------
> export OMP_NUM_THREADS=2
> -------------------------------------------------------------
> Function    Best Rate MB/s  Avg time     Min time     Max time
> Copy:            1318.8     0.121335     0.121322     0.121360
> Scale:           1072.5     0.149223     0.149185     0.149375
> Add:             1657.2     0.144868     0.144823     0.145036
> Triad:           1423.8     0.168611     0.168565     0.168755
> -------------------------------------------------------------
> export OMP_NUM_THREADS=4
> -------------------------------------------------------------
> Function    Best Rate MB/s  Avg time     Min time     Max time
> Copy:            2636.4     0.060729     0.060688     0.060919
> Scale:           2236.9     0.071580     0.071529     0.071774
> Add:             3311.2     0.072555     0.072482     0.072750
> Triad:           2845.6     0.084426     0.084341     0.084540
> -------------------------------------------------------------
> export OMP_NUM_THREADS=8
> -------------------------------------------------------------
> Function    Best Rate MB/s  Avg time     Min time     Max time
> Copy:            5265.6     0.030446     0.030386     0.030614
> Scale:           4468.1     0.035848     0.035809     0.036030
> Add:             6611.9     0.036341     0.036298     0.036526
> Triad:           5684.9     0.042258     0.042217     0.042420
> -------------------------------------------------------------
> export OMP_NUM_THREADS=16
> -------------------------------------------------------------
> Function    Best Rate MB/s  Avg time     Min time     Max time
> Copy:            9390.8     0.018977     0.017038     0.025704
> Scale:           7688.2     0.021786     0.020811     0.029255
> Add:            11985.7     0.020990     0.020024     0.028394
> Triad:          10875.0     0.023131     0.022069     0.031470
> -------------------------------------------------------------
> export OMP_NUM_THREADS=32
> -------------------------------------------------------------
> Function    Best Rate MB/s  Avg time     Min time     Max time
> Copy:           15556.4     0.011463     0.010285     0.012906
> Scale:          13361.1     0.013228     0.011975     0.014883
> Add:            20438.0     0.012872     0.011743     0.014259
> Triad:          18047.8     0.014270     0.013298     0.016016
> -------------------------------------------------------------
> export OMP_NUM_THREADS=60
> -------------------------------------------------------------
> Function    Best Rate MB/s  Avg time     Min time     Max time
> Copy:           11472.0     0.016570     0.013947     0.022287
> Scale:          10145.1     0.019031     0.015771     0.028346
> Add:            15317.9     0.018322     0.015668     0.025756
> Triad:          14106.8     0.018959     0.017013     0.025986
> -------------------------------------------------------------
>
> using GCC 4.8.2
> export OMP_NUM_THREADS=1
> -------------------------------------------------------------
> Function    Best Rate MB/s  Avg time     Min time     Max time
> Copy:            3534.4     0.045289     0.045270     0.045306
> Scale:           1318.8     0.121390     0.121325     0.121632
> Add:             1899.0     0.126403     0.126384     0.126428
> Triad:           1910.3     0.125667     0.125637     0.125724
> -------------------------------------------------------------
> export OMP_NUM_THREADS=2
> -------------------------------------------------------------
> Function    Best Rate MB/s  Avg time     Min time     Max time
> Copy:            7053.2     0.022716     0.022685     0.022744
> Scale:           2613.9     0.061247     0.061211     0.061278
> Add:             3794.3     0.063271     0.063252     0.063292
> Triad:           3794.4     0.063288     0.063251     0.063449
> -------------------------------------------------------------
> export OMP_NUM_THREADS=4
> -------------------------------------------------------------
> Function    Best Rate MB/s  Avg time     Min time     Max time
> Copy:           13999.4     0.011470     0.011429     0.011494
> Scale:           5218.5     0.030683     0.030660     0.030729
> Add:             7585.3     0.031647     0.031640     0.031681
> Triad:           7583.4     0.031663     0.031648     0.031690
> -------------------------------------------------------------
> export OMP_NUM_THREADS=8
> -------------------------------------------------------------
> Function    Best Rate MB/s  Avg time     Min time     Max time
> Copy:           25910.8     0.006205     0.006175     0.006233
> Scale:          10432.9     0.015373     0.015336     0.015484
> Add:            15130.5     0.015922     0.015862     0.016092
> Triad:          15116.2     0.015971     0.015877     0.016139
> -------------------------------------------------------------
> export OMP_NUM_THREADS=16
> -------------------------------------------------------------
> Function    Best Rate MB/s  Avg time     Min time     Max time
> Copy:           28433.5     0.005643     0.005627     0.005665
> Scale:          20547.1     0.007831     0.007787     0.007860
> Add:            27006.3     0.008922     0.008887     0.008948
> Triad:          27758.5     0.008658     0.008646     0.008672
> -------------------------------------------------------------
> export OMP_NUM_THREADS=32
> -------------------------------------------------------------
> Function    Best Rate MB/s  Avg time     Min time     Max time
> Copy:           28368.6     0.005673     0.005640     0.005742
> Scale:          26302.8     0.006115     0.006083     0.006175
> Add:            27164.4     0.008878     0.008835     0.008960
> Triad:          27691.3     0.008702     0.008667     0.008744
> -------------------------------------------------------------
> export OMP_NUM_THREADS=60
> -------------------------------------------------------------
> Function    Best Rate MB/s  Avg time     Min time     Max time
> Copy:           25715.2     0.008484     0.006222     0.012176
> Scale:          22472.2     0.012979     0.007120     0.021724
> Add:            25319.6     0.014178     0.009479     0.023234
> Triad:          25591.9     0.013839     0.009378     0.023146
> -------------------------------------------------------------
>
>
>
> --
> John Biddiscombe,                        email:biddisco @.at.@ cscs.ch
> http://www.cscs.ch/
> CSCS, Swiss National Supercomputing Centre  | Tel:  +41 (91) 610.82.07
> Via Trevano 131, 6900 Lugano, Switzerland   | Fax:  +41 (91) 610.82.82
>  _______________________________________________
> llvm-bgq-discuss mailing list
> llvm-bgq-discuss at lists.alcf.anl.gov
> https://lists.alcf.anl.gov/mailman/listinfo/llvm-bgq-discuss
>
>
> _______________________________________________
> llvm-bgq-discuss mailing list
> llvm-bgq-discuss at lists.alcf.anl.gov
> https://lists.alcf.anl.gov/mailman/listinfo/llvm-bgq-discuss
>
>


-- 
Jeff Hammond
Argonne Leadership Computing Facility
University of Chicago Computation Institute
jhammond at anl.gov <jhammond at alcf.anl.gov> / jhammond at uchicago.edu / (630)
252-5381
http://www.linkedin.com/in/jeffhammond
https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alcf.anl.gov/pipermail/llvm-bgq-discuss/attachments/20140325/b9191e1c/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image004.png
Type: image/png
Size: 166 bytes
Desc: not available
URL: <http://lists.alcf.anl.gov/pipermail/llvm-bgq-discuss/attachments/20140325/b9191e1c/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image003.png
Type: image/png
Size: 168 bytes
Desc: not available
URL: <http://lists.alcf.anl.gov/pipermail/llvm-bgq-discuss/attachments/20140325/b9191e1c/attachment-0003.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://lists.alcf.anl.gov/pipermail/llvm-bgq-discuss/attachments/20140325/b9191e1c/attachment-0001.gif>


More information about the llvm-bgq-discuss mailing list