[Llvm-bgq-discuss] clang on BGQ performance
Jeff Hammond
jhammond at anl.gov
Tue Mar 25 10:32:56 CDT 2014
And what happens if you turn off OpenMP at compile time? I wonder if the
LLVM OpenMP runtime just sucks too much right now on BGQ. Hal and I have
looked at it enough that I would believe this explanation.
Jeff
On Tue, Mar 25, 2014 at 10:25 AM, Biddiscombe, John A. <biddisco at cscs.ch>wrote:
> Tom
>
>
>
> Well, I'm not using openMP myself, I am using HPX which has its own thread
> scheduling (Thomas Heller reads this list and knows the details).
>
>
>
> My best results so far have been obtained using a commandline which passes
> some location setting via hwloc
>
>
>
> bin/H5FDdsmRaw_bandwidth_rw --hpx:print-bind --hpx:threads=15
> --hpx:bind=thread:0-14=socket:0-14 2048 Block 16777216 VirtualRAM
>
>
>
> here I'm attempting to place one thread on each of the 15 cpus that I can
> see with hwloc. Now If there's a way I can avoid the IOnode services which
> are running (for example there are always 2xbgvrnic processes running
> consuming 2x100% cpu - these are servicing io requests from CNK I assume).
>
>
>
> I was planning on asking that exact question to the IBM contacts here to
> see if they know how to skip the cores that the services are using (if just
> one). the problem is that hwloc doesn't seem to give the correct results
> either so I'm experimenting a bit.
>
>
>
> I just looked in my email from last week and I see that for bgvrnic "If
> there's no communication with the compute nodes, they are "just"
> spin-waiting and shouldn't have an impact - unless you get processes
> scheduled onto the same core (i.e. CPU 56-59).".
>
> you mention that they are running on 66/67 - is it possible to reconcile
> these numbers by taking into account a different counting method? (i.e, not
> including some)
>
>
>
> JB
>
>
>
>
>
>
>
>
>
> *From:* Thomas Gooding [mailto:tgooding at us.ibm.com]
> *Sent:* 25 March 2014 15:58
> *To:* Biddiscombe, John A.
> *Cc:* llvm-bgq-discuss at lists.alcf.anl.gov
> *Subject:* Re: [Llvm-bgq-discuss] clang on BGQ performance
>
>
>
> John,
>
> ionodes have 68 hwthreads available, however there are a few services
> running on the ionode that will take CPU. Core 0 takes PCIe interrupts
> (impacts performance on "cpus" 0-3) and bgvrnic takes cpus 66 and 67. I'm
> not sure how clang's OMP binds software threads to cpus - - maybe there's a
> way to avoid those cpus.
>
> I assume you're seeing this (lack of) performance only with the OpenMP
> builds?
>
> Tom
>
> Tom Gooding
> Senior Engineer / Blue Gene SW Lead / CAPI
> tgooding at us.ibm.com 507-253-0747
>
>
> [image: Inactive hide details for "Biddiscombe, John A." ---03/25/2014
> 08:58:04 AM---Dear people I'd had terrible performance of my app]"Biddiscombe,
> John A." ---03/25/2014 08:58:04 AM---Dear people I'd had terrible
> performance of my application which is intended to run on IO nodes, so
>
> From:
>
>
> "Biddiscombe, John A." <biddisco at cscs.ch>
>
> To:
>
>
> "llvm-bgq-discuss at lists.alcf.anl.gov" <llvm-bgq-discuss at lists.alcf.anl.gov
> >
>
> Date:
>
>
> 03/25/2014 08:58 AM
>
> Subject:
>
>
> [Llvm-bgq-discuss] clang on BGQ performance
>
> Sent by:
>
>
> llvm-bgq-discuss-bounces at lists.alcf.anl.gov
> ------------------------------
>
>
>
>
> Dear people
>
> I'd had terrible performance of my application which is intended to run on
> IO nodes, so I've been poking around to try to find out what might be wrong.
>
> Today I compiled a simple stream memory writing test from
> http://www.cs.virginia.edu/stream/FTP/Code/
> I've run it using openmp threads up to 60, (because for reasons I don't
> understand, the IO node only shows 15*4 threads)
>
> The results for bgclang seem to echo what I've been finding with my code.
> I have not tested my stuff fully with gcc as I only just got that installed
> recently.
>
> Any advice on what I might try to improve the bgclang numbers? in some
> cases gcc looks 2x better.
>
> Note that my program doesn't use openmp so I don't directly care much
> about this particular example, but the trend mirrors what I'm seeing with
> HPX threads
>
> thanks
>
> JB
>
> using bgclang version 20140309
>
> export OMP_NUM_THREADS=1
> -------------------------------------------------------------
> Function Best Rate MB/s Avg time Min time Max time
> Copy: 659.5 0.242635 0.242601 0.242724
> Scale: 536.2 0.298403 0.298376 0.298535
> Add: 828.5 0.289701 0.289669 0.289839
> Triad: 711.8 0.337206 0.337151 0.337325
> -------------------------------------------------------------
> export OMP_NUM_THREADS=2
> -------------------------------------------------------------
> Function Best Rate MB/s Avg time Min time Max time
> Copy: 1318.8 0.121335 0.121322 0.121360
> Scale: 1072.5 0.149223 0.149185 0.149375
> Add: 1657.2 0.144868 0.144823 0.145036
> Triad: 1423.8 0.168611 0.168565 0.168755
> -------------------------------------------------------------
> export OMP_NUM_THREADS=4
> -------------------------------------------------------------
> Function Best Rate MB/s Avg time Min time Max time
> Copy: 2636.4 0.060729 0.060688 0.060919
> Scale: 2236.9 0.071580 0.071529 0.071774
> Add: 3311.2 0.072555 0.072482 0.072750
> Triad: 2845.6 0.084426 0.084341 0.084540
> -------------------------------------------------------------
> export OMP_NUM_THREADS=8
> -------------------------------------------------------------
> Function Best Rate MB/s Avg time Min time Max time
> Copy: 5265.6 0.030446 0.030386 0.030614
> Scale: 4468.1 0.035848 0.035809 0.036030
> Add: 6611.9 0.036341 0.036298 0.036526
> Triad: 5684.9 0.042258 0.042217 0.042420
> -------------------------------------------------------------
> export OMP_NUM_THREADS=16
> -------------------------------------------------------------
> Function Best Rate MB/s Avg time Min time Max time
> Copy: 9390.8 0.018977 0.017038 0.025704
> Scale: 7688.2 0.021786 0.020811 0.029255
> Add: 11985.7 0.020990 0.020024 0.028394
> Triad: 10875.0 0.023131 0.022069 0.031470
> -------------------------------------------------------------
> export OMP_NUM_THREADS=32
> -------------------------------------------------------------
> Function Best Rate MB/s Avg time Min time Max time
> Copy: 15556.4 0.011463 0.010285 0.012906
> Scale: 13361.1 0.013228 0.011975 0.014883
> Add: 20438.0 0.012872 0.011743 0.014259
> Triad: 18047.8 0.014270 0.013298 0.016016
> -------------------------------------------------------------
> export OMP_NUM_THREADS=60
> -------------------------------------------------------------
> Function Best Rate MB/s Avg time Min time Max time
> Copy: 11472.0 0.016570 0.013947 0.022287
> Scale: 10145.1 0.019031 0.015771 0.028346
> Add: 15317.9 0.018322 0.015668 0.025756
> Triad: 14106.8 0.018959 0.017013 0.025986
> -------------------------------------------------------------
>
> using GCC 4.8.2
> export OMP_NUM_THREADS=1
> -------------------------------------------------------------
> Function Best Rate MB/s Avg time Min time Max time
> Copy: 3534.4 0.045289 0.045270 0.045306
> Scale: 1318.8 0.121390 0.121325 0.121632
> Add: 1899.0 0.126403 0.126384 0.126428
> Triad: 1910.3 0.125667 0.125637 0.125724
> -------------------------------------------------------------
> export OMP_NUM_THREADS=2
> -------------------------------------------------------------
> Function Best Rate MB/s Avg time Min time Max time
> Copy: 7053.2 0.022716 0.022685 0.022744
> Scale: 2613.9 0.061247 0.061211 0.061278
> Add: 3794.3 0.063271 0.063252 0.063292
> Triad: 3794.4 0.063288 0.063251 0.063449
> -------------------------------------------------------------
> export OMP_NUM_THREADS=4
> -------------------------------------------------------------
> Function Best Rate MB/s Avg time Min time Max time
> Copy: 13999.4 0.011470 0.011429 0.011494
> Scale: 5218.5 0.030683 0.030660 0.030729
> Add: 7585.3 0.031647 0.031640 0.031681
> Triad: 7583.4 0.031663 0.031648 0.031690
> -------------------------------------------------------------
> export OMP_NUM_THREADS=8
> -------------------------------------------------------------
> Function Best Rate MB/s Avg time Min time Max time
> Copy: 25910.8 0.006205 0.006175 0.006233
> Scale: 10432.9 0.015373 0.015336 0.015484
> Add: 15130.5 0.015922 0.015862 0.016092
> Triad: 15116.2 0.015971 0.015877 0.016139
> -------------------------------------------------------------
> export OMP_NUM_THREADS=16
> -------------------------------------------------------------
> Function Best Rate MB/s Avg time Min time Max time
> Copy: 28433.5 0.005643 0.005627 0.005665
> Scale: 20547.1 0.007831 0.007787 0.007860
> Add: 27006.3 0.008922 0.008887 0.008948
> Triad: 27758.5 0.008658 0.008646 0.008672
> -------------------------------------------------------------
> export OMP_NUM_THREADS=32
> -------------------------------------------------------------
> Function Best Rate MB/s Avg time Min time Max time
> Copy: 28368.6 0.005673 0.005640 0.005742
> Scale: 26302.8 0.006115 0.006083 0.006175
> Add: 27164.4 0.008878 0.008835 0.008960
> Triad: 27691.3 0.008702 0.008667 0.008744
> -------------------------------------------------------------
> export OMP_NUM_THREADS=60
> -------------------------------------------------------------
> Function Best Rate MB/s Avg time Min time Max time
> Copy: 25715.2 0.008484 0.006222 0.012176
> Scale: 22472.2 0.012979 0.007120 0.021724
> Add: 25319.6 0.014178 0.009479 0.023234
> Triad: 25591.9 0.013839 0.009378 0.023146
> -------------------------------------------------------------
>
>
>
> --
> John Biddiscombe, email:biddisco @.at.@ cscs.ch
> http://www.cscs.ch/
> CSCS, Swiss National Supercomputing Centre | Tel: +41 (91) 610.82.07
> Via Trevano 131, 6900 Lugano, Switzerland | Fax: +41 (91) 610.82.82
> _______________________________________________
> llvm-bgq-discuss mailing list
> llvm-bgq-discuss at lists.alcf.anl.gov
> https://lists.alcf.anl.gov/mailman/listinfo/llvm-bgq-discuss
>
>
> _______________________________________________
> llvm-bgq-discuss mailing list
> llvm-bgq-discuss at lists.alcf.anl.gov
> https://lists.alcf.anl.gov/mailman/listinfo/llvm-bgq-discuss
>
>
--
Jeff Hammond
Argonne Leadership Computing Facility
University of Chicago Computation Institute
jhammond at anl.gov <jhammond at alcf.anl.gov> / jhammond at uchicago.edu / (630)
252-5381
http://www.linkedin.com/in/jeffhammond
https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alcf.anl.gov/pipermail/llvm-bgq-discuss/attachments/20140325/b9191e1c/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image004.png
Type: image/png
Size: 166 bytes
Desc: not available
URL: <http://lists.alcf.anl.gov/pipermail/llvm-bgq-discuss/attachments/20140325/b9191e1c/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image003.png
Type: image/png
Size: 168 bytes
Desc: not available
URL: <http://lists.alcf.anl.gov/pipermail/llvm-bgq-discuss/attachments/20140325/b9191e1c/attachment-0003.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://lists.alcf.anl.gov/pipermail/llvm-bgq-discuss/attachments/20140325/b9191e1c/attachment-0001.gif>
More information about the llvm-bgq-discuss
mailing list