[Llvm-bgq-discuss] clang on BGQ performance
Hal Finkel
hfinkel at anl.gov
Tue Mar 25 11:34:00 CDT 2014
----- Original Message -----
> From: "John A. Biddiscombe" <biddisco at cscs.ch>
> To: llvm-bgq-discuss at lists.alcf.anl.gov
> Sent: Tuesday, March 25, 2014 8:57:56 AM
> Subject: [Llvm-bgq-discuss] clang on BGQ performance
>
>
>
>
>
> Dear people
>
>
>
> I’d had terrible performance of my application which is intended to
> run on IO nodes, so I’ve been poking around to try to find out what
> might be wrong.
>
>
>
> Today I compiled a simple stream memory writing test from
> http://www.cs.virginia.edu/stream/FTP/Code/
Can you please provide details on exactly what you did? What compile flags did you use, did you define TUNED?
-Hal
>
> I’ve run it using openmp threads up to 60, (because for reasons I
> don’t understand, the IO node only shows 15*4 threads)
>
>
>
> The results for bgclang seem to echo what I’ve been finding with my
> code. I have not tested my stuff fully with gcc as I only just got
> that installed recently.
>
>
>
> Any advice on what I might try to improve the bgclang numbers? in
> some cases gcc looks 2x better.
>
>
>
> Note that my program doesn’t use openmp so I don’t directly care much
> about this particular example, but the trend mirrors what I’m seeing
> with HPX threads
>
>
>
> thanks
>
>
>
> JB
>
>
>
> using bgclang version 20140309
>
>
>
> export OMP_NUM_THREADS=1
>
> -------------------------------------------------------------
>
> Function Best Rate MB/s Avg time Min time Max time
>
> Copy: 659.5 0.242635 0.242601 0.242724
>
> Scale: 536.2 0.298403 0.298376 0.298535
>
> Add: 828.5 0.289701 0.289669 0.289839
>
> Triad: 711.8 0.337206 0.337151 0.337325
>
> -------------------------------------------------------------
>
> export OMP_NUM_THREADS=2
>
> -------------------------------------------------------------
>
> Function Best Rate MB/s Avg time Min time Max time
>
> Copy: 1318.8 0.121335 0.121322 0.121360
>
> Scale: 1072.5 0.149223 0.149185 0.149375
>
> Add: 1657.2 0.144868 0.144823 0.145036
>
> Triad: 1423.8 0.168611 0.168565 0.168755
>
> -------------------------------------------------------------
>
> export OMP_NUM_THREADS=4
>
> -------------------------------------------------------------
>
> Function Best Rate MB/s Avg time Min time Max time
>
> Copy: 2636.4 0.060729 0.060688 0.060919
>
> Scale: 2236.9 0.071580 0.071529 0.071774
>
> Add: 3311.2 0.072555 0.072482 0.072750
>
> Triad: 2845.6 0.084426 0.084341 0.084540
>
> -------------------------------------------------------------
>
> export OMP_NUM_THREADS=8
>
> -------------------------------------------------------------
>
> Function Best Rate MB/s Avg time Min time Max time
>
> Copy: 5265.6 0.030446 0.030386 0.030614
>
> Scale: 4468.1 0.035848 0.035809 0.036030
>
> Add: 6611.9 0.036341 0.036298 0.036526
>
> Triad: 5684.9 0.042258 0.042217 0.042420
>
> -------------------------------------------------------------
>
> export OMP_NUM_THREADS=16
>
> -------------------------------------------------------------
>
> Function Best Rate MB/s Avg time Min time Max time
>
> Copy: 9390.8 0.018977 0.017038 0.025704
>
> Scale: 7688.2 0.021786 0.020811 0.029255
>
> Add: 11985.7 0.020990 0.020024 0.028394
>
> Triad: 10875.0 0.023131 0.022069 0.031470
>
> -------------------------------------------------------------
>
> export OMP_NUM_THREADS=32
>
> -------------------------------------------------------------
>
> Function Best Rate MB/s Avg time Min time Max time
>
> Copy: 15556.4 0.011463 0.010285 0.012906
>
> Scale: 13361.1 0.013228 0.011975 0.014883
>
> Add: 20438.0 0.012872 0.011743 0.014259
>
> Triad: 18047.8 0.014270 0.013298 0.016016
>
> -------------------------------------------------------------
>
> export OMP_NUM_THREADS=60
>
> -------------------------------------------------------------
>
> Function Best Rate MB/s Avg time Min time Max time
>
> Copy: 11472.0 0.016570 0.013947 0.022287
>
> Scale: 10145.1 0.019031 0.015771 0.028346
>
> Add: 15317.9 0.018322 0.015668 0.025756
>
> Triad: 14106.8 0.018959 0.017013 0.025986
>
> -------------------------------------------------------------
>
>
>
> using GCC 4.8.2
>
> export OMP_NUM_THREADS=1
>
> -------------------------------------------------------------
>
> Function Best Rate MB/s Avg time Min time Max time
>
> Copy: 3534.4 0.045289 0.045270 0.045306
>
> Scale: 1318.8 0.121390 0.121325 0.121632
>
> Add: 1899.0 0.126403 0.126384 0.126428
>
> Triad: 1910.3 0.125667 0.125637 0.125724
>
> -------------------------------------------------------------
>
> export OMP_NUM_THREADS=2
>
> -------------------------------------------------------------
>
> Function Best Rate MB/s Avg time Min time Max time
>
> Copy: 7053.2 0.022716 0.022685 0.022744
>
> Scale: 2613.9 0.061247 0.061211 0.061278
>
> Add: 3794.3 0.063271 0.063252 0.063292
>
> Triad: 3794.4 0.063288 0.063251 0.063449
>
> -------------------------------------------------------------
>
> export OMP_NUM_THREADS=4
>
> -------------------------------------------------------------
>
> Function Best Rate MB/s Avg time Min time Max time
>
> Copy: 13999.4 0.011470 0.011429 0.011494
>
> Scale: 5218.5 0.030683 0.030660 0.030729
>
> Add: 7585.3 0.031647 0.031640 0.031681
>
> Triad: 7583.4 0.031663 0.031648 0.031690
>
> -------------------------------------------------------------
>
> export OMP_NUM_THREADS=8
>
> -------------------------------------------------------------
>
> Function Best Rate MB/s Avg time Min time Max time
>
> Copy: 25910.8 0.006205 0.006175 0.006233
>
> Scale: 10432.9 0.015373 0.015336 0.015484
>
> Add: 15130.5 0.015922 0.015862 0.016092
>
> Triad: 15116.2 0.015971 0.015877 0.016139
>
> -------------------------------------------------------------
>
> export OMP_NUM_THREADS=16
>
> -------------------------------------------------------------
>
> Function Best Rate MB/s Avg time Min time Max time
>
> Copy: 28433.5 0.005643 0.005627 0.005665
>
> Scale: 20547.1 0.007831 0.007787 0.007860
>
> Add: 27006.3 0.008922 0.008887 0.008948
>
> Triad: 27758.5 0.008658 0.008646 0.008672
>
> -------------------------------------------------------------
>
> export OMP_NUM_THREADS=32
>
> -------------------------------------------------------------
>
> Function Best Rate MB/s Avg time Min time Max time
>
> Copy: 28368.6 0.005673 0.005640 0.005742
>
> Scale: 26302.8 0.006115 0.006083 0.006175
>
> Add: 27164.4 0.008878 0.008835 0.008960
>
> Triad: 27691.3 0.008702 0.008667 0.008744
>
> -------------------------------------------------------------
>
> export OMP_NUM_THREADS=60
>
> -------------------------------------------------------------
>
> Function Best Rate MB/s Avg time Min time Max time
>
> Copy: 25715.2 0.008484 0.006222 0.012176
>
> Scale: 22472.2 0.012979 0.007120 0.021724
>
> Add: 25319.6 0.014178 0.009479 0.023234
>
> Triad: 25591.9 0.013839 0.009378 0.023146
>
> -------------------------------------------------------------
>
>
>
>
>
>
>
> --
>
> John Biddiscombe, email:biddisco @.at.@ cscs.ch
>
> http://www.cscs.ch/
>
> CSCS, Swiss National Supercomputing Centre | Tel: +41 (91) 610.82.07
>
> Via Trevano 131, 6900 Lugano, Switzerland | Fax: +41 (91) 610.82.82
>
>
> _______________________________________________
> llvm-bgq-discuss mailing list
> llvm-bgq-discuss at lists.alcf.anl.gov
> https://lists.alcf.anl.gov/mailman/listinfo/llvm-bgq-discuss
>
--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
More information about the llvm-bgq-discuss
mailing list