[Llvm-bgq-discuss] more issues from trying bgclang with GROMACS

Mark Abraham mark.j.abraham at gmail.com
Fri Feb 7 20:12:16 CST 2014


They were .c files, unfortunately.

Mark


On Sat, Feb 8, 2014 at 3:02 AM, Hal Finkel <hfinkel at anl.gov> wrote:

> ----- Original Message -----
> > From: "Mark Abraham" <mark.j.abraham at gmail.com>
> > Cc: llvm-bgq-discuss at lists.alcf.anl.gov
> > Sent: Friday, February 7, 2014 7:57:34 PM
> > Subject: Re: [Llvm-bgq-discuss] more issues from trying bgclang with
> GROMACS
> >
> >
> >
> > Hi,
> >
> >
> > The preprocessing I hoped to do turns out to be non-trivial. Doing it
> > with -E using bgclang leads to qpxintrin.h being included. Then when
> > I go to do a test compilation on the preprocessed source,
> > qpxintrin.h appears to get magically re-included, and apparently I
> > can't keep redefining vec_ld, etc. Hacking out the local copy of
> > qpxintrin.h didn't help. The -no*inc options of the preprocessor
> > didn't help - they only suppress search, not the attempt to
> > #include.
> >
> >
> > Does anyone know how to do C preprocessing, but limit the scope to
> > just certain files? e.g. leave system #includes intact?
>
> This is a bug that I've not fixed yet. As I recall, it has to do with
> file-name expensions (in part). What is the file-name extension on your
> preprocessed file? I think if you name it .c or .cpp, etc. it should work.
> Naming it .i (or whatever) confuses the logic that wants to include
> qpxintrin.h.
>
>  -Hal
>
> >
> >
> > Mark
> >
> >
> >
> > On Fri, Feb 7, 2014 at 8:10 PM, Hal Finkel < hfinkel at anl.gov > wrote:
> >
> >
> >
> > ----- Original Message -----
> > > From: "Mark Abraham" < mark.abraham at scilifelab.se >
> >
> >
> > > To: llvm-bgq-discuss at lists.alcf.anl.gov
> > > Sent: Friday, February 7, 2014 12:35:26 PM
> > > Subject: Re: [Llvm-bgq-discuss] more issues from trying bgclang
> > > with GROMACS
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Fri, Feb 7, 2014 at 3:44 PM, Hal Finkel < hfinkel at anl.gov >
> > > wrote:
> > >
> > >
> > >
> > > ----- Original Message -----
> > > > From: "Mark Abraham" < mark.abraham at scilifelab.se >
> > > > Cc: llvm-bgq-discuss at lists.alcf.anl.gov
> > > > Sent: Friday, February 7, 2014 8:29:18 AM
> > > > Subject: Re: [Llvm-bgq-discuss] more issues from trying bgclang
> > > > with GROMACS
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> > > > Hi Hal,
> > > >
> > > > AFAICS not the problem, but it's hard for a mere user to find
> > > > this
> > > > kind of stuff out:
> > > >
> > > > [juqueen3 ~ (juq-homedir)] $ ls /bgsys/drivers/
> > > > ppcfloor toolchain V1R2M0 V1R2M1
> > >
> > > Yep; that seems good. FYI, if you run ls -l /bgsys/drivers you can
> > > see to which driver ppcfloor is symlinked, and that will give you
> > > the answer.
> > >
> > >
> > >
> > >
> > > Ja V1R2M1.
> > >
> > >
> > >
> > >
> > > >
> > > > Jeff helpfully tried to sort me out with an ALCF account last
> > > > year,
> > > > but the cryptocard they shipped never worked with the PIN they
> > > > sent
> > > > with it, and the helpdesk insisted on me calling from Sweden to
> > > > get
> > > > any help at all, so I gave up. :-( That was about the time
> > > > Congress
> > > > decided to act more like children than usual, so maybe things
> > > > were
> > > > messier than usual! :-)
> > >
> > > That's odd. They should be able to authenticate your identity by
> > > some
> > > mechanism other than using caller-id. :( -- When you have a chance
> > > please, try again; if they still won't help you, I'll raise the
> > > issue internally.
> > >
> > >
> > >
> > > OK.
> > >
> > >
> > >
> > > On the bgclang issue, if you can reasonably provide instructions on
> > > how to repeat a test showing this issue, I can try it on my end as
> > > well. More likely than not, if there is a correctness issue to
> > > debug, I'd need to do that at some point anyway.
> > >
> > >
> > >
> > > I haven't pinpointed where the problem arises this time, but last
> > > time it was with an omp parallel do over threads executing the
> > > innermost kernels for MD nonbonded interactions.
> > >
> > > I've looked into how simple I can make reproducing the OpenMP
> > > crash,
> > > and it is ugly. The OpenMP debug build (-g) runs OK, with plain C
> > > and QPX-specific kernels. The OpenMP release build (-O3) runs OK
> > > with plain C, but gives junk results with QPX somewhere leading to
> > > a
> > > subsequent segfault. So the core file stack trace is not useful.
> > >
> > > Altogether, that is a good suggestion that problems start to occur
> > > with the same omp parallel do, and that it is at least somewhat
> > > specific to bgclang. It might be possible to do some dirty
> > > comparison of the resulting energy and force at -O3 -g between the
> > > C
> > > and QPX versions. My guess is that the problem will be visible by
> > > the end of the very first inner loop, which should be binary
> > > reproducible between the two versions. If not, then the problem is
> > > probably when the subsequent reduction from the thread-local force
> > > buffers occurs. I just don't have the time to try that and be on
> > > the
> > > wrong track, sit waiting trying to reproduce two parallel debugging
> > > sessions, lead to no conclusion, we don't actually need bgclang to
> > > work because xlc does, etc. Happy to advise if there's something
> > > you
> > > can identify, though. Can tarball you code + build instructions +
> > > single input file off list if you'd like to try.
> >
> > Yes, please.
> >
> > -Hal
> >
> >
> >
> > >
> > > Both plain C and QPX kernels are a horrible mess of nested file
> > > #inclusion and subsequent #ifdefs, because we have about 70
> > > different kernels per SIMD flavour (and growing). (We're working on
> > > a python generator instead, but that's not here yet.) I'll
> > > preprocess them by hand into correct single files if you want.
> > >
> > >
> > > The other correctness issue could be anywhere - unfortunately
> > > basically all of our tests are end-to-end, so when a bunch of them
> > > fail you have to work out the theme(s). I'll have to do that and
> > > that probably won't be soon!
> > >
> > >
> > >
> > >
> > > Ah, one more thing: are you linking against anything that also
> > > links
> > > in IBM's OpenMP runtime (like ESSL SMP)? That can also cause issues
> > > like this.
> > >
> > >
> > >
> > > There had been a dependency on a system FFTW, but the above was
> > > done
> > > with a fully independent GROMACS. So I think the issue is not an
> > > OpenMP-runtime-version clash.
> > >
> > > Mark
> > >
> > >
> > >
> > >
> > > -Hal
> > >
> > >
> > >
> > > >
> > > > Mark
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Fri, Feb 7, 2014 at 2:27 PM, Hal Finkel < hfinkel at anl.gov >
> > > > wrote:
> > > >
> > > >
> > > >
> > > > ----- Original Message -----
> > > > > From: "Mark Abraham" < mark.j.abraham at gmail.com >
> > > > > To: llvm-bgq-discuss at lists.alcf.anl.gov
> > > >
> > > > > Sent: Friday, February 7, 2014 4:55:24 AM
> > > > > Subject: Re: [Llvm-bgq-discuss] more issues from trying bgclang
> > > > > with GROMACS
> > > > >
> > > > >
> > > > >
> > > > > Hi,
> > > > >
> > > > >
> > > >
> > > > > Unfortunately, the OpenMP runs failed outright (results from
> > > > > reduction over threads were nan, reason unclear), and there was
> > > > > some
> > > > > other issue. That will take some time to dig into, because we
> > > > > don't
> > > > > have a "known good with bgclang" code version with which to
> > > > > compare.
> > > > > I'll get back to this, but it'll be a few weeks, sorry.
> > > >
> > > > What driver version is the machine running? (there are known
> > > > issues
> > > > with OpenMP and driver version V1R1M2 (and earlier) -- which I
> > > > did
> > > > not think anyone was still using, but it seems some folks still
> > > > are).
> > > >
> > > > -Hal
> > > >
> > > >
> > > >
> > > > >
> > > > >
> > > > > Thanks again,
> > > > >
> > > > >
> > > > > Mark
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Feb 7, 2014 at 2:23 AM, Mark Abraham <
> > > > > mark.j.abraham at gmail.com > wrote:
> > > > >
> > > > >
> > > > >
> > > > > Oops, I did indeed forget to unpack that RPM. Thanks for the
> > > > > tip!
> > > > > With it, the OpenMP aspect build was flawless. I was able to
> > > > > work
> > > > > around the other bug by compiling those files at -O2 - which is
> > > > > fine
> > > > > for normal GROMACS. Test run in the queue :-)
> > > > >
> > > > >
> > > > > Mark
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Feb 7, 2014 at 1:37 AM, Hal Finkel < hfinkel at anl.gov >
> > > > > wrote:
> > > > >
> > > > >
> > > > >
> > > > > ----- Original Message -----
> > > > > > From: "Mark Abraham" < mark.j.abraham at gmail.com >
> > > > > > To: llvm-bgq-discuss at lists.alcf.anl.gov
> > > > > > Sent: Thursday, February 6, 2014 6:26:55 PM
> > > > > > Subject: [Llvm-bgq-discuss] more issues from trying bgclang
> > > > > > with
> > > > > > GROMACS
> > > > > >
> > > > > >
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > >
> > > > > > I had another go compiling GROMACS 5.0 beta with bgclang
> > > > > > latest
> > > > > > RPM
> > > > > > (r200401-20140129). CMake detection of OpenMP support in
> > > > > > mpiclang
> > > > > > failed. Detection should just work because using the -fopenmp
> > > > > > flag
> > > > > > is a standard way to do it. When I tried a manual compile:
> > > > > >
> > > > > >
> > > > > >
> > > > > > $ ~/progs/bgclang/current/bin/bgclang -fopenmp test.c -o test
> > > > > >
> /homea/slbio/slbio013/progs/bgclang/r200401-20140129/binutils/bin/ld:
> > > > > > cannot find -liomp5
> > > > > > clang: error: linker command failed with exit code 1 (use -v
> > > > > > to
> > > > > > see
> > > > > > invocation)
> > > > > >
> > > > > >
> > > > > > That looks like a lingering Intel-ism?
> > > > >
> > > > > Yes, but that's okay, the libomp package should create the
> > > > > necessary
> > > > > symlink for you. Did you install
> > > > > bgclang-libomp-r200401-20140129-1-1.ppc64.rpm?
> > > > >
> > > > > -Hal
> > > > >
> > > > >
> > > > > >
> > > > > >
> > > > > > The MPI plus non-OpenMP build seemed to go OK, but a file in
> > > > > > our
> > > > > > bundled lapack subset provoked a bug (attached in tarball).
> > > > > > That
> > > > > > file was not a problem in ~August 2013.
> > > > > >
> > > > > >
> > > > > > Thanks again for the effort!
> > > > > >
> > > > > >
> > > > > > Cheers,
> > > > > >
> > > > > >
> > > > > > Mark
> > > > > > _______________________________________________
> > > > > > llvm-bgq-discuss mailing list
> > > > > > llvm-bgq-discuss at lists.alcf.anl.gov
> > > > > > https://lists.alcf.anl.gov/mailman/listinfo/llvm-bgq-discuss
> > > > > >
> > > > >
> > > > > --
> > > > > Hal Finkel
> > > > > Assistant Computational Scientist
> > > > > Leadership Computing Facility
> > > > > Argonne National Laboratory
> > > > >
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > llvm-bgq-discuss mailing list
> > > > > llvm-bgq-discuss at lists.alcf.anl.gov
> > > > > https://lists.alcf.anl.gov/mailman/listinfo/llvm-bgq-discuss
> > > > >
> > > >
> > > > --
> > > > Hal Finkel
> > > > Assistant Computational Scientist
> > > > Leadership Computing Facility
> > > > Argonne National Laboratory
> > > > _______________________________________________
> > > > llvm-bgq-discuss mailing list
> > > > llvm-bgq-discuss at lists.alcf.anl.gov
> > > > https://lists.alcf.anl.gov/mailman/listinfo/llvm-bgq-discuss
> > > >
> > > >
> > > > _______________________________________________
> > > > llvm-bgq-discuss mailing list
> > > > llvm-bgq-discuss at lists.alcf.anl.gov
> > > > https://lists.alcf.anl.gov/mailman/listinfo/llvm-bgq-discuss
> > > >
> > >
> > > --
> > > Hal Finkel
> > > Assistant Computational Scientist
> > > Leadership Computing Facility
> > > Argonne National Laboratory
> > >
> > >
> > > _______________________________________________
> > > llvm-bgq-discuss mailing list
> > > llvm-bgq-discuss at lists.alcf.anl.gov
> > > https://lists.alcf.anl.gov/mailman/listinfo/llvm-bgq-discuss
> > >
> >
> > --
> > Hal Finkel
> > Assistant Computational Scientist
> > Leadership Computing Facility
> > Argonne National Laboratory
> >
> >
> > _______________________________________________
> > llvm-bgq-discuss mailing list
> > llvm-bgq-discuss at lists.alcf.anl.gov
> > https://lists.alcf.anl.gov/mailman/listinfo/llvm-bgq-discuss
> >
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alcf.anl.gov/pipermail/llvm-bgq-discuss/attachments/20140208/efbb4328/attachment-0001.html>


More information about the llvm-bgq-discuss mailing list