[Llvm-bgq-discuss] clang fatal error compiling GROMACS for BlueGene/Q

Mark Abraham mark.abraham at scilifelab.se
Sun Sep 29 15:09:05 CDT 2013


-O2 and -O3 both crashed the same way on nb_kernel_ref.c, but -O1 succeeded.

-O1 crashed on three other files, though :-( I sent Hal the gory details.

To be fair, I was being evil and using OpenMP, so I tried without, but
observed the same symptoms in all cases.

Mark



On Sun, Sep 29, 2013 at 9:22 PM, Hal Finkel <hfinkel at anl.gov> wrote:

> ----- Original Message -----
> >
> > Hi,
> >
> > Thanks, Hank. The good news is I succeeded at dropping in the debug
> > version of that object file to the release build.
>
> Good. Does it also crash with -O2 or -O1? Also, if autovectorization is
> triggering it, you can try passing -fno-vectorize or -fno-slp-vectorize --
> if this object file is not performance critical, however, then don't worry
> about it for now.
>
>  -Hal
>
> > Compile times were
> > extremely pleasing. make mdrun -j 8 took the following numbers of
> > seconds at -O3
> >
> > XLC debug 99
> > XLC release 253
> > clang debug 46
> > clang release 57
> >
> > Now, off to look at some more important timing measurements ;-)
> >
> > Mark
> >
> >
> >
> >
> > On Sun, Sep 29, 2013 at 8:24 PM, Hal Finkel < hfinkel at anl.gov >
> > wrote:
> >
> >
> >
> > ----- Original Message -----
> > >
> > >
> > > Hi all,
> > >
> > >
> > > I'm the development manager for GROMACS, which will offer new SIMD
> > > support for BlueGene/Q in its impending 4.6.4 release. Following
> > > some off-list discussion with Jeff Hammond and Hal Finkel, I was
> > > happy to explore compiling with clang for BlueGene/Q. Today I tried
> > > the version installed on JUQUEEN (r190771-20130914), as I had
> > > trouble logging into Vesta (support request lodged).
> > >
> > >
> > > In debug mode, everything went great. clang even warned about some
> > > MPI_Alltoall calls that could have had some explicit pointer casts
> > > to reassure the reader, which I've now patched.
> > >
> > >
> > > I even used qpxmath.h for a small handful of SIMD trig functions
> > > we'd
> > > want - that worked perfectly.
> > >
> > >
> > > In release mode, there was a fatal error from clang when compiling
> > > the "plain C" version of the code for which I've now written SIMD
> > > kernels. This kernel is compiled and built into mdrun as a
> > > fallback.
> > > My guess would be that auto-vectorization is choking, but hopefully
> > > you guys are better judges of that than me! I'm happy to pass this
> > > upstream to LLVM if that's the correct place for this report. The
> > > .c
> > > and .sh files to reproduce the issue can be found at
> >
> > Thanks for the bug report! This is an error in the backend (although
> > it certainly could be the autovectorization that is exposing it).
> > I'll fix this soon.
> >
> >
> >
> > >
> > >
> > >
> https://docs.google.com/file/d/0B0H2SbsMc3_qTnVvcTI1OTNFMFE/edit?usp=sharing
> > >
> https://docs.google.com/file/d/0B0H2SbsMc3_qenZBX05KSEg1TnM/edit?usp=sharing
> > >
> > >
> > > The crash trace follows:
> > >
> > >
> > >
> > > clang:
> > >
> /gpfs/vesta-home/hfinkel/rpmbuild/BUILD/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:630:
> > > llvm::SDValue<unnamed>::DAGCombiner::CombineTo(llvm::SDNode*, const
> > > llvm::SDValue*, unsigned int, bool): Assertion `N->getNumValues()
> > > ==
> > > NumTo && "Broken CombineTo call!"' failed.
> > > 0 libLLVM-3.4svn.so 0x00000fff7ec34a9c
> > > llvm::sys::PrintStackTrace(_IO_FILE*) + 4281424836
> > > 1 libLLVM-3.4svn.so 0x00000fff7ec34d00
> > > 2 libLLVM-3.4svn.so 0x00000fff7ec35ba4
> > > 3 0x00000fff7f980418 __kernel_sigtramp_rt64 + 0
> > > 4 libc.so.6 0x00000080c3766ef8 abort + 4293479848
> > > 5 libc.so.6 0x00000080c375b98c
> > > 6 libc.so.6 0x00000080c375baa4 __assert_fail + 4293437492
> > > 7 libLLVM-3.4svn.so 0x00000fff7ea0a94c
> > > 8 libLLVM-3.4svn.so 0x00000fff7ea0adfc
> > > 9 libLLVM-3.4svn.so 0x00000fff7ea2de20
> > > 10 libLLVM-3.4svn.so 0x00000fff7ea43554
> > > 11 libLLVM-3.4svn.so 0x00000fff7ea46ecc
> > > 12 libLLVM-3.4svn.so 0x00000fff7ea49c70
> > > llvm::SelectionDAG::Combine(llvm::CombineLevel,
> > > llvm::AliasAnalysis&, llvm::CodeGenOpt::Level) + 4279456680
> > > 13 libLLVM-3.4svn.so 0x00000fff7eb8fba8
> > > llvm::SelectionDAGISel::CodeGenAndEmitDAG() + 4280770368
> > > 14 libLLVM-3.4svn.so 0x00000fff7eb909f8
> > >
> llvm::SelectionDAGISel::SelectBasicBlock(llvm::ilist_iterator<llvm::Instruction
> > > const>, llvm::ilist_iterator<llvm::Instruction const>, bool&) +
> > > 4280774016
> > > 15 libLLVM-3.4svn.so 0x00000fff7eb92dec
> > > llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&)
> > > + 4280783188
> > > 16 libLLVM-3.4svn.so 0x00000fff7eb93fbc
> > > llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&)
> > > + 4280787732
> > > 17 libLLVM-3.4svn.so 0x00000fff7e84d9c8
> > > 18 libLLVM-3.4svn.so 0x00000fff7e1e26cc
> > > llvm::MachineFunctionPass::runOnFunction(llvm::Function&) +
> > > 4270867196
> > > 19 libLLVM-3.4svn.so 0x00000fff7e4529b8
> > > llvm::FPPassManager::runOnFunction(llvm::Function&) + 4273352968
> > > 20 libLLVM-3.4svn.so 0x00000fff7e452afc
> > > llvm::FPPassManager::runOnModule(llvm::Module&) + 4273353276
> > > 21 libLLVM-3.4svn.so 0x00000fff7e4522bc
> > > llvm::MPPassManager::runOnModule(llvm::Module&) + 4273351228
> > > 22 libLLVM-3.4svn.so 0x00000fff7e4525e4
> > > llvm::PassManagerImpl::run(llvm::Module&) + 4273352020
> > > 23 libLLVM-3.4svn.so 0x00000fff7e4526f4
> > > llvm::PassManager::run(llvm::Module&) + 4273352276
> > > 24 clang 0x00000000103ae874
> > > 25 clang 0x00000000103af7f8
> > > clang::EmitBackendOutput(clang::DiagnosticsEngine&,
> > > clang::CodeGenOptions const&, clang::TargetOptions const&,
> > > clang::LangOptions const&, llvm::Module*, clang::BackendAction,
> > > llvm::raw_ostream*) + 4272665128
> > >
> > > 26 clang 0x00000000103ab4a4
> > > 27 clang 0x000000001059f230 clang::ParseAST(clang::Sema&, bool,
> > > bool)
> > > + 4274649152
> > > 28 clang 0x00000000101e4b64
> > > clang::ASTFrontendAction::ExecuteAction()
> > > + 4270836484
> > > 29 clang 0x00000000103a9b00 clang::CodeGenAction::ExecuteAction() +
> > > 4272641808
> > > 30 clang 0x00000000101e4fb4 clang::FrontendAction::Execute() +
> > > 4270837524
> > > 31 clang 0x00000000101be154
> > > clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) +
> > > 4270679924
> > > 32 clang 0x000000001019f894
> > > clang::ExecuteCompilerInvocation(clang::CompilerInstance*) +
> > > 4270560804
> > > 33 clang 0x00000000101959d8 cc1_main(char const**, char const**,
> > > char
> > > const*, void*) + 4270520648
> > > 34 clang 0x000000001019d540 main + 4270551792
> > > 35 libc.so.6 0x00000080c374bcf8
> > > 36 libc.so.6 0x00000080c374bef0 __libc_start_main + 4293374496
> > > Stack dump:
> > > 0. Program arguments:
> > > /usr/local/bg_soft/clang/llvm.r190771/r190771-20130914/bin/clang
> > > -cc1 -fopenmp -triple powerpc64-bgq-linux -S -disable-free
> > > -main-file-name nbnxn_kernel_ref.c -static-define
> > > -mrelocation-model
> > > static -mdisable-fp-elim -ffp-contract=fast -mconstructor-aliases
> > > -target-cpu a2q -target-linker-version 2.20.51.0.2 -coverage-file
> > > /tmp/nbnxn_kernel_ref-bb4750.s -resource-dir
> > >
> /usr/local/bg_soft/clang/llvm.r190771/r190771-20130914/bin/../lib/clang/3.4
> > > -D __bgclang__=1 -D __bgclang_version__="r000000-00000000" -D
> > > HAVE_CONFIG_H -D md_EXPORTS -D NDEBUG -I
> > > /bgsys/local/clang/llvm.r190771/r190771-20130914/sleef/include -I
> > > /bgsys/local/clang/llvm.r190771/r190771-20130914/omp/include -I
> > > /bgsys/drivers/V1R2M1/ppc64/comm/include -I
> > > /bgsys/drivers/V1R2M1/ppc64/comm/lib/gnu -I
> > > /bgsys/drivers/V1R2M1/ppc64 -I
> > > /bgsys/drivers/V1R2M1/ppc64/comm/sys/include -I
> > > /bgsys/drivers/V1R2M1/ppc64/spi/include -I
> > > /bgsys/drivers/V1R2M1/ppc64/spi/include/kernel/cnk -I
> > > /homeb/zdv518/zdv518/git/bluegene-dev-r46/build-cmake-clang/src -I
> > > /homeb/zdv518/zdv518/git/bluegene-dev-r46/build-cmake-clang/include
> > > -I /homeb/zdv518/zdv518/git/bluegene-dev-r46/include -I
> > > /homeb/zdv518/zdv518/progs/bgsys-clang/include -I
> > > /bgsys/drivers/V1R2M1/ppc64/comm/include -internal-isystem
> > > /usr/local/include -internal-isystem
> > >
> /usr/local/bg_soft/clang/llvm.r190771/r190771-20130914/bin/../lib/clang/3.4/include
> > > -internal-externc-isystem /include -internal-externc-isystem
> > > /usr/include -O3 -Wall -Wno-unused -Wunused-value
> > > -fno-dwarf-directory-asm -fdebug-compilation-dir
> > > /homeb/zdv518/zdv518/git/bluegene-dev-r46/build-cmake-clang/src/mdlib
> > > -ferror-limit 19 -fmessage-length 108 -mstackrealign
> > > -fno-signed-char -fobjc-runtime=gcc
> > > -fobjc-default-synthesize-properties -fdiagnostics-show-option
> > > -fcolor-diagnostics -vectorize-loops -vectorize-slp -isystem
> > > /bgsys/drivers/V1R2M1/ppc64/gnu-linux/powerpc64-bgq-linux/sys-include
> > > -mllvm -optimize-regalloc -mllvm -fast-isel=0 -o
> > > /tmp/nbnxn_kernel_ref-bb4750.s -x c
> > >
> /homeb/zdv518/zdv518/git/bluegene-dev-r46/src/mdlib/nbnxn_kernels/nbnxn_kernel_ref.c
> > > 1. <eof> parser at end of file
> > > 2. Code generation
> > > 3. Running pass 'Function Pass Manager' on module
> > >
> '/homeb/zdv518/zdv518/git/bluegene-dev-r46/src/mdlib/nbnxn_kernels/nbnxn_kernel_ref.c'.
> > > 4. Running pass 'PowerPC DAG->DAG Pattern Instruction Selection' on
> > > function '@nbnxn_kernel_ref_rf_noener'
> > >
> > > clang: error: unable to execute command: Aborted (core dumped)
> > > clang: error: clang frontend command failed due to signal (use -v
> > > to
> > > see invocation)
> > > clang version 3.4 (trunk)
> > > Target: powerpc64-bgq-linux
> > > Thread model: posix
> > > clang: note: diagnostic msg: PLEASE submit a bug report to
> > > http://llvm.org/bugs/ and include the crash backtrace, preprocessed
> > > source, and associated run script.
> > > clang: note: diagnostic msg:
> > > ********************
> > >
> > >
> > > PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
> > > Preprocessed source(s) and associated run script(s) are located at:
> > > clang: note: diagnostic msg: /tmp/nbnxn_kernel_ref-96ac7c.c
> > > clang: note: diagnostic msg: /tmp/nbnxn_kernel_ref-96ac7c.sh
> > > clang: note: diagnostic msg:
> > >
> > >
> > > ********************
> > >
> > >
> > > I tried to check that the .sh file would reproduce the above, but
> > > it
> > > failed with
> > >
> > >
> > >
> > > In file included from <built-in>:167:
> > > <command line>:6:10: fatal error: 'qpxintrin.h' file not found
> > > #include "qpxintrin.h"
> >
> > Ah, I keep forgetting to add this to my TODO list to fix. Thanks for
> > reminding me :)
> >
> >
> > >
> > > Hope that is useful - do let me know if I can be of further help!
> >
> > Quite useful.
> >
> > -Hal
> >
> > >
> > >
> > > Cheers,
> > >
> > >
> > > Mark
> > > _______________________________________________
> > > llvm-bgq-discuss mailing list
> > > llvm-bgq-discuss at lists.alcf.anl.gov
> > > https://lists.alcf.anl.gov/mailman/listinfo/llvm-bgq-discuss
> > >
> >
> > --
> > Hal Finkel
> > Assistant Computational Scientist
> > Leadership Computing Facility
> > Argonne National Laboratory
> >
> >
> > _______________________________________________
> > llvm-bgq-discuss mailing list
> > llvm-bgq-discuss at lists.alcf.anl.gov
> > https://lists.alcf.anl.gov/mailman/listinfo/llvm-bgq-discuss
> >
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alcf.anl.gov/pipermail/llvm-bgq-discuss/attachments/20130929/e8ef89b4/attachment-0001.html>


More information about the llvm-bgq-discuss mailing list