[dcmf] running my own mpich

Kazutomo Yoshii kazutomo at mcs.anl.gov
Tue Feb 12 16:34:43 CST 2008


I found that we need a new SPI library to fix this problem.

I actually managed to build a new SPI library(only DMA part)
by mixing ppcfloor's binary libSPI.a and objects compiled from
BGP_DMA_runtime.tar.gz. I just wanted to make sure that
the SPI library is a cause or not. With a new hacked SPI library,
my dcmf worked without segv.

Can anyone upload all SPI source codes into the wiki page?
Probably, the runtime SPI library contains the following files:

bgp_cna_SPI.c
spi_collective.c
UPC.c
DMA_Counter.c
DMA_Descriptors.c
DMA_InjFifo.c
DMA_RecFifo.c

We have DMA_*.c now.

- kaz



> I've made some changes to ROMIO and would like to test them out.  I've
> built an mpich library with the the 'make mpich' rule, and that goes
> just fine: i've got an install/bin/mpicc which links in
> install/lib/libdcmfcoll.cnk.a install/lib/libdcmf.cnk.a and
> install/lib/libmpich.cnk.a
> 
> So far, everything looks normal.
> 
> When I try to run the resulting program, I get a segfault.  Here's the
> output after running the stack dump in one of the lightweight core
> file through addr2line:
> 
> 0x010fa338
> DMA_InjFifoRgetFifoFullInit
> ??:0
> 0x01304834
> ??
> ??:0
> 0x010cd56c
> DCMF::DMA::Device::initGroups()
> /home/robl/src/bgp.comm/sys/build-dcmf/../messaging/devices/prod/dma/Init.cc:308
> 0x010cd91c
> DCMF::DMA::Device::initDMADevice()
> /home/robl/src/bgp.comm/sys/build-dcmf/../messaging/devices/prod/dma/Init.cc:397
> 0x010bc2dc
> BGPMessager
> /home/robl/src/bgp.comm/sys/build-dcmf/../messaging/messager/prod/bgp/msgr.h:81
> 0x010b2cec
> DCMF::BGPMessager::generate()
> /home/robl/src/bgp.comm/sys/build-dcmf/../messaging/messager/prod/bgp/msgr.h:105
> 0x0102563c
> MPID_Init
> /gpfs/home/robl/src/bgp.comm/lib/mpi/mpich2/src/mpid/dcmf/src/misc/mpid_init.c:63
> 0x0100cf58
> MPIR_Init_thread
> /gpfs/home/robl/src/bgp.comm/lib/mpi/mpich2/src/mpi/init/initthread.c:236
> 0x0100cd1c
> PMPI_Init
> /gpfs/home/robl/src/bgp.comm/lib/mpi/mpich2/src/mpi/init/init.c:93
> 0x010013a4
> main
> /home/robl/src/darray-io.c:51
> 0x011004c0
> generic_start_main
> ../csu/libc-start.c:231
> 0x01100734
> __libc_start_main
> ../sysdeps/unix/sysv/linux/powerpc/libc-start.c:137
> 0xfffffffc
> ??
> ??:0
> 
> 
> If I had to guess, I'd say that the libdcmf in the development tree is
> incompatible with argonne's V1R1M2_500_2007-071213P driver.  What's
> the best way to test out my ROMIO changes?
> 
> Thanks
> ==rob
> 




More information about the dcmf mailing list