<br><font size=2 face="sans-serif">Sameer,</font>
<br>
<br><font size=2 face="sans-serif">I'm not exactly sure what you mean.
I don't know *exactly* how many times the locking code is called
from MPI (possibly recursively), but I think that it is one lock and one
unlock.</font>
<br>
<br><font size=2 face="sans-serif">I am sure that the enter/exit functions
are called in exactly the same places and exactly as often in SINGLE and
in MULTIPLE. The only difference is the length of time that it takes
to call them. The test I wrote is strictly a comparison of the critical
section code when locking or not.</font>
<br>
<br><font size=2 face="sans-serif">Joe Ratterman</font>
<br><font size=2 face="sans-serif">jratt@us.ibm.com</font>
<br>
<br>
<br>
<br>
<br>
<table width=100%>
<tr valign=top>
<td width=40%><font size=1 face="sans-serif"><b>Sameer Kumar/Watson/IBM</b></font>
<p><font size=1 face="sans-serif">02/05/08 12:02 PM</font>
<td width=59%>
<table width=100%>
<tr valign=top>
<td>
<div align=right><font size=1 face="sans-serif">To</font></div>
<td><font size=1 face="sans-serif">Joseph Ratterman/Rochester/IBM@IBMUS</font>
<tr valign=top>
<td>
<div align=right><font size=1 face="sans-serif">cc</font></div>
<td><font size=1 face="sans-serif">DCMF <dcmf@lists.anl-external.org>,
dcmf-bounces@lists.anl-external.org, Joseph Ratterman/Rochester/IBM@IBMUS</font>
<tr valign=top>
<td>
<div align=right><font size=1 face="sans-serif">Subject</font></div>
<td><font size=1 face="sans-serif">Re: [dcmf] Re: [PATCH] This test will
repeatedly call the low-level critical-section functions for performance
testing.</font><a href=Notes://d27ml103/86256D260013F37B/38D46BF5E8F08834852564B500129B2C/A85443D93F78405A862573E6005F3452>Link</a></table>
<br>
<table>
<tr valign=top>
<td>
<td></table>
<br></table>
<br>
<br><font size=2 face="sans-serif">Joe,</font>
<br><font size=2 face="sans-serif"> If
the lock is called once per message transaction the rate will be 4.7 MMPS,
if its called twice it will be 2.3 MMPS and four times
1.2 MMPS. So we need to investigate how many times the lock is called
per procnull message send and recv. May be someone in Argonne can
answer that.</font>
<br>
<br><font size=2 face="sans-serif">
sameer.</font>
<br>
<br>
<br>
<br>
<br>
<table width=100%>
<tr valign=top>
<td width=40%><font size=1 face="sans-serif"><b>Joseph Ratterman/Rochester/IBM@IBMUS</b>
</font>
<br><font size=1 face="sans-serif">Sent by: dcmf-bounces@lists.anl-external.org</font>
<p><font size=1 face="sans-serif">02/05/2008 12:23 PM</font>
<td width=59%>
<table width=100%>
<tr valign=top>
<td>
<div align=right><font size=1 face="sans-serif">To</font></div>
<td><font size=1 face="sans-serif">Joseph Ratterman/Rochester/IBM@IBMUS</font>
<tr valign=top>
<td>
<div align=right><font size=1 face="sans-serif">cc</font></div>
<td><font size=1 face="sans-serif">DCMF <dcmf@lists.anl-external.org></font>
<tr valign=top>
<td>
<div align=right><font size=1 face="sans-serif">Subject</font></div>
<td><font size=1 face="sans-serif">[dcmf] Re: [PATCH] This test will repeatedly
call the low-level critical-section functions for performance testing.</font></table>
<br>
<table>
<tr valign=top>
<td>
<td></table>
<br></table>
<br>
<br>
<br><font size=2 face="sans-serif"><br>
Here are the results from running this test:</font><font size=3> <br>
</font><tt><font size=2><br>
$ mpirun -np 1 -nofree -mode SMP ./build-tests/perf/dcmf/CS.cnk</font></tt><font size=3>
</font><tt><font size=2><br>
DCMF_THREAD_SINGLE: Called Enter/Exit 10000 times at 78.0386 cycles each.</font></tt><font size=3>
</font><tt><font size=2><br>
DCMF_THREAD_FUNNELED: Called Enter/Exit 10000 times at 78.0054 cycles each.</font></tt><font size=3>
</font><tt><font size=2><br>
DCMF_THREAD_SERIALIZED: Called Enter/Exit 10000 times at 78.0029 cycles
each.</font></tt><font size=3> </font><tt><font size=2><br>
DCMF_THREAD_MULTIPLE: Called Enter/Exit 10000 times at 180.063 cycles each.</font></tt><font size=3>
</font><tt><font size=2><br>
$ mpirun -np 1 -nofree -mode DUAL ./build-tests/perf/dcmf/CS.cnk</font></tt><font size=3>
</font><tt><font size=2><br>
DCMF_THREAD_SINGLE: Called Enter/Exit 10000 times at 78.0378 cycles each.</font></tt><font size=3>
</font><tt><font size=2><br>
DCMF_THREAD_FUNNELED: Called Enter/Exit 10000 times at 78.0032 cycles each.</font></tt><font size=3>
</font><tt><font size=2><br>
DCMF_THREAD_SERIALIZED: Called Enter/Exit 10000 times at 78.0019 cycles
each.</font></tt><font size=3> </font><tt><font size=2><br>
DCMF_THREAD_MULTIPLE: Called Enter/Exit 10000 times at 196.044 cycles each.</font></tt><font size=3>
<br>
</font><font size=2 face="sans-serif"><br>
While this is a doubling in the time it takes to lock/unlock, that alone
wouldn't drop the one process/thread performance from 4.47 to 1 MMPS. We
will look into it more after we get the benchmark.</font><font size=3>
<br>
<br>
</font><font size=2 face="sans-serif"><br>
Thanks,</font><font size=3> </font><font size=2 face="sans-serif"><br>
Joe Ratterman</font><font size=3> <br>
<br>
<br>
<br>
</font>
<table width=100%>
<tr valign=top>
<td width=33%><font size=1 face="sans-serif"><b>Joseph Ratterman/Rochester/IBM@IBMUS</b>
</font>
<p><font size=1 face="sans-serif">02/05/08 11:19 AM</font><font size=3>
</font>
<td width=66%>
<br>
<table width=100%>
<tr valign=top>
<td width=7%>
<div align=right><font size=1 face="sans-serif">To</font></div>
<td width=92%><font size=1 face="sans-serif">DCMF <dcmf@lists.anl-external.org></font><font size=3>
</font>
<tr valign=top>
<td>
<div align=right><font size=1 face="sans-serif">cc</font></div>
<td><font size=1 face="sans-serif">Joseph Ratterman/Rochester/IBM@IBMUS</font><font size=3>
</font>
<tr valign=top>
<td>
<div align=right><font size=1 face="sans-serif">Subject</font></div>
<td><font size=1 face="sans-serif">[PATCH] This test will repeatedly call
the low-level critical-section functions for performance testing.</font></table>
<br>
<br>
<table width=100%>
<tr valign=top>
<td width=50%>
<td width=50%></table>
<br></table>
<br><font size=3><br>
<br>
</font><tt><font size=2><br>
This is helpful when trying to understand performance degradations in MPI_THREAD_MULTIPLE.<br>
<br>
Signed-off-by: Joe Ratterman <jratt@us.ibm.com><br>
---<br>
sys/tests/perf/Makefile.in |
2 +-<br>
sys/tests/perf/dcmf/CS.c |
64 +++++++++++++++++++++++++++++++++<br>
sys/tests/perf/{ => dcmf}/Makefile.in | 4 +-<br>
3 files changed, 67 insertions(+), 3 deletions(-)<br>
create mode 100644 sys/tests/perf/dcmf/CS.c<br>
copy sys/tests/perf/{ => dcmf}/Makefile.in (96%)<br>
<br>
diff --git a/sys/tests/perf/Makefile.in b/sys/tests/perf/Makefile.in<br>
index 22a7ac7..4266989 100644<br>
--- a/sys/tests/perf/Makefile.in<br>
+++ b/sys/tests/perf/Makefile.in<br>
@@ -12,6 +12,6 @@<br>
# end_generated_IBM_copyright_prolog
#<br>
<br>
VPATH
= @abs_srcdir@<br>
-SUBDIRS
= mpi spi mpid<br>
+SUBDIRS
= mpi spi mpid dcmf<br>
TESTS
= <br>
include @abs_top_builddir@/Make.rules<br>
diff --git a/sys/tests/perf/dcmf/CS.c b/sys/tests/perf/dcmf/CS.c<br>
new file mode 100644<br>
index 0000000..080f5df<br>
--- /dev/null<br>
+++ b/sys/tests/perf/dcmf/CS.c<br>
@@ -0,0 +1,64 @@<br>
+/* begin_generated_IBM_copyright_prolog
*/<br>
+/*
*/<br>
+/* ---------------------------------------------------------------- */<br>
+/* (C)Copyright IBM Corp. 2007, 2008
*/<br>
+/* IBM CPL License
*/<br>
+/* ---------------------------------------------------------------- */<br>
+/*
*/<br>
+/* end_generated_IBM_copyright_prolog
*/<br>
+/**<br>
+ * \file perf/dcmf/CS.c<br>
+ * \brief Test the performance of the low-level critical-section functions<br>
+ */<br>
+<br>
+<br>
+#include <tests.h><br>
+#define NUM 10000<br>
+DCMF_Configure_t config;<br>
+<br>
+<br>
+double time_CS(uint32_t x)<br>
+{<br>
+ uint64_t start, stop;<br>
+ uint32_t i;<br>
+<br>
+ start = DCMF_Timebase();<br>
+ for (i=0; i<x; ++i) {<br>
+ DCMF_CriticalSection_enter(0);<br>
+ DCMF_CriticalSection_exit(0);<br>
+ }<br>
+ stop = DCMF_Timebase();<br>
+<br>
+ return (double)(stop-start)/(double)x;<br>
+}<br>
+<br>
+<br>
+#define time_run(c) time_run_long(c, #c)<br>
+void time_run_long(DCMF_Thread thread_level, char* thread_string)<br>
+{<br>
+ double time;<br>
+ DCMF_Result rc;<br>
+<br>
+ config.thread_level = thread_level;<br>
+ rc = DCMF_Messager_configure (&config, &config);<br>
+ assert(rc == DCMF_SUCCESS);<br>
+ assert(config.thread_level == thread_level);<br>
+ time = time_CS(NUM);<br>
+ printf("%s: Called Enter/Exit %u times at %g cycles each.\n",
thread_string, NUM, time);<br>
+}<br>
+<br>
+<br>
+int main()<br>
+{<br>
+ config.interrupts = DCMF_INTERRUPTS_OFF;<br>
+<br>
+ MPI_INIT;<br>
+<br>
+ time_run(DCMF_THREAD_SINGLE);<br>
+ time_run(DCMF_THREAD_FUNNELED);<br>
+ time_run(DCMF_THREAD_SERIALIZED);<br>
+ time_run(DCMF_THREAD_MULTIPLE);<br>
+<br>
+ MPI_FINALIZE;<br>
+ return (0);<br>
+}<br>
diff --git a/sys/tests/perf/Makefile.in b/sys/tests/perf/dcmf/Makefile.in<br>
similarity index 96%<br>
copy from sys/tests/perf/Makefile.in<br>
copy to sys/tests/perf/dcmf/Makefile.in<br>
index 22a7ac7..4c474b6 100644<br>
--- a/sys/tests/perf/Makefile.in<br>
+++ b/sys/tests/perf/dcmf/Makefile.in<br>
@@ -12,6 +12,6 @@<br>
# end_generated_IBM_copyright_prolog
#<br>
<br>
VPATH
= @abs_srcdir@<br>
-SUBDIRS
= mpi spi mpid<br>
-TESTS
= <br>
+SUBDIRS
= <br>
+TESTS
= CS.c<br>
include @abs_top_builddir@/Make.rules<br>
-- <br>
1.5.4<br>
</font></tt><font size=3><br>
</font><tt><font size=2>_______________________________________________<br>
dcmf mailing list<br>
dcmf@lists.anl-external.org<br>
http://lists.anl-external.org/cgi-bin/mailman/listinfo/dcmf<br>
</font></tt>
<br>
<br>