[intrepid-notify] Intrepid instability

tstacey at alcf.anl.gov tstacey at alcf.anl.gov
Thu Jan 22 22:58:37 CST 2009


We continue to experience extremely unstable behavior when running jobs
at scale on the new V1R3M0 driver.  Debugging has been complicated by
diagnostics calling out intermittent hardware issues.  We currently have
a level 1 PMR (very high priority trouble ticket) open with IBM and they
are working on this problem.  If we have not been able to resolve the
problem by tomorrow afternoon, we will revert back to our previous
driver, so that jobs can run over the weekend.

We apologize for any inconvenience this outage has caused and are
grateful for your continued patience as we work through this issue.
Please contact support at alcf.anl.gov if you have any further questions.

Thanks, 
The ALCF Support Team



More information about the intrepid-notify mailing list