[notify] NOTE - Possible BGP outage at 2 PM today

Andrew Cherry acherry at mcs.anl.gov
Thu Apr 17 11:29:17 CDT 2008


Late last night, we identified a problem with the BG/P environmental  
monitoring on intrepid that needs to be addressed.  Unfortunately,  
there is a possibility that this may result in having to restart the  
control system again (we are still working with IBM to try to cover  
all other options before we resort to a control system restart).  We  
have therefore reserved the entire system for a possible restart at 2  
PM.   New jobs queued on intrepid will not be started if they are  
long enough run past 2 PM.  We don't expect any currently running  
production jobs to be impacted (since they will be finished by the  
time the work begins), but long-running early science jobs may need  
to be killed if they are still running and we determine that the  
restart is needed.

Access to the login nodes and the filesystems will not be impacted by  
this work - this will only affect the BlueGene itself.

We will send out another note when we know for sure what the impact  
will be.

Thanks...

-Andrew Cherry
  ALCF Support




More information about the intrepid-notify mailing list