[intrepid-notify] Cobalt issues on Intrepid
Andrew Cherry
acherry at alcf.anl.gov
Wed Feb 11 23:23:32 CST 2009
We've encountered an issue with Cobalt on Intrepid that has caused it to lose track of which partitions are and aren't in use. This has caused a number of jobs to attempt to start on partitions that are already blocked or busy, resulting in job failures. To remedy the situation, we have temporarily stopped all queues in order to allow the most of the remaining jobs to finish running. Once the system has drained, we will restart Cobalt and resume service. Based on the walltimes of currently running jobs, I'm estimating it will be feasible to restore service in about two hours.
We apologize for any inconvenience.
Andrew Cherry
ALCF Support
More information about the intrepid-notify
mailing list