[intrepid-notify] Problems with job submission (Wed, 10/21/09)

Jini Ramprakash jini at alcf.anl.gov
Wed Oct 21 09:57:56 CDT 2009


Dear Users,

On the morning of Wednesday, October 21st, intrepid's control system
database exhausted a database instance resource resulting in failure
of the job control system.  Additional resources have since been
allocated, though doing so required shutting down cobalt, the control
system and restarting the database which resulted in cobalt and
job status information becoming unavailable for a short period.

Unfortunately, both cobalt and the control system reached a highly
inconsistent state before additional resources were allocated, which
may have caused job loss.  Users should examine dep_hold jobs in
particular to determine if resubmission is warranted.  Apologies
for this situation and job interruption.

Please feel free to email us if you have further questions or concerns.

Thanks,
ALCF Support Team.


More information about the intrepid-notify mailing list