[intrepid-notify] ALCF Intrepid Job State Lost
Bowen Cheetah Goletz
cheetah at alcf.anl.gov
Tue Oct 12 03:19:24 CDT 2010
Dear Intrepid Users:
During upgrades for Intrepid's scheduling system, it became apparent that some
job attributes were not fully recorded for restoration after upgrades were
complete. These attributes included dependency holds and project name. While
significant effort was made to restore the original job state, many dependency
hold chains were incomplete due to parent job failures, and accurate project
assignment could not be guaranteed.
Given the preceding factors, queued jobs where desired project could not be
determined were not restored. Jobs with dependency holds that could not be
fully resolved were queued, but placed on user hold pending user intervention.
While we certainly understand the inconvenience in resubmitting jobs, our
primary concern was ensuring projects were not inappropriately charged and
order of execution be maintained. Where jobs were queued, job IDs are the
same as originally submitted.
Intrepid will be released shortly. Users should verify dependencies, release
user holds and requeue jobs where appropriate. Please direct any questions to
support at alcf.anl.gov.
-ALCF Support Team
More information about the intrepid-notify
mailing list