[intrepid-notify] ALCF Intrepid Job State Lost

Bowen Cheetah Goletz cheetah at alcf.anl.gov
Tue Oct 12 03:19:24 CDT 2010


Dear Intrepid Users:

During upgrades for Intrepid's scheduling system, it became apparent that some 
job attributes were not fully recorded for restoration after upgrades were 
complete.  These attributes included dependency holds and project name.  While 
significant effort was made to restore the original job state, many dependency 
hold chains were incomplete due to parent job failures, and accurate project 
assignment could not be guaranteed.

Given the preceding factors, queued jobs where desired project could not be 
determined were not restored.  Jobs with dependency holds that could not be 
fully resolved were queued, but placed on user hold pending user intervention.

While we certainly understand the inconvenience in resubmitting jobs, our 
primary concern was ensuring projects were not inappropriately charged and 
order of execution be maintained.  Where jobs were queued, job IDs are the 
same as originally submitted.

Intrepid will be released shortly.  Users should verify dependencies, release 
user holds and requeue jobs where appropriate.  Please direct any questions to 
support at alcf.anl.gov.

-ALCF Support Team


More information about the intrepid-notify mailing list