[intrepid-notify] UPDATE: intrepid-fs0 scan complete, preparing for repair scan

ALCF Support support at alcf.anl.gov
Mon Dec 10 19:02:58 CST 2012


The scan phase of the fsck has completed successfully and the process 
that identifies the corrupted files is almost completed. After it 
completes, the remaining corrupted files will be copied off, and all the 
corrupted files will be deleted. Once these files have been removed, the 
repair scan will be started. The estimation of the timeline has not 
changed - the repair scan is expected to take a minimum of a week.

Several users have asked why we are so concerned about finding and 
repairing such a small number of corrupted files. It is critical to find 
and remove all cross-linked files, not to save potentially corrupted 
files, but to protect the file system from further corruption that could 
have a much wider impact if the cross-linked files have not been 
removed. The repair scan is required to ensure that all cross-linked 
files have been removed.

To better explain the issue, here is the scenario:

- Two files, D and E, think they have block "foo" (they are cross-linked).
- E was the last file written, so E is good and D is corrupt.
- File D is deleted, which tells GPFS that block foo is free to be 
re-allocated.
- File F is written, and gets allocated block foo, corrupting file E.
- The user discovers file E is corrupted and deletes it, which tells 
GPFS that block foo is free to be re-allocated.
- File G is written, and gets allocated block foo, corrupting file F.

We apologize for the inconvenience, and are working to resolve this 
issue as quickly as possible. Again, we still believe the fsck repair 
will take a minimum of a week.

If you have any questions please don't hesitate to contact your Catalyst 
or the ALCF help desk (support at alcf.anl.gov).

Thank you,
ALCF Support


More information about the intrepid-notify mailing list