[intrepid-notify] UPDATE: intrepid-fs0 scan complete, preparing for repair scan
ALCF Support
support at alcf.anl.gov
Mon Dec 10 19:02:58 CST 2012
The scan phase of the fsck has completed successfully and the process
that identifies the corrupted files is almost completed. After it
completes, the remaining corrupted files will be copied off, and all the
corrupted files will be deleted. Once these files have been removed, the
repair scan will be started. The estimation of the timeline has not
changed - the repair scan is expected to take a minimum of a week.
Several users have asked why we are so concerned about finding and
repairing such a small number of corrupted files. It is critical to find
and remove all cross-linked files, not to save potentially corrupted
files, but to protect the file system from further corruption that could
have a much wider impact if the cross-linked files have not been
removed. The repair scan is required to ensure that all cross-linked
files have been removed.
To better explain the issue, here is the scenario:
- Two files, D and E, think they have block "foo" (they are cross-linked).
- E was the last file written, so E is good and D is corrupt.
- File D is deleted, which tells GPFS that block foo is free to be
re-allocated.
- File F is written, and gets allocated block foo, corrupting file E.
- The user discovers file E is corrupted and deletes it, which tells
GPFS that block foo is free to be re-allocated.
- File G is written, and gets allocated block foo, corrupting file F.
We apologize for the inconvenience, and are working to resolve this
issue as quickly as possible. Again, we still believe the fsck repair
will take a minimum of a week.
If you have any questions please don't hesitate to contact your Catalyst
or the ALCF help desk (support at alcf.anl.gov).
Thank you,
ALCF Support
More information about the intrepid-notify
mailing list