<html><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div>If you've recompiled any of your code for the new BlueGene V1R3M0 driver on Intrepid, or are in the process of doing so, please read this note. </div><div><br></div><div>As mentioned in the communication below, there is a possibility that we will back out of the new V1R3M0 driver installation on Intrepid if we don't find a resolution to the problems we are seeing by the close of business today. One implication of this is that binaries rebuilt for the new driver will most likely not work on the old one. If you are rebuilding your code, please make sure to keep a copy of your old binaries in case we need to revert back.</div><div><br></div><div>If you have already rebuilt your code and overwrote or deleted your original binaries, it is still possible to recover your original files from one of the GPFS snapshots. If you cd into the .snapshots directory in your home directory, you will find several directories named after days of the week. Each of these contains a snapshot of your home directory from the last occurrence of that day; for example, if you need to recover your files from last Saturday, you'll find them in the "saturday" directory. The snapshots are only good for a week (i.e. the "saturday" directory is reused every Saturday), so there's limited time to copy data out of them. Unfortunately, we didn't get a snapshot on Sunday this past week, so if you need something older than Monday's copy you'll need to get it from the Saturday copy. If you do need to get any files from last Saturday's or Monday's snapshots, I suggest you do so today before the snapshots are overwritten tomorrow and Monday.</div><div><br></div><div>Sorry for any inconvenience. We will let you know later today if we end up reverting back to the old driver.</div><div><br></div><div>Andrew Cherry</div><div>ALCF Support</div><div><br></div><div><div>On Jan 22, 2009, at 10:58 PM, <a href="mailto:tstacey@alcf.anl.gov">tstacey@alcf.anl.gov</a> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div>We continue to experience extremely unstable behavior when running jobs<br>at scale on the new V1R3M0 driver. Debugging has been complicated by<br>diagnostics calling out intermittent hardware issues. We currently have<br>a level 1 PMR (very high priority trouble ticket) open with IBM and they<br>are working on this problem. If we have not been able to resolve the<br>problem by tomorrow afternoon, we will revert back to our previous<br>driver, so that jobs can run over the weekend.<br><br>We apologize for any inconvenience this outage has caused and are<br>grateful for your continued patience as we work through this issue.<br>Please contact <a href="mailto:support@alcf.anl.gov">support@alcf.anl.gov</a> if you have any further questions.<br><br>Thanks, The ALCF Support Team<br>_______________________________________________<br>intrepid-notify mailing list<br><a href="mailto:intrepid-notify@alcf.anl.gov">intrepid-notify@alcf.anl.gov</a><br><a href="http://lists.alcf.anl.gov/cgi-bin/mailman/listinfo/intrepid-notify">http://lists.alcf.anl.gov/cgi-bin/mailman/listinfo/intrepid-notify</a><br></div></blockquote></div><br></body></html>