Monday, June 4, 2007

OS X Tiger and bad sectors on the hard drive

Last week, my Mac suddenly started beachballing and got on progressively slower and slower. I could not reboot from the disk, and was worried that my disk had crashed and I would need to invest in putty knives etc to open and replace the hard drive. Here is my course of action to get my Mac working again.

  1. So, the disk is not very well. Booted into single user and try "fsck -fy /dev/disk0s3" (/dev/disk0s3 was my drive, other's may have a different ID, do a df to check). Did not help.
  2. Read somewhere about "fsck_hfs -r /dev/disk0s3" which had magically repaired catalogs etc. Could not repair it and gave me an "Invalid sibling link" and other errors.
  3. Booted off the install disk (press C while booting) and tried the disk repair utility. Gave similar messages to 2. (probably the same tool, different interfaces). Bottom line, said could not repair disk.
  4. At this point I had given up keeping the data on the disk intact, and went for an erase and install from the install disk. Took a while. One problem was that a repair unmounted the drive and had to erase it to get it mounted again. Not a big deal.
  5. Booted up fine, and I thought I had solved my problem. Started restoring files from backup disk, and suddenly started getting beachballs and system slowdown. Checked /var/log/system.log and found the dreaded disk0s3: I/O error. Booted off the install disk again and ran disk repair and found the same "Invalid sibling link", disk cannot be repaired messages from disk utility.
  6. Tried 5, and number of times and convinced myself that until I started moving lots of data on to the disk, after an erase and install, everything worked OK. This was strange as a crashed disk should not let me get that far. But possibly the failure on the disk was intermittent and sooner or later I would hit it. But it was strange that I could download and install my needed utilities and run after a clean install and it would work fine, until huge disk transfers.
  7. Posted my problem at a number of websites and had one good hint regarding something which had come to my mind, but I had not followed it through. Do a zero out data followed by erase and install. The person who advised this also told to zero out the free space on the drive, ie zero out everything. (Zeroing once is enough).
  8. THE REASON: Apparently OS X (Tiger) does not deal with bad sectors on the disk very well, through its disk repair utilities. So if you have a few bad sectors on the drive you may never be able to tell OS X not to use them. Consequently, it will work until the OS uses this bad sector in a critical way (like in a catalog, or file table) and it goes from bad to worse from there. Zeroing out the hard drive seems to mark these bad sectors so the OS does not use them anymore.
  9. This did seem to conform to the symptoms I saw with the clean install working, and then failing later.
  10. Tried it out, and the Mac has been up for the last 4 days working fine.
So I have postponed buying the putty knives for now, although bad sectors on the disk is not good news, and I will probably have to invest in a new disk at some future time. Will backup my stuff more regularly for now.

The zero out, erase and install did not seem to be very well known as a solution to disk crashing problems. People usually invest in disk repairing tools after the first few steps fail.

2 comments:

stew said...

I know of someone with a similar problem on an Intel-platform laptop.
Although they are running XP, unrepairable 'Volume Bitmap' errors (same message in XP or OSX) require the same type of repair.
What about a bootable CD which reads the data, writes 0's (or low-level formats a section), then writes data back?
Since both platforms have Intel-compatible hardware, should the technique work on both Mac and PC's?

CB said...

Not sure if read->zero->write will help in the case of bad sectors. There should be a way to have the OS be aware of bad sectors through some sort of a disk scan.