MOVED TO code.revolt.org/tcp/documentation/-/wik...

New Disks

Upon receiving a new disk we do the following

  1. If this is a new machine we are setting up, we boot d-i, configure the network, get it to a shell and download our tools (smartctl, bonnie++, time). If this is an existing machine, we just apt-get install the tools
  2. For SSDs, ensure that firmware is up to date
  3. Run smartctl tests
    1. Check the health of the drive
      # smartctl -H /dev/sdX
    2. Make sure there are no existing errors by reading the error and selftest logs
      # smartctl -l error /dev/sdX
      # smartctl -l selftest /dev/sdX
      
    3. Run the short test and check it’s results after a couple minutes
      # smartctl -t short /dev/sdX
      # smartctl -l selftest /dev/sdX
      
    4. Run the long test and check it’s results after a couple hours
      # smartctl -t long /dev/sdX
      # smartctl -l selftest /dev/sdX
      
    5. Read the smart values and make sure nothing is wrong. Check “Reallocated_Sector_Ct” (the number of sectors the drive has had to reallocate, high means that we’ve used up a lot of the spares).
      # smartctl -A /dev/sdX
      
  4. Run a timed badblocks on the disk, and compare the time to existing results for this hardware.
    # time badblocks -s -v -w -b 4096 -c 10240 /dev/sdX
    
  5. If you need performance numbers for the drive: Create a partition and filesystem and run bonnie++ (v1.96 or newer) on the disk and compare the results to existing results for this hardware. I usually run bonnie++ 3 times.
    # fdisk /dev/sdX
    # mke2fs -j /dev/sdX1
    # mount /dev/sdX1 /mnt
    # cd /mnt
    # bonnie++ -u 0 -s 16G -n 512
    

    (these bonnie settings are what’s currently needed to get useful output on the current generation of SSDs and work fine on modern HDDs too, for older HDDs you might want to use smaller. but really you want to use what you have existing results for on similar disks for comparison’s sake)
  6. For SSDs do a SATA secure erase in order to reset performance.

Old Disks

When reusing an old disk, use the same procedure as new disks, but pay careful attention to the starting and finishing “Reallocated_Sector_Ct” and make sure it’s not too high (if it goes up a little that OK). Assuming it tests OK, put a sticker on the top of drive that indicates when badblocks was last run.

For Hitachi disks, run the Hitachi ‘Drive Fitness Tool’ to ensure Hitachi thinks the disk is OK. If it fails, RMA the disk.

For SSDs, ensure that firmware is up to date and do a SATA secure erase in order to reset performance.

Disposing of Disks

If the disks is ATA it may support the “secure erase” command. Here are instructions for how to use it.

NOTE: Since we aren’t timing our badblock runs, it’s ok to do them in parallel with other disks.

WARNING: even after running badblocks, there is some risk of data still being on the drive. If sectors are reallocated by the drive, the old sectors that are marked bad may still be readable and contain data. The ATA secure erase procedure is supposed to attempt to delete these as well.