Quick HDD Health Check

Re­cently I have had to deal with two hard­ware disk fail­ures in quick suc­ces­sion, which has led to two changes:

  1. A much in­creased pace of de­vel­op­ment on emu (more on that soon!).
  2. Tak­ing a more pro­gram­matic and as­tute ap­proach to keep­ing on eye on sys­tem health.

In or­der to fa­cil­it­ate the second point, I ad­ap­ted a simple one-liner from this disk clon­ing with ddrescue tu­torial in or­der to provide a quick and simple over­view of the health of hard drives:

for i in $(ls /dev/sd[a-z]); do echo "=== $i ==="; \
sudo smartctl -a $i | egrep --color=never -i \
"reallocated_sector|ata error|serial|model|user capacity"; \
sudo smartctl -H $i &>/dev/null || sudo smartctl -H $i; echo; done;

Ex­ample out­put for a sys­tem with 5 drives, the fourth of which (/dev/sdd) is fail­ing:

=== /dev/sda ===
Device Model:     WDC WD40EFRX-68WT0N0
Serial Number:    WD-XXXXXXXXXXXX
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0

=== /dev/sdb ===
Model Family:     Western Digital Caviar Green (AF)
Device Model:     WDC WD20EARS-00MVWB0
Serial Number:    WD-XXXXXXXXXXXX
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0

=== /dev/sdc ===
Model Family:     Western Digital Red (AF)
Device Model:     WDC WD20EFRX-68EUZN0
Serial Number:    WD-XXXXXXXXXXXX
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0

=== /dev/sdd ===
Model Family:     Western Digital Caviar Green (AF)
Device Model:     WDC WD20EARS-00MVWB0
Serial Number:    WD-XXXXXXXXXXXX
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.13.0-29-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
Failed Attributes:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   020   020   051    Pre-fail  Always   FAILING_NOW 89263


=== /dev/sde ===
Model Family:     HP 250GB SATA disk VB0250EAVER
Device Model:     VB0250EAVER
Serial Number:    XXXXXXXX
User Capacity:    250,059,350,016 bytes [250 GB]
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0

And here is the com­mand in ex­pan­ded form. Note that it runs the SMART health check, and will show the out­put is the test fails.

for i in $(ls /dev/sd[a-z]); do
    echo "=== $i ==="
    sudo smartctl -a $i | egrep --color=never \
        -i "reallocated_sector|ata error|serial|model|user capacity"
    sudo smartctl -H $i &>/dev/null || sudo smartctl -H $i
    echo
done

Note also that this de­pends on the smartctl bin­ary, which is prob­ably not in­cluded in most de­fault dis­tri­bu­tions. In Debian, you need to apt-get the smartmontools pack­age:

sudo apt-get install smartmontools

AI researcher specializing in compilers and code optimization.