ewams

Contents: Overview of breaking the RAID
In the last article it was demonstrated how to simulate RAID failures and recover from them. The problem with simulating is that it is a simulation, especially with failures, because it is not truly known the consequences of failures and how they will be handled.

This article is not a how-to but merely a demonstration of what happens when a disk is hard removed from a RAID 5 array, pulling the plug per say. The five hard drives are /dev/sdb sdc sdd sde and sdf. The RAID5 device is /dev/md0. The ext4 filesystem that resides on /dev/md0 is /dev/md0p1 and it is mounted to /data.

Everything was performed on a Debian 6.0 "Squeeze" system (Linux debian 2.6.32-5-amd64) with a RAID 5 array created out of five 5GB disks, one as a hot spare, with the ext4 filesystem as the primary partition. It is running as a guest in a virtual machine on VMware's ESXi 4.1.0 build 348481.

It is recommended to not perform the activities below as it may damage your systems.

Pulling the plug on an idle system
  1. The status of the array is good, 5 total disks, 4 are for data and 1 is a hot spare.
    root@debian~#: cat /proc/mdstat
    md0 : active raid5 sde[5] sdf[4](S) sdd[2] sdc[1] sdb[0]
        15724032 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
    root@debian~#: df -h | grep md0
    /dev/md0p1    15G    176M    15G    2%    /data

  2. Go in to the VM settings and delete one of the presented hard drives. Then do an ls on /data and monitor /var/log/messages and /proc/mdstat for changes.
    root@debian:/data# ls
    lost+found testfile1 testfile2
    root@debian:/data# tail -f /var/log/messages
    sd 2:0:1:0: [sdb] Unhandled error code
    sd 2:0:1:0: [sdb] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
    sd 2:0:1:0: [sdb] CDB: Write(10): 2a 00 00 00 00 08 00 00 02 00
    md: super_written gets error=-5, uptodate=0
    RAID5 conf printout:
    - - - rd:4 wd:3
    disk 0, o:0, dev:sdb
    disk 1, o:1, dev:sdc
    disk 2, o:1, dev:sdd
    disk 3, o:1, dev:sde
    RAID5 conf printout:
    - - - rd:4 wd:3
    disk 1, o:1, dev:sdc
    disk 2, o:1, dev:sdd
    disk 3, o:1, dev:sde
    RAID5 conf printout:
    - - - rd:4 wd:3
    disk 0, o:1, dev:sdf
    disk 1, o:1, dev:sdc
    disk 2, o:1, dev:sdd
    disk 3, o:1, dev:sde
    md: recovery of RAID array md0
    The system does discover that the drive is unreachable and starts the process of trying to rebuild the array.

    root@debian:/data# cat /proc/mdstat
    md0 : active raid5 sde[5] sdf[4] sdd[2] sdc[1] sdb[0](F)
        15724032 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [_UUU]
        [>. . . . . . . . . . .] recovery = 7.2% (380928/5241344) finish=4.0min speed=20048K/sec
    mdstat shows sdb marked as failed and moves /dev/sdf into the array to start the recovery process.


Pulling the plug with I/O
  1. Created a 500MB file named /data/testfile3 with data from /dev/urandom and copied it to /data/testfile4. While the copy was underway deleted a drive from the system.
    root@debian:/data# ls
    lost+found testfile1 testfile2 testfile3
    root@debian:/data# cat testfile3 >> testfile4
    root@debian:/data# tail -f /var/log/messages
    sd 2:0:3:0: [sdd] Unhandled error code
    sd 2:0:3:0: [sdd] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
    sd 2:0:3:0: [sdd] CDB: Write(10): 2a 00 00 14 20 00 00 30 00 00
    RAID5 conf printout:
    - - - rd:4 wd:3
    disk 0, o:1, dev:sdb
    disk 1, o:1, dev:sdc
    disk 2, o:0, dev:sdd
    disk 3, o:1, dev:sde
    RAID5 conf printout:
    - - - rd:4 wd:3
    disk 0, o:1, dev:sdb
    disk 1, o:1, dev:sdc
    disk 3, o:1, dev:sde
    RAID5 conf printout:
    - - - rd:4 wd:3
    disk 0, o:1, dev:sdb
    disk 1, o:1, dev:sdc
    disk 2, o:1, dev:sdf
    disk 3, o:1, dev:sde
    md: recovery of RAID array md0
    The system does discover that the drive is unreachable and starts the process of trying to rebuild the array.

    root@debian:/data# cat /proc/mdstat
    md0 : active raid5 sde[5] sdf[4] sdd[2](F) sdc[1] sdb[0]
        15724032 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [UU_U]
        [>. . . . . . . . . . .] recovery = 5.0% (263424/5241344) finish=20.0min speed=4143K/sec
    mdstat shows mb marked as failed and moves /dev/sdf into the array to start the recovery process.
Pulling the plug and then pulling the plug
  1. While the RAID array is rebuilding from the previous section, this is what happens when a second disk is pulled.
    root@debian:/data# ls
    lost+found testfile1 testfile2 testfile3
    root@debian:/data# cat testfile3 >> testfile4

    [first disk is pulled here]

    root@debian:/data# tail -f /var/log/messages
    sd 2:0:3:0: [sdd] Unhandled error code
    sd 2:0:3:0: [sdd] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
    sd 2:0:3:0: [sdd] CDB: Write(10): 2a 00 00 14 20 00 00 30 00 00
    RAID5 conf printout:
    - - - rd:4 wd:3
    disk 0, o:1, dev:sdb
    disk 1, o:1, dev:sdc
    disk 2, o:0, dev:sdd
    disk 3, o:1, dev:sde
    RAID5 conf printout:
    - - - rd:4 wd:3
    disk 0, o:1, dev:sdb
    disk 1, o:1, dev:sdc
    disk 3, o:1, dev:sde
    RAID5 conf printout:
    - - - rd:4 wd:3
    disk 0, o:1, dev:sdb
    disk 1, o:1, dev:sdc
    disk 2, o:1, dev:sdf
    disk 3, o:1, dev:sde
    md: recovery of RAID array md0

    [second disk is pulled here]

    sd 2:0:1:0: [sdb] Unhandled error code
    sd 2:0:1:0: [sdb] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
    sd 2:0:1:0: [sdb] CDB: Write(10): 2a 00 00 13 28 00 00 04 00 00
    lost page write due to I/O error on md0p1
    lost page write due to I/O error on md0p1
    lost page write due to I/O error on md0p1
    lost page write due to I/O error on md0p1
    lost page write due to I/O error on md0p1
    JBD2: Detected IO errors while flushing file data on md0p1-8
    JBD2: Detected IO errors while flushing file data on md0p1-8
    md: md0: recovery done.
    RAID5 conf printout:
    - - - rd:4 wd:2
    disk 0, o:0, dev:sdb
    disk 1, o:1, dev:sdc
    disk 2, o:1, dev:sdf
    disk 3, o:1, dev:sde
    RAID5 conf printout:
    - - - rd:4 wd:2
    disk 1, o:1, dev:sdc
    disk 2, o:1, dev:sdf
    disk 3, o:1, dev:sde
    RAID5 conf printout:
    - - - rd:4 wd:2
    disk 1, o:1, dev:sdc
    disk 3, o:1, dev:sde

    cat: write error: Read-only file system
    This box is an unhappy camper. As usual, when the first drive is pulled it is discovered and md removes the bad drive from the array and places the spare in. About 20 seconds later I pulled another drive from the VM and it is detected. Since reads and writes were being performed on the volume the system complains about I/O errors. Interestingly, md removes the second failed disk (/dev/sdb) from the array and then it removes the hot spare that was rebuilding (/dev/sdf). The filesystem is then placed in read-only mode.

    root@debian:/data# cat /proc/mdstat
    md0 : active raid5 sde[5] sdf[4](S) sdd[2](F) sdc[1] sdb[0](F)
        15724032 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/2] [_U_U]
    mdstat shows that md has failed /dev/sdd and /dev/sdb. /dev/sdf is changed back to a spare device. /data is no longer usable but the system does keep it mounted.


Conclusion
md behaved as expected when a single drive is pulled.

This test is a great example that simulated failures are not real failures because the system may perform differently. Above, when a second drive is pulled while the RAID array is still rebuilding you can see in /var/log/messages that there is an entry for "md: md0: recovery done." It is possible and highly likely that when a person was testing failures they discovered this line to exist once md has rebuilt an array, the person then goes and writes a log analyzer that looks for this line which reports up to the sysadmins / NOC that all is green checkmarks. In the case of two failed drives this could be a problem as the filesystem is no longer usable and a monitoring program may say the system is green checkmarks when it is infact red X's.

Simulated failures are a great way to train personnel and test systems but it does not compare to the real thing. When an actual failure happens, will you be ready?

Written by Eric Wamsley
Posted: July 29th, 2011 12:38am.
Topic: breaking RAID5
Tags: testing RAID5 Debian Linux mdadm failures


 Eric Wamsley - ewams.net