Tags: RAID
Ouch! This morning i discovered that one of the mirror for the system
array disks was broken on one of the servers, as can be show in the
/var/log/messages
log file:
Jul 1 08:22:56 bento kernel: ad4: FAILURE - WRITE_DMA status=51<READY,DSC,ERROR> error=14<NID_NOT_FOUND,ABORTED> LBA=16082271
Jul 1 08:22:56 bento kernel: ar0: WARNING - mirror lost
Jul 1 08:22:57 bento kernel: ad4: FAILURE - WRITE_DMA status=51<READY,DSC,ERROR> error=14<NID_NOT_FOUND,ABORTED> LBA=16099295
So, assuming that the hardware error is not so bad and maybe recoverable (i really don't want to replay the last backups if i can), i follow these steps to rebuild the faulting RAID1 array...
Check for more information on ATA devices and the impacted array:
# atacontrol list
ATA channel 0:
Master: acd0 <SAMSUNG CD-ROM SC-152L/C100> ATA/ATAPI revision 0
Slave: no device present
ATA channel 1:
Master: no device present
Slave: no device present
ATA channel 2:
Master: ad4 <Maxtor 6Y080P0/YAR41VW0> ATA/ATAPI revision 7
Slave: no device present
ATA channel 3:
Master: ad6 <Maxtor 6Y080P0/YAR41VW0> ATA/ATAPI revision 7
Slave: no device present
ATA channel 4:
Master: ad8 <Maxtor 6Y120P0/YAR41VW0> ATA/ATAPI revision 7
Slave: no device present
ATA channel 5:
Master: ad10 <Maxtor 6Y120P0/YAR41VW0> ATA/ATAPI revision 7
Slave: no device present
#
# atacontrol status ar0
ar0: ATA RAID1 subdisks: ad6 status: DEGRADED
Detach the disk from the array (then it will be safely removable if necessary):
# atacontrol detach 2
#
# grep ad4 /var/log/messages | tail -1
Jul 2 11:13:40 bento kernel: ad4: WARNING - removed from configuration
Reattach the disk to the configuration:
# atacontrol attach 2
Master: ad4 <Maxtor 6Y080P0/YAR41VW0> ATA/ATAPI revision 7
Slave: no device present
#
# grep ad4 /var/log/messages | tail -1
Jul 2 11:13:47 bento kernel: ad4: 78167MB <Maxtor 6Y080P0/YAR41VW0> [158816/16/63] at ata2-master UDMA133
Add a spare disk (the same as before in fact, in our case) to the existing system RAID:
# atacontrol addspare ar0 ad4
#
# grep ad4 /var/log/messages | tail -1
Jul 2 11:14:03 bento kernel: ad4: inserted into ar0 disk0 as spare
Rebuild the RAID1 dynamically:
# atacontrol rebuild ar0
Check the progression of the rebuild:
# atacontrol status ar0
ar0: ATA RAID1 subdisks: ad4 ad6 status: REBUILDING 7% completed
When all is done, this can be shown using atacontrol(8)
as follow:
# atacontrol status ar0
ar0: ATA RAID1 subdisks: ad4 ad6 status: READY