Replace faulty disk in SVM mirror

A disk I have in a production machine went bad:

d4: Mirror
  Submirror 0: d14
    State: Okay
  Submirror 1: d24
    State: Needs maintenance
     Pass: 1
     Read option: roundrobin (default)
     Write option: parallel (default)
     Size: 120850176 blocks (57 GB)

d14: Submirror of d4 State: Okay Size: 120850176 blocks (57 GB) Stripe 0: Device Start Block Dbase State Reloc Hot Spare c1t0d0s4 0 No Okay Yes

d24: Submirror of d4 State: Needs maintenance Invoke: metareplace d4 c1t1d0s4 Size: 120850176 blocks (57 GB) Stripe 0: Device Start Block Dbase State Reloc Hot Spare c1t1d0s4 0 No Maintenance Yes</pre>

The first thing I did was check iostat to see how bad the situation was:

bash-3.00# iostat -En
...
c1t1d0           Soft Errors: 9 Hard Errors: 98 Transport Errors: 27
Vendor: SEAGATE  Product: ST373207LSUN72G  Revision: 045A Serial No: 060133PK2W
Size: 73.40GB <73400057856>
Media Error: 84 Device Not Ready: 0 No Device: 14 Recoverable: 9
Illegal Request: 0 Predictive Failure Analysis: 0...

98 Hard Errors doesn’t look good. (It was probably less the first time I noticed the problem.) Let’s do a surface scan: format -> 1 -> analyze -> read -> y

Without posting the output- suffice it to say that I need to replace the disk. To do this we will have to dettach it from the mirror and offline the disk. If your disk is also part of a ZFS pool, you will need to dettach it from there as well.

Assuming the bad disk is c1t1d0, this will break the mirror:

for a in `metastat -c | grep c1t1 | awk '{print $1}'`;
     do A=`echo $a | sed 's/.$/0/'`;
     metadetach -f $A $a;
     metaclear -f $a;
done

You can use zpool detach poolname device to break any basic zfs mirrors.

Then delete any metadb’s that you have on the bad disk. This can be a little tricky. You want at least 3 dbs to remain. If you followed SUN’s advice and put 2 replica state databases on each of the two disks (SunFire v210) then you might want to add some more before you delete the ones on the bad disk. FYI: You cannot add db’s to a slice which already has DB’s on it.

Assuming the metadb’s are on slice 3, metadb -d c1t1d0s3 will delete them and leave you free to offline the disk.

bash-3.00# cfgadm -al
Ap_Id                          Type         Receptacle   Occupant     Condition
c0                             scsi-bus     connected    configured   unknown
c0::dsk/c0t0d0                 CD-ROM       connected    configured   unknown
c1                             scsi-bus     connected    configured   unknown
c1::dsk/c1t0d0                 disk         connected    configured   unknown
c1::dsk/c1t1d0                 disk         connected    configured   unknown
c2                             scsi-bus     connected    unconfigured unknown
usb0/1                         unknown      empty        unconfigured ok
usb0/2                         unknown      empty        unconfigured ok
bash-3.00# cfgadm -c unconfigure c1::dsk/c1t1d0

At this point, a blue LED should light up next to the disk which needs to be replaced (at least it does in a V210, other hardware might be different). Replace the disk and get ready to undo everything we did 😉

bash-3.00# cfgadm -c configure c1::dsk/c1t1d0
bash-3.00# format 
# Label the disk with format if necessary
bash-3.00# prtvtoc /dev/rdsk/c1t0d0s2 | fmthard -s - /dev/rdsk/c1t1d0s2
bash-3.00# metadb 
     flags           first blk       block count
  a m  p  luo        16              8192            /dev/dsk/c1t0d0s3
  a    p  luo        8208            8192            /dev/dsk/c1t0d0s3
  a    p  luo        16400           8192            /dev/dsk/c1t0d0s3
  a    p  luo        24592           8192            /dev/dsk/c1t0d0s3
bash-3.00# metadb -a -c 4 c1t1d0s3
bash-3.00# metadb
     flags           first blk       block count
  a m  p  luo        16              8192            /dev/dsk/c1t0d0s3
  a    p  luo        8208            8192            /dev/dsk/c1t0d0s3
  a    p  luo        16400           8192            /dev/dsk/c1t0d0s3
  a    p  luo        24592           8192            /dev/dsk/c1t0d0s3
  a        u         16              8192            /dev/dsk/c1t1d0s3
  a        u         8208            8192            /dev/dsk/c1t1d0s3
  a        u         16400           8192            /dev/dsk/c1t1d0s3
  a        u         24592           8192            /dev/dsk/c1t1d0s3
bash-3.00# metastat -c
d20              m  4.0GB d21  d21
          s  4.0GB c1t0d0s1
d10              m  4.0GB d11  d11
          s  4.0GB c1t0d0s0
bash-3.00# metainit d22 1 1 c1t1d0s1
d22: Concat/Stripe is setup
bash-3.00# metainit d12 1 1 c1t1d0s0
d12: Concat/Stripe is setup
bash-3.00# metattach d20 d22
d20: submirror d22 is attached
bash-3.00# metattach d10 d12
d10: submirror d12 is attached
bash-3.00# metastat
d20: Mirror
   Submirror 0: d21
    State: Okay
   Submirror 1: d22
    State: Resyncing
  Resync in progress: 8 % done
  Pass: 1
  Read option: roundrobin (default)
  Write option: parallel (default)
  Size: 8392080 blocks (4.0 GB)

d21: Submirror of d20 State: Okay Size: 8392080 blocks (4.0 GB) Stripe 0: Device Start Block Dbase State Reloc Hot Spare c1t0d0s1 0 No Okay Yes

d22: Submirror of d20 State: Resyncing Size: 8392080 blocks (4.0 GB) Stripe 0: Device Start Block Dbase State Reloc Hot Spare c1t1d0s1 0 No Okay Yes

d10: Mirror Submirror 0: d11 State: Okay Submirror 1: d12 State: Resyncing Resync in progress: 0 % done Pass: 1 Read option: roundrobin (default) Write option: parallel (default) Size: 8392080 blocks (4.0 GB)

d11: Submirror of d10 State: Okay Size: 8392080 blocks (4.0 GB) Stripe 0: Device Start Block Dbase State Reloc Hot Spare c1t0d0s0 0 No Okay Yes

d12: Submirror of d10 State: Resyncing Size: 8392080 blocks (4.0 GB) Stripe 0: Device Start Block Dbase State Reloc Hot Spare c1t1d0s0 0 No Okay Yes

Device Relocation Information: Device Reloc Device ID c1t1d0 Yes id1,sd@SFUJITSU_MAW3147NC____DAA0P7203F0V c1t0d0 Yes id1,sd@SFUJITSU_MAW3147NC____DAA0P7203F1N</pre>

Don’t forget to rebuild your zfs pool if necessary.

Yonah Russ

Yonah Russ
Production tested jack of all trades and master of some. Forgotten more than I remember, some of which is written here. Occasional BOFH but only because I care. KISS is Bliss!

Being a startup shouldn't mean starting from scratch

In an open retrospective, Eden Shochat, from Aleph VC, asks what they could do better to help the companies they invest in. I really appr...… Continue reading

Triton Bare Metal Containers FTW!

Published on June 22, 2015