blog'o thnet

To content | To menu | To search

Wednesday 3 December 2008

GRUB Boot Archive With SVM, A Better Approach

In a previous discussion about the GRUB boot archive and how it can be regenerated in Failsafe mode, I mentioned that it will not be as easy as it can be when the root file system use the md driver. I previously show a method to do this which necessitate to unmirror one or more file systems when the root file system is build upon a SVM mirror. This was not very optimal since a lot of of manipulations are involved, which may lead to human error(s), and may seems to be a little complicated.

This method was build on Performing System Recovery from the Solaris Volume Manager official documentation, which show up last month on the Sun-Managers mailing list.

Note: Although this test case was done using Solaris 10 10/08 under a virtual machine build upon VirtualBox on latest OpenSolaris release, the instructions must be valid for Solaris 10 1/06 and later.

Initial setup

As we saw before, the system use only a root file system, and a swap device. Both are encapsulated with SVM:

# df -k -F ufs
Filesystem     kbytes    used   avail capacity  Mounted on
/dev/md/dsk/d0 6147798 3455578 2630743      57%  /
# swap -l
swapfile             dev  swaplo blocks   free
/dev/md/dsk/d1      85,1       8 4194288 4194288
# metastat -c d0 d1
d0               m  6.0GB d10 d20
    d10          s  6.0GB c0d0s0
    d20          s  6.0GB c1d1s0
d1               m  2.0GB d11 d21
    d11          s  2.0GB c0d0s1
    d21          s  2.0GB c1d1s1

Regenerate the GRUB boot archive

The idea is to boot on the GRUB Failsafe mode, get the md configuration from local root file system, and load manually the md module, hence properly configured. The main advantage is to be fully self hosted from the Failsafe mode, and not have to manipulate SVM more than necessary, especially when breaking the mirror, loosing redundancy for a time.

[...]
Booting to milestone "milestone/single-user:default".
Configuring devices.
Searching for installed OS instances...
/dev/dsk/c0d0s0 is under md control, skipping.
/dev/dsk/c1d1s0 is under md control, skipping.
No installed OS instance found.

Starting shell.
# mount -F ufs -o ro /dev/dsk/c0d0s0 /a
# cp -p /a/kernel/drv/md.conf /kernel/drv
# umount /a
# update_drv -f md
devfsadm: mkdir failed for /dev 0x1ed: Read-only file system
# metainit -r
# metasync d0
# fsck /dev/md/rdsk/d0
# mount -F ufs /dev/md/dsk/d0 /a
# bootadm update-archive -R /a
# umount /a
# reboot

Really interesting!

Thursday 13 March 2008

Update A Corrupted GRUB Boot Archive, With SVM

In a previous discussion about the GRUB boot archive and how it can be regenerated, I mentioned that it will not be as easy as it can be when the root file system use the md driver. I will now show two different methods to do the same thing when the root file system is build upon a SVM mirror (RAID-1):

  1. Unmirror the root file system only.
  2. Unmirror the entire system, i.e. all devices.

Note: Although this test case was done using Solaris 10 8/07 under a virtual machine build upon VirtualBox on latest Solaris Express Community Edition, the instructions must be valid for Solaris 10 1/06 and later.

Initial setup

As we can see, the system use only a root file system, and a swap device. Both are encapsulated with SVM.

# df -k -F ufs
Filesystem     kbytes    used   avail capacity  Mounted on
/dev/md/dsk/d0 6147798 3455578 2630743      57%  /
# swap -l
swapfile             dev  swaplo blocks   free
/dev/md/dsk/d1      85,1       8 4194288 4194288
# metastat -c d0 d1
d0               m  6.0GB d10 d20
    d10          s  6.0GB c0d0s0
    d20          s  6.0GB c1d1s0
d1               m  2.0GB d11 d21
    d11          s  2.0GB c0d0s1
    d21          s  2.0GB c1d1s1

Unmirror the root file system only

The idea is to boot on the GRUB Failsafe mode, select the first side of the mirror, and modify the system and vfstab configuration files to use the correct device path. For the system file, this means to actually remove the rootdev:/pseudo/md@0:0,0,bl entry, not just comment it. For the vfstab file, this means replacing the root file system metadevice path /dev/md/[r]dsk/d0 by the first underlying device path, i.e. /dev/[r]dsk/c0d0s0. Last, regenerate the boot archive on the alternate root path.

[...]
Booting to milestone "milestone/single-user:default".
Configuring devices.
Searching for installed OS instances...
/dev/dsk/c0d0s0 is under md control, skipping.
/dev/dsk/c1d1s0 is under md control, skipping.
No installed OS instance found.

Starting shell.
# fsck /dev/rdsk/c0d0s0
# mount -F ufs /dev/dsk/c0d0s0 /a
# cp /a/etc/system /a/etc/system.bckp
# cp /a/etc/vfstab /a/etc/vfstab.bckp
# TERM=vt100 vi /a/etc/system
# TERM=vt100 vi /a/etc/vfstab
# bootadm update-archive -R /a
# umount /a
# fsck /dev/rdsk/c0d0s0
# reboot

Then, boot into milestone/multi-user:default level and detach the second half of the mirror, since the first half correspond to the valid and updated underlying device. Next, restore the original configuration files which refers to the encapsulated metadevices, and reboot.

# df -k -F ufs
Filesystem            kbytes    used   avail capacity  Mounted on
/dev/dsk/c0d0s0      6147798 3458810 2627511    57%    /
# swap -l
swapfile             dev  swaplo blocks   free
/dev/md/dsk/d1      85,1       8 4194288 4194288
# metastat -c d0
d0               m  6.0GB d10 d20
    d10          s  6.0GB c0d0s0
    d20          s  6.0GB c1d1s0
# metadetach d0 d20
d0: submirror d20 is detached
# metastat -c d0
d0               m  6.0GB d10
    d10          s  6.0GB c0d0s0
# cp /etc/system.orig /etc/system
# cp /etc/vfstab.orig /etc/vfstab
# shutdown -y -i 6 -g 0

After the reboot, just reattach the second half of the mirror, and wait for complete synchronization to be fully redundant again.

# df -k -F ufs
Filesystem            kbytes    used   avail capacity  Mounted on
/dev/md/dsk/d0       6147798 3458714 2627607    57%    /
# swap -l
swapfile             dev  swaplo blocks   free
/dev/md/dsk/d1      85,1       8 4194288 4194288
# metattach d0 d20
d0: submirror d20 is attached
# metastat -c d0
d0               m  6.0GB d10 d20 (resync-29%)
    d10          s  6.0GB c0d0s0
    d20          s  6.0GB c1d1s0

Unmirror the entire system, i.e. all devices

The idea is exactly the same as for unmirroring the root file system only, but adapting the vfstab file to change the swap entry, too. (So, I didn't reproduce the code listing here.)

Then, boot into milestone/single-user:default level modifying the corresponding GRUB entry as follow: kernel /platform/i86pc/multiboot -s. Completely delete all the metadevices and metadb configurations to clear SVM settings. Last, continue into milestone/multi-user:default level to boot unmirrored.

# metaclear -f -r d0 d1
# metadb -f -d  c1d0s4 c1d0s4
# ^D

Now, the system must be fully encapsulate by SVM again. Please refer to online Sun Documentation, or some past entries on this subject, depending on the system's architecture: SPARC systems, or x86 platforms.

Sunday 9 March 2008

Update A Corrupted GRUB Boot Archive, Without SVM

Solaris 10 systems on x86 architecture use the GNU GRand Unified Bootloader (GRUB) which is the boot loader responsible for loading a boot archive into a system's memory. The boot archive is a collection of critical files (kernel modules and configuration files) that are required to boot the Solaris OS. As stated in the Sun documentation:

These files are needed during system startup before the root file system is mounted. Two boot archives are maintained on a system:

  • The boot archive that is used to boot the Solaris OS on a system. This boot archive is sometimes called the primary boot archive.
  • The boot archive that is used for recovery when the primary boot archive is damaged. This boot archive starts the system without mounting the root file system. On the GRUB menu, this boot archive is called failsafe. The archive's essential purpose is to regenerate the primary boot archive, which is usually used to boot the system.

The Solaris OS generally keeps the boot archive properly synchronized on its own. Sometimes, the boot archive gets corrupted--for example when (bad) patches are applied, or the the operating system crashed. In these cases, the boot archive must be regenerated. This is easily accomplished following the Sun documentations x86: How to Boot the Failsafe Archive for Recovery Purposes, and x86: How to Boot the Failsafe Archive to Forcibly Update a Corrupt Boot Archive. The main drawback is when the system is encapsulated under a SVM mirror (RAID-1) since the md driver is not managed under the failsafe mode. Please refer to this blog entry on this subject, if needed.

Wednesday 30 May 2007

RAID-1 Volume From the root File System Using SVM on x86 Platform

Here is a little step-by-step guide to create a soft mirror from the root file system, known as an encapsulation of the system's disk. This will provide full protection against one disk failure, and complete redundancy. In the same time, this will have the effect to speed read requests (since there exists multiple backing devices hosting the same data), but write performance is generally degraded. First, know your running system, particularly on which disk it is currently installed and which other device is available for the second mirror side.

# df -hF ufs
Filesystem             size   used  avail capacity  Mounted on
/dev/dsk/c1d0s0        7.9G   5.2G   2.6G    67%    /
# swap -lh
swapfile             dev    swaplo   blocks     free
/dev/dsk/c1d0s1     102,65       4K     4.0G     4.0G
#
# echo | format
Searching for disks...done

AVAILABLE DISK SELECTIONS:
       0. c1d0 
          /pci@0,0/pci-ide@8/ide@0/cmdk@0,0
       1. c2d0 
          /pci@0,0/pci-ide@8/ide@1/cmdk@0,0
[...]

Well, we will use the c2d0 as the second submirror. So, we need to default to one Solaris partition that uses the whole disk and make it bootable (we are using GRUB in this case). The slice for the second submirror must have a slice tag of root and the root slice must be slice 0 (so, we will duplicate the label's content from the boot disk to the mirror disk).

# fdisk -B /dev/rdsk/c2d0p0
# fdisk /dev/rdsk/c2d0p0
             Total disk size is 36483 cylinders
             Cylinder size is 16065 (512 byte) blocks

                                               Cylinders
      Partition   Status    Type          Start   End   Length    %
      =========   ======    ============  =====   ===   ======   ===
          1       Active    Solaris2          1  36482    36482    100

SELECT ONE OF THE FOLLOWING:
   1. Create a partition
   2. Specify the active partition
   3. Delete a partition
   4. Change between Solaris and Solaris2 Partition IDs
   5. Exit (update disk configuration and exit)
   6. Cancel (exit without updating disk configuration)
Enter Selection:
#
# prtvtoc /dev/rdsk/c1d0s2 | fmthard -s - /dev/rdsk/c2d0s2
fmthard:  New volume table of contents now in place.
#
# /sbin/installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c2d0s0
stage1 written to partition 0 sector 0 (abs 16065)
stage2 written to partition 0, 260 sectors starting at 50 (abs 16115)

Create replicas of the metadevice state database:

# metadb -a -c 3 -f c1d0s4 c2d0s4
# metadb
        flags           first blk       block count
     a        u         16              8192            /dev/dsk/c1d0s4
     a        u         8208            8192            /dev/dsk/c1d0s4
     a        u         16400           8192            /dev/dsk/c1d0s4
     a        u         16              8192            /dev/dsk/c2d0s4
     a        u         8208            8192            /dev/dsk/c2d0s4
     a        u         16400           8192            /dev/dsk/c2d0s4

Flag -f is needed because it is the first invocation/creation of metadb(1m).

Set up the RAID-0 metadevices (stripe or concatenation volumes) corresponding to the / file system and the swap space, and automatically configure system files (/etc/vfstab and /etc/system) for the root metadevice.

# metainit -f d10 1 1 c1d0s0
d10: Concat/Stripe is setup
# metainit -f d11 1 1 c1d0s1
d11: Concat/Stripe is setup
# metainit d20 1 1 c2d0s0
d20: Concat/Stripe is setup
# metainit d21 1 1 c2d0s1
d21: Concat/Stripe is setup
# metainit d0 -m d10
d0: Mirror is setup
# metainit d1 -m d11
d1: Mirror is setup
#
# cp /etc/vfstab /etc/vfstab.beforesvm
# sed -e 's@/dev/dsk/c1d0s1@/dev/md/dsk/d1@' /etc/vfstab.beforesvm > /etc/vfstab
# metaroot d0
# diff /etc/vfstab /etc/vfstab.beforesvm
6,7c6,7
< /dev/md/dsk/d1   -                 -   swap   -   no   -
< /dev/md/dsk/d0   /dev/md/rdsk/d0   /   ufs    1   no   -
---
> /dev/dsk/c1d0s1  -                 -   swap   -   no   -
> /dev/dsk/c1d0s0  /dev/rdsk/c1d0s0  /   ufs    1   no   -

Flag -f is needed because the file systems created on the slice we want to initialize a new metadevice are currently mounted (in use).

Reboot on the metadevices: the operating system will now boot encapsulated, on a one-side mirror. Last, attach the second part of the mirror and adapt the system dump configuration.

# lockfs -af && shutdown -y -g 0 -i 6
[...]
# metattach d0 d20
d0: submirror d20 is attached
# metattach d1 d21
d1: submirror d21 is attached
#
# metastat -p
d1 -m /dev/md/rdsk/d11 /dev/md/rdsk/d21 1
d11 1 1 /dev/rdsk/c1d0s1
d21 1 1 /dev/rdsk/c2d0s1
d0 -m /dev/md/rdsk/d10 /dev/md/rdsk/d20 1
d10 1 1 /dev/rdsk/c1d0s0
d20 1 1 /dev/rdsk/c2d0s0
# metastat | grep %
    Resync in progress: 41 % done
    Resync in progress: 46 % done
#
# rmdir /var/crash/*
# mkdir /var/crash/`hostname`
# chmod 700 /var/crash/`hostname`
# dumpadm -s /var/crash/`hostname` -d /dev/md/dsk/d1
      Dump content: kernel pages
       Dump device: /dev/md/dsk/d1 (swap)
Savecore directory: /var/crash/bento
  Savecore enabled: yes

Last, define the alternative boot path in the menu.lst GRUB configuration file: the Solaris/BSD slice 0 on the first fdisk partition on the second BIOS disk.

cat << EOF >> /boot/grub/menu.lst
title Solaris Nevada snv_65 X86 (Alternate Boot Path)
root (hd1,0,a)
kernel$ /platform/i86pc/kernel/$ISADIR/unix
module$ /platform/i86pc/$ISADIR/boot_archive
EOF
#
# bootadm list-menu 
The location for the active GRUB menu is: /boot/grub/menu.lst
default 0
timeout 10
0 Solaris Nevada snv_65 X86
1 Solaris failsafe
2 Solaris Nevada snv_65 X86 (Alternate Boot Path)

For further (and deeper) information on this subject, please refer to the excellent Sun Microsystems Documentation on Solaris Volume Manager, and particularly x86: Creating a RAID-1 Volume From the root (/) File System.

Monday 1 May 2006

How to Patch a Live System Mirrored with SVM

Aim of this memo

The main purpose of this technical note is to demonstrate how to patch a running (live) system currently mirrored using SVM, minimizing the downtime as far as possible.

The idea is simple: detach one side of the mirror, apply the cluster patch against it and reboot on it. If all seems OK, re-encapsulate the system. This can achieve similar goal currently found in the Live Upgrade feature of the Solaris OS (see live_upgrade(5)), with less complexity and different requirement (LVM RAID-1 vs. spare disk, or free slice).

Using this solution, the downtime can go between 10 to 30 minutes of service unavailability (depending on the hardware POST) and a maximum of two reboots are required, whatever is the number of patches to apply.

Here it is

Here is a system encapsulated using SDS 4.x or SVM 1.x, and the associated SVM encapsulation configuration:

# metastat -p
d3 -m d13 d23 1
d13 1 1 c0t0d0s3
d23 1 1 c0t1d0s3
d1 -m d11 d21 1
d11 1 1 c0t0d0s1
d21 1 1 c0t1d0s1
d0 -m d10 d20 1
d10 1 1 c0t0d0s0
d20 1 1 c0t1d0s0
#
# cat /etc/vfstab
#device         device          mount   FS      fsck    mount   mount
#to mount       to fsck         point   type    pass    at boot options
#
fd      -       /dev/fd fd      -       no      -
/proc   -       /proc   proc    -       no      -
/dev/md/dsk/d3  -       -       swap    -       no      -
/dev/md/dsk/d0  /dev/md/rdsk/d0 /       ufs     1       no      -
/dev/md/dsk/d1  /dev/md/rdsk/d1 /var    ufs     1       no      -
swap    -       /tmp    tmpfs   -       yes     -

Run an explorer and generate a cluster patch, based on tools provided by the OSE for example, if you are luckily enough to have one included with your support plan (or just pick one provided at SunSolve).

Then, be sure to be able to boot on the two disks, just in case:

# installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/rdsk/c0t0d0s0
# installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/rdsk/c0t1d0s0

The next step is to voluntarily detach one side of the mirror: take the first one for the sake of simplicity (i.e. c0t0d0). Indeed, in this case we are pretty sure that its alias name at the OBP is disk.

Note: You can always create it at the OBP (using the usual set of commands, such as show-disks, devalias, etc.) if you want. That is just a matter of personal preferences.

# lockfs -af /* Just to minimize the fs inconsistencies at next fsck(1m). */
#
# metadetach d0 d10
# metadetach d1 d11
# metadetach d3 d13
#
# metaclear d10
# metaclear d11
# metaclear d13

Check and repair the file systems if necessary, since we will boot on them the next time:

# fsck /dev/dsk/c0t0d0s0
# fsck /dev/dsk/c0t0d0s1

Next steps include mounting the recently detached file systems and prepare the first disk to boot without SVM encapsulation:

# mkdir /mirror
# mount /dev/dsk/c0t0d0s0 /mirror
# mount /dev/dsk/c0t0d0s1 /mirror/var
#
# cat << EOF > /mirror/etc/vfstab
#device         device          mount   FS      fsck    mount   mount
#to mount       to fsck         point   type    pass    at boot options
#
fd      -       /dev/fd fd      -       no      -
/proc   -       /proc   proc    -       no      -
/dev/dsk/c0t0d0s3       -       -       swap    -       no      -
/dev/dsk/c0t0d0s0       /dev/rdsk/c0t0d0s0      /       ufs     1       no      -
/dev/dsk/c0t0d0s1       /dev/rdsk/c0t0d0s1      /var    ufs     1       no      -
swap    -       /tmp    tmpfs   -       yes     -
EOF
#
# cp /mirror/etc/system /mirror/etc/system.orig
# sed -e 's;rootdev:/pseudo/md@0:0,0,blk;*rootdev:/pseudo/md@0:0,0,blk;' \
   /mirror/etc/system.orig > /mirror/etc/system

Last, install patches against the first disk, clean things up a little and reboot if the install procedure went all smooth:

# ./install_all_patches -R /mirror
#
# umount /mirror/var
# umount /mirror
# rmdir /mirror
#
# shutdown -y -g 0 -i 6

After rebooting, carefully review the behavior of the very freshly patched system. If all seems well, don't forget to re-encapsulate the second disk. Here is a quick and easy way to this:

/* Recreate the metadb. */
# metadb -d c0t0d0s4 c0t1d0s4
# metadb -a -c3 -f c0t0d0s4 c0t1d0s4
#
/* Clean the system metadevices always present. */
# metaclear d0
# metaclear d1
# metaclear d3
# metaclear d20
# metaclear d21
# metaclear d23
#
/* Re-create them as part of a mirror. */
# metainit -f d10 1 1 c0t0d0s0
# metainit d0 -m d10
# metainit -f d11 1 1 c0t0d0s1
# metainit d1 -m d11
# metainit -f d13 1 1 c0t0d0s3
# metainit d3 -m d13
#
/* Be able to boot on the new metadevices. */
# metaroot d0
#
/* Reboot, and create the second side of the mirror. */
# shutdown -y -g 0 -i 6
[...]
# metainit d20 1 1 c0t1d0s0
# metattach d0 d20
# metainit d21 1 1 c0t1d0s1
# metattach d1 d21
# metainit d23 1 1 c0t1d0s3
# metattach d3 d23

For a little more detailed explanation about encapsulating the system using SVM on Sun Solaris, please refer to the dedicated entry in this blog.

Last, it must be mentioned that this documentation was written by our OSE, and that this procedure was officially marked as supported by Sun Microsystems.

Monday 6 June 2005

Encapsulation of the System's Disk Using SVM

  1. c0t0d0s2 represents the first system disk (boot)
  2. c0t1d0s2 represents the second disk (mirror)

Duplicate the label's content from the boot disk to the mirror disk:

# prtvtoc /dev/rdsk/c0t0d0s2 | fmthard -s - /dev/rdsk/c0t1d0s2

Create replicas of the metadevice state database:

# metadb -a -c3 -f c0t0d0s4 c0t1d0s4
# metadb

Option -f is needed because it is the first invocation/creation of metadb(1m).

Creation of metadevices:

# metainit -f d10 1 1 c0t0d0s0
# metainit -f d11 1 1 c0t0d0s1
# metainit -f d13 1 1 c0t0d0s3
# metainit -f d16 1 1 c0t0d0s6
#
# metainit d20 1 1 c0t1d0s0
# metainit d21 1 1 c0t1d0s1
# metainit d23 1 1 c0t1d0s3
# metainit d26 1 1 c0t1d0s6

Option -f is needed because the file systems created on the slice we want to initialize a new metadevice are already mounted.

Create the first part of the mirror:

# metainit d0 -m d10
# metainit d1 -m d11
# metainit d3 -m d13
# metainit d6 -m d16
#
# cp /etc/vfstab /etc/vfstab.beforesvm
# metaroot d0

Don't forget to edit /etc/vfstab in order to reflect the other metadevices:

  • s@/dev/dsk/cXtYdZsN@/dev/md/dsk/dN@
  • s@/dev/rdsk/cXtYdZsN@/dev/md/rdsk/dN@

Install the boot block code on the alternate boot disk and set it in the OpenBoot Prom (OBP):

# installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/rdsk/c0t1d0s0
# eeprom boot-device="disk disk1 net"   /* Or just "disk disk1". */

Reboot on the new metadevices (the operating system will now boot encapsulated):

# shutdown -y -g 0 -i 6

Attach the second part of the mirror:

# metattach d0 d20
# metattach d1 d21
# metattach d3 d23
# metattach d6 d26

Verify all:

# metastat -p
# metastat | grep \%

Modify the system dump configuration:

# mkdir /var/crash/`hostname`
# chmod 700 /var/crash/`hostname`
# dumpadm -s /var/crash/`hostname`
# dumpadm -d /dev/md/dsk/d1

Monday 30 May 2005

Encapsulation of the Data's Disk using SVM

Examples of a raw device and a file system using soft partition.

  1. c0t10d0s2 represents the first data disk
  2. c0t11d0s2 represents the second disk (mirror)

Duplicate the label's content from the data disk to the mirror disk. There are two slices on it:

  • s4 for the replicas
  • s7 for all the data using soft partitions
# prtvtoc /dev/rdsk/c0t0d0s2 | fmthard -s - /dev/rdsk/c0t1d0s2

Create replicas of the metadevice state database:

# metadb -a -c3 c0t10d0s4
# metadb

Creation of metadevices:

# metainit d17 1 1 c0t10d0s7
# metainit d27 1 1 c0t11d0s7

Create the first part of the mirror:

# metainit d7 -m d17

Attach the second part of the mirror:

# metattach d7 d27

Verify all:

# metastat -p
# metastat | grep \%

Example #1: create a raw device for Sybase...

# metainit d30 -p d7 2g
# chown sybase:sybase /dev/md/rdsk/d30

Example #2: create a new file system for Sybase...

# mkdir -p /files/sybase
# metainit d31 -p d7 1g
# newfs /dev/md/rdsk/d31
# grep d31 /etc/vfstab
/dev/md/dsk/d31 /dev/md/rdsk/d31 /files/sybase ufs 2 yes logging
# mount /files/sybase
# chown sybase:sybase /files/sybase
# ln -s /files/sybase /opt/sybase

Example #3: Grow an existing file system for Sybase...

# metattach d31 +1g
# growfs -M /files/sybase /dev/md/rdsk/d31