blog'o thnet

To content | To menu | To search

Monday 22 September 2008

About GNU/Linux Software Mirroring And LVM

Here, the final aim was to provide data access redundancy through SAN storage hosted on remote sites across Wide Area Network (WAN) links. After some relatively long and painful tries to mimic software mirroring as found on HP-UX platform using Logical Volume Management (LVM), i.e. at the logical volume level, I finally give up deciding this functionality will definitely not fit my need. Why? Here are my comments.

  1. It is not possible to provide clear and manageable storage multipath when the need to distinguish between the multiple sites is important, ala mirror across controllers found on Veritas VxVM on Sun Solaris system, for example. So, managing many physical volumes along with lots of logical volumes is very complicated.
  2. There is no exact mapping capability between logical volume storage on a given physical volume.
  3. The need to have a disk-based log, i.e. a persistent log. Yes, one can always provide the option --corelog at the creation time to the logical volume initial build and have an in-memory log , i.e. a non-persistent log, but this requires the entire copies (mirrors) be resynchronized upon reboot. Not really viable on multi-TB environments.
  4. A write-intensive workload on a file system living on a logical volume mirror will suffer high latency: the overhead is important, and the time to do mostly-write jobs grow dramatically. It is really hard to get high level statistics, only low level metrics seems consistent: sd SCSI devices and dm- device mapper components for each paths entries. Not from the multipath devices standpoint, which is the more interesting from the end user and SA point of view.
  5. You can't extend a logical volume, which is really a no-go per-se. On that point, the Red Hat support answered that this functionality may be added in a future release, the current state may eventually be a Request For Enhancement (RFE), if a proper business justification is provided. One must break the logical volume mirror copy, then rebuild it completely. Not realistic when the logical volume is made of a lot of physical extents across multiple physical volumes.
  6. A LVM configuration can be totally blocked by itself, and not usable at all. The fact is, LVM use persistent storage blocks to keep track of its own metadata. The metadata size is set at physical volume creation time only, and can't be change afterward. This size is statically defined as 255 physical volume blocks, and can be adjust from the LVM configuration file. The problem is, when this circular buffer space (stored in ASCII) fills up--such as when there are a lot of logical volumes in a mirrored environment--it is not possible to do anything more with LVM. So you can't add more logical volume, can't add more logical volume copies,... and can't delete them trying to reestablish a proper LVM configuration. Well, here are the answers given by the Red Hat support to two keys questions in this situation:
    • How to size the metadata, i.e. if we need to change it from the default value, how can we determine the new proper and appropriate size, and from which information?
      I am afraid but Metadata size can only be defined at the time of PV creation and there is no real formula for calculating the size in advance. By changing the default value of 255 you can get a higher space value. For general LVM setup (with less LV's and VG's) default size works fine however in cases where high number of LV's are required a custom value will be required.
    • We just want to delete all LV copies, which means to return to the initial situation and have 0 copy for all LV, i.e. only one LV per-se, in order to be able to change LVM configuration again (we can't do anything on our production server right now)?
      I discussed this situation with peer engineers and also referenced a similar previous case. From the notes of the same the workaround is to use the backup file (/etc/lvm/backup) and restore the PV's. I agree that this really not a production environment method however seems the only workaround.

So, the production RDBMS Oracle server is finally now being evacuate to an other machine. Hum... Hope to see better enterprise experience using the mdadm package to handle RAID software, instead of mirror (RAID-1) LVM. Maybe more about that in an other blog entry?

Friday 16 May 2008

Comparison: EMC PowerPath vs. GNU/Linux dm-multipath

I will present some notes about the use of multipath solutions on Red Hat systems: EMC PowerPath and GNU/Linux dm-multipath. Along those notes, keep in mind that they were based on tests done when pressure was very high to put new systems in production, so lack of time resulted in less complete tests than expected. These tests were done more than a year ago, and so before the release of RHEL4 Update 5 and some of RHBA related to both LVM and dm-multipath technologies.

Keep in mind that without purchasing an appropriate EMC license, PowerPath can only be used in failover mode (active-passive mode). Multiple paths accesses are not supported in this case: no round-robin, and no I/O load balancer for example.

EMC PowerPath


  1. Not specific to the SAN Host Bus Adapter (HBA).
  2. Support for multiple and heterogeneous SAN storage provider.
  3. Support for most UNIX and Unix-like platforms.
  4. Without a valid license, can only work in degraded mode (failover).
  5. Is not sensible to a change in the SCSI LUN renumbering. Adapt accordingly the corresponding multiple sd devices (different paths to a given device) with its multipath definition of the emcpower device.
  6. Provide easily the ID of the SAN storage.


  1. Not integrated with the operating system (which generally has its own solution).
  2. The need to force a RPM re-installation in case of a kernel upgrade on RHEL systems (due to the fact that kernel modules are stored in a path containing the exact major and minor versions of the installed (booted) kernel.
  3. Non-automatic update procedure.

GNU/Linux device-mapper-multipath


  1. Not specific to the SAN Host Bus Adapter (HBA).
  2. Support for multiple and heterogeneous SAN storage provider.
  3. Well integrated with the operating system.
  4. Automatic update using RHN (you must be a licensed and registered user in this case).
  5. No additional license cost.


  1. Only available on GNU/Linux systems.
  2. Configuration (files and keywords) very tedious and difficult.
  3. Without the use of LVM (Logical Volume Management), it has not the ability to follow SCSI LUN renumbering! Even in this case, be sure not to have blacklisted the newly discovered SCSI devices (sd).

Last, please find some interesting documentation on the subject below:

Friday 15 February 2008

LVM2 Simple Mirroring On RHEL4

When the need to evacuate all persistent SAN storage from EMC DMX1K to HP XP12K (HDS), three main solutions were envisaged. The first one was brute data copy (tar, cpio, etc.) but was not very practical with the size of the data (multi-terabytes) and the time involved in copying them. The two others were based on LVM technologies: mirroring, or moving.

Although the choice has been to use the online and transparent moving data technology (see pvmove for more information), it was interesting to note that Red Hat has backported support for the creation and manipulation of simple mirrors to their RHEL4 distribution. These functionalities were introduced with the RHBA-2006:0504-15 advisory issued on 2006-08-10, i.e. between RHEL4 Update 4 and RHEL4 Update 5 (and so available via RHN at this time). It is just too bad that the online help for LVM commands are not properly synchronized nor fully documented by the corresponding manual page: clearly, this doesn't help to use them in the best conditions (no, Google isn't always the better option when using these kinds of functionalities in big companies).

Saturday 9 February 2008

Deleting SCSI Device Paths For A Multipath SAN LUN

When releasing a multipath device under RHEL4, different SCSI devices corresponding to different paths must be cleared properly before removing the SAN LUN effectively. When the LUN was delete before to clean up the paths at the OS level, it is always possible to remove them afterwards. In the following example, it is assume that the freeing LVM manipulations were already done, and that the LUN is managed by EMC PowerPath.

  1. First, get and verify the SCSI devices corresponding to the multipath LUN:
    # grep "I/O error on device" /var/log/messages | tail -2
    Feb  4 00:20:47 beastie kernel: Buffer I/O error on device sdo, \
     logical block 12960479
    Feb  4 00:20:47 beastie kernel: Buffer I/O error on device sdp, \
     logical block 12960479
    # powermt display dev=sdo
    Bad dev value sdo, or not under Powerpath control.
    # powermt display dev=sdp
    Bad dev value sdp, or not under Powerpath control.
  2. Then, get the appropriate scsi#:channel#:id#:lun# informations:
    # find /sys/devices -name "*block" -print | \
     xargs \ls -l | awk -F\/ '$NF ~ /sdo$/ || $NF ~ /sdp$/ \
     {print "HBA: "$7"\tscsi#:channel#:id#:lun#: "$9}'
    HBA: host0      scsi#:channel#:id#:lun#: 0:0:0:9
    HBA: host0      scsi#:channel#:id#:lun#: 0:0:1:9
  3. When the individual SCSI paths are known, remove them from the system:
    # echo 1 > /sys/bus/scsi/devices/0\:0\:0\:9/delete
    # echo 1 > /sys/bus/scsi/devices/0\:0\:1\:9/delete
    # dmesg | grep "Synchronizing SCSI cache"
    Synchronizing SCSI cache for disk sdp:
    Synchronizing SCSI cache for disk sdo:

Saturday 9 June 2007

Getting Emulex HBA Information on a GNU/Linux System

As a GNU/Linux environment behaves always a little differently as with other UNIX platforms, here is a little sample of what commands I find useful when working with our SAN administrators. In this example, the server is connected using an Emulex Fibre Channel HBA (Host Bus Adapter) and is based on an updated RHEL4U4 system, as can be seen below:

# cat /etc/redhat-release
Red Hat Enterprise Linux AS release 4 (Nahant Update 4)
# uname -a
Linux 2.6.9-42.0.10.ELsmp #1 SMP Fri Feb 16 17:13:42 EST 2007 \
 x86_64 x86_64 x86_64 GNU/Linux

Here are some informations about the HBA itself. You can see--respectively--the HBA description, the firmware revision level, the WWNN (World Wide Node Name), the WWPN (World Wide Port Name), the operating system driver version, the associated serial number, and the current speed and link state.

# lspci | grep -i emulex
05:0d.0 Fibre Channel: Emulex Corporation LP9802 Fibre Channel Host Adapter (rev 01)
# cat /sys/class/scsi_host/host0/fwrev
1.90A4 (H2D1.90A4)
# cat /sys/class/scsi_host/host0/node_name
# cat /sys/class/scsi_host/host0/port_name
# cat /sys/class/scsi_host/host0/lpfc_drvr_version
Emulex LightPulse Fibre Channel SCSI driver
# cat /sys/class/scsi_host/host0/serialnum
# cat /sys/class/scsi_host/host0/speed
2 Gigabit
# cat /sys/class/scsi_host/host0/state
Link Up - Ready:

Saturday 9 December 2006

Memory Behaviour: Tuning Linux's Kernel Overcommit

After encounter some problem at work running large Oracle databases on a RHEL4 system, we need to prevent kernel overcommit from exceeding a certain threshold. In fact, the problem was that the system begin to kill processes under heavy load: in our case ssh connections... Ouch!

The solution was to alter the default behavior of the Linux kernel in this area, as mentioned in Documentation/vm/overcommit-accounting in the corresponding source code tarball.

Interestingly, an overall and great explanation of this mechanism is available today on the O'Reilly It is worth reading it, i think.

Thursday 23 November 2006

How to Add a New "ctmag" Service on RHEL4 (Control-M Agent v6)

Create the Control-M service file:

# cat /etc/rc.d/init.d/ctmag
#!/usr/bin/env sh

# chkconfig: 345 57 23
# description: Control-M agent daemons

# Source functions library.
. /etc/rc.d/init.d/functions

LC_ALL=en_US; export LC_ALL
ctmag_home=`getent passwd ${ctmag_user} | awk -F\: '{print $6}'`
ctmag_opts="-u ${ctmag_user} -p ALL"

start() {
  echo -n $"Starting `basename $0`:"
  initlog -c "${ctmag_home}/ctm/scripts/start-ag ${ctmag_opts}" > /dev/null \
   && success || failure

stop() {
  echo -n $"Stopping `basename $0`:"
  ${ctmag_home}/ctm/scripts/shut-ag ${ctmag_opts} > /dev/null
  if [ ${rc} -eq 0 ]; then
    success $"Stopping `basename $0`"
    failure $"Stopping `basename $0`"

case "$1" in
  echo "Usage: `basename $0` {start|stop|restart}"

exit ${rc}

Add and configure the Control-M service:

# chmod 755 /etc/rc.d/init.d/ctmag
# chkconfig --add ctmag
# chkconfig --level 0126 ctmag off
# chkconfig --level 345 ctmag on

Start, or restart, the Control-M service:

# service ctmag restart

Thursday 16 November 2006

How to Add a NewTSM Scheduler Daemon Service on RHEL4

  • dsmc sched represents the TSM scheduler daemon

Create the TSM service file:

# cat /etc/rc.d/init.d/tsm
#!/usr/bin/env sh

# chkconfig: 345 56 24
# description: TSM scheduler daemon
# processname: /usr/bin/dsmc

# Source functions library.
. /etc/rc.d/init.d/functions

LC_ALL=en_US; export LC_ALL

start() {
  echo -n $"Starting `basename ${dsmc}`:"
  # Doesn't behave as expected because of the launching command
  # of the `tsm' scheduler :-(
  #initlog -c "${dsmc} ${options}  "    success || failure
  # So...
  [ -x ${dsmc} ]    (${dsmc} ${options}  ) > /dev/null 2> 1
  if [ ${rc} -eq 0 ]; then
    success $"Starting `basename ${dsmc}`"
    failure $"Starting `basename ${dsmc}`"

stop() {
  echo -n $"Stopping `basename ${dsmc}`:"
  if [ -n "`pidofproc ${dsmc}`" ]; then
    killproc ${dsmc} -TERM
    failure $"Stopping `basename ${dsmc}`"

case "$1" in
  status ${dsmc}
  echo "Usage: `basename $0` {start|stop|restart|status}"

exit ${rc}

Add and configure the TSM service:

# chmod 744 /etc/rc.d/init.d/tsm
# chkconfig --add tsm
# chkconfig --level 0126 tsm off
# chkconfig --level 345 tsm on

Start, or restart, the TSM service, and monitor it:

# service tsm restart
# service tsm status

Wednesday 31 August 2005

Use the NIS and NFS Infrastructure on Red Hat Advanced Server 2.1

Here are the steps to be able to use the current NIS and NFS infrastructure from a Linux server.


Be sure to resolve the NIS servers (slave and/or master) for the int domain name:

# egrep "nasty|bigup" /etc/hosts           nasty           bigup

Configure the NIS client:

# cat << EOF >> /etc/yp.conf
domain int server nasty
domain int server nasty
# grep NIS /etc/sysconfig/authconfig /etc/sysconfig/network


The NFS part is relatively simple since the autofs maps is looked up in the NIS maps (already managed by the corresponding boot script's service).

So, it is just needed to modify the automountd service to add some arguments that must be passed to the program. This is a necessary step to be able to automount the correct remote path using our customized autofs server. Here is how to do so.

Check the configuration of the run-level informations for the autofs service:

# chkconfig --list autofs
autofs          0:off   1:off   2:off   3:on    4:on    5:on    6:off

Modify the initial service configuration and reload it:

# diff -u /etc/init.d/autofs.orig /etc/init.d/autofs
--- /etc/init.d/autofs.orig     Thu Aug 25 13:12:38 2005
+++ /etc/init.d/autofs  Thu Aug 25 17:25:47 2005
@@ -67,7 +67,10 @@
 # We can add local options here
 # e.g. localoptions='rsize=8192,wsize=8192'
+localoptions="-DOSNAME=`uname -s` \
+              -DCPU=x86 \
+              -DNATISA=32 \
+              -DOSREL=`uname -r | awk -F\. '{print $1\".\"$2}'`"
 # Daemon options
 # e.g. --timeout 60
# service autofs restart

Verify if all is ok:

# service autofs status
Configured Mount Points:
/usr/sbin/automount /Soft yp auto.soft -ro,hard,bg,intr -DOSNAME=Linux        -DCPU=x86        -DNATISA=32        -DOSREL=2.4
/usr/sbin/automount /NTFS yp auto.nt  -DOSNAME=Linux        -DCPU=x86        -DNATISA=32        -DOSREL=2.4
/usr/sbin/automount /Home yp auto.home -rw,hard,bg,intr -DOSNAME=Linux        -DCPU=x86        -DNATISA=32        -DOSREL=2.4
/usr/sbin/automount /Apps yp auto.apps -ro,hard,bg,intr -DOSNAME=Linux        -DCPU=x86        -DNATISA=32        -DOSREL=2.4
/usr/sbin/automount /- yp  -DOSNAME=Linux        -DCPU=x86        -DNATISA=32        -DOSREL=2.4

Active Mount Points:
/usr/sbin/automount /Soft yp auto.soft -ro,hard,bg,intr -DOSNAME=Linux -DCPU=x86 -DNATISA=32 -DOSREL=2.4
/usr/sbin/automount /NTFS yp auto.nt -DOSNAME=Linux -DCPU=x86 -DNATISA=32 -DOSREL=2.4
/usr/sbin/automount /Home yp auto.home -rw,hard,bg,intr -DOSNAME=Linux -DCPU=x86 -DNATISA=32 -DOSREL=2.4
/usr/sbin/automount /Apps yp auto.apps -ro,hard,bg,intr -DOSNAME=Linux -DCPU=x86 -DNATISA=32 -DOSREL=2.4
/usr/sbin/automount /- yp -DOSNAME=Linux -DCPU=x86 -DNATISA=32 -DOSREL=2.4

Sunday 21 August 2005

Compile and Install a New Kernel on Red Hat Advanced Server 2.1

To be able to recompile our Linux kernel for our IBM Blade Center, here are a listing of the necessary prerequisites:

  • Have the sources installed and available under /usr/src/linux-2.4.9-e.35
  • The bcm5700 driver located at /usr/src/redhat (provided by IBM)
  • Our standard configuration kernel file, i.e. .config provided as an attached file

Adapt our custom kernel configuration file and the top Makefile:

# cd /usr/src/linux-2.4.9-e.35   /* Go the top sources directory. */
# make mrproper                  /* Make sure you have no stale .o files and
                                    dependencies lying around. */
# sum .config                    /* Verify our customized kernel configuration
64319    16                         file. */
# make oldmenu                   /* Default all questions based on the contents
                                    of the existing .config file. */
# cp Makefile Makefile.orig      /* Check the top Makefile for further
                                    site-dependent configuration. */
# diff -u Makefile.orig Makefile
--- Makefile.orig    Wed Aug 17 12:51:14 2005
+++ Makefile Tue Aug 16 14:43:16 2005
@@ -1,7 +1,7 @@
+EXTRAVERSION = -e.35smp-custom

Build and install the corresponding custom modules and kernel:

# make dep               /* Set up all the dependencies correctly. */
# make j4 bzImage        /* Create a compressed kernel image. */
# make j4 modules        /* Create the chosen modules. */
# make modules_install   /* Install the corresponding modules. */
# make install           /* Install the newly created kernel. */

Add it to the boot loader:

# cd /boot/grub
# cp -p grub.conf /boot/grub/grub.conf.orig
# diff -u grub.conf.orig grub.conf     
--- grub.conf.orig      Wed Aug 17 17:26:25 2005
+++ grub.conf   Wed Aug 17 17:29:35 2005
@@ -10,6 +10,10 @@
+title Red Hat Enterprise (2.4.9-e.35smp-custom)
+       root (hd0,0)
+       kernel /vmlinuz-2.4.9-e.35smp-custom ro root=/dev/hda2
+       initrd /initrd-2.4.9-e.35smp.img
 title Red Hat Linux (2.4.9-e.35smp)
        root (hd0,0)
        kernel /vmlinuz-2.4.9-e.35smp ro root=/dev/hda2

Verify the files are present in /boot!

Verify that all these new stuffs are OK, then reinstall the network driver using the provided RPM package:

# shutdown -ry now
 * If all is ok, then access the machine through the IBM Management Module
 * Console (MMC).
# cd /usr/src/redhat
# rpm bb SPECS/bcm5700.spec
# rpm ivh --force RPMS/i386/bcm5700-8.1.11-1.i386.rpm
# grep bcm5700 /etc/modules.conf
alias eth0 bcm5700
alias eth1 bcm5700
# modprobe bcm5700

Test if the network is running OK for now, then reboot.

- page 1 of 2