blog'o thnet

To content | To menu | To search

Tag - multipath

Entries feed - Comments feed

Monday 22 September 2008

About GNU/Linux Software Mirroring And LVM

Here, the final aim was to provide data access redundancy through SAN storage hosted on remote sites across Wide Area Network (WAN) links. After some relatively long and painful tries to mimic software mirroring as found on HP-UX platform using Logical Volume Management (LVM), i.e. at the logical volume level, I finally give up deciding this functionality will definitely not fi my need. Why? Here are my comments.

  1. It is not possible to provide clear and manageable storage multipath when the need to distinguish between the multiple sites is important, ala mirror across controllers found on Veritas VxVM on Sun Solaris system, for example. So, managing many physical volumes along with lots of logical volumes is very complicated.
  2. There is no exact mapping capability between logical volume storage on a given physical volume.
  3. The need to have a disk-based log, i.e. a persistent log. Yes, one can always provide the option --corelog at the creation time to the logical volume initial build and have an in-memory log , i.e. a non-persistent log, but this requires the entire copies (mirrors) be resynchronized upon reboot. Not really viable on multi-TB environments.
  4. A write-intensive workload on a file system living on a logical volume mirror will suffer high latency: the overhead is important, and the time to do mostly-write jobs grow dramatically. It is really hard to get high level statistics, only low level metrics seems consistent: sd SCSI devices and dm- device mapper components for each paths entries. Not from the multipath devices standpoint, which is the more interesting from the end user and SA point of view.
  5. You can't extend a logical volume, which is really a no-go per-se. On that point, the Red Hat support answered that this functionality may be added in a future release, the current state may eventually be a Request For Enhancement (RFE), if a proper business justification is provided. One must break the logical volume mirror copy, then rebuild it completely. Not realistic when the logical volume is made of a lot of physical extents across multiple physical volumes.
  6. A LVM configuration can be totally blocked by itself, and not usable at all. The fact is, LVM use persistent storage blocks to keep track of its own metadata. The metadata size is set at physical volume creation time only, and can't be change afterward. This size is statically defined as 255 physical volume blocks, and can be adjust from the LVM configuration file. The problem is, when this circular buffer space (stored in ASCII) fills up--such as when there are a lot of logical volumes in a mirrored environment--it is not possible to do anything more with LVM. So you can't add more logical volume, can't add more logical volume copies,... and can't delete them trying to reestablish a proper LVM configuration. Well, here are the answers given by the Red Hat support to two keys questions in this situation:
    • How to size the metadata, i.e. if we need to change it from the default value, how can we determine the new proper and appropriate size, and from which information?
      I am afraid but Metadata size can only be defined at the time of PV creation and there is no real formula for calculating the size in advance. By changing the default value of 255 you can get a higher space value. For general LVM setup (with less LV's and VG's) default size works fine however in cases where high number of LV's are required a custom value will be required.
    • We just want to delete all LV copies, which means to return to the initial situation and have 0 copy for all LV, i.e. only one LV per-se, in order to be able to change LVM configuration again (we can't do anything on our production server right now)?
      I discussed this situation with peer engineers and also referenced a similar previous case. From the notes of the same the workaround is to use the backup file (/etc/lvm/backup) and restore the PV's. I agree that this really not a production environment method however seems the only workaround.

So, the production RDBMS Oracle server is finally now being evacuate to an other machine. Hum... Hope to see better enterprise experience using the mdadm package to handle RAID software, instead of mirror (RAID-1) LVM. Maybe more about that in an other blog entry?

Saturday 9 February 2008

Deleting SCSI Device Paths For A Multipath SAN LUN

When releasing a multipath device under RHEL4, different SCSI devices corresponding to different paths must be cleared properly before removing the SAN LUN effectively. When the LUN was delete before to clean up the paths at the OS level, it is always possible to remove them afterwards. In the following example, it is assume that the freeing LVM manipulations were already done, and that the LUN is managed by EMC PowerPath.

  1. First, get and verify the SCSI devices corresponding to the multipath LUN:
    # grep "I/O error on device" /var/log/messages | tail -2
    Feb  4 00:20:47 beastie kernel: Buffer I/O error on device sdo, \
     logical block 12960479
    Feb  4 00:20:47 beastie kernel: Buffer I/O error on device sdp, \
     logical block 12960479
    # powermt display dev=sdo
    Bad dev value sdo, or not under Powerpath control.
    # powermt display dev=sdp
    Bad dev value sdp, or not under Powerpath control.
    
  2. Then, get the appropriate scsi#:channel#:id#:lun# informations:
    # find /sys/devices -name "*block" -print | \
     xargs \ls -l | awk -F\/ '$NF ~ /sdo$/ || $NF ~ /sdp$/ \
     {print "HBA: "$7"\tscsi#:channel#:id#:lun#: "$9}'
    HBA: host0      scsi#:channel#:id#:lun#: 0:0:0:9
    HBA: host0      scsi#:channel#:id#:lun#: 0:0:1:9
    
  3. When the individual SCSI paths are known, remove them from the system:
    # echo 1 > /sys/bus/scsi/devices/0\:0\:0\:9/delete
    # echo 1 > /sys/bus/scsi/devices/0\:0\:1\:9/delete
    # dmesg | grep "Synchronizing SCSI cache"
    Synchronizing SCSI cache for disk sdp:
    Synchronizing SCSI cache for disk sdo:
    

Saturday 6 August 2005

Details About SAN Disks and MPxIO Capabilities on a VIOS

Obtaining these sorts of particular and specific informations (such as MultiPath I/O status) from a Virtual I/O Server can be very easily achieved using the following one (long) line shell script, helped by the lsdev(1), lscfg(1) and lspath commands:

# for disk in `lsdev | grep hdisk | egrep  -v "SCSI Disk Drive|Raid1" | awk '{print $1}'`
> do
> lscfg -v -l ${disk} | egrep "${disk}|Manufacturer|Machine Type|ROS Level and ID|Serial Number|Part Number"
> echo "`lspath -H -l ${disk} | grep ${disk} | awk '{print\"\tMultiPath I/O (MPIO) status: \"$1\" on parent \"$3}'`"
> echo ""
> done

  hdisk3           U787B.001.DNW3897-P1-C3-T1-W5006048448930A41-L9000000000000  EMC Symmetrix FCP MPIO RaidS
        Manufacturer................EMC     
        Machine Type and Model......SYMMETRIX       
        ROS Level and ID............5670
        Serial Number...............9312A020
        Part Number.................000000000000510001000287
        MultiPath I/O (MPIO) status: Enabled on parent fscsi0
        MultiPath I/O (MPIO) status: Enabled on parent fscsi1

  hdisk4           U787B.001.DNW3897-P1-C3-T1-W5006048448930A41-LA000000000000  EMC Symmetrix FCP MPIO RaidS
        Manufacturer................EMC     
        Machine Type and Model......SYMMETRIX       
        ROS Level and ID............5670
        Serial Number...............9312E020
        Part Number.................000000000000510001000287
        MultiPath I/O (MPIO) status: Enabled on parent fscsi0
        MultiPath I/O (MPIO) status: Enabled on parent fscsi1
[...]

Pattern SCSI Disk Drive is excluded since it represents local SCSI disks, as well as pattern Raid1 because it is a view corresponding to parity disks (which are logical disks only used by SAN administrators).