blog'o thnet

To content | To menu | To search

Sunday 9 March 2008

Update A Corrupted GRUB Boot Archive, Without SVM

Solaris 10 systems on x86 architecture use the GNU GRand Unified Bootloader (GRUB) which is the boot loader responsible for loading a boot archive into a system's memory. The boot archive is a collection of critical files (kernel modules and configuration files) that are required to boot the Solaris OS. As stated in the Sun documentation:

These files are needed during system startup before the root file system is mounted. Two boot archives are maintained on a system:

  • The boot archive that is used to boot the Solaris OS on a system. This boot archive is sometimes called the primary boot archive.
  • The boot archive that is used for recovery when the primary boot archive is damaged. This boot archive starts the system without mounting the root file system. On the GRUB menu, this boot archive is called failsafe. The archive's essential purpose is to regenerate the primary boot archive, which is usually used to boot the system.

The Solaris OS generally keeps the boot archive properly synchronized on its own. Sometimes, the boot archive gets corrupted--for example when (bad) patches are applied, or the the operating system crashed. In these cases, the boot archive must be regenerated. This is easily accomplished following the Sun documentations x86: How to Boot the Failsafe Archive for Recovery Purposes, and x86: How to Boot the Failsafe Archive to Forcibly Update a Corrupt Boot Archive. The main drawback is when the system is encapsulated under a SVM mirror (RAID-1) since the md driver is not managed under the failsafe mode. Please refer to this blog entry on this subject, if needed.

Wednesday 5 March 2008

Create And Remove A Remote Printer Queue (CLI)

You can easily create and remove a remote printer queue using the BSD type spooler. You just have to fill the configuration file /tmp/lp.list properly, i.e. provide the local printer name, the remote LPD server, and the remote printer queue:

# cat << EOF > /tmp/lp.list
locname1 lpdserv1 remname1
locname2 lpdserv2 remname2
EOF

Then, just run the appropriate script depending of the desired behavior. Follow, an example when removing the two queues:

# cat << EOF > /tmp/lp.remove
#!/usr/bin/env sh

for lplocal in `awk '{print $1}' /tmp/lp.list`; do
  /usr/sbin/lpshut
  /usr/bin/cancel ${lplocal} -e 2> /dev/null
  /usr/sbin/lpadmin -x${lplocal}
  /usr/sbin/lpsched -v
  sleep 1
done

exit 0
EOF
# sh /tmp/lp.remove
scheduler stopped
scheduler is running
scheduler stopped
scheduler is running
# lpstat -olocname1
no system default destination
lpstat: "locname1" not a request id or a destination

And now, the creation:

# cat << EOF > /tmp/lp.create
#!/usr/bin/env sh

while read lp; do
  eval set -- `IFS=" "; printf '"%s" ' ${lp}`
  lplocal="$1"
  lpserver="$2"
  lpremote="$3"

  /usr/sbin/lpshut
  /usr/sbin/lpadmin -p${lplocal} -orm${lpserver} -orp${lpremote} \
   -mrmodel -v/dev/null -orc -ob3 -ocmrcmodel -osmrsmodel
  /usr/sbin/accept ${lplocal}
  /usr/bin/enable ${lplocal}
  /usr/sbin/lpsched -v
  sleep 1
done < /tmp/lp.list

exit 0
EOF
# sh /tmp/lp.create
scheduler stopped
destination "locname1" now accepting requests
printer "locname1" now enabled
scheduler is running
scheduler stopped
destination "locname2" now accepting requests
printer "locname2" now enabled
scheduler is running
# lpstat -olocname1
no system default destination
printer queue for locname1
                         Windows LPD Server
                   Printer \\lpdserv1
emname1
Owner       Status         Jobname          Job-Id    Size   Pages  Priority
----------------------------------------------------------------------------
hostname: locname1: ready and waiting
no entries

That's it!

Friday 15 February 2008

LVM2 Simple Mirroring On RHEL4

When the need to evacuate all persistent SAN storage from EMC DMX1K to HP XP12K (HDS), three main solutions were envisaged. The first one was brute data copy (tar, cpio, etc.) but was not very practical with the size of the data (multi-terabytes) and the time involved in copying them. The two others were based on LVM technologies: mirroring, or moving.

Although the choice has been to use the online and transparent moving data technology (see pvmove for more information), it was interesting to note that Red Hat has backported support for the creation and manipulation of simple mirrors to their RHEL4 distribution. These functionalities were introduced with the RHBA-2006:0504-15 advisory issued on 2006-08-10, i.e. between RHEL4 Update 4 and RHEL4 Update 5 (and so available via RHN at this time). It is just too bad that the online help for LVM commands are not properly synchronized nor fully documented by the corresponding manual page: clearly, this doesn't help to use them in the best conditions (no, Google isn't always the better option when using these kinds of functionalities in big companies).

Saturday 9 February 2008

Deleting SCSI Device Paths For A Multipath SAN LUN

When releasing a multipath device under RHEL4, different SCSI devices corresponding to different paths must be cleared properly before removing the SAN LUN effectively. When the LUN was delete before to clean up the paths at the OS level, it is always possible to remove them afterwards. In the following example, it is assume that the freeing LVM manipulations were already done, and that the LUN is managed by EMC PowerPath.

  1. First, get and verify the SCSI devices corresponding to the multipath LUN:
    # grep "I/O error on device" /var/log/messages | tail -2
    Feb  4 00:20:47 beastie kernel: Buffer I/O error on device sdo, \
     logical block 12960479
    Feb  4 00:20:47 beastie kernel: Buffer I/O error on device sdp, \
     logical block 12960479
    # powermt display dev=sdo
    Bad dev value sdo, or not under Powerpath control.
    # powermt display dev=sdp
    Bad dev value sdp, or not under Powerpath control.
    
  2. Then, get the appropriate scsi#:channel#:id#:lun# informations:
    # find /sys/devices -name "*block" -print | \
     xargs \ls -l | awk -F\/ '$NF ~ /sdo$/ || $NF ~ /sdp$/ \
     {print "HBA: "$7"\tscsi#:channel#:id#:lun#: "$9}'
    HBA: host0      scsi#:channel#:id#:lun#: 0:0:0:9
    HBA: host0      scsi#:channel#:id#:lun#: 0:0:1:9
    
  3. When the individual SCSI paths are known, remove them from the system:
    # echo 1 > /sys/bus/scsi/devices/0\:0\:0\:9/delete
    # echo 1 > /sys/bus/scsi/devices/0\:0\:1\:9/delete
    # dmesg | grep "Synchronizing SCSI cache"
    Synchronizing SCSI cache for disk sdp:
    Synchronizing SCSI cache for disk sdo:
    

Sunday 23 December 2007

Tuning Is Evil

Each month, I hear many coworkers or specific application management teams asking about putting some system tunings in place, even on very recent operating system releases. All the time. Most of these settings comes from the Internet, are found in forum posts, or articles related to a subsystem, or in technical publications. And some of them comes from third party software providers, or editors. A very, very few settings are proposed or recommended by system administrators, or by knowledgeable people in tuning area.

The problem is that, most of the time, these tunings are related to another release of the operating system, are not updated to keep current with the Best Of Practices for a given OS release, or simply are not well understood and not applicable without affecting (badly) current running environments. More, already present tunings are reported as-is on upgraded and fresh installed systems without more thinking, or without be assured these are always applicable (or obsolete) and what are the new defaults (if not dynamic). One of the most representative example today of this is the new System V IPC facilities found from the GA Solaris 10, and later, operating system, where some Oracle DBAs always ask SA team for shared memory settings as found on Solaris 8 systems.

Although extract from the Solaris Internals and Performance FAQ for ZFS, here is a great excerpt we all must read carefully and try to keep in mind when modifying default behavior of a system:

Tuning is evil and should not be done...in general.

First, consider that the default values are set by the people who know most things about the effects of the tuning. If a better value exists, it would be the default. While alternative values might help a given workload, it could quite possibly degrade some other aspects of performance. Maybe, catastrophically so.

Over time, tuning recommendations might become stale at best or might lead to performance degradations. Customers are leery of changing a tuning that is in place and the net effect is a worse product than what it could be. Moreover, tuning enabled on a given system might spread to other systems, where it might not be warranted at all.

Monday 17 December 2007

A Process Can Not Be Killed Upon Hanging In sendfilev()

Here is a little but annoying bug I faced recently on a production system running Solaris 10 11/06: a process can not be killed.

The KILL signal can't be ignored by a process. The associated handler must be the default--taking the signal into account, and honor it--and it is the case (the incriminated process id is 26632):

# psig 26632 | grep KILL
KILL    default

Nevertheless, the process seems not to stop as expected:

# echo $$
2913
#
# kill -s KILL 26632
#
# dtrace -n 'fbt::sigtoproc:entry /pid == 2913/ {trace(arg2); trace(execname);}'
dtrace: description 'fbt::sigtoproc:entry ' matched 1 probe
CPU     ID                    FUNCTION:NAME
 18  16387                  sigtoproc:entry                 9  tcsh
#
# ps -ef | awk '$2 ~ /26632/ {print $0}'
nonpriv 26632     1   0   Nov 13 ?         0:01 /usr/sbin/in.ftpd -a

This not the correct behavior since a pending signal is not honored:

# pflags 26632
26632:  /usr/sbin/in.ftpd -a
       data model = _ILP32  flags = ORPHAN|MSACCT|MSFORK
       sigpend = 0x00006100,0x00000000
/1:    flags = 0

More, the process seems in a non-coherent system state: it is now nearly impossible to trace it using, truss, the proctools,...

# truss -alef -p 26632
truss: unanticipated system error: 26632
#
# pstack 26632
pstack: cannot examine 26632: unanticipated system error
#
# pfiles 26632
pfiles: unanticipated system error: 26632
#
# pldd 26632
pldd: cannot examine 26632: unanticipated system error

... or helped by DTace:

# dtrace -n 'profile-1000hz /pid ==26632/ { @[stack()] = count() }'
dtrace: description 'profile-1000hz ' matched 1 probe
^C
#
# dtrace -n 'profile-1000hz /pid ==26632/ { @[ustack()] = count() }'
dtrace: description 'profile-1000hz ' matched 1 probe
^C

Follow some information gathered through mdb in kernel debugging mode:

> ::status
debugging live kernel (64-bit) on socrate
operating system: 5.10 Generic_118833-36 (sun4u)
>
> ::showrev
Hostname: socrate
Release: 5.10
Kernel architecture: sun4u
Application architecture: sparcv9
Kernel version: SunOS 5.10 sun4u Generic_118833-36
Platform: SUNW,Sun-Fire-V490
>
> ::pgrep in.ftpd
S    PID   PPID   PGID    SID    UID      FLAGS             ADDR NAME
R  26632      1    261    261  30501 0x5a024b00 00000600144b0c28 in.ftpd
>
> 00000600144b0c28::kill
mdb: command is not supported by current target
>
> 00000600144b0c28::thread
            ADDR    STATE    FLG PFLG SFLG   PRI  EPRI PIL             INTR
00000600144b0c28 inval/2000 157e 5778    0     0     0   6                2
>
> 00000600144b0c28::walk thread | ::findstack
stack pointer for thread 30002aba360: 2a1022e7ec1
[ 000002a1022e7ec1 cv_wait+0x38() ]
  000002a1022e7f71 page_lock_es+0x204()
  000002a1022e8021 pvn_vplist_dirty+0x2a4()
  000002a1022e8101 nfs_putpages+0x124()
  000002a1022e81c1 nfs3_putpage+0xcc()
  000002a1022e8271 fop_putpage+0x1c()
  000002a1022e8321 nfs_purge_caches+0xe4()
  000002a1022e83d1 nfs_attr_cache+0x20c()
  000002a1022e8481 nfs3_getattr_otw+0x1b8()
  000002a1022e85f1 nfs3_validate_caches+0x4c()
  000002a1022e8731 nfs3_getpage+0xa4()
  000002a1022e8861 fop_getpage+0x44()
  000002a1022e8931 segmap_getmapflt+0x588()
  000002a1022e8a41 snf_segmap+0x13c()
  000002a1022e8bc1 sosendfile64+0x298()
  000002a1022e8d21 sendvec64+0xf8()
  000002a1022e8f61 sendfilev+0x178()
  000002a1022e92e1 syscall_trap32+0xcc()

We can see the use of the sendfilev() syscall: it appears to be a known bug around this (you must have a registered Sun support customer account to be able to view the second document).

The good news is the root cause is already fix in the development branch of the operating system. The bad news is, after opening a call to the Sun support team, that no patch will be released in a near future to correct this: the incorporation of the fix is currently planned for the next major Solaris 10 Update (Update 5) which is scheduled for summer 2008. However, the fix is already available via the Solaris Express program, if that is applicable to your environment. As a last note, and if the non-killable process owns a resource necessary to another program, there seems no other option than to plan a reboot of the system.

Saturday 1 December 2007

Nifty Tool For Querying Heterogeneous SCSI Devices

Lasse Østerild remind us about the EMC inq tool, which is able to query SCSI buses to find a large range of devices, of many sort. This great utility support non-EMC targets, and is freely available (just be aware that the latest link seems not to be updated frequently, so check the latest version yourself in the list).

Here are two examples taken respectively from a Sun Fire V490 UltraSPARC system running Solaris 9...

# ./inq.sol64
Inquiry utility, Version V7.3-845 (Rev 2.0)      (SIL Version V6.4.2.0
(Edit Level 845)
Copyright (C) by EMC Corporation, all rights reserved.
For help type inq -h.
---------------------------------------------------------------------------------------
DEVICE                            :VEND     :PROD             :REV  :SERNUM  :CAP(kb)
---------------------------------------------------------------------------------------
/dev/rdsk/c0t0d0s2                :TSSTcorp :DVD-ROM TS-H352C :SI00 :        :  -----
/dev/rdsk/c1t0d0s2                :FUJITSU  :MAX3147FCSUN146G :1103:0639G02A :143369664
/dev/rdsk/c1t3d0s2                :FUJITSU  :MAX3147FCSUN146G :1103:0638G02A :143369664
/dev/rdsk/c2t50060E8004F2F520d0s2 :HP       :OPEN-V*2         :5007 :500F2F5 :103683840
/dev/rdsk/c3t50060E8004F2F510d6s2 :HP       :OPEN-V*7         :5007 :500F2F5 :362893440
/dev/vx/rdmp/XP12K_SQC_0s2        :HP       :OPEN-V*3         :5007 :500F2F5 :155525760
/dev/vx/rdmp/XP12K_SQC_6s2        :HP       :OPEN-V*7         :5007 :500F2F5 :362893440
/dev/vx/rdmp/c1t0d0s2             :FUJITSU  :MAX3147FCSUN146G :1103:0639G02A :143369664
/dev/vx/rdmp/c1t3d0s2             :FUJITSU  :MAX3147FCSUN146G :1103:0638G02A :143369664
# 
# ./inq.sol64 -hba
Inquiry utility, Version V7.3-845 (Rev 2.0)      (SIL Version V6.4.2.0
(Edit Level 845)
Copyright (C) by EMC Corporation, all rights reserved.
For help type inq -h.
---------------------------------------------------
HBA name:           QLogic Corp.-2200-0
host WWN:           200000144F415386
vendor name:        QLogic Corp.
model:              2200
firmware version:   2.1.144
driver version:     20060630-2.16
serial number:      Unknown
vendor code:        0x144f
HBA type:           Fibre Channel
port count:         1

port number:       1
    port WWN:           210000144F415386
    Port OS name:       /dev/cfg/c1
    port type:          LPORT
    port speed:         1GBIT
    supported speed:    1GBIT
    port state:         ONLINE
    port FCID:          0x1
---------------------------------------------------
HBA name:           Sun Microsystems, Inc.-LP10000-S-1
host WWN:           20000000C957A8E8
vendor name:        Sun Microsystems, Inc.
model:              LP10000-S
firmware version:   1.91a5
driver version:     1.11i (2006.07.11.10.53)
serial number:      0999BG0-0635000219
vendor code:        0xc9
HBA type:           Fibre Channel
port count:         1

port number:       1
    port WWN:           10000000C957A8E8
    Port OS name:       /dev/cfg/c2
    port type:          NPORT
    port speed:         2GBIT
    supported speed:    2GBIT
    port state:         ONLINE
    port FCID:          0x10900
[...]

... and a HP DL585G2 AMD64 system running RHEL4U5. Both are connected to a remote SAN served by a HP XP12K (HDS refurbished) storage system:

# ./inq.LinuxAMD64 -f_powerpath -f_hds
Inquiry utility, Version V7.3-845 (Rev 2.0)      (SIL Version V6.4.2.0
(Edit Level 845)
Copyright (C) by EMC Corporation, all rights reserved.
For help type inq -h.
----------------------------------------------------------------------
DEVICE         :VEND    :PROD            :REV   :SER NUM    :CAP(kb)
----------------------------------------------------------------------
/dev/emcpowere :HP      :OPEN-V*6        :5007  :50 0F2F5   :311051520
/dev/emcpowerf :HP      :OPEN-V*6        :5007  :50 0F2F5   :311051520
/dev/emcpowerg :HP      :OPEN-V*10       :5007  :50 0F2F5   :307785600
/dev/emcpowerh :HP      :OPEN-V*10       :5007  :50 0F2F5   :307785600
/dev/emcpowerj :HP      :OPEN-V*6        :5007  :50 0F2F5   :307203840
/dev/emcpowerk :HP      :OPEN-V*4        :5007  :50 0F2F5   :206085120

Although this tool works great with an OpenSolaris distribution (say, the Solaris Express family), there appear not to have a x86 declination which is a pity knowing the growing Solaris/OpenSolaris community in the marketplace today. Well, maybe for a next inq release, at least I hope.

Tuesday 16 October 2007

Error While Patching A New Boot Environment

After creating a new boot environment (BE) named beastie, see below, to upgrade a system running Solaris 8 to Solaris 10 11/06 (as I do many times in the past without a hiccup), I encounter a problem when I tried to apply the appropriate Recommended cluster patch to the new BE with this message:

# luupgrade -t -n beastie -s /var/tmp/10_Recommended

Validating the contents of the media .
The media contains 76 software patches that can be added.
All 76 patches will be added because you did not specify any specific
patches to add.
Mounting the BE .
ERROR: The boot environment  supports non-global
zones.The current boot environment does not support non-global zones.
Releases prior to Solaris 10 cannot be used to maintain Solaris 10 and
later releases that include support for non-global zones. You may only
execute the specified operation on a system with Solaris 10 (or later)
installed.

I can't find any reference to a known bug or problem after looking for this against SunSolve, Sun Support, and Googling. Has anyone already seen this error, and solved it The Right Way? As for me, I needed to boot from the BE and apply the cluster patch: this was a pain since this bundle include the -36 kernel patch which is known to be relatively disruptive, since it need two reboot to apply the entire cluster patch (it contains a new version for the kernel).

Update #1 (2009-03-22): Seems to be explained in this excellent BigAdmin article.

Thursday 27 September 2007

European Type6 USB Keyboard Circumflex Key Issue... Solved

I recently worked with Dermot Malone, the Responsible Engineer for this bug report.

The fact is the described problem persist as explained in the Description field of the bug report, running snv_70. But I must say I found something recently: as of now, I used the default C locale. But after reading this documentation (go to Unicode Locale: en_US.UTF-8 Support), I decided to try the en_US.UTF-8 locale, directly chosen from the dtlogin screen... and found that I can now have a similar English environment as before (using C locale), while supporting accent and circumflex characters from applications which support this locale (for example the GNOME Terminal, or Mozilla Firefox). Hum, I just find a little big curious that most of accent characters (say 'é') works properly using the C locale, but that I need to set an other locale to be able to use the circumflex characters (say 'ô').

As a matter of interest, Dermot ask me why I didn't use the fr_FR.UTF-8 locale. Here is my answer:

Well, I am a French guy, but systematically install English operating systems, and use the default C locale. In IT, English is _the_ standard, and all things (manual pages, messages, format strings, etc.) are all homogeneous this way. As a side note, I am sure not to encounter the problems found on RedHat Linux systems when the default locale is not properly supported by the OS sub-systems themselves (the RC scripts generally sets the LANG=C (or something like that) for 'grep', 'sed' and 'awk' for example), or some third party products (such as the IBM TSM Backup Archive client).

Now, using the en_US.UTF-8 locale on Solaris and Solaris Express, I can have best of both worlds: a fully functional (and supported) English environment, and be able to use extended characters specific to my language.

Friday 21 September 2007

64-bit System (Kernel), But 32-bit Binaries?

Once again, well known and great developer Casper Dik from Sun Microsystems give us an interesting answer on the fact that there is not 32-bit emulation in the Solaris OS when running a 64-bit kernel, and why most installed binaries seems to be compiled as 32-bit executables, even if you boot on a 64-bit platform, as in the following example:

# isainfo -kv
64-bit amd64 kernel modules
# file /usr/bin/ls
/usr/bin/ls:    ELF 32-bit LSB executable 80386 Version 1 [FPU], dynamically linked, \
 not stripped, no debugging information available

Here is the explanation, shown as a little Q&A:

Q: Once I boot it in 64bit mode, i'd have to run emulation libraries to run 32bit bins right?

A: No; you run the exact same binaries and libraries under 32 and 64 bit. It's not emulation; it's basically two syscall entry tables one for 32 bit and one for 64 bit, mostly sharing the same code except where pointer sizes matter.

Q: Since it will be a very light server (only bare binaries to run my req.s) they probably end up most likely all being 64bit?

A: No, most binaries are 32 bit only those that need to be 64 bit are 64 bit.

Q: so you are saying that i can run both 32bit and 64bit code simultaneously, natively with the solaris kernel? That's pretty damn cool

A: Correct. There are several reasons for an OS which generally comes in binary distributions to do so:

  • maintain complete binary compatibility with old applications (and yes, closed source does matter, even to a lot of Linux customers).
  • allows a single distribution to work on both 32 and 64 bit systems of the same architecture.

As a last note, it is interesting to see that a full install of SXCE snv_70 provide 1188 32-bit files and only 82 64-bit files in the following paths /bin, /usr/bin, /usr/openwin/bin, /usr/ucb, /usr/sfw/bin, and /opt/SUNWspro/bin. The 64-bit files are mostly stored in a specific architecture subdirectory; i.e. amd64 in this case, while some of them are provided as both 32-bit and 64-bit incarnations:

# file /usr/bin/pfiles /usr/bin/amd64/pfiles
/usr/bin/pfiles:        ELF 32-bit LSB executable 80386 Version 1 [FPU], \
 dynamically linked, not stripped, no debugging information available
/usr/bin/amd64/pfiles:  ELF 64-bit LSB executable AMD64 Version 1 [SSE FXSR FPU], \
 dynamically linked, not stripped, no debugging information available

Update #1: 2008-04-07

Be sure to consult the excellent 64-bit FAQ for Solaris.

- page 4 of 16 -