blog'o thnet

To content | To menu | To search

Sunday 23 December 2007

Tuning Is Evil

Each month, I hear many coworkers or specific application management teams asking about putting some system tunings in place, even on very recent operating system releases. All the time. Most of these settings comes from the Internet, are found in forum posts, or articles related to a subsystem, or in technical publications. And some of them comes from third party software providers, or editors. A very, very few settings are proposed or recommended by system administrators, or by knowledgeable people in tuning area.

The problem is that, most of the time, these tunings are related to another release of the operating system, are not updated to keep current with the Best Of Practices for a given OS release, or simply are not well understood and not applicable without affecting (badly) current running environments. More, already present tunings are reported as-is on upgraded and fresh installed systems without more thinking, or without be assured these are always applicable (or obsolete) and what are the new defaults (if not dynamic). One of the most representative example today of this is the new System V IPC facilities found from the GA Solaris 10, and later, operating system, where some Oracle DBAs always ask SA team for shared memory settings as found on Solaris 8 systems.

Although extract from the Solaris Internals and Performance FAQ for ZFS, here is a great excerpt we all must read carefully and try to keep in mind when modifying default behavior of a system:

Tuning is evil and should not be done...in general.

First, consider that the default values are set by the people who know most things about the effects of the tuning. If a better value exists, it would be the default. While alternative values might help a given workload, it could quite possibly degrade some other aspects of performance. Maybe, catastrophically so.

Over time, tuning recommendations might become stale at best or might lead to performance degradations. Customers are leery of changing a tuning that is in place and the net effect is a worse product than what it could be. Moreover, tuning enabled on a given system might spread to other systems, where it might not be warranted at all.

Monday 17 December 2007

A Process Can Not Be Killed Upon Hanging In sendfilev()

Here is a little but annoying bug I faced recently on a production system running Solaris 10 11/06: a process can not be killed.

The KILL signal can't be ignored by a process. The associated handler must be the default--taking the signal into account, and honor it--and it is the case (the incriminated process id is 26632):

# psig 26632 | grep KILL
KILL    default

Nevertheless, the process seems not to stop as expected:

# echo $$
2913
#
# kill -s KILL 26632
#
# dtrace -n 'fbt::sigtoproc:entry /pid == 2913/ {trace(arg2); trace(execname);}'
dtrace: description 'fbt::sigtoproc:entry ' matched 1 probe
CPU     ID                    FUNCTION:NAME
 18  16387                  sigtoproc:entry                 9  tcsh
#
# ps -ef | awk '$2 ~ /26632/ {print $0}'
nonpriv 26632     1   0   Nov 13 ?         0:01 /usr/sbin/in.ftpd -a

This not the correct behavior since a pending signal is not honored:

# pflags 26632
26632:  /usr/sbin/in.ftpd -a
       data model = _ILP32  flags = ORPHAN|MSACCT|MSFORK
       sigpend = 0x00006100,0x00000000
/1:    flags = 0

More, the process seems in a non-coherent system state: it is now nearly impossible to trace it using, truss, the proctools,...

# truss -alef -p 26632
truss: unanticipated system error: 26632
#
# pstack 26632
pstack: cannot examine 26632: unanticipated system error
#
# pfiles 26632
pfiles: unanticipated system error: 26632
#
# pldd 26632
pldd: cannot examine 26632: unanticipated system error

... or helped by DTace:

# dtrace -n 'profile-1000hz /pid ==26632/ { @[stack()] = count() }'
dtrace: description 'profile-1000hz ' matched 1 probe
^C
#
# dtrace -n 'profile-1000hz /pid ==26632/ { @[ustack()] = count() }'
dtrace: description 'profile-1000hz ' matched 1 probe
^C

Follow some information gathered through mdb in kernel debugging mode:

> ::status
debugging live kernel (64-bit) on socrate
operating system: 5.10 Generic_118833-36 (sun4u)
>
> ::showrev
Hostname: socrate
Release: 5.10
Kernel architecture: sun4u
Application architecture: sparcv9
Kernel version: SunOS 5.10 sun4u Generic_118833-36
Platform: SUNW,Sun-Fire-V490
>
> ::pgrep in.ftpd
S    PID   PPID   PGID    SID    UID      FLAGS             ADDR NAME
R  26632      1    261    261  30501 0x5a024b00 00000600144b0c28 in.ftpd
>
> 00000600144b0c28::kill
mdb: command is not supported by current target
>
> 00000600144b0c28::thread
            ADDR    STATE    FLG PFLG SFLG   PRI  EPRI PIL             INTR
00000600144b0c28 inval/2000 157e 5778    0     0     0   6                2
>
> 00000600144b0c28::walk thread | ::findstack
stack pointer for thread 30002aba360: 2a1022e7ec1
[ 000002a1022e7ec1 cv_wait+0x38() ]
  000002a1022e7f71 page_lock_es+0x204()
  000002a1022e8021 pvn_vplist_dirty+0x2a4()
  000002a1022e8101 nfs_putpages+0x124()
  000002a1022e81c1 nfs3_putpage+0xcc()
  000002a1022e8271 fop_putpage+0x1c()
  000002a1022e8321 nfs_purge_caches+0xe4()
  000002a1022e83d1 nfs_attr_cache+0x20c()
  000002a1022e8481 nfs3_getattr_otw+0x1b8()
  000002a1022e85f1 nfs3_validate_caches+0x4c()
  000002a1022e8731 nfs3_getpage+0xa4()
  000002a1022e8861 fop_getpage+0x44()
  000002a1022e8931 segmap_getmapflt+0x588()
  000002a1022e8a41 snf_segmap+0x13c()
  000002a1022e8bc1 sosendfile64+0x298()
  000002a1022e8d21 sendvec64+0xf8()
  000002a1022e8f61 sendfilev+0x178()
  000002a1022e92e1 syscall_trap32+0xcc()

We can see the use of the sendfilev() syscall: it appears to be a known bug around this (you must have a registered Sun support customer account to be able to view the second document).

The good news is the root cause is already fix in the development branch of the operating system. The bad news is, after opening a call to the Sun support team, that no patch will be released in a near future to correct this: the incorporation of the fix is currently planned for the next major Solaris 10 Update (Update 5) which is scheduled for summer 2008. However, the fix is already available via the Solaris Express program, if that is applicable to your environment. As a last note, and if the non-killable process owns a resource necessary to another program, there seems no other option than to plan a reboot of the system.

Saturday 1 December 2007

Nifty Tool For Querying Heterogeneous SCSI Devices

Lasse Østerild remind us about the EMC inq tool, which is able to query SCSI buses to find a large range of devices, of many sort. This great utility support non-EMC targets, and is freely available (just be aware that the latest link seems not to be updated frequently, so check the latest version yourself in the list).

Here are two examples taken respectively from a Sun Fire V490 UltraSPARC system running Solaris 9...

# ./inq.sol64
Inquiry utility, Version V7.3-845 (Rev 2.0)      (SIL Version V6.4.2.0
(Edit Level 845)
Copyright (C) by EMC Corporation, all rights reserved.
For help type inq -h.
---------------------------------------------------------------------------------------
DEVICE                            :VEND     :PROD             :REV  :SERNUM  :CAP(kb)
---------------------------------------------------------------------------------------
/dev/rdsk/c0t0d0s2                :TSSTcorp :DVD-ROM TS-H352C :SI00 :        :  -----
/dev/rdsk/c1t0d0s2                :FUJITSU  :MAX3147FCSUN146G :1103:0639G02A :143369664
/dev/rdsk/c1t3d0s2                :FUJITSU  :MAX3147FCSUN146G :1103:0638G02A :143369664
/dev/rdsk/c2t50060E8004F2F520d0s2 :HP       :OPEN-V*2         :5007 :500F2F5 :103683840
/dev/rdsk/c3t50060E8004F2F510d6s2 :HP       :OPEN-V*7         :5007 :500F2F5 :362893440
/dev/vx/rdmp/XP12K_SQC_0s2        :HP       :OPEN-V*3         :5007 :500F2F5 :155525760
/dev/vx/rdmp/XP12K_SQC_6s2        :HP       :OPEN-V*7         :5007 :500F2F5 :362893440
/dev/vx/rdmp/c1t0d0s2             :FUJITSU  :MAX3147FCSUN146G :1103:0639G02A :143369664
/dev/vx/rdmp/c1t3d0s2             :FUJITSU  :MAX3147FCSUN146G :1103:0638G02A :143369664
# 
# ./inq.sol64 -hba
Inquiry utility, Version V7.3-845 (Rev 2.0)      (SIL Version V6.4.2.0
(Edit Level 845)
Copyright (C) by EMC Corporation, all rights reserved.
For help type inq -h.
---------------------------------------------------
HBA name:           QLogic Corp.-2200-0
host WWN:           200000144F415386
vendor name:        QLogic Corp.
model:              2200
firmware version:   2.1.144
driver version:     20060630-2.16
serial number:      Unknown
vendor code:        0x144f
HBA type:           Fibre Channel
port count:         1

port number:       1
    port WWN:           210000144F415386
    Port OS name:       /dev/cfg/c1
    port type:          LPORT
    port speed:         1GBIT
    supported speed:    1GBIT
    port state:         ONLINE
    port FCID:          0x1
---------------------------------------------------
HBA name:           Sun Microsystems, Inc.-LP10000-S-1
host WWN:           20000000C957A8E8
vendor name:        Sun Microsystems, Inc.
model:              LP10000-S
firmware version:   1.91a5
driver version:     1.11i (2006.07.11.10.53)
serial number:      0999BG0-0635000219
vendor code:        0xc9
HBA type:           Fibre Channel
port count:         1

port number:       1
    port WWN:           10000000C957A8E8
    Port OS name:       /dev/cfg/c2
    port type:          NPORT
    port speed:         2GBIT
    supported speed:    2GBIT
    port state:         ONLINE
    port FCID:          0x10900
[...]

... and a HP DL585G2 AMD64 system running RHEL4U5. Both are connected to a remote SAN served by a HP XP12K (HDS refurbished) storage system:

# ./inq.LinuxAMD64 -f_powerpath -f_hds
Inquiry utility, Version V7.3-845 (Rev 2.0)      (SIL Version V6.4.2.0
(Edit Level 845)
Copyright (C) by EMC Corporation, all rights reserved.
For help type inq -h.
----------------------------------------------------------------------
DEVICE         :VEND    :PROD            :REV   :SER NUM    :CAP(kb)
----------------------------------------------------------------------
/dev/emcpowere :HP      :OPEN-V*6        :5007  :50 0F2F5   :311051520
/dev/emcpowerf :HP      :OPEN-V*6        :5007  :50 0F2F5   :311051520
/dev/emcpowerg :HP      :OPEN-V*10       :5007  :50 0F2F5   :307785600
/dev/emcpowerh :HP      :OPEN-V*10       :5007  :50 0F2F5   :307785600
/dev/emcpowerj :HP      :OPEN-V*6        :5007  :50 0F2F5   :307203840
/dev/emcpowerk :HP      :OPEN-V*4        :5007  :50 0F2F5   :206085120

Although this tool works great with an OpenSolaris distribution (say, the Solaris Express family), there appear not to have a x86 declination which is a pity knowing the growing Solaris/OpenSolaris community in the marketplace today. Well, maybe for a next inq release, at least I hope.

Tuesday 16 October 2007

Error While Patching A New Boot Environment

After creating a new boot environment (BE) named beastie, see below, to upgrade a system running Solaris 8 to Solaris 10 11/06 (as I do many times in the past without a hiccup), I encounter a problem when I tried to apply the appropriate Recommended cluster patch to the new BE with this message:

# luupgrade -t -n beastie -s /var/tmp/10_Recommended

Validating the contents of the media .
The media contains 76 software patches that can be added.
All 76 patches will be added because you did not specify any specific
patches to add.
Mounting the BE .
ERROR: The boot environment  supports non-global
zones.The current boot environment does not support non-global zones.
Releases prior to Solaris 10 cannot be used to maintain Solaris 10 and
later releases that include support for non-global zones. You may only
execute the specified operation on a system with Solaris 10 (or later)
installed.

I can't find any reference to a known bug or problem after looking for this against SunSolve, Sun Support, and Googling. Has anyone already seen this error, and solved it The Right Way? As for me, I needed to boot from the BE and apply the cluster patch: this was a pain since this bundle include the -36 kernel patch which is known to be relatively disruptive, since it need two reboot to apply the entire cluster patch (it contains a new version for the kernel).

Thursday 27 September 2007

European Type6 USB Keyboard Circumflex Key Issue... Solved

I recently worked with Dermot Malone, the Responsible Engineer for this bug report.

The fact is the described problem persist as explained in the Description field of the bug report, running snv_70. But I must say I found something recently: as of now, I used the default C locale. But after reading this documentation (go to Unicode Locale: en_US.UTF-8 Support), I decided to try the en_US.UTF-8 locale, directly chosen from the dtlogin screen... and found that I can now have a similar English environment as before (using C locale), while supporting accent and circumflex characters from applications which support this locale (for example the GNOME Terminal, or Mozilla Firefox). Hum, I just find a little big curious that most of accent characters (say 'é') works properly using the C locale, but that I need to set an other locale to be able to use the circumflex characters (say 'ô').

As a matter of interest, Dermot ask me why I didn't use the fr_FR.UTF-8 locale. Here is my answer:

Well, I am a French guy, but systematically install English operating systems, and use the default C locale. In IT, English is _the_ standard, and all things (manual pages, messages, format strings, etc.) are all homogeneous this way. As a side note, I am sure not to encounter the problems found on RedHat Linux systems when the default locale is not properly supported by the OS sub-systems themselves (the RC scripts generally sets the LANG=C (or something like that) for 'grep', 'sed' and 'awk' for example), or some third party products (such as the IBM TSM Backup Archive client).

Now, using the en_US.UTF-8 locale on Solaris and Solaris Express, I can have best of both worlds: a fully functional (and supported) English environment, and be able to use extended characters specific to my language.

Friday 21 September 2007

64-bit System (Kernel), But 32-bit Binaries?

Once again, well known and great developer Casper Dik from Sun Microsystems give us an interesting answer on the fact that there is not 32-bit emulation in the Solaris OS when running a 64-bit kernel, and why most installed binaries seems to be compiled as 32-bit executables, even if you boot on a 64-bit platform, as in the following example:

# isainfo -kv
64-bit amd64 kernel modules
# file /usr/bin/ls
/usr/bin/ls:    ELF 32-bit LSB executable 80386 Version 1 [FPU], dynamically linked, \
 not stripped, no debugging information available

Here is the explanation, shown as a little Q&A:

Q: Once I boot it in 64bit mode, i'd have to run emulation libraries to run 32bit bins right?

A: No; you run the exact same binaries and libraries under 32 and 64 bit. It's not emulation; it's basically two syscall entry tables one for 32 bit and one for 64 bit, mostly sharing the same code except where pointer sizes matter.

Q: Since it will be a very light server (only bare binaries to run my req.s) they probably end up most likely all being 64bit?

A: No, most binaries are 32 bit only those that need to be 64 bit are 64 bit.

Q: so you are saying that i can run both 32bit and 64bit code simultaneously, natively with the solaris kernel? That's pretty damn cool

A: Correct. There are several reasons for an OS which generally comes in binary distributions to do so:

  • maintain complete binary compatibility with old applications (and yes, closed source does matter, even to a lot of Linux customers).
  • allows a single distribution to work on both 32 and 64 bit systems of the same architecture.

As a last note, it is interesting to see that a full install of SXCE snv_70 provide 1188 32-bit files and only 82 64-bit files in the following paths /bin, /usr/bin, /usr/openwin/bin, /usr/ucb, /usr/sfw/bin, and /opt/SUNWspro/bin. The 64-bit files are mostly stored in a specific architecture subdirectory; i.e. amd64 in this case, while some of them are provided as both 32-bit and 64-bit incarnations:

# file /usr/bin/pfiles /usr/bin/amd64/pfiles
/usr/bin/pfiles:        ELF 32-bit LSB executable 80386 Version 1 [FPU], \
 dynamically linked, not stripped, no debugging information available
/usr/bin/amd64/pfiles:  ELF 64-bit LSB executable AMD64 Version 1 [SSE FXSR FPU], \
 dynamically linked, not stripped, no debugging information available

Update #1: 2008-04-07

Be sure to consult the excellent 64-bit FAQ for Solaris.

Wednesday 12 September 2007

The BrandZ Framework Enhancements

Although not very versed in BrandZ technologies, I tried the lx brand in the past, with success. But the fact is, the just supported release of the lx brand in the Solaris 8/07 release will not really help us, since it can only provide a 2.4 Linux kernel working environment which is now a bit outdated (yes, I know the most recent Linux distributions will be supported via the Xen open-source hypervisor ). Interestingly, two recent news will certainly (hopefully?) change this in the future.

The first one is the availability of an experimental Linux 2.6 support in OpenSolaris. The work has been done by a summer intern (!) in the Solaris kernel group, Evan Hoke. What a hard work done in so little time, really impressive. I just hope that the community will follow and will enhance the proposed experimental branded zone extension.

The second news is really exciting too. This is the upcoming availability of a SPARC-only brand designed to emulate the Solaris 8 kernel. Although not yet ready for use, this news is really, really amazing since this enable a more soft-upgrade to Solaris 10 by enabling the use of all the great features provided to us with this version (DTrace, ZFS, FMA, SMF, etc.) on the global zone, while running a Solaris 8 branded zone... on top of the most recent hardware (say Sun Fire V215, T1000, T2000, etc.). Wow. I just can't wait to try this out, in particular since it will be available for the last supported Solaris 8/07 release.

Wednesday 5 September 2007

Solaris 10 8/07 Release

The long awaited Solaris 10 8/07 release, a.k.a. Update 4 is now available, and ready to download. This release was postponed many times due to some concerns about better testing and validating advanced features such as the new Deferred-Activation Patching, or the support for the upcoming AMD quad-core next generation processor. Interestingly, this release re-introduce the Documentation DVD.

As always, you must consult the excellent What's New in the Solaris 10 8/07 Release web pages. The following points are those I prefer:

Last, it is interesting to mention that the release number corresponds to the build date of the release, not the general availability (GA)date. It is the case for the 8/07 release, made public at the beginning of September 2007. And it was the case in the past with the 11/06 release which was made publicly available at the beginning of December, although it was build during mid-November 2006:

# cat /etc/release
                       Solaris 10 11/06 s10s_u3wos_10 SPARC
           Copyright 2006 Sun Microsystems, Inc.  All Rights Reserved.
                        Use is subject to license terms.
                           Assembled 14 November 2006

# cat /etc/release
                       Solaris 10 8/07 s10x_u4wos_12b X86
           Copyright 2007 Sun Microsystems, Inc.  All Rights Reserved.
                        Use is subject to license terms.
                           Assembled 16 August 2007

Update #1: 2007-09-10

I just forgot to update the preferred list with this particular point: the lx Branded Zones: Solaris Containers for Linux Applications. Let me explain why in a next blog entry.

Update #2: 2007-09-11

Sadly, the impressive work currently being done in the OpenSolaris community (and already available with the Solaris Express Community Edition) on the performance on NFS with ZFS problem--particularly the boot time problem--is not yet back-ported to the official release tree, including the use of evolution from sharemgr(1m) plus in-kernel sharetab(4) facility. I hope for the next Update.

Monday 3 September 2007

Why Set a Local authorized_keys File in a NFS Shared Environment

Why set the authorized_keys file to a local pathname on large UNIX environments, especially when NFS shares are used for home directories? Because this can address security problems.

First, you must remember that this special SSH file stores the public key of a remote account, letting the owner to be able to log-in using asymmetric keys along with the corresponding passphrase instead of the more classical challenge with appropriate password mechanism. (This eventually enable for non-interactive login through the use of an SSH agent, latter.)

The default path for the authorized_keys file is in a subdirectory of the home directory. This means that when the home of a UNIX account is hosted on a NFS share, all servers available in the same domain as the NFS resource will have access to the very same authorized_keys file, thus opening a security flaw. This is a security concern since by allowing one account on one server, you open this account to all servers in the same domain.

So, the first benefit to store the authorized_keys file in a local name space on each server is to authorize one--and only one--access to a given machine. The direct drawback is that there will be as many authorized_keys file as the number of servers in a domain (if a SSH access is needed on all servers). A side effect is that the path, mode and owner of the directory which will host the authorized_keys file may be better managed and hardened than before (even if SSH already check those things for sane defaults). It is particularly of interest when managing thousands of servers in heterogeneous UNIX environment, when Solaris, AIX, Linux and HP-UX doesn't have the same ownership same system paths (such as /var, for example).

Wednesday 22 August 2007

SunSpectrum Enterprise Service Plan Supports Solaris 10 Across Non-Sun x64/x86 Servers

You are a big corporate Sun customer? You are particularly interested in running the Solaris operating system on various and heterogeneous x64/x86 platforms including non-Sun servers (more and more hardware providers are certifying their servers to run the Solaris 10 OS these days)?

You are already aware that H-P can be a SPOC (Single Point Of Contact) for both their hardware and the support for the Red Hat Enterprise Linux system, and will enjoy the very same mechanism for the Solaris platform?

Well. It is now possible... and a little more in fact: this quiet the opposite. I mean, you can ask Sun to be the SPOC for running Solaris on Sun and non-Sun systems. I feel strongly having Sun to be the very first entry point is a must, in particular since they know really well (today) what can be done on Solaris on x64/x86 platforms, seeing the rise of work which is currently achieve through the OpenSolaris project. Not only Solaris (and its open source counterparts distributions) is becoming more and more relevant in the x64/x86 AMD/Intel world, but they are leading innovations on these platforms.

So, enjoy the not-so-recent announcement about the new SunSpectrum Enterprise Service Plan for enterprise customers.

- page 3 of 15 -