blog'o thnet

To content | To menu | To search

Sunday 9 March 2008

Update A Corrupted GRUB Boot Archive, Without SVM

Solaris 10 systems on x86 architecture use the GNU GRand Unified Bootloader (GRUB) which is the boot loader responsible for loading a boot archive into a system's memory. The boot archive is a collection of critical files (kernel modules and configuration files) that are required to boot the Solaris OS. As stated in the Sun documentation:

These files are needed during system startup before the root file system is mounted. Two boot archives are maintained on a system:

  • The boot archive that is used to boot the Solaris OS on a system. This boot archive is sometimes called the primary boot archive.
  • The boot archive that is used for recovery when the primary boot archive is damaged. This boot archive starts the system without mounting the root file system. On the GRUB menu, this boot archive is called failsafe. The archive's essential purpose is to regenerate the primary boot archive, which is usually used to boot the system.

The Solaris OS generally keeps the boot archive properly synchronized on its own. Sometimes, the boot archive gets corrupted--for example when (bad) patches are applied, or the the operating system crashed. In these cases, the boot archive must be regenerated. This is easily accomplished following the Sun documentations x86: How to Boot the Failsafe Archive for Recovery Purposes, and x86: How to Boot the Failsafe Archive to Forcibly Update a Corrupt Boot Archive. The main drawback is when the system is encapsulated under a SVM mirror (RAID-1) since the md driver is not managed under the failsafe mode. Please refer to this blog entry on this subject, if needed.

Monday 17 December 2007

A Process Can Not Be Killed Upon Hanging In sendfilev()

Here is a little but annoying bug I faced recently on a production system running Solaris 10 11/06: a process can not be killed.

The KILL signal can't be ignored by a process. The associated handler must be the default--taking the signal into account, and honor it--and it is the case (the incriminated process id is 26632):

# psig 26632 | grep KILL
KILL    default

Nevertheless, the process seems not to stop as expected:

# echo $$
2913
#
# kill -s KILL 26632
#
# dtrace -n 'fbt::sigtoproc:entry /pid == 2913/ {trace(arg2); trace(execname);}'
dtrace: description 'fbt::sigtoproc:entry ' matched 1 probe
CPU     ID                    FUNCTION:NAME
 18  16387                  sigtoproc:entry                 9  tcsh
#
# ps -ef | awk '$2 ~ /26632/ {print $0}'
nonpriv 26632     1   0   Nov 13 ?         0:01 /usr/sbin/in.ftpd -a

This not the correct behavior since a pending signal is not honored:

# pflags 26632
26632:  /usr/sbin/in.ftpd -a
       data model = _ILP32  flags = ORPHAN|MSACCT|MSFORK
       sigpend = 0x00006100,0x00000000
/1:    flags = 0

More, the process seems in a non-coherent system state: it is now nearly impossible to trace it using, truss, the proctools,...

# truss -alef -p 26632
truss: unanticipated system error: 26632
#
# pstack 26632
pstack: cannot examine 26632: unanticipated system error
#
# pfiles 26632
pfiles: unanticipated system error: 26632
#
# pldd 26632
pldd: cannot examine 26632: unanticipated system error

... or helped by DTace:

# dtrace -n 'profile-1000hz /pid ==26632/ { @[stack()] = count() }'
dtrace: description 'profile-1000hz ' matched 1 probe
^C
#
# dtrace -n 'profile-1000hz /pid ==26632/ { @[ustack()] = count() }'
dtrace: description 'profile-1000hz ' matched 1 probe
^C

Follow some information gathered through mdb in kernel debugging mode:

> ::status
debugging live kernel (64-bit) on socrate
operating system: 5.10 Generic_118833-36 (sun4u)
>
> ::showrev
Hostname: socrate
Release: 5.10
Kernel architecture: sun4u
Application architecture: sparcv9
Kernel version: SunOS 5.10 sun4u Generic_118833-36
Platform: SUNW,Sun-Fire-V490
>
> ::pgrep in.ftpd
S    PID   PPID   PGID    SID    UID      FLAGS             ADDR NAME
R  26632      1    261    261  30501 0x5a024b00 00000600144b0c28 in.ftpd
>
> 00000600144b0c28::kill
mdb: command is not supported by current target
>
> 00000600144b0c28::thread
            ADDR    STATE    FLG PFLG SFLG   PRI  EPRI PIL             INTR
00000600144b0c28 inval/2000 157e 5778    0     0     0   6                2
>
> 00000600144b0c28::walk thread | ::findstack
stack pointer for thread 30002aba360: 2a1022e7ec1
[ 000002a1022e7ec1 cv_wait+0x38() ]
  000002a1022e7f71 page_lock_es+0x204()
  000002a1022e8021 pvn_vplist_dirty+0x2a4()
  000002a1022e8101 nfs_putpages+0x124()
  000002a1022e81c1 nfs3_putpage+0xcc()
  000002a1022e8271 fop_putpage+0x1c()
  000002a1022e8321 nfs_purge_caches+0xe4()
  000002a1022e83d1 nfs_attr_cache+0x20c()
  000002a1022e8481 nfs3_getattr_otw+0x1b8()
  000002a1022e85f1 nfs3_validate_caches+0x4c()
  000002a1022e8731 nfs3_getpage+0xa4()
  000002a1022e8861 fop_getpage+0x44()
  000002a1022e8931 segmap_getmapflt+0x588()
  000002a1022e8a41 snf_segmap+0x13c()
  000002a1022e8bc1 sosendfile64+0x298()
  000002a1022e8d21 sendvec64+0xf8()
  000002a1022e8f61 sendfilev+0x178()
  000002a1022e92e1 syscall_trap32+0xcc()

We can see the use of the sendfilev() syscall: it appears to be a known bug around this (you must have a registered Sun support customer account to be able to view the second document).

The good news is the root cause is already fix in the development branch of the operating system. The bad news is, after opening a call to the Sun support team, that no patch will be released in a near future to correct this: the incorporation of the fix is currently planned for the next major Solaris 10 Update (Update 5) which is scheduled for summer 2008. However, the fix is already available via the Solaris Express program, if that is applicable to your environment. As a last note, and if the non-killable process owns a resource necessary to another program, there seems no other option than to plan a reboot of the system.

Tuesday 16 October 2007

Error While Patching A New Boot Environment

After creating a new boot environment (BE) named beastie, see below, to upgrade a system running Solaris 8 to Solaris 10 11/06 (as I do many times in the past without a hiccup), I encounter a problem when I tried to apply the appropriate Recommended cluster patch to the new BE with this message:

# luupgrade -t -n beastie -s /var/tmp/10_Recommended

Validating the contents of the media .
The media contains 76 software patches that can be added.
All 76 patches will be added because you did not specify any specific
patches to add.
Mounting the BE .
ERROR: The boot environment  supports non-global
zones.The current boot environment does not support non-global zones.
Releases prior to Solaris 10 cannot be used to maintain Solaris 10 and
later releases that include support for non-global zones. You may only
execute the specified operation on a system with Solaris 10 (or later)
installed.

I can't find any reference to a known bug or problem after looking for this against SunSolve, Sun Support, and Googling. Has anyone already seen this error, and solved it The Right Way? As for me, I needed to boot from the BE and apply the cluster patch: this was a pain since this bundle include the -36 kernel patch which is known to be relatively disruptive, since it need two reboot to apply the entire cluster patch (it contains a new version for the kernel).

Wednesday 5 September 2007

Solaris 10 8/07 Release

The long awaited Solaris 10 8/07 release, a.k.a. Update 4 is now available, and ready to download. This release was postponed many times due to some concerns about better testing and validating advanced features such as the new Deferred-Activation Patching, or the support for the upcoming AMD quad-core next generation processor. Interestingly, this release re-introduce the Documentation DVD.

As always, you must consult the excellent What's New in the Solaris 10 8/07 Release web pages. The following points are those I prefer:

Last, it is interesting to mention that the release number corresponds to the build date of the release, not the general availability (GA)date. It is the case for the 8/07 release, made public at the beginning of September 2007. And it was the case in the past with the 11/06 release which was made publicly available at the beginning of December, although it was build during mid-November 2006:

# cat /etc/release
                       Solaris 10 11/06 s10s_u3wos_10 SPARC
           Copyright 2006 Sun Microsystems, Inc.  All Rights Reserved.
                        Use is subject to license terms.
                           Assembled 14 November 2006

# cat /etc/release
                       Solaris 10 8/07 s10x_u4wos_12b X86
           Copyright 2007 Sun Microsystems, Inc.  All Rights Reserved.
                        Use is subject to license terms.
                           Assembled 16 August 2007

Update #1: 2007-09-10

I just forgot to update the preferred list with this particular point: the lx Branded Zones: Solaris Containers for Linux Applications. Let me explain why in a next blog entry.

Update #2: 2007-09-11

Sadly, the impressive work currently being done in the OpenSolaris community (and already available with the Solaris Express Community Edition) on the performance on NFS with ZFS problem--particularly the boot time problem--is not yet back-ported to the official release tree, including the use of evolution from sharemgr(1m) plus in-kernel sharetab(4) facility. I hope for the next Update.

Wednesday 22 August 2007

SunSpectrum Enterprise Service Plan Supports Solaris 10 Across Non-Sun x64/x86 Servers

You are a big corporate Sun customer? You are particularly interested in running the Solaris operating system on various and heterogeneous x64/x86 platforms including non-Sun servers (more and more hardware providers are certifying their servers to run the Solaris 10 OS these days)?

You are already aware that H-P can be a SPOC (Single Point Of Contact) for both their hardware and the support for the Red Hat Enterprise Linux system, and will enjoy the very same mechanism for the Solaris platform?

Well. It is now possible... and a little more in fact: this quiet the opposite. I mean, you can ask Sun to be the SPOC for running Solaris on Sun and non-Sun systems. I feel strongly having Sun to be the very first entry point is a must, in particular since they know really well (today) what can be done on Solaris on x64/x86 platforms, seeing the rise of work which is currently achieve through the OpenSolaris project. Not only Solaris (and its open source counterparts distributions) is becoming more and more relevant in the x64/x86 AMD/Intel world, but they are leading innovations on these platforms.

So, enjoy the not-so-recent announcement about the new SunSpectrum Enterprise Service Plan for enterprise customers.

Monday 23 July 2007

Tracking Performance Problem with DTrace

Recently, I faced a performance problem showing an important load average on a server running Nagios, and hosted on a Solaris 10 platform. So, in order to identify the origin of the problem, I decided to use DTrace --for the very first time. DTrace is a comprehensive dynamic tracing framework and provides a powerful infrastructure to concisely answer arbitrary questions about the behavior of the operating system and user programs.

In the following case, I will use some one-liners and Dtrace scripts taken, or derived, from the excellent DTraceToolkit by Brendan Gregg--co-author of Solaris Performance and Tools, from the second edition of the Solaris Internals books. Many, many thanks to him for his great and impressive work.

Back to my server running Nagios, we can see a constant load average around 4 or 5, for a uni-processor system:

# uptime
 11:48am  up 85 day(s), 23:27,  2 users,  load average: 4.53, 4.44, 4.48
# 
# vmstat 1
 kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr m0 m1 m3 m4   in   sy   cs us sy id
 3 0 0 9286536 1484928 1029 8643 42 156 161 0 203 1 2 0 0 676 78530 949 47 44 8
 1 0 0 9263880 1467936 1123 10536 0 0 0 0 0 0  0  0  0  673 37455 1241 52 48 0
 1 0 0 9262232 1455472 1097 10656 0 0 0 0 0 0  0  0  0  771 35824 1496 52 48 0
 0 0 0 9302168 1484760 1236 11513 0 0 0 0 0 0  0  0  0  592 32530 864 50 50  0
 3 0 0 9260872 1458072 967 9788 0 1616 1616 0 0 0 0 0 0 1141 30488 1280 51 48 1
 3 0 0 9224896 1434208 1229 11008 0 0 0 0 0 0  0  0  0  638 36756 1198 53 47 0

So, we can see there is a lot a system CPU time spend on the processor. Can we know what is(are) the current program(s) which seems to take most of this time?

# dtrace -n 'syscall:::entry { @num[execname] = count(); }'
dtrace: description 'syscall:::entry ' matched 226 probes
^C

  fmd                                                               1
  inetd                                                             1
  svc.configd                                                       1
  sshd                                                              8
  svc.startd                                                       12
  sendmail                                                         20
  dced                                                             30
  picld                                                            34
  xntpd                                                            50
  httpd                                                            68
  nscd                                                            236
  dtrace                                                          454
  init                                                           2265
  nagios_xng_1.2                                                 5170
  sh                                                             9645
  mysqld                                                        21901
  php                                                          250039

Well, php is a clear winner here, and that is not abnormal since there are a lot of servers to monitor, including large Sybase data servers. Right now, try to get the most used system calls during php execution:

# dtrace -n 'syscall:::entry { @num[probefunc] = count(); }'
dtrace: description 'syscall:::entry ' matched 226 probes
^C

[...]
  pollsys                                                         951
  llseek                                                         1263
  getcwd                                                         1371
  memcntl                                                        1667
  write                                                          2679
  lseek                                                          3552
  resolvepath                                                    3745
  munmap                                                         3807
  sigaction                                                      3917
  gtime                                                          4228
  fcntl                                                          4758
  close                                                          5185
  lwp_sigmask                                                    5279
  open                                                           5311
  mmap                                                           7933
  brk                                                           34003
  stat                                                          43151
  read                                                          86493

Ok, the read syscall is the very most called one from php processes. We will now try to see if this is always the same php processus which call the read syscall using the readid.d DTrace script. Then, we will get the exact file names accessed (read) by the php processes helped by the readfile.d script:

# ./readid.d php 5s
Sampling for 5s ... Please wait.

PROGRAM                PID      COUNT
[...]
php                  26333       1351
php                  26345       1353
php                  26408       1353
php                  26448       1353
php                  26340       1353
php                  26440       1353
php                  26432       1353
php                  26424       1353
php                  26436       2376
php                  26372       2376
php                  26357       2376
php                  26353       2376
php                  26379       2376
php                  26369       2378
php                  26328       3136
#
# ./readfile.d php 5s
Sampling for 5s ... Please wait.

FILE NAME                                                                 COUNT
[...]
/data/p2/sysdoc/soft/php_5.0.3_3/php.ini                                    231
/data/sybase_client/V12.5.1_32bits_SunOS/charsets/iso_1/charset.loc         330
/data/sybase_client/V12.5.1_32bits_SunOS/config/objectid.dat                374
/data/sybase_client/V12.5.1_32bits_SunOS/locales/message/us_english/cslib.loc
                                                                            429
/data/sybase_client/V12.5.1_32bits_SunOS/charsets/utf8/iso_1.ctb            660
/data/sybase_client/V12.5.1_32bits_SunOS/locales/message/us_english/tcllib.loc
                                                                            858
/data/sybase_client/V12.5.1_32bits_SunOS/locales/locales.dat               1056
/data/sybase_client/V12.5.1_32bits_SunOS/locales/message/us_english/ctlib.loc
                                                                           2244
/data/sybase_client/V12.5.1_32bits_SunOS/charsets/utf8/charset.loc         5473
/tools/list/sybase/interfaces                                             33116

So, we saw that multiple php processes uses the read syscall, and that the number of reads is relatively similar. The second output is more interesting as it shows that the file which is read so often is the interfaces file, a Sybase RDBMS configuration file used for client-server connection definitions.

Last, the opensnoop.d DTrace script shows that some of our Nagios configuration must be done again since we try to access to non-existent paths:

# sh ./opensnoop.d -x -n php
  UID    PID COMM          FD PATH
  620   8483 php           -1 /var/ld/ld.config
  620   8483 php           -1 /bin//php-cli.ini
  620   8483 php           -1 /soft/sun_free/php/php-cli.ini

  620   8483 php           -1 /bin//php.ini
  620   8483 php           -1 /usr/nagios/site/my_syb_client/OCS-12_5/config/ocs.cfg
  620   8483 php           -1 //getinfo.php
  620   8483 php           -1 /soft/sun_free/php/lib/php/getinfo.php
  620   8483 php           -1 /soft/sun_free/adodb/getinfo.php
  620   8483 php           -1 //recuperation_seuil.php
  620   8483 php           -1 /soft/sun_free/php/lib/php/recuperation_seuil.php
  620   8483 php           -1 /soft/sun_free/adodb/recuperation_seuil.php
  620   8483 php           -1 //mysql_connect.php
  620   8483 php           -1 /soft/sun_free/php/lib/php/mysql_connect.php
  620   8483 php           -1 /soft/sun_free/adodb/mysql_connect.php
  620   8483 php           -1 //sybase_check_appli.php
  620   8483 php           -1 /soft/sun_free/php/lib/php/sybase_check_appli.php
  620   8483 php           -1 /soft/sun_free/adodb/sybase_check_appli.php
  620   8483 php           -1 //sybase_check_system.php
  620   8483 php           -1 /soft/sun_free/php/lib/php/sybase_check_system.php
  620   8483 php           -1 /soft/sun_free/adodb/sybase_check_system.php
  620   8483 php           -1 //mysql_connect.php
  620   8483 php           -1 /soft/sun_free/php/lib/php/mysql_connect.php
  620   8483 php           -1 /soft/sun_free/adodb/mysql_connect.php
  620   8487 php           -1 /var/ld/ld.config
  620   8487 php           -1 /bin//php-cli.ini
  620   8487 php           -1 /soft/sun_free/php/php-cli.ini
  620   8487 php           -1 /bin//php.ini
[...]

Well, DTrace helped us to see where the excessive CPU consumption came from, along with telling us what are the roots for this behavior: connections to Sybase RDBMS, and bad search paths. So, I now can go and see the developers to provide them with these useful informations to debug their Nagios agents.

Thursday 10 May 2007

Managing System and Process Core Dumps Generation

Core Dump Management on the Solaris OS and Using the Solaris coreadm utility to control core file generation are clearly two useful recent writings about knowing how to enable and configure process and system core dumps on a Solaris system.

You will learn what are SIGSEGV and SIGBUS signals, and the role they are playing in core generation. You will know how to easily alter the current process and system configuration files (respectively coreadm.conf, and dumpadm.conf) using appropriate system commands (respectively coreadm(1M), and dumpadm(1M)). In the same time, you will learn some basics about how to extract and interpret core files content. Then, you will find some tips on how a system dump can be voluntarily generated on both UltraSPARC and x86 platforms. Last, Matty will show us that you can even set process core dump configuration to log to the syslog facility. Very nice!

Sunday 6 May 2007

Using LDAP Profile for Storing Client Informations

One very interesting feature available in the LDAP client implementation found on the Solaris platform is the capacity to define most of the characteristic of a client specification in a profile, which can itself be stored in an LDAP directory.

This way, you can easily change, deploy and modify common informations for lots of clients. You can even change the duration of these informations, using a Time To Live parameter (such as the TTL found in DNS systems). This can be handy in the case of the push of new or updated settings.

For example, you can easily define a Solaris system as a LDAP client with all of relevant characteristics in the workstation profile:

# ldapclient init \
 -a profileName=workstation \
 -a proxyDN=cn=proxyagent,ou=profile,dc=example,dc=com \
 -a proxyPassword=proxypassword \
 -a domainName=dc=example,dc=com \
 xx.xx.xx.xx

Tuesday 3 April 2007

Notes About Interesting ps(1b) Behaviour

I encounter a strange behavior using /usr/ucb/ps command recently. Normally, the COMMAND header of the output of the ps(1b) is truncated to the size of a default terminal. Adding one w flag can grow this output to 132 characters. Adding a second w flag doesn't truncate the output at all, reading the complete arguments list in the /proc/${PID}/as special file; in fact, a representation of the address space for the corresponding process. The restriction that apply here is the ability to read this file, generally the owner of the running process only.

Interestingly, at least on Solaris 8 and 9, you can print untruncated arguments list--even if you didn't have the the right to read the content of the address space special file--if the SUNWscpux package is installed. This package seems not delivered with Solaris 10.

As I remember, this package is a prerequisite for 64-bit architecture platform or you will see error messages such as Data Type Too Large if executed without it installed, but it seems to be able to behave in a very different manner as I expected.

Don't known what to think about this right now. Any though?

Thursday 22 March 2007

Patching an x86 Miniroot Image for the Solaris OS

Generally speaking, BigAdmin is a great and valuable source for Sun's systems administrators. Here is an awesome article describing how to patch (update) the kernel used during an installation or system upgrade process, known as miniroot, for x86 based Solaris platform.

At work, we precisely encounter a bug between Solaris 6/06 and the provided nVidia driver which prevents jumpstarting it on a Sun Fire X4100 M2 Server. The support team said we can apply specific patches, already present in Solaris 11/06 at that time. Because we don't really known the exact procedure to follow to update the miniroot accordingly, and because these machines must be provisioned very quickly, we doesn't investigate much on that way (ending installing them with DVD-ROMs, in servers room). Now, after reading the proposed article, we will certainly take the time to do so... if we know how to get the proper bundle of patches to correct our bug.

- page 2 of 5 -