blog'o thnet

To content | To menu | To search

Sunday 17 April 2011

Replacing A Failed Drive In A SVM Configuration

Here is a roughly step-by-step procedure in case of a failed drive which is part of a Solaris Volume Manager mirror configuration (RAID-1), and how to replace it while the system is up and running.

Here are the kind of messages reported by the operating system:

# grep md_mirror /var/adm/messages
/var/adm/messages:Mar 30 15:41:36 beastie md_mirror: [ID 104909 kern.warning] WARNING: md: d17: /dev/dsk/c1t0d0s7 needs maintenance
/var/adm/messages:Mar 30 15:41:36 beastie md_mirror: [ID 104909 kern.warning] WARNING: md: d11: /dev/dsk/c1t0d0s1 needs maintenance
/var/adm/messages:Mar 30 15:41:36 beastie md_mirror: [ID 976326 kern.warning] WARNING: md d5: open error on /dev/dsk/c1t0d0s5
/var/adm/messages:Mar 30 15:41:36 beastie md_mirror: [ID 976326 kern.warning] WARNING: md d1: open error on /dev/dsk/c1t0d0s1
/var/adm/messages:Mar 30 15:41:36 beastie md_mirror: [ID 104909 kern.warning] WARNING: md: d10: /dev/dsk/c1t0d0s0 needs maintenance
/var/adm/messages:Mar 30 15:41:36 beastie md_mirror: [ID 104909 kern.warning] WARNING: md: d13: /dev/dsk/c1t0d0s3 needs maintenance
/var/adm/messages:Mar 30 15:41:37 beastie md_mirror: [ID 976326 kern.warning] WARNING: md d100: open error on /dev/dsk/c1t0d0s0
/var/adm/messages:Mar 30 15:41:37 beastie md_mirror: [ID 104909 kern.warning] WARNING: md: d14: /dev/dsk/c1t0d0s4 needs maintenance
/var/adm/messages:Mar 30 15:41:37 beastie md_mirror: [ID 976326 kern.warning] WARNING: md d103: open error on /dev/dsk/c1t0d0s3
/var/adm/messages:Mar 30 15:41:37 beastie md_mirror: [ID 976326 kern.warning] WARNING: md d104: open error on /dev/dsk/c1t0d0s4

Figure out the SVM configuration layout:

# metastat -c
d78              p  300MB d7
d72              p 1018MB d7
d76              p  100MB d7
d75              p  500MB d7
d74              p   50GB d7
d73              p  200MB d7
d77              p  256MB d7
d71              p  250MB d7
    d7           m  119GB d17 (maint) d27
        d17      s  119GB c1t0d0s7 (maint)
        d27      s  119GB c1t1d0s7
d104             m  2.4GB d24 d14 (maint)
    d24          s  2.4GB c1t1d0s4
    d14          s  2.4GB c1t0d0s4 (maint)
d103             m  2.0GB d23 d13 (maint)
    d23          s  2.0GB c1t1d0s3
    d13          s  2.0GB c1t0d0s3 (maint)
d100             m  4.9GB d20 d10 (maint)
    d20          s  4.9GB c1t1d0s0
    d10          s  4.9GB c1t0d0s0 (maint)
d5               m  4.0GB d15 (unavail) d25
    d15          s  4.0GB c1t0d0s5 (-)
    d25          s  4.0GB c1t1d0s5
d1               m  3.9GB d11 (maint) d21
    d11          s  3.9GB c1t0d0s1 (maint)
    d21          s  3.9GB c1t1d0s1

As we can see, the failed disk is reported as drive not to be available anymore:

# echo | format
[...]
AVAILABLE DISK SELECTIONS:
       0. c1t0d0 
          /pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w500000e012ab94d1,0
       1. c1t1d0 
          /pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w500000e012a66aa1,0
[...]

Find more information about the failed drive, such as type of errors, serial number, WWNN of the drive, etc.:

# cfgadm -alv c1
Ap_Id                          Receptacle   Occupant     Condition  Information
When         Type         Busy     Phys_Id
c1                             connected    configured   unknown
unavailable  fc-private   n        /devices/pci@9,600000/SUNW,qlc@2/fp@0,0:fc
c1::500000e012a66aa1           connected    configured   unknown    FUJITSU MAX3147FCSUN146G
unavailable  disk         y        /devices/pci@9,600000/SUNW,qlc@2/fp@0,0:fc::500000e012a66aa1
c1::500000e012ab94d1           connected    configured   failed     FUJITSU MAX3147FCSUN146G
unavailable  disk         y        /devices/pci@9,600000/SUNW,qlc@2/fp@0,0:fc::500000e012ab94d1
# iostat -En
[...]
c1t1d0           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: FUJITSU  Product: MAX3147FCSUN146G Revision: 1103 Serial No: 0634G021R2
Size: 146.81GB <146810536448 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 1 Predictive Failure Analysis: 0
c1t0d0           Soft Errors: 0 Hard Errors: 1 Transport Errors: 73
Vendor: FUJITSU  Product: MAX3147FCSUN146G Revision: 1103 Serial No: 0634G023LB
Size: 146.81GB <146810536448 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 1 Predictive Failure Analysis: 0
[...]

Well, first clear the metadb configuration by removing references to the bad disk drive:

# metadb
    flags           first blk       block count
  Wm  p  l          16            8192         /dev/dsk/c1t0d0s6
  W   p  l          8208          8192         /dev/dsk/c1t0d0s6
  W   p  l          16400         8192         /dev/dsk/c1t0d0s6
 a    p  luo        16            8192         /dev/dsk/c1t1d0s6
 a    p  luo        8208          8192         /dev/dsk/c1t1d0s6
 a    p  luo        16400         8192         /dev/dsk/c1t1d0s6
# metadb -d c1t0d0s6
# metadb
    flags           first blk       block count
 a    p  luo        16            8192         /dev/dsk/c1t1d0s6
 a    p  luo        8208          8192         /dev/dsk/c1t1d0s6
 a    p  luo        16400         8192         /dev/dsk/c1t1d0s6

Since the disk is completely gone, the proper way to remove a FC drive didn't work as expected:

# luxadm remove_device /dev/rdsk/c1t0d0s2

 WARNING!!! Please ensure that no filesystems are mounted on these device(s).
 All data on these devices should have been backed up.

 Error: SCSI failure. - /dev/rdsk/c1t0d0s2.

So, let's go by physically replacing the failed drive. Here is the output of the hardware events on the system's console:

# dmesg
[...]
Mar 31 18:06:17 beastie picld[152]: [ID 222282 daemon.error] Fault detected: DISK0
Mar 31 18:06:18 beastie qlc: [ID 630585 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Mar 31 18:06:18 beastie qlc: [ID 630585 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Mar 31 18:06:18 beastie fctl: [ID 517869 kern.warning] WARNING: fp(3)::fp_plogi_intr: fp 1 pd ef
Mar 31 18:06:19 beastie scsi: [ID 107833 kern.warning] WARNING: /pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w500000e012a66aa1,0 (ssd0):
Mar 31 18:06:19 beastie        Error for Command: write(10)               Error Level: Retryable
Mar 31 18:06:19 beastie scsi: [ID 107833 kern.notice]  Requested Block: 37369856                  Error Block: 37369856
Mar 31 18:06:19 beastie scsi: [ID 107833 kern.notice]  Vendor: FUJITSU                            Serial Number: 0634G021R2
Mar 31 18:06:19 beastie scsi: [ID 107833 kern.notice]  Sense Key: Unit Attention
Mar 31 18:06:19 beastie scsi: [ID 107833 kern.notice]  ASC: 0x29 (bus device reset message occurred), ASCQ: 0x3, FRU: 0x0
Mar 31 18:06:37 beastie scsi: [ID 799468 kern.info] ssd144 at fp3: name w500000e0125c4531,0, bus address ef
Mar 31 18:06:37 beastie genunix: [ID 936769 kern.info] ssd144 is /pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w500000e0125c4531,0
Mar 31 18:06:37 beastie genunix: [ID 408114 kern.info] /pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w500000e0125c4531,0 (ssd144) online
Mar 31 18:06:52 beastie scsi: [ID 107833 kern.warning] WARNING: /pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w500000e012ab94d1,0 (ssd3):
Mar 31 18:06:52 beastie        drive offline
[...]
Mar 31 18:07:31 beastie picld[152]: [ID 691918 daemon.error] FSP_GEN_FAULT_LED has turned ON
Mar 31 18:07:43 beastie picld[152]: [ID 861866 daemon.error] Notice: DISK0 okay
Mar 31 18:07:44 beastie picld[152]: [ID 114988 daemon.error] FSP_GEN_FAULT_LED has turned OFF
[...]

If necessary (if not done automatically), recreate and eventually clean the public interface from the /dev subtree, and verify the new drive is properly managed by the operating system:

# devfsadm -Cv
[...]
# echo | format
AVAILABLE DISK SELECTIONS:
       0. c1t0d0 
          /pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w500000e0125c4531,0
       1. c1t1d0 
          /pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w500000e012a66aa1,0
[...]

So, now create a proper VTOC on the new disk, and propagate the metadb configuration on it as before:

# prtvtoc /dev/rdsk/c1t1d0s2 | fmthard -s - /dev/rdsk/c1t0d0s2
fmthard:  New volume table of contents now in place.
# metadb -a -c 3 c1t0d0s6
# metadb
    flags           first blk       block count
 a        u         16            8192         /dev/dsk/c1t0d0s6
 a        u         8208          8192         /dev/dsk/c1t0d0s6
 a        u         16400         8192         /dev/dsk/c1t0d0s6
 a    p  luo        16            8192         /dev/dsk/c1t1d0s6
 a    p  luo        8208          8192         /dev/dsk/c1t1d0s6
 a    p  luo        16400         8192         /dev/dsk/c1t1d0s6

Then, just replace the new drive as if it was the old one in the SVM configuration, and let the mirror reconstruct itself automatically:

# metareplace -e d104 c1t0d0s4
d104: device c1t0d0s4 is replaced with c1t0d0s4
[...]
# metareplace -e d5 c1t0d0s5
d5: device c1t0d0s5 is replaced with c1t0d0s5
# metastat -c
d78              p  300MB d7
d72              p 1018MB d7
d76              p  100MB d7
d75              p  500MB d7
d74              p   50GB d7
d73              p  200MB d7
d77              p  256MB d7
d71              p  250MB d7
    d7           m  119GB d17 (resync-0%) d27
        d17      s  119GB c1t0d0s7 (resyncing)
        d27      s  119GB c1t1d0s7
d104             m  2.4GB d24 d14 (resync-41%)
    d24          s  2.4GB c1t1d0s4
    d14          s  2.4GB c1t0d0s4 (resyncing)
d103             m  2.0GB d23 d13 (resync-28%)
    d23          s  2.0GB c1t1d0s3
    d13          s  2.0GB c1t0d0s3 (resyncing)
d100             m  4.9GB d20 d10 (resync-10%)
    d20          s  4.9GB c1t1d0s0
    d10          s  4.9GB c1t0d0s0 (resyncing)
d5               m  4.0GB d15 (resync-6%) d25
    d15          s  4.0GB c1t0d0s5 (resyncing)
    d25          s  4.0GB c1t1d0s5
d1               m  3.9GB d11 (resync-13%) d21
    d11          s  3.9GB c1t0d0s1 (resyncing)
    d21          s  3.9GB c1t1d0s1

You are done.

Sunday 13 March 2011

Customized Solaris installation and patching experience

I recently faced a curious problem when trying to patch an Alternate Boot Environment created with Live Upgrade on Solaris 10. Although I initially though it was a LU problem, the solution is finally related to the patches to be applied and the way a Solaris is installed.

Assuming the ABE is named s10u9, I first tried to apply the Critical Patch Updates the new way, i.e. through a switch to the installcluster script , which quickly failed like this:

# cd /net/jumpstart/export/media/patch/cpu/10_Recommended_CPU_2010-10
# ./installcluster --apply-prereq --s10cluster
[...]
# ./installcluster -B s10u9 --s10cluster
ERROR: Patch set cannot be installed from a live boot environment without zones
       support, to a target boot environment that has zones support.

So, I tried to apply the patches using the luupgrade command, but it failed with a very similar message:

# ptime luupgrade -t -n s10u9 -s /net/jumpstart/export/media/patch/cpu/10_Recommended_CPU_2010-10/patches `cat patch_order`
Validating the contents of the media .
The media contains 198 software patches that can be added.
All 198 patches will be added because you did not specify any specific patches to add.
Mounting the BE .
ERROR: The boot environment  supports non-global zones. The current boot
environment does not support non-global zones. Releases prior to Solaris 10 cannot be
used to maintain Solaris 10 and later releases that include support for non-global zones.
You may only execute the specified operation on a system with Solaris 10 (or later)
installed.

The fact is, the Primary Boot Environment is a Solaris 10 installation. So, why complaining that the PBE is an older release? Looking on OTN discussion forums and in the README file which came with the Critical Patch Updates release, there is a known bug which can end this way. This will occur when /etc/zones/index in the inactive boot environment has an incorrect setting for the state for the global zone. The correct setting is installed. So, get check this one:

# lumount -n s10u9
/.alt.s10u9
# grep "^global:configured:" /.alt.s10u9/etc/zones/index
# luumount -n s10u9

So no luck here. But wait: if the PBE is a customized Solaris 10 installation, it may be that the installed packages missed the Zone feature, which seems to be mandatory by installcluster or liveupgrade -t to figure out if the PBE is a proper (usable) Solaris 10 installation. So, I just installed the missing packages from the install media...

# mount -r -F hsfs `lofiadm -a /net/jumpstart/export/media/iso/sol-10-u9-ga-sparc-dvd.iso` /mnt
# pkginfo -d /mnt/Solaris_10/Product | nawk '$2 ~ /zone/ || $2 ~ /pool$/ {print $0}'
application SUNWluzone                       Live Upgrade (zones support)
system      SUNWpool                         Resource Pools
system      SUNWzoner                        Solaris Zones (Root)
system      SUNWzoneu                        Solaris Zones (Usr)
# yes | pkgadd -d /mnt/Solaris_10/Product SUNWluzone SUNWzoner SUNWzoneu SUNWpool
[...]
# umount /mnt
# lofiadm -d /dev/lofi/1

... and this must be OK right now:

# ./installcluster -B s10u9 --s10cluster
Setup ...
CPU OS Cluster 2010/10 Solaris 10 SPARC (2010.10.06)
Application of patches started : 2011.02.07 11:17:08

Applying 120900-04 (  1 of 198) ... skipped
[...]
Installation of patch set to alternate boot environment complete.

Please remember to activate boot environment s10u9 with luactivate(1M)
before rebooting.
Install log files written :
  /.alt.s10u9/var/sadm/install_data/s10s_rec_cluster_short_2011.02.07_11.17.08.log
  /.alt.s10u9/var/sadm/install_data/s10s_rec_cluster_verbose_2011.02.07_11.17.08.log

And it is... The question is, why is the Zone feature necessary and mandatory in this case?

Saturday 19 February 2011

Solaris 11 Express: Problem #6

In this series, I will report the bugs or problems I find when running the Oracle Solaris 11 Express distribution. I hope this will give more visibility on those PR to Oracle to correct them before the release of Solaris 11 next year.

Well, this one is not a really a bug per-se, but it is really annoying nonetheless. The new interface where most network things are managed through dladm and ipadm is really interesting because this leads to a single point of management for the stack IP. Most things works pretty well, unless setting the mode and speed of an interface, were we always need to use the old ndd command, or the system kernel configuration file, or set parameters in the specific interface configuration file, as in old days on old Solaris releases.

Here is an example of such annoying behavior when hard setting a mode and speed for an interface is required:

# dladm show-link
LINK        CLASS     MTU    STATE    BRIDGE     OVER
eri0        phys      1500   up       --         --

# dladm show-phys
LINK         MEDIA                STATE      SPEED  DUPLEX    DEVICE
eri0         Ethernet             up         100    half      eri0

# dladm show-linkprop -p duplex eri0
LINK         PROPERTY        PERM VALUE          DEFAULT        POSSIBLE
eri0         duplex          r-   full           full           half,full

As you can see, the auto-negotiation leads to an unusual 100Mbps half-duplex mode. And because the associated link property is read-only, the only way to put the link at 100Mbps full-duplex is to set it another way as before Solaris 11:

# ndd -set /dev/eri0 adv_100fdx_cap 1
# ndd -set /dev/eri0 adv_autoneg_cap 0

# cat << EOF >> /etc/system
set eri:adv_autoneg_cap=0
set eri:adv_100fdx_cap=1
set eri:adv_100hdx_cap=0
set eri:adv_10fdx_cap=0
set eri:adv_10hdx_cap=0
EOF

As I said previously it is not a bug, but is is really a pity it not possible to use the new way to set these parameters.

Tuesday 1 February 2011

Solaris 11 Express: Problem #5

In this series, I will report the bugs or problems I find when running the Oracle Solaris 11 Express distribution. I hope this will give more visibility on those PR to Oracle to correct them before the release of Solaris 11 next year.

Some builds before Oracle decided not to provide a binary distribution of Solaris Next anymore (build snv_124 if I recall correctly), virtual consoles were introduced. This a well-known feature for the Linux and BSD people, but Solaris 11 Express is the first supported release of Solaris where we can run multiple virtual terminals on the console.

This long-time missing feature is not enable by default though. Here are the steps to do so:

$ pfexec svccfg -s vtdaemon setprop options/secure=false
$ pfexec svccfg -s vtdaemon setprop options/hotkeys=true
$ pfexec svcadm enable vtdaemon
$ pfexec svcadm enable console-login:vt2
$ pfexec svcadm enable console-login:vt3
$ pfexec svcadm enable console-login:vt4
$ pfexec svcadm enable console-login:vt5
$ pfexec svcadm enable console-login:vt6

Using Control-Alt-F1 to Control-Alt-F6, one can now switch to virtual consoles. And using Control-Alt-F7, one can switch back to the X server. Note: not setting the options/secure property to false will automatically lock the X server screen.

Although switching to virtual consoles works as expected, getting back to the X server is not really easy. From my experience, with the options/secure property set to true, using Control-Alt-F7 get me to a new login prompt, bypassing my (always?) logged-in session. With the options/secure property set to false, using Control-Alt-F7 leave me with a black screen and a blinking _ cursor... and nothing what I can do without a remote access.

FYI, this problem is covered by the Bug ID number 7001741. Note that you can add yourself to the interest list at the bottom of the bug report page:

Monday 17 January 2011

Solaris 11 Express: Problem #4

In this series, I will report the bugs or problems I find when running the Oracle Solaris 11 Express distribution. I hope this will give more visibility on those PR to Oracle to correct them before the release of Solaris 11 next year.

Here is one way you can use to create a local copy of an Oracle Solaris 11 Express IPS package repository. This is based on the Oracle document on this exact purpose, and the README file provided as part of the ISO media which provides the entire IPS package repository.

The dpool/store/data is a compressed ZFS dataset. Here, we duplicate the content of the Oracle Solaris 11 Express IPS package repository from the ISO media. Then, the pkg/server SMF is enabled.

# mkdir /dpool/store/data/repo_support
# mount -F hsfs `lofiadm -a /dpool/store/iso/sol-11-exp-201011-repo-full.iso` /mnt
# rsync -aP /mnt/repo /dpool/store/data/repo_support
# umount /mnt
# lofiadm -d /dev/lofi/1
# svccfg -s pkg/server setprop pkg/inst_root=/dpool/store/data/repo_support/repo
# svcadm refresh pkg/server
# svcadm enable pkg/server

Last, simply reset the origin for the solaris publisher URI, and verify the new configuration:

# pkg set-publisher -G http://pkg.oracle.com/solaris/release -g http://unic.thilelli.net/ solaris
# pkg publisher
PUBLISHER                             TYPE     STATUS   URI
solaris                  (preferred)  origin   online   http://unic.thilelli.net/

Then, try some listing and searching with the new local repository:

# pkg list -a | grep constructor
install/distribution-constructor              0.5.11-0.151.0.1 installed  -----
# pkg search constructor
pkg: Some repositories failed to respond appropriately:
solaris:
http protocol error: code: 503 reason: Service Unavailable
URL: 'http://unic.thilelli.net/solaris/search/1/False_2_None_None_%3A%3A%3Aconstructor'

Ouch, the newly created repository is unsearchable... And in fact there is no index directory provided in the /dpool/store/data/repo_support/repo/publisher/solaris sub-directory. So, in order to create one we need to force a refresh of the repository.

# svccfg -s pkg/server setprop pkg/readonly=false
# svcadm refresh pkg/server
# pkgrepo refresh -s http://unic.thilelli.net/
Repository refresh initiated.

But just after initiating the refresh of the repository, I started seeing a very high load on the disk on which the dpool is built, and which hosts now the local repository.

# iostat -xnz 2
[...]
                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
   92.0    0.0  195.0    0.0  3.1  1.0   33.3   10.8  89 100 c9t1d0

So, what is causing this huge I/O workload? Fire iosnoop, and iotop from the DTraceToolkit to observe the activity, and find who is doing these I/O:

# /opt/DTT/Bin/iosnoop
^C
    0   747 R 261342101   3072 pkg.depotd 
    0   747 R 266583819   3072 pkg.depotd 
    0   747 R 55676891    1536 pkg.depotd 
    0   747 R 260418077   2048 pkg.depotd 

# /opt/DTT/Bin/iotop
Tracing... Please wait.

2011 Jan 10 19:26:36,  load: 0.58,  disk_r:    910 KB,  disk_w:  12667 KB

  UID    PID   PPID CMD              DEVICE  MAJ MIN D            BYTES
    0      0      0                  sd3      94 192               5632
    0      5      0 zpool-rpool      sd2      94 128 W           215552
    0    312      0 zpool-dpool      sd3      94 192 R           401408
    0    747      9 pkg.depotd       sd3      94 192 R           530944
    0    312      0 zpool-dpool      sd3      94 192 W         12755968

Ok, so it seems that the local repository is the culprit. How is the process and its LWP working at this same time?

# prstat -cZmL -p `pgrep pkg.depotd` 2
Please wait...
[...]
   PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID 
   747 root     0.1 0.6 0.0 0.0 0.0 0.0  99 0.0 118   2 254   0 pkg.depotd/2
   747 root     0.1 0.0 0.0 0.0 0.0 0.0 100 0.0 103   0 103   0 pkg.depotd/3
   747 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0  80   1  48   0 pkg.depotd/1
   747 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0  10   0   5   0 pkg.depotd/4
   747 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0 pkg.depotd/15
   747 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0 pkg.depotd/14
   747 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0 pkg.depotd/13
   747 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0 pkg.depotd/12
   747 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0 pkg.depotd/11
   747 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0 pkg.depotd/10
   747 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0 pkg.depotd/9
   747 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0 pkg.depotd/8
   747 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0 pkg.depotd/7
   747 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0 pkg.depotd/6
   747 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   0   0   0   0 pkg.depotd/5
ZONEID     NLWP  SWAP   RSS MEMORY      TIME  CPU ZONE                        
     0       64  635M  736M    36%   0:01:22 0.4% global                      
Total: 1 processes, 64 lwps, load averages: 0.29, 0.31, 0.25

Ok, the LWP #2 of the pkg.depotd program seems hot here. What is the content of the running stack trace for this particular LWP?

# pstack 747/2
747:    /usr/bin/python2.6 /usr/lib/pkg.depotd --cfg svc:/application/pkg/serv
-----------------  lwp# 2 / thread# 2  --------------------
[...]
   [ /usr/lib/python2.6/vendor-packages/pkg/indexer.py:692 (_generic_update_index) ]
 fed57e48 PyEval_EvalCodeEx (87c0530, 87bb8ac, 0, 8bc2d7c, 4, 8bc2d8c) + 91c
 fed594f6 fast_function (87e5df4, fcf1de3c, 4, 4, 0, fef70000) + 176
 fed58fea call_function (fcf1de3c, 3, 629a7d59, fed5337e) + ee
 fed56399 PyEval_EvalFrameEx (8bc2c34, 0, 87bb8ac, 0) + 3029
   [ /usr/lib/python2.6/vendor-packages/pkg/indexer.py:790 (server_update_index) ]
 fed57e48 PyEval_EvalCodeEx (87c05c0, 87bb8ac, 0, 8bc2bfc, 2, 8bc2c04) + 91c
 fed594f6 fast_function (87e5e64, fcf1e00c, 2, 2, 0, fedcc7f4) + 176
 fed58fea call_function (fcf1e00c, 1, 88e7a3ae, fed5337e) + ee
 fed56399 PyEval_EvalFrameEx (8bc2ab4, 0, 880a604, 0) + 3029
   [ /usr/lib/python2.6/vendor-packages/pkg/server/repository.py:1003 (__update_searchdb_unlocked) ]
 fed59488 fast_function (8898a04, fcf1e14c, 2, 2, 0, fedcc7f4) + 108
 fed58fea call_function (fcf1e14c, 1, 88e7a3ae, fed5337e) + ee
 fed56399 PyEval_EvalFrameEx (875ca24, 0, 880a604, 0) + 3029
   [ /usr/lib/python2.6/vendor-packages/pkg/server/repository.py:1431 (__run_update_index) ]
 fed59488 fast_function (8898ed4, fcf1e28c, 1, 1, 0, fedcc7f4) + 108
 fed58fea call_function (fcf1e28c, 0, 88e7a3ae, fed5337e) + ee
 fed56399 PyEval_EvalFrameEx (8a58904, 0, 880a604, 0) + 3029
   [ /usr/lib/python2.6/vendor-packages/pkg/server/repository.py:687 (__refresh_index) ]
 fed59488 fast_function (889880c, fcf1e3cc, 1, 1, 0, fedcc7f4) + 108
 fed58fea call_function (fcf1e3cc, 0, 55, 0) + ee
 fed56399 PyEval_EvalFrameEx (8a5876c, 0, 880a604, 0) + 3029
[...]

It is now pretty clear that refreshing the repository is at the heart of the the load. When updating the index went well, it will last for five to ten minutes, and the log will look like this:

# tail -f /var/svc/log/application-pkg-server:default.log
  [04/Nov/2008:09:46:45] ENGINE Listening for SIGTERM.
  [04/Nov/2008:09:46:45] ENGINE Listening for SIGUSR1.
  [04/Nov/2008:09:46:45] ENGINE Bus STARTING
  [04/Nov/2008:09:46:45] ENGINE Started monitor thread '_TimeoutMonitor'.
  [04/Nov/2008:09:46:46] ENGINE Serving on 0.0.0.0:80
  [04/Nov/2008:09:46:46] ENGINE Bus STARTED
  [04/Nov/2008:09:46:51] INDEX Search Available
  [04/Nov/2008:09:46:56] INDEX Updating search indexes
  [04/Nov/2008:09:50:15] INDEX Search indexes updated and available.

The question is why is it so hot, and especially so long with this repository? The update generate more than 1.5 billion of files under the index/TMP directory before dying, and trying again... indefinitely:

# tail -f /var/svc/log/application-pkg-server:default.log
[...]
[ Jan 12 04:16:10 Stopping because all processes in service exited. ]
[ Jan 12 04:16:18 Executing start method ("//lib/svc/method/svc-pkg-depot start"). ]
Dropping net_privaddr privilege.
ppriv -s A=basic,-file_link_any,-proc_info,-proc_session,net_privaddr -e /usr/lib/pkg.depotd --cfg svc:/application/pkg/server:default
[12/Jan/2011:04:16:26] ENGINE Listening for SIGHUP.
[12/Jan/2011:04:16:26] ENGINE Listening for SIGTERM.
[12/Jan/2011:04:16:26] ENGINE Listening for SIGUSR1.
[12/Jan/2011:04:16:26] ENGINE Bus STARTING
[12/Jan/2011:04:16:26] INDEX Checking for updated package data.
[12/Jan/2011:04:16:26] ENGINE Started monitor thread '_TimeoutMonitor'.
[12/Jan/2011:04:16:27] ENGINE Serving on 0.0.0.0:80
[12/Jan/2011:04:16:27] ENGINE Bus STARTED
[12/Jan/2011:04:16:27] INDEX Updating search indexes
[ Jan 12 07:36:09 Stopping because all processes in service exited. ]
[ Jan 12 07:36:17 Executing start method ("//lib/svc/method/svc-pkg-depot start"). ]
Dropping net_privaddr privilege.
ppriv -s A=basic,-file_link_any,-proc_info,-proc_session,net_privaddr -e /usr/lib/pkg.depotd --cfg svc:/application/pkg/server:default
[12/Jan/2011:07:36:28] ENGINE Listening for SIGHUP.
[12/Jan/2011:07:36:28] ENGINE Listening for SIGTERM.
[12/Jan/2011:07:36:28] ENGINE Listening for SIGUSR1.
[12/Jan/2011:07:36:28] ENGINE Bus STARTING
[12/Jan/2011:07:36:28] INDEX Checking for updated package data.
[12/Jan/2011:07:36:28] ENGINE Started monitor thread '_TimeoutMonitor'.
[12/Jan/2011:07:36:28] ENGINE Serving on 0.0.0.0:80
[12/Jan/2011:07:36:28] ENGINE Bus STARTED
[12/Jan/2011:07:36:29] INDEX Updating search indexes

So, I am asking me why the index is not provided as part of the ISO media... Nonetheless, the problem is that I can't now enable the pkg/server SMF service anymore without pkg.depotd trying to update the index... Too bad. So, I decided to reinstall the depot from the beginning, but how can I create an index properly with this release from the ISO media?

Update #1 (2011-01-19): Well, I think I find a better way (the proper way?) to achieve this without problem: use the file path as an argument to the pkgrepo command.

# svccfg -s pkg/server setprop pkg/readonly=true
# svcadm refresh pkg/server
# svcadm restart pkg/server
# pkgrepo -s /dpool/store/data/repo_support/repo refresh
Repository refresh initiated.
# tail /var/svc/log/application-pkg-server\:default.log 
[...]
[17/Jan/2011:21:51:37]  pkg://solaris/SUNWdcopy@0.5.11,5.11-0.133:20101027T183617Z
[17/Jan/2011:21:51:37]  pkg://solaris/driver/x11/xsvc@0.5.11,5.11-0.151.0.1:20101104T233133Z
[17/Jan/2011:21:51:37]  pkg://solaris/SUNWdedis@0.5.11,5.11-0.130:20101027T183621Z
[17/Jan/2011:21:51:40] INDEX Checking for updated package data.
# pkg search -s http://unic.thilelli.net/ constructor
INDEX           ACTION VALUE                                                       PACKAGE
pkg.description set    Distribution Constructor libraries, commands and data files pkg:/install/distribution-constructor@0.5.11-0.151.0.1

Monday 10 January 2011

Solaris 11 Express: Problem #3

In this series, I will report the bugs or problems I find when running the Oracle Solaris 11 Express distribution. I hope this will give more visibility on those PR to Oracle to correct them before the release of Solaris 11 next year.

I recently switch from the official Oracle release repository to the support repository for Solaris 11 Express. Before the switch, one non-global zone was created. Since there were some updates to this repository, I pkg update'ed, rebooted to the new boot environment, and tried to update the non-global zone:

# beadm list                                                                         
BE        Active Mountpoint Space Policy Created          
--        ------ ---------- ----- ------ -------          
solaris   -      -          9.88M static 2010-12-01 09:32 
solaris-1 NR     /          5.44G static 2011-01-03 19:35 

# zoneadm list -vc                                                                   
  ID NAME             STATUS     PATH                           BRAND    IP    
   0 global           running    /                              ipkg     shared
   - zone1            installed  /dpool/store/zone/zone1        ipkg     shared

# zoneadm -z zone1 detach

# zoneadm -z zone1 attach -u
Log File: /var/tmp/zone1.attach_log.lfa49e
Attaching...

preferred global publisher: solaris
       Global zone version: entire@0.5.11,5.11-0.151.0.1.1:20101222T214417Z
   Non-Global zone version: entire@0.5.11,5.11-0.151.0.1:20101105T054056Z

                     Cache: Using /var/pkg/download.
  Updating non-global zone: Output follows
Creating Plan                          
ERROR: Could not update attaching zone
                    Result: Attach Failed.

# cat /var/tmp/zone1.attach_log.lfa49e
[Monday, January  3, 2011 08:42:24 PM CET] Log File: /var/tmp/zone1.attach_log.lfa49e
[Monday, January  3, 2011 08:42:25 PM CET] Attaching...
[Monday, January  3, 2011 08:42:25 PM CET] existing
[Monday, January  3, 2011 08:42:25 PM CET] 
[Monday, January  3, 2011 08:42:25 PM CET]   Sanity Check: Passed.  Looks like an OpenSolaris system.
[Monday, January  3, 2011 08:42:31 PM CET] preferred global publisher: solaris
[Monday, January  3, 2011 08:42:32 PM CET]        Global zone version: entire@0.5.11,5.11-0.151.0.1.1:20101222T214417Z
[Monday, January  3, 2011 08:42:32 PM CET]    Non-Global zone version: entire@0.5.11,5.11-0.151.0.1:20101105T054056Z

[Monday, January  3, 2011 08:42:32 PM CET]                      Cache: Using /var/pkg/download.
[Monday, January  3, 2011 08:42:32 PM CET]   Updating non-global zone: Output follows
pkg set-publisher: 
Unable to locate certificate '/dpool/store/zone/zone1/root/dpool/store/zone/zone1/root/var/pkg/ssl/Oracle_Solaris_11_Express_Support.certificate.pem' needed to access 'https://pkg.oracle.com/solaris/support/'.
pkg unset-publisher: 
Removal failed for 'za23954': The preferred publisher cannot be removed.

pkg: The following pattern(s) did not match any packages in the current catalog.
Try relaxing the pattern, refreshing and/or examining the catalogs:
        entire@0.5.11,5.11-0.151.0.1.1:20101222T214417Z
[Monday, January  3, 2011 08:44:04 PM CET] ERROR: Could not update attaching zone
[Monday, January  3, 2011 08:44:06 PM CET]                     Result: Attach Failed.

FYI, this problem was covered by the Bug ID number 13000, but is always present at this time, at least for Solaris 11 Express 2010.11.

So, it seems that the change of repository for the solaris publisher was not well managed by the non-global zone update mechanism. Just to be sure, I tried to create a new non-global zone in the new boot environment, but the problem exists in this case, too:

# zoneadm -z zone2 install
A ZFS file system has been created for this zone.
   Publisher: Using solaris (https://pkg.oracle.com/solaris/support/ ).
   Publisher: Using opensolaris.org (http://pkg.opensolaris.org/dev/).
       Image: Preparing at /dpool/store/zone/zone2/root.
 Credentials: Propagating Oracle_Solaris_11_Express_Support.key.pem
 Credentials: Propagating Oracle_Solaris_11_Express_Support.certificate.pem
Traceback (most recent call last):
  File "/usr/bin/pkg", line 4225, in handle_errors
    __ret = func(*args, **kwargs)
  File "/usr/bin/pkg", line 4156, in main_func
    ret = image_create(pargs)
  File "/usr/bin/pkg", line 3836, in image_create
    variants=variants, props=set_props)
  File "/usr/lib/python2.6/vendor-packages/pkg/client/api.py", line 3205, in image_create
    uri=origins[0])
TypeError: 'set' object does not support indexing

pkg: This is an internal error.  Please let the developers know about this
problem by filing a bug at http://defect.opensolaris.org and including the
above traceback and this message.  The version of pkg(5) is '052adf36c3f4'.
ERROR: failed to create image

FYI, this problem is covered by the Bug ID number 17653.

Well, no luck here. I didn't see a Solaris IPS update for these problems yet, which are very annoying, at least.

Update #1 (2011-02-03): The problem is now fixed in the latest support pkg repository. Install or attach a non-global ipkg branded zone works now as expected:

# zoneadm -z zone1 install
A ZFS file system has been created for this zone.
   Publisher: Using solaris (https://pkg.oracle.com/solaris/support/ ).
   Publisher: Using opensolaris.org (http://pkg.opensolaris.org/dev/).
   Publisher: Using sunfreeware (http://pkg.sunfreeware.com:9000/).
       Image: Preparing at /dpool/export/zone/zone1/root.
 Credentials: Propagating Oracle_Solaris_11_Express_Support.key.pem
 Credentials: Propagating Oracle_Solaris_11_Express_Support.certificate.pem
       Cache: Using /var/pkg/download. 
Sanity Check: Looking for 'entire' incorporation.
  Installing: Core System (output follows)
               Packages to install:     1
           Create boot environment:    No
[...]
        Note: Man pages can be obtained by installing SUNWman
 Postinstall: Copying SMF seed repository ... done.
 Postinstall: Applying workarounds.
        Done: Installation completed in 332.525 seconds.

  Next Steps: Boot the zone, then log into the zone console (zlogin -C)
              to complete the configuration process.

And:

# zoneadm -z zone1 detach
# zoneadm -z zone1 attach -u
Log File: /var/tmp/zone1.attach_log.PhaOKf
Attaching...

preferred global publisher: solaris
       Global zone version: entire@0.5.11,5.11-0.151.0.1.2:20110127T225841Z
   Non-Global zone version: entire@0.5.11,5.11-0.151.0.1.2:20110127T225841Z

                     Cache: Using /var/pkg/download.
  Updating non-global zone: Output follows
No updates necessary for this image.   
  Updating non-global zone: Zone updated.
                    Result: Attach Succeeded.
# zoneadm list -vc
  ID NAME             STATUS     PATH                           BRAND    IP    
   0 global           running    /                              ipkg     shared
   - zone1            installed  /dpool/export/zone/zone1       ipkg     shared

Monday 3 January 2011

Solaris 11 Express: Problem #2

In this series, I will report the bugs or problems I find when running the Oracle Solaris 11 Express distribution. I hope this will give more visibility on those PR to Oracle to correct them before the release of Solaris 11 next year.

For some customers, I had the habit to clone a non-global zone using a template zone. But in order to save some space, I generally use the capability to use a ZFS snapshot as input for the clone, avoiding creating a snapshot each time a new clone is created.

It seems that this capability is not usable anymore on Solaris 11 Express at this time:

# zoneadm -z zone3 clone -s dpool/store/zone/zone1/ROOT/zbe@zone2_snap zone1
/usr/lib/brand/ipkg/clone: -s: unknown option

Nevertheless, this functionality is always described in the manual page:

brand-specific usage: clone {sourcezone} usage: clone [-m method] [-s ] [brand-specific args] zonename Clone the installation of another zone. The -m option can be used to specify 'copy' which forces a copy of the source zone. The -s option can be used to specify the name of a ZFS snapshot that was taken from a previous clone command. The snapshot will be used as the source instead of creating a new ZFS snapshot. All other arguments are passed to the brand clone function; see brands(5) for more information.

No luck here. Even if the space consideration may be minimized by the deduplication feature of ZFS in Solaris 11 Express, it is not always appropriate nor usable: on small size server for example.

FYI, this problem is covered by the Bug ID number 6383119. Note that you can add yourself to the interest list at the bottom of the bug report page:

Sunday 19 December 2010

Solaris 11 Express: Problem #1

In this series, I will report the bugs or problems I find when running the Oracle Solaris 11 Express distribution. I hope this will give more visibility on those PR to Oracle to correct them before the release of Solaris 11 next year.

After a fresh installation of Solaris 11 Express from the LiveUSB media, and with the default set of packages provided by this media (i.e. the slim_install profile), I just fell on the following error message from a secure-shell login connection:

invalid UTF-8 sequence: Cannot convert UTF-8 strings to the local codeset

In fact, this problem is not new and was caused by the integration of the locale fix number 6740240 which added a dependency of the libssh against the locale framework. This generate error messages on systems where the iconv data for UTF-8 is not available. So, this cause no problem for the proper execution of SSH per-se, but I find this a little bit annoying on a fresh installed system. So, my point here is that the dependency is not properly managed from the installation of Solaris 11 Express.

FYI, this problem is covered by the Bug ID number 6872504. Note that you can add yourself to the interest list at the bottom of the bug report page:

Thursday 7 October 2010

Live Upgrading To Solaris 10 9/10

If you try to update to the latest Solaris 10 Update (U9), one new step is now required in order to be able to successfully luupgrade to the desired Update. As mentioned in the Oracle Solaris 10 9/10 Release Notes, a new Auto Registration mecanism has been added to this release to facilitate registering the system using your Oracle support credentials.

So, if you try the classical luupgrade following incantation, it will fail with the reported message:

# luupgrade -u -n s10u9 -s /mnt -j /var/tmp/profile
System has findroot enabled GRUB
No entry for BE  in GRUB menu
Copying failsafe kernel from media.
61364 blocks
miniroot filesystem is 
Mounting miniroot at 
ERROR: The auto registration file <> does not exist or incomplete.
       The auto registration file is mandatory for this upgrade.
       Use -k  argument along with luupgrade command.

So, you now need to set the Auto Registration choice as a mandatory parameter. Here is how it resembles right now:

# echo "auto_reg=disable" > /var/tmp/sysidcfg
# luupgrade -u -n s10u9 -s /mnt -j /var/tmp/profile -k /var/tmp/sysidcfg
System has findroot enabled GRUB
No entry for BE  in GRUB menu
Copying failsafe kernel from media.
61364 blocks
miniroot filesystem is 
Mounting miniroot at 
#######################################################################
 NOTE: To improve products and services, Oracle Solaris communicates
 configuration data to Oracle after rebooting.

 You can register your version of Oracle Solaris to capture this data
 for your use, or the data is sent anonymously.

 For information about what configuration data is communicated and how
 to control this facility, see the Release Notes or
 www.oracle.com/goto/solarisautoreg.

 INFORMATION: After activated and booted into new BE ,
 Auto Registration happens automatically with the following Information

autoreg=disable
#######################################################################
Validating the contents of the media .
The media is a standard Solaris media.
The media contains an operating system upgrade image.
The media contains  version <10>.
Constructing upgrade profile to use.
Locating the operating system upgrade program.
Checking for existence of previously scheduled Live Upgrade requests.
Creating upgrade profile for BE .
Checking for GRUB menu on ABE .
Saving GRUB menu on ABE .
Checking for x86 boot partition on ABE.
Determining packages to install or upgrade for BE .
Performing the operating system upgrade of the BE .
CAUTION: Interrupting this process may leave the boot environment unstable
or unbootable.
Upgrading Solaris: 100% completed
Installation of the packages from this media is complete.
Restoring GRUB menu on ABE .
Updating package information on boot environment .
Package information successfully updated on boot environment .
Adding operating system patches to the BE .
The operating system patch installation is complete.
ABE boot partition backing deleted.
PBE GRUB has no capability information.
PBE GRUB has no versioning information.
ABE GRUB is newer than PBE GRUB. Updating GRUB.
GRUB update was successfull.
INFORMATION: The file  on boot
environment  contains a log of the upgrade operation.
INFORMATION: The file  on boot
environment  contains a log of cleanup operations required.
INFORMATION: Review the files listed above. Remember that all of the files
are located on boot environment . Before you activate boot
environment , determine if any additional system maintenance is
required or if additional media of the software distribution must be
installed.
The Solaris upgrade of the boot environment  is complete.
Creating miniroot device
Configuring failsafe for system.
Failsafe configuration is complete.
Installing failsafe
Failsafe install is complete.

Not sure this will ease the upgrade path to this Update, even if there is nothing really wrong with this. It may just have been less intrusive I think.

Sunday 26 September 2010

Deactivate The VMware HGFS File System

I recently faced a problem when our backup administrator was unable to browse remotely the root (/) file system on Solaris when the system was installed as a guest in a VMware ESX hypervisor. After digging around the system, I find that the Host-Guest File System made the HP DataProtector agent unable to stat the /hgfs pseudo-file system as can be seen in the /var/opt/omni/log/debug.log debug log file:

[...]
09/23/10 17:06:36  FSBRDA.11618.0 ["da/bda/solaris.c /main/blr_dp61/10":1324] A.06.11 b243
SolStatObj: /hgfs lstat failed! errno 5

Although it is not a bug per itself, installing the VMware Tools just enable the HGFS module independently of the virtualization stack: VMware ESX doesn't provide access to the Shared Folders facility, although the VMware Workstation does. So in my case, I can just disable it without loosing any useful functionality.

Since it may be advisable to have a configuration option at the VMware Tools level, I didn't find one. Some may argue that write a little script at the boot to unmount the /hgfs file system is good enough, I find painful and not very elegant. In fact, I prefer to disable at the kernel module level using the module's configuration file:

# cp -p /kernel/drv/vmhgfs.conf /kernel/drv/vmhgfs.conf.orig
/* Edit, and comment the vmhgfs line. */
# cat /kernel/drv/vmhgfs.conf
# name= parent="pseudo" instance=0;
#name="vmhgfs" parent="pseudo" instance=0;

Last, be sure to recreate the GRUB boot archive before rebooting the system, and all backup stuff went well again.

# bootadm update-archive
# shutdown -y -i 6 -g 0

- page 3 of 19 -