blog'o thnet

To content | To menu | To search

Tag - Zone

Entries feed

Thursday 22 March 2012

Problem with the beadm utility inside a Zone

Although beadm utility is now supported inside a non-global zone, I find a case where its behavior seems not works as expected. So, connected inside a Zone (say, myzone), I can create a new BE (say, solaris-1), activate it, and reboot on it.

ZG# zoneadm list -vc
  ID NAME             STATUS     PATH                           BRAND    IP
   0 global           running    /                              solaris  shared
   4 myzone           running    /zones/myzone                  solaris  excl

ZG# zlogin myZone
[Connected to zone 'myzone' pts/7]
Oracle Corporation      SunOS 5.11      11.0    February 2012

ZNG# beadm list
BE      Active Mountpoint Space   Policy Created
--      ------ ---------- -----   ------ -------
solaris NR     /          917.06M static 2012-03-21 14:04

ZNG# beadm create solaris-1

ZNG# beadm activate solaris-1

ZNG# beadm list
BE        Active Mountpoint Space   Policy Created
--        ------ ---------- -----   ------ -------
solaris   N      /          43.0K   static 2012-03-21 14:04
solaris-1 R      -          917.19M static 2012-03-21 16:48

ZNG# init 6
[Connection to zone 'myzone' pts/9 closed]

ZG# zlogin myzone
[Connected to zone 'myzone' pts/7]
Oracle Corporation      SunOS 5.11      11.0    February 2012

ZNG# beadm list
BE        Active Mountpoint Space   Policy Created
--        ------ ---------- -----   ------ -------
solaris   -      -          3.06M   static 2012-03-21 14:04
solaris-1 NR     /          979.46M static 2012-03-21 17:56

All works very well, I didnt get any problem and can do whatever I want after that: fallback on the other BE, go on with this one installing new packages, create more new BE, etc.

But if I tried to automagically create a new BE from the pkg utility, the created BE seems not have all the good stuff it must had.

ZNG# pkg install --require-new-be site/application/testpkg
           Packages to install:   1
       Create boot environment: Yes
Create backup boot environment:  No

DOWNLOAD                                  PKGS       FILES    XFER (MB)
Completed                                  1/1       20/20      0.0/0.0

PHASE                                        ACTIONS
Install Phase                                  73/73

PHASE                                          ITEMS
Package State Update Phase                       1/1
Image State Update Phase                         2/2
pkg: '/sbin/bootadm update-archive -R /tmp/tmpCqUVIT' failed.
with a return code of 1.

A clone of solaris exists and has been updated and activated.
On the next boot the Boot Environment solaris-1 will be
mounted on '/'.  Reboot when ready to switch to this updated BE.

ZNG# beadm list
BE        Active Mountpoint Space   Policy Created
--        ------ ---------- -----   ------ -------
solaris   N      /          102.0K  static 2012-03-21 14:04
solaris-1 R      -          950.52M static 2012-03-21 17:39

So a new BE was created, but this time there is something wrong. Try to see what's missing:

ZNG# beadm list
beadm mount newsolaBE         Active Mountpoint Space   Policy Created
--         ------ ---------- -----   ------ -------
newsolaris R      -          864.50M static 2012-03-21 17:52
solaris    N      /          80.84M  static 2012-03-21 14:04
ZNG# beadm mount newsolaris /mnt

ZNG# bootadm  update-archive -vn -R /mnt
file not found: /mnt//boot/solaris/bin/create_ramdisk
/mnt/: not a boot archive based Solaris instance

ZNG# ls -l /mnt/boot/solaris/bin/create_ramdisk
/mnt/boot/solaris/bin/create_ramdisk: No such file or directory

ZNG# ls -l /mnt/boot
/mnt/boot: No such file or directory

ZNG# ls -l /mnt
total 72
lrwxrwxrwx   1 root     root           9 Mar 21 14:17 bin -> ./usr/bin
drwxr-xr-x  17 root     sys           18 Mar 21 17:18 dev
drwxr-xr-x   2 root     root           2 Mar 21 14:26 dpool
drwxr-xr-x  48 root     sys          114 Mar 21 17:52 etc
drwxr-xr-x   2 root     sys            2 Mar 21 14:11 export
dr-xr-xr-x   2 root     root           2 Mar 21 14:11 home
drwxr-xr-x  12 root     bin          185 Mar 21 14:17 lib
drwxr-xr-x   2 root     sys            2 Mar 21 14:11 mnt
dr-xr-xr-x   2 root     root           2 Mar 21 14:26 net
dr-xr-xr-x   2 root     root           2 Mar 21 14:26 nfs4
drwxr-xr-x   2 root     sys            2 Mar 21 14:11 opt
dr-xr-xr-x   2 root     root           2 Mar 21 14:11 proc
drwx------   2 root     root           5 Mar 21 16:50 root
drwxr-xr-x   2 root     root           2 Mar 21 14:26 rpool
lrwxrwxrwx   1 root     root          10 Mar 21 14:17 sbin -> ./usr/sbin
drwxr-xr-x   5 root     root           5 Mar 21 14:11 system
drwxrwxrwt   2 root     sys            2 Mar 21 17:19 tmp
drwxr-xr-x   2 root     root           2 Mar 21 14:26 tools
drwxr-xr-x  22 root     sys           32 Mar 21 14:26 usr
drwxr-xr-x  28 root     sys           29 Mar 21 14:17 var

Well, I don't why there is a difference between those two BE, but the differences are significant enough to be a problem.

Comments welcome!

Sunday 13 March 2011

Customized Solaris installation and patching experience

I recently faced a curious problem when trying to patch an Alternate Boot Environment created with Live Upgrade on Solaris 10. Although I initially though it was a LU problem, the solution is finally related to the patches to be applied and the way a Solaris is installed.

Assuming the ABE is named s10u9, I first tried to apply the Critical Patch Updates the new way, i.e. through a switch to the installcluster script , which quickly failed like this:

# cd /net/jumpstart/export/media/patch/cpu/10_Recommended_CPU_2010-10
# ./installcluster --apply-prereq --s10cluster
[...]
# ./installcluster -B s10u9 --s10cluster
ERROR: Patch set cannot be installed from a live boot environment without zones
       support, to a target boot environment that has zones support.

So, I tried to apply the patches using the luupgrade command, but it failed with a very similar message:

# ptime luupgrade -t -n s10u9 -s /net/jumpstart/export/media/patch/cpu/10_Recommended_CPU_2010-10/patches `cat patch_order`
Validating the contents of the media .
The media contains 198 software patches that can be added.
All 198 patches will be added because you did not specify any specific patches to add.
Mounting the BE .
ERROR: The boot environment  supports non-global zones. The current boot
environment does not support non-global zones. Releases prior to Solaris 10 cannot be
used to maintain Solaris 10 and later releases that include support for non-global zones.
You may only execute the specified operation on a system with Solaris 10 (or later)
installed.

The fact is, the Primary Boot Environment is a Solaris 10 installation. So, why complaining that the PBE is an older release? Looking on OTN discussion forums and in the README file which came with the Critical Patch Updates release, there is a known bug which can end this way. This will occur when /etc/zones/index in the inactive boot environment has an incorrect setting for the state for the global zone. The correct setting is installed. So, get check this one:

# lumount -n s10u9
/.alt.s10u9
# grep "^global:configured:" /.alt.s10u9/etc/zones/index
# luumount -n s10u9

So no luck here. But wait: if the PBE is a customized Solaris 10 installation, it may be that the installed packages missed the Zone feature, which seems to be mandatory by installcluster or liveupgrade -t to figure out if the PBE is a proper (usable) Solaris 10 installation. So, I just installed the missing packages from the install media...

# mount -r -F hsfs `lofiadm -a /net/jumpstart/export/media/iso/sol-10-u9-ga-sparc-dvd.iso` /mnt
# pkginfo -d /mnt/Solaris_10/Product | nawk '$2 ~ /zone/ || $2 ~ /pool$/ {print $0}'
application SUNWluzone                       Live Upgrade (zones support)
system      SUNWpool                         Resource Pools
system      SUNWzoner                        Solaris Zones (Root)
system      SUNWzoneu                        Solaris Zones (Usr)
# yes | pkgadd -d /mnt/Solaris_10/Product SUNWluzone SUNWzoner SUNWzoneu SUNWpool
[...]
# umount /mnt
# lofiadm -d /dev/lofi/1

... and this must be OK right now:

# ./installcluster -B s10u9 --s10cluster
Setup ...
CPU OS Cluster 2010/10 Solaris 10 SPARC (2010.10.06)
Application of patches started : 2011.02.07 11:17:08

Applying 120900-04 (  1 of 198) ... skipped
[...]
Installation of patch set to alternate boot environment complete.

Please remember to activate boot environment s10u9 with luactivate(1M)
before rebooting.
Install log files written :
  /.alt.s10u9/var/sadm/install_data/s10s_rec_cluster_short_2011.02.07_11.17.08.log
  /.alt.s10u9/var/sadm/install_data/s10s_rec_cluster_verbose_2011.02.07_11.17.08.log

And it is... The question is, why is the Zone feature necessary and mandatory in this case?

Monday 10 January 2011

Solaris 11 Express: Problem #3

In this series, I will report the bugs or problems I find when running the Oracle Solaris 11 Express distribution. I hope this will give more visibility on those PR to Oracle to correct them before the release of Solaris 11 next year.

I recently switch from the official Oracle release repository to the support repository for Solaris 11 Express. Before the switch, one non-global zone was created. Since there were some updates to this repository, I pkg update'ed, rebooted to the new boot environment, and tried to update the non-global zone:

# beadm list                                                                         
BE        Active Mountpoint Space Policy Created          
--        ------ ---------- ----- ------ -------          
solaris   -      -          9.88M static 2010-12-01 09:32 
solaris-1 NR     /          5.44G static 2011-01-03 19:35 

# zoneadm list -vc                                                                   
  ID NAME             STATUS     PATH                           BRAND    IP    
   0 global           running    /                              ipkg     shared
   - zone1            installed  /dpool/store/zone/zone1        ipkg     shared

# zoneadm -z zone1 detach

# zoneadm -z zone1 attach -u
Log File: /var/tmp/zone1.attach_log.lfa49e
Attaching...

preferred global publisher: solaris
       Global zone version: entire@0.5.11,5.11-0.151.0.1.1:20101222T214417Z
   Non-Global zone version: entire@0.5.11,5.11-0.151.0.1:20101105T054056Z

                     Cache: Using /var/pkg/download.
  Updating non-global zone: Output follows
Creating Plan                          
ERROR: Could not update attaching zone
                    Result: Attach Failed.

# cat /var/tmp/zone1.attach_log.lfa49e
[Monday, January  3, 2011 08:42:24 PM CET] Log File: /var/tmp/zone1.attach_log.lfa49e
[Monday, January  3, 2011 08:42:25 PM CET] Attaching...
[Monday, January  3, 2011 08:42:25 PM CET] existing
[Monday, January  3, 2011 08:42:25 PM CET] 
[Monday, January  3, 2011 08:42:25 PM CET]   Sanity Check: Passed.  Looks like an OpenSolaris system.
[Monday, January  3, 2011 08:42:31 PM CET] preferred global publisher: solaris
[Monday, January  3, 2011 08:42:32 PM CET]        Global zone version: entire@0.5.11,5.11-0.151.0.1.1:20101222T214417Z
[Monday, January  3, 2011 08:42:32 PM CET]    Non-Global zone version: entire@0.5.11,5.11-0.151.0.1:20101105T054056Z

[Monday, January  3, 2011 08:42:32 PM CET]                      Cache: Using /var/pkg/download.
[Monday, January  3, 2011 08:42:32 PM CET]   Updating non-global zone: Output follows
pkg set-publisher: 
Unable to locate certificate '/dpool/store/zone/zone1/root/dpool/store/zone/zone1/root/var/pkg/ssl/Oracle_Solaris_11_Express_Support.certificate.pem' needed to access 'https://pkg.oracle.com/solaris/support/'.
pkg unset-publisher: 
Removal failed for 'za23954': The preferred publisher cannot be removed.

pkg: The following pattern(s) did not match any packages in the current catalog.
Try relaxing the pattern, refreshing and/or examining the catalogs:
        entire@0.5.11,5.11-0.151.0.1.1:20101222T214417Z
[Monday, January  3, 2011 08:44:04 PM CET] ERROR: Could not update attaching zone
[Monday, January  3, 2011 08:44:06 PM CET]                     Result: Attach Failed.

FYI, this problem was covered by the Bug ID number 13000, but is always present at this time, at least for Solaris 11 Express 2010.11.

So, it seems that the change of repository for the solaris publisher was not well managed by the non-global zone update mechanism. Just to be sure, I tried to create a new non-global zone in the new boot environment, but the problem exists in this case, too:

# zoneadm -z zone2 install
A ZFS file system has been created for this zone.
   Publisher: Using solaris (https://pkg.oracle.com/solaris/support/ ).
   Publisher: Using opensolaris.org (http://pkg.opensolaris.org/dev/).
       Image: Preparing at /dpool/store/zone/zone2/root.
 Credentials: Propagating Oracle_Solaris_11_Express_Support.key.pem
 Credentials: Propagating Oracle_Solaris_11_Express_Support.certificate.pem
Traceback (most recent call last):
  File "/usr/bin/pkg", line 4225, in handle_errors
    __ret = func(*args, **kwargs)
  File "/usr/bin/pkg", line 4156, in main_func
    ret = image_create(pargs)
  File "/usr/bin/pkg", line 3836, in image_create
    variants=variants, props=set_props)
  File "/usr/lib/python2.6/vendor-packages/pkg/client/api.py", line 3205, in image_create
    uri=origins[0])
TypeError: 'set' object does not support indexing

pkg: This is an internal error.  Please let the developers know about this
problem by filing a bug at http://defect.opensolaris.org and including the
above traceback and this message.  The version of pkg(5) is '052adf36c3f4'.
ERROR: failed to create image

FYI, this problem is covered by the Bug ID number 17653.

Well, no luck here. I didn't see a Solaris IPS update for these problems yet, which are very annoying, at least.

Update #1 (2011-02-03): The problem is now fixed in the latest support pkg repository. Install or attach a non-global ipkg branded zone works now as expected:

# zoneadm -z zone1 install
A ZFS file system has been created for this zone.
   Publisher: Using solaris (https://pkg.oracle.com/solaris/support/ ).
   Publisher: Using opensolaris.org (http://pkg.opensolaris.org/dev/).
   Publisher: Using sunfreeware (http://pkg.sunfreeware.com:9000/).
       Image: Preparing at /dpool/export/zone/zone1/root.
 Credentials: Propagating Oracle_Solaris_11_Express_Support.key.pem
 Credentials: Propagating Oracle_Solaris_11_Express_Support.certificate.pem
       Cache: Using /var/pkg/download. 
Sanity Check: Looking for 'entire' incorporation.
  Installing: Core System (output follows)
               Packages to install:     1
           Create boot environment:    No
[...]
        Note: Man pages can be obtained by installing SUNWman
 Postinstall: Copying SMF seed repository ... done.
 Postinstall: Applying workarounds.
        Done: Installation completed in 332.525 seconds.

  Next Steps: Boot the zone, then log into the zone console (zlogin -C)
              to complete the configuration process.

And:

# zoneadm -z zone1 detach
# zoneadm -z zone1 attach -u
Log File: /var/tmp/zone1.attach_log.PhaOKf
Attaching...

preferred global publisher: solaris
       Global zone version: entire@0.5.11,5.11-0.151.0.1.2:20110127T225841Z
   Non-Global zone version: entire@0.5.11,5.11-0.151.0.1.2:20110127T225841Z

                     Cache: Using /var/pkg/download.
  Updating non-global zone: Output follows
No updates necessary for this image.   
  Updating non-global zone: Zone updated.
                    Result: Attach Succeeded.
# zoneadm list -vc
  ID NAME             STATUS     PATH                           BRAND    IP    
   0 global           running    /                              ipkg     shared
   - zone1            installed  /dpool/export/zone/zone1       ipkg     shared

Monday 3 January 2011

Solaris 11 Express: Problem #2

In this series, I will report the bugs or problems I find when running the Oracle Solaris 11 Express distribution. I hope this will give more visibility on those PR to Oracle to correct them before the release of Solaris 11 next year.

For some customers, I had the habit to clone a non-global zone using a template zone. But in order to save some space, I generally use the capability to use a ZFS snapshot as input for the clone, avoiding creating a snapshot each time a new clone is created.

It seems that this capability is not usable anymore on Solaris 11 Express at this time:

# zoneadm -z zone3 clone -s dpool/store/zone/zone1/ROOT/zbe@zone2_snap zone1
/usr/lib/brand/ipkg/clone: -s: unknown option

Nevertheless, this functionality is always described in the manual page:

brand-specific usage: clone {sourcezone} usage: clone [-m method] [-s ] [brand-specific args] zonename Clone the installation of another zone. The -m option can be used to specify 'copy' which forces a copy of the source zone. The -s option can be used to specify the name of a ZFS snapshot that was taken from a previous clone command. The snapshot will be used as the source instead of creating a new ZFS snapshot. All other arguments are passed to the brand clone function; see brands(5) for more information.

No luck here. Even if the space consideration may be minimized by the deduplication feature of ZFS in Solaris 11 Express, it is not always appropriate nor usable: on small size server for example.

FYI, this problem is covered by the Bug ID number 6383119. Note that you can add yourself to the interest list at the bottom of the bug report page:

Sunday 29 August 2010

Apropos Solaris

John Fowler (Oracle Executive Vice President for Server and Storage Systems) held an on-line webcast on August 10 on the strategy for hardware servers based on SPARC and x86, and the formalization of the upcoming release of Solaris 11 in 2011.

This post is only aimed at summarize the main points, the complete slides of the presentation are available at the Oracle web site.

  1. Message #1: SPARC is alive and will continue. Solaris is alive and will continue. Both actively.
  2. Message #2: What is interesting here is that this is not only intentions, it is a real roadmap up to five years, on the ex-Sun well-known products. Oracle clearly has some strong plans about Solaris, SPARC ad x86 platforms, and just began to speak publicly about them. We will see probably more about them all at the Oracle OpenWorld in few weeks now.

The points are:

  • A roadmap for SPARC and Solaris up to 2015.
  • SPARC will double performance improvement every two years:
    • Cores: 128 (32 in 2010).
    • Threads: 16384 (512 in 2010).
    • Memory capacity: 64TB (4TB in 2010).
    • Logical Domains: 256 (128 in 2010).
    • Java Ops per second: 50000 (5000 in 2010).
  • Very SPARC oriented: it seems that there will only be one SPARC brand at the end of 2015.
  • Two big families of SPARC servers: lots of threads known as the T-Series, lots of sockets known as M-Series.
  • A least one Update to Solaris 10 around 2010Q3, a beta program of Solaris 11 known as Solaris 11 Express due to last 2010, then Solaris 11 due in 2011 and up to 2015.

Solaris 11 will be based on the now close OpenSolaris distribution, which will include:

  • Image Packaging System (IPS): totally new packaging system fully integrated with ZFS and Boot Environment Administration (aimed at replacing Live Upgrade).
  • Crossbow network virtualization stack.
  • ZFS de-duplication, and lots of recent optimizations and functionalities.
  • CIFS file services : in-kernel implementation of CIFS.
  • Enhanced Gnome user environment.
  • Updated installer and auto network installer ("AI", aimed at replacing JumpStart)
  • Network Automagic configuration.
  • And many more (I heard Solaris 10 BrandZ...).

Sunday 2 May 2010

Live Upgrading When Diagnostics Mode Is Enabled

Recently, we faced an interesting problem when using Live Upgrade on some of our SPARC servers (with lots of non-global zones hosted on SAN devices). Here are the basic steps we generally follow when using LU:

  1. Update the Live Upgrade functionality according to the Article ID #1004881.1, Solaris Live Upgrade Software: Patch Requirements.
  2. Create the ABE.
  3. Upgrade the ABE with an operating system image (and test the upgrade according to a JumpStart profile).
  4. Apply a determined Recommended Patch Cluster to the ABE.
  5. Activate the ABE to be the next booted BE.
  6. Reboot on the new BE, and post-configuration steps--eventually.

In some circumstances, and even if all the steps went pretty well--the activation of the new BE was ok (we traced its activities)--we did reboot on the old BE:

# lustatus
Boot Environment     Is       Active Active    Can    Copy
Name                 Complete Now    On Reboot Delete Status
-------------------- -------- ------ --------- ------ -------
s10u4                yes      yes    yes       no     -
s10u8                yes      no     no        yes    -
# lucurr
s10u4
# luactivate -n s10u8
[...]
# lustatus
Boot Environment     Is       Active Active    Can    Copy
Name                 Complete Now    On Reboot Delete Status
-------------------- -------- ------ --------- ------ -------
s10u4                yes      yes    no        no     -
s10u8                yes      no     yes       no     -
# shutdown -y -g 0 -i 6
[...]
# lucurr
s10u4

Ouch. After a bit of digging, and seeing nothing wrong from the console via the Service Processor, we hit the following message from the log of the SMF legacy script run by LU when rebooting (at the shutdown time more precisely):

# cat /var/svc/log/rc6.log
[...]
Executing legacy init script "/etc/rc0.d/K62lu".
Live Upgrade: Deactivating current boot environment <s10u4>.
zlogin: login allowed only to running zones (zonename1 is 'installed').
zlogin: login allowed only to running zones (zonename2 is 'installed').
Live Upgrade: Executing Stop procedures for boot environment <s10u4>.
Live Upgrade: Current boot environment is <s10u4>.
Live Upgrade: New boot environment will be <s10u8>.
Live Upgrade: Activating boot environment <s10u8>.
Creating boot_archive for /.alt.tmp.b-9Tb.mnt
updating /.alt.tmp.b-9Tb.mnt/platform/sun4v/boot_archive
Live Upgrade: The boot device for boot environment <s10u8> is
</dev/dsk/c1t0d0s4>.
/etc/lib/lu/lubootdev: ERROR: Unable to get current boot devices.
/etc/lib/lu/lubootdev: INFORMATION: The system is running with the system
boot PROM diagnostics mode enabled. When diagnostics mode is
enabled, Live Upgrade is unable to access the system boot
device list, causing certain features of Live Upgrade (such
as changing the system boot device after activating a boot
environment) to fail. To correct this problem, please run
the system in normal, non-diagnostic mode. The system might
have a key switch or other external means of booting the
system in normal mode. If you do not have such a means, you
can set one or both of the EEPROM parameters 'diag-switch?'
or 'diagnostic-mode?' to 'false'.  After making a change,
either through external means or by changing an EEPROM
parameter, retry the Live Upgrade operation or command.
ERROR: Live Upgrade: Unable to change primary boot device to boot
environment <s10u8>.
ERROR: You must manually change the system boot prom to boot the system
from device </pci@0/pci@0/pci@2/scsi@0/sd@0,0:e>.
Live Upgrade: Activation of boot environment <s10u8> completed.
Legacy init script "/etc/rc0.d/K62lu" exited with return code 0.
[...]

Well, pretty explicit in fact, but very unexpected when the activation went so well beforehand. So, go to check the EEPROM, and change it back if necessary:

# eeprom diag-switch?
diag-switch?=true
# eeprom diag-switch?=false

And all returned to a normal situation when activating again, and rebooting. Although this case is self explanatory in the corresponding log file, and is describe in the Bug ID #6949588, I think this one may be put more visible to the system administrator, for example by checking the EEPROM configuration during the BE activation code (at the luactivate command).

Monday 26 April 2010

Problem Starting OCCSD In A Non-Global Zone

If you are not able to start the Oracle Cluster Synchronization Services Daemon (OCCSD) in a non-global zone on Solaris 10, I bet you are running Oracle 10.2.0.3 and higher. In this case, you will see something similar in the the /var/adm/messages file--but nothing is coming up:

Apr 26 10:39:51 zonename oracle: [ID 702911 user.error] Oracle Cluster Synchronization Service starting by user request.
Apr 26 10:39:52 zonename root: [ID 702911 user.error] Cluster Ready Services completed waiting on dependencies.

Trying to trace the ocssd.bin process during start-up give you something similar to:

[...]
12564:   0.0803 setrlimit(RLIMIT_CORE, 0xFFFFFFFF7FFFF900)      = 0
12564:   0.0804 priocntlsys(1, 0xFFFFFFFF7FFFF694, 6, 0xFFFFFFFF7FFFF768, 0) Err#1 EPERM [proc_priocntl]
12564:   0.0810 fstat(2, 0xFFFFFFFF7FFFE870)                    = 0
12564:   0.0811 brk(0x100229E80)                                = 0
12564:   0.0813 brk(0x10022DE80)                                = 0
12564:   0.0815 fstat(2, 0xFFFFFFFF7FFFE740)                    = 0
12564:   0.0816 ioctl(2, TCGETA, 0xFFFFFFFF7FFFE7AC)            Err#25 ENOTTY
12564:   0.0818 write(2, " s e t p r i o r i t y :".., 52)      = 52
12564:   0.0821 _exit(100)

So, in this case you just hit a privilege restriction, which did not apply before with older release of Oracle. As clearly mentioned in the output of truss, the proc_priocntl is not available in the non-global zone for use by Oracle. A clean solution, available only with Solaris 10 11/06 (U3) and later, is to use the limitpriv configuration property to extend the basic privileges provided by the zone framework.

As stated in the privileges(5) man page:

PRIV_PROC_PRIOCNTL Allow a process to elevate its priority above its current level. Allow a process to change its scheduling class to any scheduling class, including the RT class.

Interestingly, this seems to be exactly the case for the Oracle Cluster Synchronization Services Daemon:

# zonecfg -z zonename set limitpriv=default,proc_priocntl
# zoneadm -z zonename reboot
# zlogin zonename "ps -o class,args -p `pgrep ocssd.bin`"
 CLS COMMAND
  RT /soft/oracle/10.2.0/asm_1/bin/ocssd.bin

Ok, that's fine right now.

Thursday 25 March 2010

Prevent A Non-Global Zone Reaching Others

When using non-global zones, the network stream didn't leave the global zone. Although very interesting when looking for performance for multi-tiers applications hosted on non-global zones from the same system, it can be a problem when it comes to segregate different networks used by the different non-global zones.

To my knowledge, IP Filter can be use from the global zone to help in this case. But a more cleaner approach would be to block (reject) the route between those non-global zones. For example, if one non-global zone has an IP address of addrX, and the second non-global zone has an address of addrY, then the following commands will prevent network traffic from passing between the two zones.

# route add addrX addrY -interface -reject
# route add addrY addrX -interface -reject

The problem is, when there is a lot of non-global zones you need to segregate, you need to add 2^n routes, which represents 32 routes for 5 non-global zones... Not very scalable, and not manageable. If someone know a better solution, please feel free to comment this post.

Monday 6 October 2008

Fake The hostid Of A Solaris Zone, Updated

As a little follow-up to Fake The hostid Of A Solaris Zone, and regarding the discussion on the capacity to change the hostid of a Solaris non-global zone, it is interesting to mention these (updated) informations:

  1. The LD_PRELOAD trick proposed before is not a proper option, and is really ugly (and intrusive if you didn't unset it before continuing the execution of a program).
  2. When using Solaris 8 or Solaris 9 Containers, there is a feature called Host ID Emulation from the zonecfg utility which can do exactly that.
  3. Before the introduction of the privileges in a non-global zone with Solaris 11/06 (a.k.a. Solaris Update 3), you must run the DTrace zhostid script (daemon) within the global zone. It is not mandatory to run it from the global zone anymore. Using the appropriate dtrace_user privilege only, you can run it directly from the non-global zone:
    # zonecfg -z ngzone set limitpriv=default,dtrace_user
    # zoneadm -z ngzone boot
    # zlogin ngzone
    [Connected to zone 'ngzone' pts/5]
    Last login: Sat Oct  4 18:57:17 on pts/5
    Sun Microsystems Inc.   SunOS 5.11      snv_99  November 2008
    # /sbin/zonename 
    ngzone
    # /usr/bin/hostid
    837d47dd
    # ./zhostid &
    [1] 21506
    # /usr/bin/hostid
    20a82f32
    # ^D
    [Connection to zone 'ngzone' pts/5 closed]
    

Tuesday 16 October 2007

Error While Patching A New Boot Environment

After creating a new boot environment (BE) named beastie, see below, to upgrade a system running Solaris 8 to Solaris 10 11/06 (as I do many times in the past without a hiccup), I encounter a problem when I tried to apply the appropriate Recommended cluster patch to the new BE with this message:

# luupgrade -t -n beastie -s /var/tmp/10_Recommended

Validating the contents of the media .
The media contains 76 software patches that can be added.
All 76 patches will be added because you did not specify any specific
patches to add.
Mounting the BE .
ERROR: The boot environment  supports non-global
zones.The current boot environment does not support non-global zones.
Releases prior to Solaris 10 cannot be used to maintain Solaris 10 and
later releases that include support for non-global zones. You may only
execute the specified operation on a system with Solaris 10 (or later)
installed.

I can't find any reference to a known bug or problem after looking for this against SunSolve, Sun Support, and Googling. Has anyone already seen this error, and solved it The Right Way? As for me, I needed to boot from the BE and apply the cluster patch: this was a pain since this bundle include the -36 kernel patch which is known to be relatively disruptive, since it need two reboot to apply the entire cluster patch (it contains a new version for the kernel).

Update #1 (2009-03-22): Seems to be explained in this excellent BigAdmin article.

- page 1 of 2