Blog


s10u4 yes yes yes no - s10u8 yes no no yes -

lucurr

s10u4

luactivate -n s10u8

[…]

lustatus

Boot Environment Is Active Active Can Copy Name Complete Now On Reboot Delete Status


Recently, we faced an interesting problem when using Live Upgrade on some of our SPARC servers (with lots of non-global zones hosted on SAN devices). Here are the basic steps we generally follow when using LU:

  1. Update the Live Upgrade functionality according to the Article ID #1004881.1, Solaris Live Upgrade Software: Patch Requirements.
  2. Create the ABE.
  3. Upgrade the ABE with an operating system image (and test the upgrade according to a JumpStart profile).
  4. Apply a determined Recommended Patch Cluster to the ABE.
  5. Activate the ABE to be the next booted BE.
  6. Reboot on the new BE, and post-configuration steps--eventually.

In some circumstances, and even if all the steps went pretty well--the activation of the new BE was ok (we traced its activities)--we did reboot on the old BE:

# lustatus
Boot Environment     Is       Active Active    Can    Copy
Name                 Complete Now    On Reboot Delete Status
s10u4                yes      yes    no        no     -
s10u8                yes      no     yes       no     -
# shutdown -y -g 0 -i 6
[...]
# lucurr
s10u4

Ouch. After a bit of digging, and seeing nothing wrong from the console via the Service Processor, we hit the following message from the log of the SMF legacy script run by LU when rebooting (at the shutdown time more precisely):

# cat /var/svc/log/rc6.log
[...]
Executing legacy init script "/etc/rc0.d/K62lu".
Live Upgrade: Deactivating current boot environment <s10u4>.
zlogin: login allowed only to running zones (zonename1 is 'installed').
zlogin: login allowed only to running zones (zonename2 is 'installed').
Live Upgrade: Executing Stop procedures for boot environment <s10u4>.
Live Upgrade: Current boot environment is <s10u4>.
Live Upgrade: New boot environment will be <s10u8>.
Live Upgrade: Activating boot environment <s10u8>.
Creating boot_archive for /.alt.tmp.b-9Tb.mnt
updating /.alt.tmp.b-9Tb.mnt/platform/sun4v/boot_archive
Live Upgrade: The boot device for boot environment <s10u8> is
</dev/dsk/c1t0d0s4>.
/etc/lib/lu/lubootdev: ERROR: Unable to get current boot devices.
/etc/lib/lu/lubootdev: INFORMATION: The system is running with the system
boot PROM diagnostics mode enabled. When diagnostics mode is
enabled, Live Upgrade is unable to access the system boot
device list, causing certain features of Live Upgrade (such
as changing the system boot device after activating a boot
environment) to fail. To correct this problem, please run
the system in normal, non-diagnostic mode. The system might
have a key switch or other external means of booting the
system in normal mode. If you do not have such a means, you
can set one or both of the EEPROM parameters 'diag-switch?'
or 'diagnostic-mode?' to 'false'.  After making a change,
either through external means or by changing an EEPROM
parameter, retry the Live Upgrade operation or command.
ERROR: Live Upgrade: Unable to change primary boot device to boot
environment <s10u8>.
ERROR: You must manually change the system boot prom to boot the system
from device </pci@0/pci@0/pci@2/scsi@0/sd@0,0:e>.
Live Upgrade: Activation of boot environment <s10u8> completed.
Legacy init script "/etc/rc0.d/K62lu" exited with return code 0.
[...]

Well, pretty explicit in fact, but very unexpected when the activation went so well beforehand. So, go to check the EEPROM, and change it back if necessary:

# eeprom diag-switch?
diag-switch?=true
# eeprom diag-switch?=false

And all returned to a normal situation when activating again, and rebooting. Although this case is self explanatory in the corresponding log file, and is describe in the Bug ID #6949588, I think this one may be put more visible to the system administrator, for example by checking the EEPROM configuration during the BE activation code (at the luactivate command).