Recently, we faced an interesting problem when using Live Upgrade on some of our SPARC servers (with lots of non-global zones hosted on SAN devices). Here are the basic steps we generally follow when using LU:
- Update the Live Upgrade functionality according to the Article ID #1004881.1, Solaris Live Upgrade Software: Patch Requirements.
- Create the ABE.
- Upgrade the ABE with an operating system image (and test the upgrade according to a JumpStart profile).
- Apply a determined Recommended Patch Cluster to the ABE.
- Activate the ABE to be the next booted BE.
- Reboot on the new BE, and post-configuration steps--eventually.
In some circumstances, and even if all the steps went pretty well--the activation of the new BE was ok (we traced its activities)--we did reboot on the old BE:
# lustatus Boot Environment Is Active Active Can Copy Name Complete Now On Reboot Delete Status -------------------- -------- ------ --------- ------ ------- s10u4 yes yes yes no - s10u8 yes no no yes - # lucurr s10u4 # luactivate -n s10u8 [...] # lustatus Boot Environment Is Active Active Can Copy Name Complete Now On Reboot Delete Status -------------------- -------- ------ --------- ------ ------- s10u4 yes yes no no - s10u8 yes no yes no - # shutdown -y -g 0 -i 6 [...] # lucurr s10u4
Ouch. After a bit of digging, and seeing nothing wrong from the console via the Service Processor, we hit the following message from the log of the SMF legacy script run by LU when rebooting (at the shutdown time more precisely):
# cat /var/svc/log/rc6.log [...] Executing legacy init script "/etc/rc0.d/K62lu". Live Upgrade: Deactivating current boot environment <s10u4>. zlogin: login allowed only to running zones (zonename1 is 'installed'). zlogin: login allowed only to running zones (zonename2 is 'installed'). Live Upgrade: Executing Stop procedures for boot environment <s10u4>. Live Upgrade: Current boot environment is <s10u4>. Live Upgrade: New boot environment will be <s10u8>. Live Upgrade: Activating boot environment <s10u8>. Creating boot_archive for /.alt.tmp.b-9Tb.mnt updating /.alt.tmp.b-9Tb.mnt/platform/sun4v/boot_archive Live Upgrade: The boot device for boot environment <s10u8> is </dev/dsk/c1t0d0s4>. /etc/lib/lu/lubootdev: ERROR: Unable to get current boot devices. /etc/lib/lu/lubootdev: INFORMATION: The system is running with the system boot PROM diagnostics mode enabled. When diagnostics mode is enabled, Live Upgrade is unable to access the system boot device list, causing certain features of Live Upgrade (such as changing the system boot device after activating a boot environment) to fail. To correct this problem, please run the system in normal, non-diagnostic mode. The system might have a key switch or other external means of booting the system in normal mode. If you do not have such a means, you can set one or both of the EEPROM parameters 'diag-switch?' or 'diagnostic-mode?' to 'false'. After making a change, either through external means or by changing an EEPROM parameter, retry the Live Upgrade operation or command. ERROR: Live Upgrade: Unable to change primary boot device to boot environment <s10u8>. ERROR: You must manually change the system boot prom to boot the system from device </pci@0/pci@0/pci@2/scsi@0/sd@0,0:e>. Live Upgrade: Activation of boot environment <s10u8> completed. Legacy init script "/etc/rc0.d/K62lu" exited with return code 0. [...]
Well, pretty explicit in fact, but very unexpected when the activation went so well beforehand. So, go to check the EEPROM, and change it back if necessary:
# eeprom diag-switch? diag-switch?=true # eeprom diag-switch?=false
And all returned to a normal situation when activating again, and rebooting.
Although this case is self explanatory in the corresponding log file, and is
describe in the Bug
ID #6949588, I think this one may be put more visible to the system
administrator, for example by checking the EEPROM configuration during the BE
activation code (at the luactivate command).
