blog'o thnet

To content | To menu | To search

Monday 10 October 2011

Encrypted SWAP Device Just Disappeared In Solaris 11 EA

For some months, I used to encrypt the SWAP device (which is a ZFS volume) and thus have an encrypted /tmp. This worked fine with Solaris 11 Express, but I encountered a strange behavior in Solaris 11 EA which leads to have the SWAP device to... well, just disappeared.

Here is what I found after two boots; and on several machines:

# swap -l
No swap devices configured

# zfs list -t volume
NAME         USED  AVAIL  REFER  MOUNTPOINT
rpool/dump  32.8G   240G  31.8G  -

# grep swap /etc/vfstab
swap            -               /tmp            tmpfs   -       yes     -
/dev/zvol/dsk/rpool/swap        -               -               swap    -       no      encrypted

So, the rpool/swap dataset disappeared. I am sure not to have destroyed it, in particular since this appears on multiple servers. Nevertheless, I found this in the history of the zpool command:

# zpool history | grep destroy
[...]
2011-10-05.10:22:49 zfs destroy rpool/swap

# last reboot | head -2
reboot    system boot                   Wed Oct  5 10:23
reboot    system down                   Wed Oct  5 10:20

So, this problem seems to be related to some actions at boot time. What have the logs of SMF services to say about that?

# find /var/svc/log -print | xargs grep -i swap
/var/svc/log/system-filesystem-usr:default.log:cannot create 'rpool/swap': pool must be upgraded to set this property or value
/var/svc/log/system-filesystem-usr:default.log:cannot open 'rpool/swap': dataset does not exist
/var/svc/log/system-filesystem-usr:default.log:cannot create 'rpool/swap': pool must be upgraded to set this property or value
/var/svc/log/system-filesystem-usr:default.log:cannot open 'rpool/swap': dataset does not exist

# tail -3 /var/svc/log/system-filesystem-usr:default.log
[ Oct  5 12:00:05 Executing start method ("/lib/svc/method/fs-usr"). ]
cannot create 'rpool/swap': pool must be upgraded to set this property or value
[ Oct  5 12:00:13 Method "start" exited with status 0. ]

Ouch, what happened here? The message is interesting, but is a little misleading: it is on fresh Solaris 11 EA installations, and so the pools and datasets are all up to date:

# zpool upgrade && zfs upgrade
This system is currently running ZFS pool version 33.
All pools are formatted using this version.
This system is currently running ZFS filesystem version 5.
All filesystems are formatted with the current version.

So, it seems that the rpool/swap device is re-created at boot time, and for some reason it doesn't work as expected. Here is an attempt to discover where the device is re-created and why it does fail.

# find /lib/svc/method -print | xargs grep -i sbin/swapadd
/lib/svc/method/fs-usr:/usr/sbin/swapadd -1
/lib/svc/method/nfs-client:     /usr/sbin/swapadd
/lib/svc/method/fs-local:/usr/sbin/swapadd >/dev/null 2>&1

# grep "zfs destroy" /usr/sbin/swapadd
                zfs destroy $zvol > /dev/null 2>&1

# sed -n '/zfs create/,/\$zvol/p' /usr/sbin/swapadd
        zfs create -V $volsize -o volblocksize=`/usr/bin/pagesize` \
            -o primarycache=$primarycache -o secondarycache=$secondarycache \
            -o encryption=$encryption -o keysource=raw,file:///dev/random $zvol

So, the re-creation at boot time of the rpool/swap appears only when using an encrypted volume. And after a bit of digging, here what I found. At the first boot, here is the command used to create the encrypted volume:

zfs create -V 4G -o volblocksize=8192 -o primarycache=metadata -o secondarycache=all -o encryption=on -o keysource=raw,file:///dev/random rpool/swap

But on a second boot, here is the slightly different command used this time:

zfs create -V 4G -o volblocksize=8192 -o primarycache=metadata -o secondarycache=all -o encryption=aes-128-ctr -o keysource=raw,file:///dev/random rpool/swap

This is because the arguments passed to the command is backed-up and restored from the settings just before the deletion of the volume. As mentioned in the zfs(1m) manual page, only the following encryption algorithm are supported... and so the one which is sets is not valid (the error message saying that the pool must be upgraded to set this property or value is a little more clear by now).

encryption=off | on | aes-128-ccm | aes-192-ccm | aes-256-ccm | aes-128-gcm | aes-192-gcm | aes-256-gcm

The question is, how can this happen? Where does this algorithm com from? The answer is simple: it seems that this is the swap(1m) command which alters some properties of the rpool/swap volume:

# swap -d /dev/zvol/dsk/rpool/swap
# zfs destroy rpool/swap
# zfs create -V 4G -o volblocksize=8192 -o primarycache=metadata -o secondarycache=all -o encryption=on -o keysource=raw,file:///dev/random rpool/swap
# zfs list -H -o type,volsize,volblocksize,encryption rpool/swap
volume  4G      8K      on
# swap -1 -a /dev/zvol/dsk/rpool/swap
# zfs list -H -o type,volsize,volblocksize,encryption rpool/swap
volume  4G      1M      aes-128-ctr

Not only is the algorithm changed to something not supported (yet?), but the volblocksize property is touched as well. This was not the case on Solaris 11 Express 2010.11.

Hope someone can help me on this side, and that this is a known bug which is already (or will be quickly) addressed, in particular for the Solaris 11 GA. I already posted a comment on the blog of Darren Moffat, just in case this can help a bit.

Tuesday 1 February 2011

Solaris 11 Express: Problem #5

In this series, I will report the bugs or problems I find when running the Oracle Solaris 11 Express distribution. I hope this will give more visibility on those PR to Oracle to correct them before the release of Solaris 11 next year.

Some builds before Oracle decided not to provide a binary distribution of Solaris Next anymore (build snv_124 if I recall correctly), virtual consoles were introduced. This a well-known feature for the Linux and BSD people, but Solaris 11 Express is the first supported release of Solaris where we can run multiple virtual terminals on the console.

This long-time missing feature is not enable by default though. Here are the steps to do so:

$ pfexec svccfg -s vtdaemon setprop options/secure=false
$ pfexec svccfg -s vtdaemon setprop options/hotkeys=true
$ pfexec svcadm enable vtdaemon
$ pfexec svcadm enable console-login:vt2
$ pfexec svcadm enable console-login:vt3
$ pfexec svcadm enable console-login:vt4
$ pfexec svcadm enable console-login:vt5
$ pfexec svcadm enable console-login:vt6

Using Control-Alt-F1 to Control-Alt-F6, one can now switch to virtual consoles. And using Control-Alt-F7, one can switch back to the X server. Note: not setting the options/secure property to false will automatically lock the X server screen.

Although switching to virtual consoles works as expected, getting back to the X server is not really easy. From my experience, with the options/secure property set to true, using Control-Alt-F7 get me to a new login prompt, bypassing my (always?) logged-in session. With the options/secure property set to false, using Control-Alt-F7 leave me with a black screen and a blinking _ cursor... and nothing what I can do without a remote access.

FYI, this problem is covered by the Bug ID number 7001741. Note that you can add yourself to the interest list at the bottom of the bug report page:

Tuesday 13 March 2007

Adding a Specific Logical Interface with the Solaris 10

Under Solaris 10, there is no /etc/init.d/inetsvc network script anymore. You must use the new SMF to manage your network services, even special initialization ones. In order to plumb and up a new logical interface, you can now use the service instance named svc:/network/physical:default.

Here is quick example on how to use it:

# ifconfig -au4
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000 
nge0: flags=201004843<UP,BROADCAST,RUNNING,MULTICAST,DHCP,IPv4,CoS> mtu 1500 index 2
        inet 192.168.1.101 netmask ffffff00 broadcast 192.168.1.255
        ether 0:e0:81:58:88:ae 
# 
# cat /etc/hostname.nge0:1
192.168.1.50
# 
# svcadm restart network/physical
#
# ifconfig -au4
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000 
nge0: flags=201004843<UP,BROADCAST,RUNNING,MULTICAST,DHCP,IPv4,CoS> mtu 1500 index 2
        inet 192.168.1.101 netmask ffffff00 broadcast 192.168.1.255
        ether 0:e0:81:58:88:ae 
nge0:1: flags=201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500 index 2
        inet 192.168.1.50 netmask ffffff00 broadcast 192.168.1.255
# 
# ifconfig nge0:1 down unplumb
#
# ifconfig -a4
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000 
nge0: flags=201004843<UP,BROADCAST,RUNNING,MULTICAST,DHCP,IPv4,CoS> mtu 1500 index 2
        inet 192.168.1.101 netmask ffffff00 broadcast 192.168.1.255
        ether 0:e0:81:58:88:ae

Tuesday 2 January 2007

Remote X11 Display Using the Secure By Default Network Profile

Although already present for months in the Sun's OpenSolaris distribution, the new Secure By Default Network Profile feature is now available with the 11/06 update of the official Solaris release.

If you didn't get it through a fresh install, you can turn it on using the provided shell script (as /usr/sbin/netservices). It all works well, except in our case (at work) we want to be able to do remote display on X11. So, we use this quickly-written script in order to be able to do so easily:

# cat /tmp/svc.netservices
#!/usr/bin/env sh

echo "Setting the limited SMF configuration..."
netservices limited

echo "Setting the open form of x11-server..."
svccfg -s x11-server setprop options/tcp_listen = boolean: true

echo "Setting the open form of cde-login..."
svccfg -s cde-login setprop dtlogin/args = "\"\""
echo "Refresh the cde-login SMF service..."
svcadm refresh cde-login

echo "Restarting the SMF services..."
svcadm restart cde-login

exit 0

Hope this help... at least, it will be an online-memo for us!