blog'o thnet

To content | To menu | To search

Thursday 25 December 2008

More News To Come About Shrinking A zpool

As a little update to an older post on this subject, and although this post from Matthew Ahrens is about the new scrub code recently introduced in OpenSolaris build 94--and was in fact a priority before the launch of the Sun Storage 7000 Unified Storage Systems (a.k.a. Amber Road)--it is interesting to note that some of the new code will be usable to remove a disk from a ZFS pool.

As Matthew wrote:

This work lays a bunch of infrastructure that will be used by the upcoming device removal feature.

Saturday 18 October 2008

Discrepancies Between df And du Outputs

As a SA, it not uncommon to have regularly requests about big differences between the du and df outputs on a UFS file system. (For ZFS specific considerations, please see the ZFS FAQ.)

The du utility reports the sum of space allocated to all files in the file hierarchy rooted in the directory plus the space allocated to the directory itself. The df utility reports the amount of disk space occupied by a mounted file system.

When a file is remove from the file system, i.e. is unlinked (the hard link count goes to zero), the space belonging to this file is accounted against the du tool, but is not visible to the df utility until all references to it (open file descriptors) are closed. In order to find the guilty process, one can follow the information found in the SunManagers Frequently Asked Questions. Here is an example of such finding, but using a slightly different method to get the process currently holding the open descriptor to the deleted file.

Find the file which has been unlinked through the procfs interface:

# find /proc/*/fd \( -type f -a ! -size 0 -a -links 0 \) -print | xargs \ls -li
 415975 --w-------   0 user  group  2125803025 Oct 15 23:59 /proc/1252/fd/3

Eventually, get more detail about it:

# pargs -c 1252
1252:   rvd.basic -reliability 5 -listen tcp:9876 -logfile /path/to/log/rvd_9876.l
argv[0]: rvd.basic
argv[1]: -reliability
argv[2]: 5
argv[3]: -listen
argv[4]: tcp:9876
argv[5]: -logfile
argv[6]: /path/to/log/rvd_9876.log

Check to see if you can understand what is the content of the unlinked file:

# tail /proc/1252/fd/3
-------------------------------------------------------------------------------
2008-10-15 23:59:32.002116 - [MSG] BBG_Transmitter_class.cc, line 792 (thread 25087:4)
[4060] Sent a heartbeat
-------------------------------------------------------------------------------
BBG_Transmitter_class.cc: [4111] No activity detected. Send a Heartbeat message
-------------------------------------------------------------------------------
2008-10-15 23:59:32.134829 - [MSG] BBG_Transmitter_class.cc, line 1138 (thread 25087:4)
[4065] Heartbeat acknowledged by Bloomberg

You can correlate the size of the removed, but always referenced, file to the space accounted from the du and df tools:

# df -k /path/to
Filesystem            kbytes    used   avail capacity  Mounted on
/dev/md/dsk/d5       6017990 5874592   83219    99%    /path/to
# du -sk /path/to
3791632 /data
# echo "(5874616-3791632)*1024" | bc
2132975616

So, we now found the ~2GB log file which was always opened (used) by a process. Now, there are two solutions to be able to get back the freed space:

  1. Truncate the unlinked file (quick workaround).
  2. Simply restart properly the corresponding program (better option).

Use the solution which fits the best your need in your environment.

Tuesday 16 September 2008

Announcing Solaris 10 10/08

Although this seems a little bit confident, the long-awaited Update 6 to the Solaris 10 operating system release is just behind the door. This release (scheduled to be available in mid-October) will includes virtualization enhancements including the ability for a Solaris Container to automatically update its environment when moved from one system to another, Logical Domains support for dynamically reconfigurable disk and network I/O, and paravirtualization support when Solaris 10 is used as a guest OS in Xen-based environments such as Sun xVM Server. Solaris 10 10/08 also includes support for the latest systems from Sun and other vendors, such as those based on the Intel Xeon Processor 7400 Series.

This release will be the very first Solaris release to be able to boot from ZFS and use it as their root file system, such as what can be found on OpenSolaris or Solaris Express Community Release today.

Check the What’s New web page for Solaris, and consult the Solaris Media Gallery videos for more information.

Update #1 (2008-10-14): Don't forget to consult the must read What's New in Solaris 10 10/08? from the San Antonio OpenSolaris User Group.

Update #2 (2008-10-31): Get yours, and go reading the official What's New in the Solaris 10 10/08 Release page.

Friday 30 March 2007

ZFS Recent News

Well. More than a real blog entry, this post is more about keeping in touch with some recent add-ons in ZFS area. First, you can read the ZFS Overview and Guide just published on BigAdmin. Second, you must watch the excellent Thumper do it yourself, which is a very nice showcase of ZFS use. Third, a great listing of recent add-ons put in latest SXCE builds is available at Robert Milkowski's blog.

Last, be sure to check Tim Foster explanations about the recently announced ZFS Boot support in build 62, for the x86 platform. All interesting links included. His script to set up ZFS root automatically too! (Since all bits not yet well integrated...)

Thursday 1 March 2007

Want to Shrink a zpool?

If so, be patient. In fact it is a high-wanted feature, and the ZFS team is working hard on it right now. You can learn more about this feature following the Shrinking a zpool? thread on the zfs-discuss forum on opensolaris.org. Here are some chosen excerpts.

From Matthew A. Ahrens #1:

Regardless of where you want or don't want to use shrink, we are actively working on this, targeting delivery in s10u5.

From Matthew A. Ahrens #2:

Yeah, the implementation is nontrivial. Of course, this won't have any impact on snapshots, clones, etc. and will happen on-line. Any other solution would be unacceptable.

Howdy... I really like these kind of short and concise answers!

Thursday 11 January 2007

NFS and ZFS plus ZIL Interesting Notes

I recently learn about NFS on ZFS interaction problem reading the great blog of Ben Rockwood. Although not directly related to what he encountered, this recent great post about how NFS behaves with ZFS backend, particularly on the performance comparison front, says a lot of things about why you might see poor performance using this two technologies together.

To go deeper on this front, you can read more about the ZIL purpose on Eric Kustarz's weblog, and follow closely this ZFS thread on the OpenSolaris website.

Monday 11 December 2006

IBM TSM and Sun ZFS File System

Because ZFS is a relatively new file system, not all the giant corporate and well known backup/restore tools are able to support it. It is the case by now using EMC NetWorker and Veritas NetBackup. Not before the first half of the next year for TSM, IBM support said.

So. In order to backup ZFS file system with this tool, we need to write a little script which will launch the TSM CLI utility, thus backup'ing the wanted ZFS file systems. (This hack does not support the new ZFS ACLs, though.)

Here it is:

# grep postschedulecmd /opt/tivoli/tsm/client/ba/bin/dsm.sys
postschedulecmd "/root/bin/tsm.do.post.sh >> \
 /opt/tivoli/tsm/client/ba/bin/dsmsched.log 2>&1"
#
# cat /root/bin/tsm.do.post.sh
#!/usr/bin/env sh

PATH=/usr/sbin:${PATH}
export PATH

for mount in `zfs mount | awk '$0 ~ /.*\/.*\/.*/ {print $2}'`; do
  dsmc i ${mount} -subdir=yes
done

exit 0

Just relaunch the TSM scheduler, and watch your dsmsched.log log file with care.

Thursday 5 October 2006

Zones and ZFS Integration, and New Features in OpenSolaris

The aim of this little test case is to present new features of zones, and its tightly integration with the ZFS powerful file system. In order to do so, we will:

  1. Create a specific ZFS namespace for non-global zone(s).
  2. Create a new zone.
  3. Move it to a new zonepath.
  4. Configure it automatically, using a sysidcfg file.
  5. Clone it.
  6. Duplicate the first zone using the clone.

Be sure to have a valid hostname and IP address for the two non-global zones:

# getent hosts | egrep 'beastie|watchie'
192.168.1.1   beastie.thilelli.net beastie
192.168.1.2   watchie.thilelli.net watchie

Create a valid ZFS dedicated namespace:

# zfs create -o compression=on \
   -o mountpoint=/export/zone \
   -o canmount=off pool0/zone
# zfs list -r pool0/zone
NAME         USED  AVAIL  REFER  MOUNTPOINT
pool0/zone  24.5K   228G  24.5K  /export/zone
# zfs get compression,mountpoint,canmount pool0/zone
NAME        PROPERTY     VALUE         SOURCE
pool0/zone  compression  on            local
pool0/zone  mountpoint   /export/zone  local
pool0/zone  canmount     off           local
# zfs mount | grep zone
#

Note: Since build 48 of Nevada, some new ZFS features were added. Create time properties and canmount property are two of them. Please refer to this excellent blog entry from Eric Schrock's weblog for more information on these putbacks.

Configure the different zone's informations:

# zonecfg -z beastie 'create; set autoboot=true; \
   set zonepath=/export/zone/badbeastie; add net; \
   set address=192.168.1.1/24; set physical=nge0; \
   end; verify; commit; exit'
# zonecfg -z watchie 'create; set autoboot=true; \
   set zonepath=/export/zone/watchie; add net; \
   set address=192.168.1.2/24; set physical=nge0; \
   end; verify; commit; exit'
# zoneadm list -vc
  ID NAME             STATUS         PATH
   0 global           running        /
   - beastie          configured     /export/zone/badbeastie
   - watchie          configured     /export/zone/watchie

Then, fire the zoneadm command:

# zoneadm -z beastie install
A ZFS file system has been created for this zone.
Preparing to install zone <beastie>.
Creating list of files to copy from the global zone.
[...]

Instead of configure it manually at first boot, create a configuration file which will do this task automatically for you, and start the zone:

# cat << EOF > /export/zone/badbeastie/root/etc/sysidcfg
system_locale=C
timezone=Europe/Paris
terminal=vt100
security_policy=NONE
root_password=xxxxxxxxxxxxx
timeserver=localhost
name_service=NONE
nfs4_domain=dynamic
network_interface=primary {
  hostname=beastie.thilelli.net
  ip_address=192.168.1.1
  netmask=255.255.255.0
  protocol_ipv6=no
  default_route=192.168.1.254
}
EOF
#
# zoneadm -z beastie boot && zlogin -C beastie
[Connected to zone 'beastie' console]
SunOS Release 5.11 Version snv_48 64-bit
Copyright 1983-2006 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
Hostname: beastie
Loading smf(5) service descriptions: 119/119
Oct  4 04:30:22 svc.startd[3003]: svc:/system/dbus:default:
 Method "/lib/svc/method/svc-dbus start" failed with exit status 95.
Oct  4 04:30:22 svc.startd[3003]: system/dbus:default failed fatally:
 transitioned to maintenance (see 'svcs -xv' for details)
Creating new rsa public/private host key pair
Creating new dsa public/private host key pair
rebooting system due to change(s) in /etc/default/init

[NOTICE: Zone rebooting]
SunOS Release 5.11 Version snv_48 64-bit
Copyright 1983-2006 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
Hostname: beastie.thilelli.net
Oct  4 13:30:40 svc.startd[3757]: svc:/system/dbus:default:
 Method "/lib/svc/method/svc-dbus start" failed with exit status 95.
Oct  4 13:30:40 svc.startd[3757]: system/dbus:default failed fatally:
 transitioned to maintenance (see 'svcs -xv' for details)

beastie.thilelli.net console login: ~.
[Connection to zone 'beastie' console closed]

Well done. Now, change the zone path to something more appropriate. Eventually, adapt the ZFS name accordingly:

# zoneadm -z beastie halt
# zoneadm -z beastie move /export/zone/beastie
# zoneadm list -vc
  ID NAME             STATUS         PATH
   0 global           running        /
   - beastie          installed      /export/zone/beastie
   - watchie          configured     /export/zone/watchie
#
# zfs list -r pool0/zone
NAME                    USED  AVAIL  REFER  MOUNTPOINT
pool0/zone              248M   227G  24.5K  /export/zone
pool0/zone/badbeastie   248M   227G   248M  /export/zone/beastie
# zfs rename pool0/zone/badbeastie pool0/zone/beastie
# zfs list -r pool0/zone
NAME                 USED  AVAIL  REFER  MOUNTPOINT
pool0/zone           248M   227G  24.5K  /export/zone
pool0/zone/beastie   248M   227G   248M  /export/zone/beastie
#
# zoneadm -z beastie boot

Wow... Very interesting feature, isn't it?!

Now, lets try the cloning feature bundle with the new zoneadm command. Do some specific non-global zone tuning before; then do:

# zlogin beastie svcadm disable system/dbus
# zoneadm -z beastie halt
# zoneadm -z watchie clone beastie
Cloning snapshot pool0/zone/beastie@SUNWzone1
Instead of copying, a ZFS clone has been created for this zone.
# zoneadm list -vc
  ID NAME             STATUS         PATH
   0 global           running        /
   - beastie          installed      /export/zone/beastie
   - watchie          installed      /export/zone/watchie
#
# zfs list -r pool0/zone/beastie
NAME                           USED  AVAIL  REFER  MOUNTPOINT
pool0/zone/beastie             251M   227G   248M  /export/zone/beastie
pool0/zone/beastie@SUNWzone1  3.77M      -   248M  -
#
# sed -e 's/beastie/watchie/' \
   -e 's/ip_address=192.168.1.1/ip_address=192.168.1.2/'
   /export/zone/beastie/root/etc/sysidcfg > \
   /export/zone/watchie/root/etc/sysidcfg

You can now enjoy the first boot of the newly created zone:

# zoneadm -z watchie boot && zlogin -C watchie 
[Connected to zone 'watchie' console]
Hostname: watchie
Creating new rsa public/private host key pair
Creating new dsa public/private host key pair

watchie.thilelli.net console login: ~.
[Connection to zone 'watchie' console closed]

Awesome features and technologies i think! Really.

Last, note that Ben Rockwood already has a well written blog entry on this very same subject... he was the first to publish it though ;)

Note: It seems there is a little bug in the snv_48 SX:CR release which prevents the expected automatically ZFS file system creation or cloning from happening properly; the action fails with an error similar to this one:

cannot create ZFS dataset <zfs_name>: 'sharenfs' must be a string

This bug is already closed and fixed, and will be available in the next Solaris Express, see Bug ID: 6468554 for more information on this one.

Thursday 24 August 2006

ZFS on a USB Disk (removable Media)

With a relatively recent versions of Solaris or OpenSolaris (say Nevada build 36, Solaris Express 4/06 and Solaris 10 6/06 release), here is how it is possible to use ZFS as the backing file system for such a removable device, easily.

The first step is certainly to disable the vold(1M) SMF service, for the operating system not trying to mount it automatically each time this device will be plugged in:

# svcadm disable volfs

Knowing that Solaris USB driver presents any USB storage device as removable media, it can be seen using both the format command in expert mode, and the rmformat program:

# format -e < /dev/null
Searching for disks...done

AVAILABLE DISK SELECTIONS:
       0. c1d0 <DEFAULT cyl 9720 alt 2 hd 255 sec 63>
          /pci@0,0/pci-ide@7/ide@0/cmdk@0,0
       1. c2d0 <ST325082-         4ND0XKT-0001-232.89GB>
          /pci@0,0/pci-ide@7/ide@1/cmdk@0,0
       2. c3t0d0 <ST940211-5A-0000-37.26GB>
          /pci@0,0/pci108e,5347@2,1/storage@3/disk@0,0
Specify disk (enter its number):
#
# rmformat
Looking for devices...
     1. Logical Node: /dev/rdsk/c0t0d0p0
        Physical Node: /pci@0,0/pci-ide@6/ide@0/sd@0,0
        Connected Device: LITE-ON  DVD SOHD-16P9S   F3S2
        Device Type: DVD Reader
        Bus: IDE
        Size: <Unknown>
        Label: <Unknown>
        Access permissions: <Unknown>
     2. Logical Node: /dev/rdsk/c3t0d0p0
        Physical Node: /pci@0,0/pci108e,5347@2,1/storage@3/disk@0,0
        Connected Device: ST940211 5A               0000
        Device Type: Removable
        Bus: USB
        Size: 38.2 GB
        Label: <Unknown>
        Access permissions: Medium is not write protected.

Now that the device name is clearly identified, it possible to create a specialized pool:

# zpool create rmzp c3t0d0
# zpool list rmzp
NAME                    SIZE    USED   AVAIL    CAP  HEALTH     ALTROOT
rmzp                     37G   28.6M   37.0G     0%  ONLINE     -

Since the purpose of this disk is, among other things, to be a backup of my home directory, here is how to do so.

First, create the correct zfs hierarchy on the USB disk:

# zfs create rmzp/home
# zfs set compression=on rmzp/home

Take a snapshot of current home directory, send and restore it on the fly to the new pool:

# zfs snapshot datazp/home/jgabel@rmzp.0
# zfs send datazp/home/jgabel@rmzp.0 | zfs receive rmzp/home/jgabel
# zfs destroy rmzp/home/jgabel@rmzp.0
# zfs list -r rmzp
NAME                   USED  AVAIL  REFER  MOUNTPOINT
rmzp                  28.6M  36.4G  25.5K  /rmzp
rmzp/home             28.5M  36.4G  26.5K  /rmzp/home
rmzp/home/jgabel      28.5M  36.4G  28.5M  /rmzp/home/jgabel

Then, when you are ready to take it away, just export the pool as with a classical disk:

# zpool export rmzp
# zpool list rmzp
cannot open 'rmzp': no such pool
# zpool import
  pool: rmzp
    id: 1670601809438763813
 state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:

        rmzp        ONLINE
          c3t0d0    ONLINE

Wow! What an easy and powerful way to do backups on removable media, isn't it?

Since there is new ZFS porting work on the way (e.g. ZFS on FUSE/Linux and ZFS on FreeBSD), we can expect to share this kind of devices between Unix-like OSes very quickly.