blog'o thnet

To content | To menu | To search

Sunday 26 October 2008

Solaris vs. RHEL Costs And Features Comparisons

Clearly, the costs involved in running Solaris and RHEL platforms are not well understood, and generally favors GNU/Linux environments. This is (most of the time) untrue, since this tend to be based on personal user experience, which is in fact far different from running lots of systems in high demand production data centers.

Here some interesting readings on these subjects--costs and features analysis--from:

YMMV for sure, but I personally think that Solaris costs are overestimated, and its features are mostly unknown, or at least underused... but this is a very large and hot topic nowadays, I know.

Update #1 (2008-11-14): Go to read interesting comment update from Jim Laurent.

Update #2 (2008-12-02): Go to read Joerg Moellenkamp's entry about similar points.

Update #3 (2008-12-03): Go to read this article appearing in Computerworld.com.

Saturday 18 October 2008

Discrepancies Between df And du Outputs

As a SA, it not uncommon to have regularly requests about big differences between the du and df outputs on a UFS file system. (For ZFS specific considerations, please see the ZFS FAQ.)

The du utility reports the sum of space allocated to all files in the file hierarchy rooted in the directory plus the space allocated to the directory itself. The df utility reports the amount of disk space occupied by a mounted file system.

When a file is remove from the file system, i.e. is unlinked (the hard link count goes to zero), the space belonging to this file is accounted against the du tool, but is not visible to the df utility until all references to it (open file descriptors) are closed. In order to find the guilty process, one can follow the information found in the SunManagers Frequently Asked Questions. Here is an example of such finding, but using a slightly different method to get the process currently holding the open descriptor to the deleted file.

Find the file which has been unlinked through the procfs interface:

# find /proc/*/fd \( -type f -a ! -size 0 -a -links 0 \) -print | xargs \ls -li
 415975 --w-------   0 user  group  2125803025 Oct 15 23:59 /proc/1252/fd/3

Eventually, get more detail about it:

# pargs -c 1252
1252:   rvd.basic -reliability 5 -listen tcp:9876 -logfile /path/to/log/rvd_9876.l
argv[0]: rvd.basic
argv[1]: -reliability
argv[2]: 5
argv[3]: -listen
argv[4]: tcp:9876
argv[5]: -logfile
argv[6]: /path/to/log/rvd_9876.log

Check to see if you can understand what is the content of the unlinked file:

# tail /proc/1252/fd/3
-------------------------------------------------------------------------------
2008-10-15 23:59:32.002116 - [MSG] BBG_Transmitter_class.cc, line 792 (thread 25087:4)
[4060] Sent a heartbeat
-------------------------------------------------------------------------------
BBG_Transmitter_class.cc: [4111] No activity detected. Send a Heartbeat message
-------------------------------------------------------------------------------
2008-10-15 23:59:32.134829 - [MSG] BBG_Transmitter_class.cc, line 1138 (thread 25087:4)
[4065] Heartbeat acknowledged by Bloomberg

You can correlate the size of the removed, but always referenced, file to the space accounted from the du and df tools:

# df -k /path/to
Filesystem            kbytes    used   avail capacity  Mounted on
/dev/md/dsk/d5       6017990 5874592   83219    99%    /path/to
# du -sk /path/to
3791632 /data
# echo "(5874616-3791632)*1024" | bc
2132975616

So, we now found the ~2GB log file which was always opened (used) by a process. Now, there are two solutions to be able to get back the freed space:

  1. Truncate the unlinked file (quick workaround).
  2. Simply restart properly the corresponding program (better option).

Use the solution which fits the best your need in your environment.

Saturday 11 October 2008

Anatomy Of An Attack

Well, although I didn't generally give credibility by speaking about public FUD, I will just let you know about really great, and point by point explanations on the recent InfoWorld (and New York Times) publication from Paul Krill Is Sun Solaris on its deathbed?

As Jim Grisanzio stated recently on the advocacy-discuss mailing list:

[...] More importantly, though, is that the original article not only fell flat but it was actually aggressively rejected by many in the open source community. That's an interesting shift out there. And a good one, too.

Well, I can't agree more with Jim on his points.

Update #1 (2008-10-28): Don't forget to read the interesting inputs from Joerg Moellenkamp.

Monday 6 October 2008

Fake The hostid Of A Solaris Zone, Updated

As a little follow-up to Fake The hostid Of A Solaris Zone, and regarding the discussion on the capacity to change the hostid of a Solaris non-global zone, it is interesting to mention these (updated) informations:

  1. The LD_PRELOAD trick proposed before is not a proper option, and is really ugly (and intrusive if you didn't unset it before continuing the execution of a program).
  2. When using Solaris 8 or Solaris 9 Containers, there is a feature called Host ID Emulation from the zonecfg utility which can do exactly that.
  3. Before the introduction of the privileges in a non-global zone with Solaris 11/06 (a.k.a. Solaris Update 3), you must run the DTrace zhostid script (daemon) within the global zone. It is not mandatory to run it from the global zone anymore. Using the appropriate dtrace_user privilege only, you can run it directly from the non-global zone:
    # zonecfg -z ngzone set limitpriv=default,dtrace_user
    # zoneadm -z ngzone boot
    # zlogin ngzone
    [Connected to zone 'ngzone' pts/5]
    Last login: Sat Oct  4 18:57:17 on pts/5
    Sun Microsystems Inc.   SunOS 5.11      snv_99  November 2008
    # /sbin/zonename 
    ngzone
    # /usr/bin/hostid
    837d47dd
    # ./zhostid &
    [1] 21506
    # /usr/bin/hostid
    20a82f32
    # ^D
    [Connection to zone 'ngzone' pts/5 closed]
    

Monday 29 September 2008

Sun's Free Proficiency Assessment System

Free. Online. Right now. As Peter Tribble mentioned in his blog, it is clearly not perfect and some questions (and answers) seems a little obscure sometimes. Nonetheless, as I have never had any (yes, any) formal training, I though interesting to try these tests, and here are the results for pre-assessment for:

  1. UNIX Essentials Featuring the Solaris 10 OS, I scored 37 out of 42.
  2. Sun Certified System Administrator for the Solaris 10 OS (Part 1), I scored 45 out of 48.
  3. Certified System Administrator for the Solaris 10 OS (Part 2), I scored 45 out of 48.

Maybe, is it time to pass to official Solaris Operating System certifications? ;)

  • Sun Certified Solaris Associate (SCSAS)
  • Sun Certified System Administrator (SCSA)
  • Sun Certified Network Administrator (SCNA)
  • Sun Certified Security Administrator (SCSECA)

Well... if time permit!

Monday 22 September 2008

About GNU/Linux Software Mirroring And LVM

Here, the final aim was to provide data access redundancy through SAN storage hosted on remote sites across Wide Area Network (WAN) links. After some relatively long and painful tries to mimic software mirroring as found on HP-UX platform using Logical Volume Management (LVM), i.e. at the logical volume level, I finally give up deciding this functionality will definitely not fit my need. Why? Here are my comments.

  1. It is not possible to provide clear and manageable storage multipath when the need to distinguish between the multiple sites is important, ala mirror across controllers found on Veritas VxVM on Sun Solaris system, for example. So, managing many physical volumes along with lots of logical volumes is very complicated.
  2. There is no exact mapping capability between logical volume storage on a given physical volume.
  3. The need to have a disk-based log, i.e. a persistent log. Yes, one can always provide the option --corelog at the creation time to the logical volume initial build and have an in-memory log , i.e. a non-persistent log, but this requires the entire copies (mirrors) be resynchronized upon reboot. Not really viable on multi-TB environments.
  4. A write-intensive workload on a file system living on a logical volume mirror will suffer high latency: the overhead is important, and the time to do mostly-write jobs grow dramatically. It is really hard to get high level statistics, only low level metrics seems consistent: sd SCSI devices and dm- device mapper components for each paths entries. Not from the multipath devices standpoint, which is the more interesting from the end user and SA point of view.
  5. You can't extend a logical volume, which is really a no-go per-se. On that point, the Red Hat support answered that this functionality may be added in a future release, the current state may eventually be a Request For Enhancement (RFE), if a proper business justification is provided. One must break the logical volume mirror copy, then rebuild it completely. Not realistic when the logical volume is made of a lot of physical extents across multiple physical volumes.
  6. A LVM configuration can be totally blocked by itself, and not usable at all. The fact is, LVM use persistent storage blocks to keep track of its own metadata. The metadata size is set at physical volume creation time only, and can't be change afterward. This size is statically defined as 255 physical volume blocks, and can be adjust from the LVM configuration file. The problem is, when this circular buffer space (stored in ASCII) fills up--such as when there are a lot of logical volumes in a mirrored environment--it is not possible to do anything more with LVM. So you can't add more logical volume, can't add more logical volume copies,... and can't delete them trying to reestablish a proper LVM configuration. Well, here are the answers given by the Red Hat support to two keys questions in this situation:
    • How to size the metadata, i.e. if we need to change it from the default value, how can we determine the new proper and appropriate size, and from which information?
      I am afraid but Metadata size can only be defined at the time of PV creation and there is no real formula for calculating the size in advance. By changing the default value of 255 you can get a higher space value. For general LVM setup (with less LV's and VG's) default size works fine however in cases where high number of LV's are required a custom value will be required.
    • We just want to delete all LV copies, which means to return to the initial situation and have 0 copy for all LV, i.e. only one LV per-se, in order to be able to change LVM configuration again (we can't do anything on our production server right now)?
      I discussed this situation with peer engineers and also referenced a similar previous case. From the notes of the same the workaround is to use the backup file (/etc/lvm/backup) and restore the PV's. I agree that this really not a production environment method however seems the only workaround.

So, the production RDBMS Oracle server is finally now being evacuate to an other machine. Hum... Hope to see better enterprise experience using the mdadm package to handle RAID software, instead of mirror (RAID-1) LVM. Maybe more about that in an other blog entry?

Tuesday 16 September 2008

Announcing Solaris 10 10/08

Although this seems a little bit confident, the long-awaited Update 6 to the Solaris 10 operating system release is just behind the door. This release (scheduled to be available in mid-October) will includes virtualization enhancements including the ability for a Solaris Container to automatically update its environment when moved from one system to another, Logical Domains support for dynamically reconfigurable disk and network I/O, and paravirtualization support when Solaris 10 is used as a guest OS in Xen-based environments such as Sun xVM Server. Solaris 10 10/08 also includes support for the latest systems from Sun and other vendors, such as those based on the Intel Xeon Processor 7400 Series.

This release will be the very first Solaris release to be able to boot from ZFS and use it as their root file system, such as what can be found on OpenSolaris or Solaris Express Community Release today.

Check the What’s New web page for Solaris, and consult the Solaris Media Gallery videos for more information.

Update #1 (2008-10-14): Don't forget to consult the must read What's New in Solaris 10 10/08? from the San Antonio OpenSolaris User Group.

Update #2 (2008-10-31): Get yours, and go reading the official What's New in the Solaris 10 10/08 Release page.

Friday 16 May 2008

Comparison: EMC PowerPath vs. GNU/Linux dm-multipath

I will present some notes about the use of multipath solutions on Red Hat systems: EMC PowerPath and GNU/Linux dm-multipath. Along those notes, keep in mind that they were based on tests done when pressure was very high to put new systems in production, so lack of time resulted in less complete tests than expected. These tests were done more than a year ago, and so before the release of RHEL4 Update 5 and some of RHBA related to both LVM and dm-multipath technologies.

Keep in mind that without purchasing an appropriate EMC license, PowerPath can only be used in failover mode (active-passive mode). Multiple paths accesses are not supported in this case: no round-robin, and no I/O load balancer for example.

EMC PowerPath

Advantages

  1. Not specific to the SAN Host Bus Adapter (HBA).
  2. Support for multiple and heterogeneous SAN storage provider.
  3. Support for most UNIX and Unix-like platforms.
  4. Without a valid license, can only work in degraded mode (failover).
  5. Is not sensible to a change in the SCSI LUN renumbering. Adapt accordingly the corresponding multiple sd devices (different paths to a given device) with its multipath definition of the emcpower device.
  6. Provide easily the ID of the SAN storage.

Drawbacks

  1. Not integrated with the operating system (which generally has its own solution).
  2. The need to force a RPM re-installation in case of a kernel upgrade on RHEL systems (due to the fact that kernel modules are stored in a path containing the exact major and minor versions of the installed (booted) kernel.
  3. Non-automatic update procedure.

GNU/Linux device-mapper-multipath

Advantages

  1. Not specific to the SAN Host Bus Adapter (HBA).
  2. Support for multiple and heterogeneous SAN storage provider.
  3. Well integrated with the operating system.
  4. Automatic update using RHN (you must be a licensed and registered user in this case).
  5. No additional license cost.

Drawbacks

  1. Only available on GNU/Linux systems.
  2. Configuration (files and keywords) very tedious and difficult.
  3. Without the use of LVM (Logical Volume Management), it has not the ability to follow SCSI LUN renumbering! Even in this case, be sure not to have blacklisted the newly discovered SCSI devices (sd).

Last, please find some interesting documentation on the subject below:

Friday 2 May 2008

PHP APC Extension Bug With Optimized Open Source Software Stack

To easily manage LDAP accounts (and general LDAP entries in fact), we have created a Solaris Zone and installed the excellent Cool Stack bundle to host the LAM (LDAP Account Manager) management web tool. But after upgrading the Cool Stack to version 1.2 we encountered a very annoying problem mostly with freezing web pages, and generally ending up in restarting the Apache web server provided by the Cool Stack. After some troubleshooting, we discover that this behavior was introduced by a bug in the APC-3.0.14 module bundled with the updated php-5.2.4 scripting software in this version of the Cool Stack.

Luckily, the bug was already fixed and a new version of the APC extension of PHP is available for download (in fact, just replace to original apc.so module by the new one). All the Cool Stack related problems, associated fixes and instructions are listed on the Cool Stack 1.2 Patches page: be sure to keep in sync' if you are a Cool Stack consumer.

Monday 28 April 2008

memconf And AMD Athlon 64 X2 Dual Core Processor

The last update to the excellent memconf utility (V2.5 22-Feb-2008) support properly recent Solaris Express releases, and my recent change from the stock AMD Opteron Processor 148 to an AMD Athlon 64 X2 Dual Core Processor 3800+. (I mostly did that change just to be able to access two run queues separately, not to gain more power per se.)

So, here is the new and appropriate memconf report:

# memconf -d
memconf:  V2.5 22-Feb-2008 http://www.4schmidts.com/unix.html
hostname: unic
manufacturer: Sun Microsystems, Inc.
model:    Sun Ultra 20 Workstation (AMD Athlon(tm) 64 X2 Dual Core \
 Processor 3800+ Socket 939 2010MHz)
Sun Family Part Number: A63
Solaris Express Community Edition snv_87 X86, 64-bit kernel, SunOS 5.11
1 AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ Socket 939 2010MHz cpu
diagbanner = Sun Ultra 20 Workstation
cpubanner = AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ Socket 939 2010MHz
model = Sun Ultra 20 Workstation
machine = i86pc
platform = i86pc
perl version: 5.008004
CPU Units:
==== Processor Sockets ====================================
Version                          Location Tag
-------------------------------- --------------------------
AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ Socket 939
AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ Socket 939
Memory Units:
Type    Status Set Device Locator      Bank Locator
------- ------ --- ------------------- --------------------
unknown in use 0   A0                  Bank0/1
unknown in use 0   A1                  Bank2/3
unknown in use 0   A2                  Bank4/5
unknown in use 0   A3                  Bank6/7
total memory = 2048MB (2GB)

You can check and compare with the previous report on my blog.

- page 2 of 15 -