blog'o thnet

To content | To menu | To search

Sunday 28 February 2010

Oracle On The Future Of OpenSolaris, Finally

After the official and long-awaited public information from Oracle on the merge with Sun Microsystems, some great news came on both hardware and software portfolios, in particular the x86 ans SPARC ecosystems, and around the Solaris operating system.

The main unknown was about the OpenSolaris community, and distribution model. And until very recently, some important voices around the community stayed without answer, in particular from Ben Rockwood, or Peter Tribble.

Well, until recently. In fact, the OpenSolaris Annual Meeting (held on IRC through the #opensolaris-meeting canal last 26 February) brought some answers very shortly, which currently begin to spread through the community. I hope this will quiet some recent misunderstanding on the support model of the OpenSolaris distribution.

Tuesday 1 December 2009

Oracle Commitment To Sun Technologies

Here are more information about Oracle commitment to Sun business, and the Solaris operating system in particular:

Lastly, Oracle and Sun Overview and FAQ for customers and partners is a must read for all persons interested in the future and Oracle's investment in all the current technologies from Sun Microsystems.

Thursday 26 November 2009

Oracle Database 11g Release 2 For Solaris

As a follow-up to the preceding entry on the upcoming availability on the Oracle Database 11g Release 2 on both the Solaris SPARC and x86 releases, we can see that this is a reality as of today. Both architectures are readily available for download.

As a system administrator I think this very interesting and encouraging, not only because of the availability of one of the more robust RDBMS system on Solaris platforms, but because this is some actions taken after words from Oracle which seems to fit together. And so, the interest in Solaris as an OS of choice is more reinforced now.

Wednesday 7 October 2009

Upcoming Oracle RDBMS And Solaris News

Following the recent news about the future of Sun from Larry Ellison itself, we now can hope about more things to come on Solaris, both on SPARC and x86 platforms. In particular, this quote is particularly encouraging:

We're a big supporter of Linux, but the fact is that Solaris just a much more mature OS, its just a fact. We became a big supporter of Linux years ago because it ran on smaller and cheaper X86 processors and Solaris did not, we had no choice. [...] So we are a supporter of Linux, but Solaris is a more mature operating system designed for bigger systems. We support both.

In the very same vein, I just heard from two different sources these upcoming changes from Oracle:

  1. The just release Oracle Database 11g Release 2, currently available only for Linux (and released on 1 September 2009), will be available soon--one to two months--for both Solaris SPARC and x86, at the same time.
  2. Secondly, Solaris x86 will be raised to Tier 2 platform from Tier 3 currently.

Well, pretty good news in fact! Seems that Solaris will be a serious and growing competitor in the (near) future!

Sunday 22 March 2009

Finding The Process Responsible For Crashing A System

Recently, we encounter a wave of suicide on most of the nodes which formed some Oracle RAC cluster on lots of Sun M5000 domains platforms. Although the logs found on Oracle RAC were interesting, they didn't help us to determine precisely the origin of the crashes. Since the domains panic'ed, we were able to briefly analyze the cores generated at crash time to get the process which initiated the panics. Here is how to do so.

First, be sure to have proper and usable core on persistent storage:

# cd /var/crash/nodename
# file *.0
unix.0:         ELF 64-bit MSB executable SPARCV9 Version 1, UltraSPARC1 Extensions Required, statically linked, not stripped, no debugging information available
vmcore.0:       SunOS 5.10 Generic_127111-11 64-bit SPARC crash dump from 'nodename'

Then, extract useful information using MDB dcmds such as ::status, ::showrev and ::panicinfo which give us the exact panic message and provide us the message and thread responsible for the system crash:

# mdb -k unix.0 vmcore.0
Loading modules: [ unix krtld genunix specfs dtrace ufs sd mpt px ssd fcp fctl md ip qlc hook neti sctp arp usba nca zfs random logindmux ptm cpc sppp crypto wrsmd fcip nfs ipc ]
> ::status
debugging crash dump vmcore.0 (64-bit) from nodename
operating system: 5.10 Generic_127111-11 (sun4u)
panic message: forced crash dump initiated at user request
dump content: kernel pages only
> ::showrev
Hostname: nodename
Release: 5.10
Kernel architecture: sun4u
Application architecture: sparcv9
Kernel version: SunOS 5.10 sun4u Generic_127111-11
Platform: SUNW,SPARC-Enterprise
> ::panicinfo
             cpu                0
          thread      300171c7300
         message forced crash dump initiated at user request
          tstate       4400001606
              g1                b
              g2                0
              g3          11c13e0
              g4              6e0
              g5         88000000
              g6                0
              g7      300171c7300
              o0          1208020
              o1      2a10176b9e8
              o2                1
              o3                0
              o4 fffffffffffffff5
              o5             1000
              o6      2a10176b0b1
              o7          10626a4
              pc          1044d8c
             npc          1044d90
               y                0

Well. Now what we have the exact thread number (thread ID), we can find the corresponding UNIX process helped by the following script:

# cat /var/tmp/findstack.vmcore.sh
#!/usr/bin/env sh

echo "::ps" | mdb -k unix.0 vmcore.0 | \
 nawk '$8 !~ /ADDR/ {print $8" "$NF}' > /tmp/.core.$$

cat /dev/null > /tmp/core.$$

while read ps; do
  echo "process name: `echo ${ps} | nawk '{print $2}'`" >> /tmp/core.$$
  echo ${ps} | nawk '{print $1"::walk thread | ::findstack"}' | \
   mdb unix.0 vmcore.0 >> /tmp/core.$$
  echo >> /tmp/core.$$
done < /tmp/.core.$$

\rm /tmp/.core.$$

exit 0

Now, just find the lines for the guilty process in the output file. In our case, it is the oprocd.bin process:

# vi /tmp/core.*
[...]
process name: oprocd.bin
stack pointer for thread 300171c7300: 2a10176b0b1
  000002a10176b161 kadmin+0x4a4()
  000002a10176b221 uadmin+0x11c()
  000002a10176b2e1 syscall_trap+0xac()
[...]

This process is locked in memory to monitor the cluster and provide I/O fencing. oprocd.bin performs its check, stops running, and if the wake up is beyond the expected time, then it resets the processor and reboots the node. An oprocd.bin failure results in Oracle Clusterware restarting the node. Please read the Oracle Clusterware and Oracle Real Application Clusters documentation for more information.

Although the incident is always under investigation, it seems the nodes were impacted by the additional second that was added at the end of 2008...

Wednesday 25 February 2009

Problem Querying ru. Name Servers

In a previous position, I encounter a rather strange problem I would like to share here. It has to do with DNS resolution. From the internal network of a company it was not possible to get the IP address of the www.banks2ifrs.ru. host name.

We determined that the name servers which manage the banks2ifrs.ru. domain are ns3.nic.ru. and ns4.nic.ru.. In the same time, we saw that for the fbk.ru. domain name, the managing name servers are ns3.nic.ru. and gw.fbk.ru.. Interestingly, it was possible to resolve www.fbk.ru. using gw.fbk.ru., but not from ns3.nic.ru.. More, we noted that the reverse resolution for the name servers ns3.nic.ru. and ns4.nic.ru. are not correct, translated to ns3.ripn.net. and ns4.ripn.net., respectively. So, it is worth to mention that nothing was wrong when querying the name server gw.fbk.ru..

# host -t a ns3.nic.ru.
ns3.nic.ru has address 194.85.61.20
# host -t a ns4.nic.ru.
ns4.nic.ru has address 194.226.96.8
# host -t ptr 194.85.61.20
20.61.85.194.in-addr.arpa domain name pointer ns3.ripn.net.
# host -t ptr 194.226.96.8
8.96.226.194.in-addr.arpa domain name pointer ns4.ripn.net.

The error from the DNS query seems to be related to an incomplete answer from the server (the truncated flag was set to 1 in the network trace) when the query is made over UDP. In this case, an automatic fallback over TCP must be used, certainly prohibited from the company's network security policy. This may say that the answer is larger than 512 bytes long, too. So, we tried to advertise different sizes of the UDP message buffer, but without being confident that this message went through network devices properly. Nonetheless it would seem curious to get an answer larger that 120 bytes long.

Last, we can note that the complexity of the network layout (DMZ, firewalls, NAT, etc.) may badly interact and hamper DNS queries, at least in certain circumstances.

After more investigation from the network team, they decided to permit TCP DNS queries. And it worked. It worked letting the internal DNS servers doing their job themselves...

# dig +trace -t a www.banks2ifrs.ru.
; <<>> DiG 9.3.4-P1 <<>> +trace www.banks2ifrs.ru.
;; global options:  printcmd
.                       449798  IN   NS   L.ROOT-SERVERS.NET.
.                       449798  IN   NS   M.ROOT-SERVERS.NET.
.                       449798  IN   NS   A.ROOT-SERVERS.NET.
.                       449798  IN   NS   B.ROOT-SERVERS.NET.
.                       449798  IN   NS   C.ROOT-SERVERS.NET.
.                       449798  IN   NS   D.ROOT-SERVERS.NET.
.                       449798  IN   NS   E.ROOT-SERVERS.NET.
.                       449798  IN   NS   F.ROOT-SERVERS.NET.
.                       449798  IN   NS   G.ROOT-SERVERS.NET.
.                       449798  IN   NS   H.ROOT-SERVERS.NET.
.                       449798  IN   NS   I.ROOT-SERVERS.NET.
.                       449798  IN   NS   J.ROOT-SERVERS.NET.
.                       449798  IN   NS   K.ROOT-SERVERS.NET.
;; Received 512 bytes from 127.0.0.1#53(127.0.0.1) in 0 ms

ru.                   172800  IN   NS   ns.ripn.net.
ru.                   172800  IN   NS   ns2.nic.fr.
ru.                   172800  IN   NS   ns2.ripn.net.
ru.                   172800  IN   NS   ns5.msk-ix.net.
ru.                   172800  IN   NS   ns9.ripn.net.
ru.                   172800  IN   NS   sunic.sunet.se.
;; Received 297 bytes from 199.7.83.42#53(L.ROOT-SERVERS.NET) in 125 ms

banks2ifrs.ru.   345600  IN   NS   ns4.nic.ru.
banks2ifrs.ru.   345600  IN   NS   ns3.nic.ru.
;; Received 107 bytes from 194.85.105.17#53(ns.ripn.net) in 66 ms

www.banks2ifrs.ru.   86400   IN   A     83.222.6.194
banks2ifrs.ru.            86400   IN   NS   ns4.nic.ru.
banks2ifrs.ru.            86400   IN   NS   ns3.nic.ru.
;; Received 91 bytes from 194.226.96.8#53(ns4.nic.ru) in 65 ms

... and it worked when querying directly the name servers responsible for the wanted domain:

# dig @ns4.nic.ru. -t a www.banks2ifrs.ru.
; <<>> DiG 9.3.4-P1 <<>> @ns4.nic.ru. -t a www.banks2ifrs.ru.
; (1 server found)
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 1530
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 0

;; QUESTION SECTION:
;www.banks2ifrs.ru.             IN   A

;; ANSWER SECTION:
www.banks2ifrs.ru.      86400   IN   A   83.222.6.194

;; AUTHORITY SECTION:
banks2ifrs.ru.          86400   IN   NS   ns4.nic.ru.
banks2ifrs.ru.          86400   IN   NS   ns3.nic.ru.

;; Query time: 65 msec
;; SERVER: 194.226.96.8#53(194.226.96.8)
;; WHEN: Tue Apr 15 21:05:12 2008
;; MSG SIZE  rcvd: 91

Note: The size of the answer is 91 bytes long, so nothing wrong from this side.

I think we will never know what was going wrong here, even if the heart of the problem seems related specifically only to the two same name servers.

Monday 5 January 2009

Quick Reference Guides For Network Technologies

I am always looking for some good IT cheat sheets, and for a lots of thing. Here is one of my favorite web site around network technologies. Interestingly, this site propose some regularly updated quick reference guides for the network, and are classified under the following categories (at the time of this writing):

  • Protocols
  • Applications
  • Reference
  • Syntax
  • Technologies
  • Miscellaneous

You can find them all at the Cheat Sheets Library section.

As the author wrote on his blog, if you notice an error or would like to see a new cheat sheet on a specific topic, don't hesitate and drop him a line.

Thursday 25 December 2008

More News To Come About Shrinking A zpool

As a little update to an older post on this subject, and although this post from Matthew Ahrens is about the new scrub code recently introduced in OpenSolaris build 94--and was in fact a priority before the launch of the Sun Storage 7000 Unified Storage Systems (a.k.a. Amber Road)--it is interesting to note that some of the new code will be usable to remove a disk from a ZFS pool.

As Matthew wrote:

This work lays a bunch of infrastructure that will be used by the upcoming device removal feature.

Wednesday 3 December 2008

GRUB Boot Archive With SVM, A Better Approach

In a previous discussion about the GRUB boot archive and how it can be regenerated in Failsafe mode, I mentioned that it will not be as easy as it can be when the root file system use the md driver. I previously show a method to do this which necessitate to unmirror one or more file systems when the root file system is build upon a SVM mirror. This was not very optimal since a lot of of manipulations are involved, which may lead to human error(s), and may seems to be a little complicated.

This method was build on Performing System Recovery from the Solaris Volume Manager official documentation, which show up last month on the Sun-Managers mailing list.

Note: Although this test case was done using Solaris 10 10/08 under a virtual machine build upon VirtualBox on latest OpenSolaris release, the instructions must be valid for Solaris 10 1/06 and later.

Initial setup

As we saw before, the system use only a root file system, and a swap device. Both are encapsulated with SVM:

# df -k -F ufs
Filesystem     kbytes    used   avail capacity  Mounted on
/dev/md/dsk/d0 6147798 3455578 2630743      57%  /
# swap -l
swapfile             dev  swaplo blocks   free
/dev/md/dsk/d1      85,1       8 4194288 4194288
# metastat -c d0 d1
d0               m  6.0GB d10 d20
    d10          s  6.0GB c0d0s0
    d20          s  6.0GB c1d1s0
d1               m  2.0GB d11 d21
    d11          s  2.0GB c0d0s1
    d21          s  2.0GB c1d1s1

Regenerate the GRUB boot archive

The idea is to boot on the GRUB Failsafe mode, get the md configuration from local root file system, and load manually the md module, hence properly configured. The main advantage is to be fully self hosted from the Failsafe mode, and not have to manipulate SVM more than necessary, especially when breaking the mirror, loosing redundancy for a time.

[...]
Booting to milestone "milestone/single-user:default".
Configuring devices.
Searching for installed OS instances...
/dev/dsk/c0d0s0 is under md control, skipping.
/dev/dsk/c1d1s0 is under md control, skipping.
No installed OS instance found.

Starting shell.
# mount -F ufs -o ro /dev/dsk/c0d0s0 /a
# cp -p /a/kernel/drv/md.conf /kernel/drv
# umount /a
# update_drv -f md
devfsadm: mkdir failed for /dev 0x1ed: Read-only file system
# metainit -r
# metasync d0
# fsck /dev/md/rdsk/d0
# mount -F ufs /dev/md/dsk/d0 /a
# bootadm update-archive -R /a
# umount /a
# reboot

Really interesting!

Saturday 1 November 2008

System V IPC Now Managed By Resource Controls

When it comes to Solaris 10, all IPC facilities are either automatically configured or can be controlled by resource controls. In the same time, they get new default values, when applicable.

As an example, we will assume that we need to change the limit on number of shared memory segments that can be created, and that the new default (128) is not enough either. Before Solaris 10, you've had to set the shmsys:shminfo_shmmni tunable parameter in the /etc/system kernel configuration file, which is a system wide limit, and required a reboot. This parameter is now marked as Obsolete or Have Been Removed, and its use is clearly deprecated.

To increase the corresponding limit up to 256 shared memory segments, we now have to deal with the project.max-shm-ids resource control which is controlled at the project level. The idea is to set the appropriate resource control to a project, then execute a program in the context of this project. One method to achieve this is to create a project at one side (using the project(4) database), and to populate the extended user attributes to do the association between this project and a user account (using the user_attr(4) database) in order to put the new project as the default project for the user. Or it is possible not to create an extended user attribute with this project at all, but use its characteristics explicitly through the newtask(1) command (and the login(1), cron(1M), and su(1M) programs, or the setproject(3PROJECT) function). But the simplest method, and the less intrusive one, is certainly to directly put the project as the default one for a user account. Here is how to do so.

By default, no error message is logged against the syslog daemon for resource controls. To be able to see an appropriate message in the messages log file, you must first enable globally the syslog action for the wanted resource control (the default level is notice).

# rctladm -e syslog project.max-shm-ids
# rctladm -l project.max-shm-ids
project.max-shm-ids   syslog=notice   [ no-basic deny count ]

When the limit on the number of shared memory segments is reached, one message similar to the following is write to the log file:

# grep rctl /var/adm/messages
/var/adm/messages:Oct 21 16:47:29 hostname genunix: [ID 883052 kern.notice] privileged rctl project.max-shm-ids (value 128) exceeded by project 3

Here is the definition of the new project, and its configuration.

# getent project user.username
user.username:1000:Project To Increase The Limit Of SHM Segments:::project.max-shm-ids=(priv,256,deny)
#
# projects -l user.username
user.username
      projid : 1000
      comment: "Project To Increase The Limit Of SHM Segments"
      users  : (none)
      groups : (none)
      attribs: project.max-shm-ids=(priv,256,deny)

When a project name begin with the pattern user., the project will automatically be set as the default one for the corresponding user, without the need to populate the extended user attributes database. Check that the project is set as the default project for the account username.

# id -p username
uid=100(username) gid=100(groupname) projid=1000(user.username)
#
# projects -d username
user.username

After a login phase using the username identity, the programs progname is launched. We can confirm the use of shared memory segments under the context of the project user.username, and we can consult the programs statistics report.

# ipcs -mJ
IPC status from  as of Wed Oct 29 11:39:59 CET 2008
T         ID KEY        MODE    OWNER     GROUP       PROJECT
Shared Memory:
m 1409286255   0 --rw-rw-rw- username groupname user.username
m  469762152   0 --rw-rw-rw- username groupname user.username
m         56   0 --rw-rw-rw- username groupname user.username
#
# prstat -n5 -cJ
   PID USERNAME  SIZE   RSS STATE PRI NICE    TIME  CPU PROCESS/NLWP
  3704 username  373M  284M cpu24   2   10 0:07:37 2.1% progname/26
  6785 username  285M  196M sleep  29   10 0:04:13 1.1% progname/26
  4480 username  785M  697M sleep  29   10 0:11:40 1.1% progname/26
  5836 username  293M  204M sleep  29   10 0:06:31 1.0% progname/26
  7635 username  277M  188M sleep  29   10 0:01:00 0.9% progname/26
PROJID    NPROC  SWAP   RSS MEMORY      TIME  CPU PROJECT
  1000       26 6472M 6333M    26%   3:57:24  23% user.username
     1       17   41M   87M   0.4%   2:39:58 0.0% user.root
     0       43  184M  267M   1.1%   4:07:25 0.0% system
     3        4 5856K   11M   0.0%   0:00:00 0.0% default
Total: 90 processes, 916 lwps, load averages: 4.41, 2.36, 1.04

Last, we can verify the new setting for one progname instance. For example for PID 3704:

# prctl -n project.max-shm-ids 3704
process: 3704: bin/progname 54 80 -Xmx192m
NAME    PRIVILEGE       VALUE    FLAG   ACTION      RECIPIENT
project.max-shm-ids
        privileged        256       -   deny                -
        system          16.8M     max   deny                -

The resource management facility can do much more than just tuning IPC settings, such as managing CPU usage, and physical memory control. It is a more fine-grained facility than what is in place before Solaris 10, and did not required a reboot anymore.

As a last word, we can note that there are command line tools to help creating and managing projects and extended user attributes for locally stored databases: respectively projadd(1M), projmod(1M), and useradd(1M), usermod(1M). But since the information sources was hosted in NIS and LDAP network directories, we did not use them for this test case though.

- page 1 of 15