blog'o thnet

To content | To menu | To search

Tag - Oracle

Entries feed - Comments feed

Sunday 28 February 2010

Oracle On The Future Of OpenSolaris, Finally

After the official and long-awaited public information from Oracle on the merge with Sun Microsystems, some great news came on both hardware and software portfolios, in particular the x86 ans SPARC ecosystems, and around the Solaris operating system.

The main unknown was about the OpenSolaris community, and distribution model. And until very recently, some important voices around the community stayed without answer, in particular from Ben Rockwood, or Peter Tribble.

Well, until recently. In fact, the OpenSolaris Annual Meeting (held on IRC through the #opensolaris-meeting canal last 26 February) brought some answers very shortly, which currently begin to spread through the community. I hope this will quiet some recent misunderstanding on the support model of the OpenSolaris distribution.

Tuesday 1 December 2009

Oracle Commitment To Sun Technologies

Here are more information about Oracle commitment to Sun business, and the Solaris operating system in particular:

Lastly, Oracle and Sun Overview and FAQ for customers and partners is a must read for all persons interested in the future and Oracle's investment in all the current technologies from Sun Microsystems.

Thursday 26 November 2009

Oracle Database 11g Release 2 For Solaris

As a follow-up to the preceding entry on the upcoming availability on the Oracle Database 11g Release 2 on both the Solaris SPARC and x86 releases, we can see that this is a reality as of today. Both architectures are readily available for download.

As a system administrator I think this very interesting and encouraging, not only because of the availability of one of the more robust RDBMS system on Solaris platforms, but because this is some actions taken after words from Oracle which seems to fit together. And so, the interest in Solaris as an OS of choice is more reinforced now.

Wednesday 7 October 2009

Upcoming Oracle RDBMS And Solaris News

Following the recent news about the future of Sun from Larry Ellison itself, we now can hope about more things to come on Solaris, both on SPARC and x86 platforms. In particular, this quote is particularly encouraging:

We're a big supporter of Linux, but the fact is that Solaris just a much more mature OS, its just a fact. We became a big supporter of Linux years ago because it ran on smaller and cheaper X86 processors and Solaris did not, we had no choice. [...] So we are a supporter of Linux, but Solaris is a more mature operating system designed for bigger systems. We support both.

In the very same vein, I just heard from two different sources these upcoming changes from Oracle:

  1. The just release Oracle Database 11g Release 2, currently available only for Linux (and released on 1 September 2009), will be available soon--one to two months--for both Solaris SPARC and x86, at the same time.
  2. Secondly, Solaris x86 will be raised to Tier 2 platform from Tier 3 currently.

Well, pretty good news in fact! Seems that Solaris will be a serious and growing competitor in the (near) future!

Sunday 22 March 2009

Finding The Process Responsible For Crashing A System

Recently, we encounter a wave of suicide on most of the nodes which formed some Oracle RAC cluster on lots of Sun M5000 domains platforms. Although the logs found on Oracle RAC were interesting, they didn't help us to determine precisely the origin of the crashes. Since the domains panic'ed, we were able to briefly analyze the cores generated at crash time to get the process which initiated the panics. Here is how to do so.

First, be sure to have proper and usable core on persistent storage:

# cd /var/crash/nodename
# file *.0
unix.0:         ELF 64-bit MSB executable SPARCV9 Version 1, UltraSPARC1 Extensions Required, statically linked, not stripped, no debugging information available
vmcore.0:       SunOS 5.10 Generic_127111-11 64-bit SPARC crash dump from 'nodename'

Then, extract useful information using MDB dcmds such as ::status, ::showrev and ::panicinfo which give us the exact panic message and provide us the message and thread responsible for the system crash:

# mdb -k unix.0 vmcore.0
Loading modules: [ unix krtld genunix specfs dtrace ufs sd mpt px ssd fcp fctl md ip qlc hook neti sctp arp usba nca zfs random logindmux ptm cpc sppp crypto wrsmd fcip nfs ipc ]
> ::status
debugging crash dump vmcore.0 (64-bit) from nodename
operating system: 5.10 Generic_127111-11 (sun4u)
panic message: forced crash dump initiated at user request
dump content: kernel pages only
> ::showrev
Hostname: nodename
Release: 5.10
Kernel architecture: sun4u
Application architecture: sparcv9
Kernel version: SunOS 5.10 sun4u Generic_127111-11
Platform: SUNW,SPARC-Enterprise
> ::panicinfo
             cpu                0
          thread      300171c7300
         message forced crash dump initiated at user request
          tstate       4400001606
              g1                b
              g2                0
              g3          11c13e0
              g4              6e0
              g5         88000000
              g6                0
              g7      300171c7300
              o0          1208020
              o1      2a10176b9e8
              o2                1
              o3                0
              o4 fffffffffffffff5
              o5             1000
              o6      2a10176b0b1
              o7          10626a4
              pc          1044d8c
             npc          1044d90
               y                0

Well. Now what we have the exact thread number (thread ID), we can find the corresponding UNIX process helped by the following script:

# cat /var/tmp/findstack.vmcore.sh
#!/usr/bin/env sh

echo "::ps" | mdb -k unix.0 vmcore.0 | \
 nawk '$8 !~ /ADDR/ {print $8" "$NF}' > /tmp/.core.$$

cat /dev/null > /tmp/core.$$

while read ps; do
  echo "process name: `echo ${ps} | nawk '{print $2}'`" >> /tmp/core.$$
  echo ${ps} | nawk '{print $1"::walk thread | ::findstack"}' | \
   mdb unix.0 vmcore.0 >> /tmp/core.$$
  echo >> /tmp/core.$$
done < /tmp/.core.$$

\rm /tmp/.core.$$

exit 0

Now, just find the lines for the guilty process in the output file. In our case, it is the oprocd.bin process:

# vi /tmp/core.*
[...]
process name: oprocd.bin
stack pointer for thread 300171c7300: 2a10176b0b1
  000002a10176b161 kadmin+0x4a4()
  000002a10176b221 uadmin+0x11c()
  000002a10176b2e1 syscall_trap+0xac()
[...]

This process is locked in memory to monitor the cluster and provide I/O fencing. oprocd.bin performs its check, stops running, and if the wake up is beyond the expected time, then it resets the processor and reboots the node. An oprocd.bin failure results in Oracle Clusterware restarting the node. Please read the Oracle Clusterware and Oracle Real Application Clusters documentation for more information.

Although the incident is always under investigation, it seems the nodes were impacted by the additional second that was added at the end of 2008...