Before going deep into CRS, it’s good to know what a cluster and why it is required?
Let’s see –
A Cluster is a feature of High Availability, where it eliminates single point of failure.
In simple, a group of systems(nodes) works together to form as a single system so as to achieve it’s objective. So, physically they are group of systems(nodes), but logically they work as a single system.
There are different types of clusters – like High Availability Clusters, Load balancing clusters etc.,
How these clusters work?
All the servers in the cluster communicate with each other through polling. These Servers continually polls every other server in the cluster to ensure that they are still operational. This polling takes place over a private “heartbeat” network, which requires its own separate cable run.
Each clustered server is connected to:
• The network, via LAN connection • A common mass storage device, via shared data bus • The other server(s) in the cluster, through a simple interconnect device such as a secondary network card.
When a failover takes place, all three connections become involved. If the failed server is no longer available over the private heartbeat connection, it is polled on the LAN. If this yields no response, then the polling server must take over the failed server’s assignment of connecting users to the common data source via the data bus.
One subtle, but serious condition all clustering software must be able to handle is split-brain. Split-brain occurs when all of the private links go down simultaneously, but the cluster nodes are still running. If that happens, each node in the cluster may mistakenly decide that every other node has gone down and attempt to start services that other nodes are still running. Having duplicate instances of services may cause data corruption on the shared storage.
There are quite lot number of clustering solutions available through vendors like HP, IBM, SUN, Veritas etc.,
Oracle use to depend on these vendors for the clustering solution to implement RAC, until Oracle9i. Having different vendors for cluster management and the database was problematic and introduced many obstacles for administrators. Oracle 10g RAC now eliminates these issues with the introduction of Cluster Ready Services.
So, Oracle now has whole bunch of software to implement RAC. No need to depend on Third party vendors for clustering solutions. However, users who wish to continue using third-party clusterware for cluster support will still be able to utilize all the benefits of CRS.
CRS can either run on top of the vendor clusterware (Sun,HP,IBM,Veritas Cluster,etc.,) or without vendor clusterware. Remember that the vendor clusterware was required in 9i RAC but is optional in 10g RAC.
Cluster Ready Services (CRS) provides overall management of the cluster activities. CRS requires two key files that must be located in logical drives on the shared disks: one for a Voting Disk and one for the Oracle Cluster Registry(OCR). So, these two files must be readily available on shared disk before CRS installation.
Installing CRS is mandatory prior to installing 10g RAC. The CRS software is installed in the cluster with its own set of binaries under different home directory (CRS_HOME).
The core of CRS depends on 3 main components that run as deamons or processes. They are –
CRSD – The CRS Daemon is the main background process for managing the HA operation of the service. CRSD will start, stop, and fail over application resources as well as spawn separate processes to check the application resource health if needed. It maintains its configuration data within the OCR.
OCSSD – This process is associated with the ASM instance. This daemon is spawned to manage shared access of the disk devices to the clustered nodes. It provides basic node-management services such as node membership, cluster locking, and split brain protection.
EVMD – This is event management logger. It monitors the message flow between the nodes and logs the relevant event information to the log files.
So these Daemons (which inturn CRS) managing a number of resources and these resources are categorized into 2 types –
* nodeapps related resources like GSD,ONS,VIP's. * database related resources like rac db,instances, listeners etc.,
Whereas the nodeapp related resources are created and registered in OCR during the installation of RAC, database related resources can be created either during RAC installation or later by using manual tools like SRVCTL, DBCA and NETCA.
Let’s go operational and see how to manage Clusterware –
Oracle Clusterware can be managed using the utility CRSCTL. CRSCTL is an action packed program that allows many different operations like enabling/disabling of clusterware on startup, replacing or moving voting disks, checking the viability of the cluster, and advanced debugging.
1.) To start CRS #crsctl start crs 2.) To stop CRS #crsctl stop crs 3.) To enable CRS #crsctl enable crs 4.) To disable CRS #crsctl disable crs
Enable/Disable of cluster options are quite usefule during cluster maintenance time, as they enable/disable auto startup of cluster.
The various crsctl check commands are very useful when resources do not start up, or the cluster is suffering from stability issues.
1.) Check status of CRS #crsctl check crs CSS appears healthy CRS appears healthy EVM appears healthy 2.) You can even check each daemon individually - #crsctl check cssd/crsd/evmd 3.) To check status of cluster #crsctl check cluster 4.) To check clusterware version on a particular node #crsctl query crs sorwareversion 5.) To check active version on cluster #crsctl query crs activeversion Basically, we do not maintain different versions on nodes, but at times (upgrading a cluster on one node) it happens.
So, these are some of the basic commands for managing clusterware. The best place to know more about CRSCTL is crsctl command itself as it prints the specifications of the program –
$crsctl or $crsctl help