In this post let us know something about the main concept of RAC – Cache Fusion.
Now before going into Cache Fusion – first let’s first memorize once again how actually a normal database instance behaves when there is a request for data block.
Let us suppose a user process is requesting a data block, and since this process cannot directly read from the disk, first the requested block must be read(Physical Read) into Buffer Cache of SGA. Once it is read into buffer cache it will remains in the Buffer Cache for further requests. Whenever there is a request for the same data block, since it is already in the buffer, it can be directly read (Buffer Read) from the buffer, thus avoiding another Physical Read. If a data block is found in buffer cache it is called a ‘Cache Hit’ and if it is not found, then it is called a ‘Cache Miss’.
In order to maintain data integrity, when there are concurrent requests for the same data block, Oracle uses Locking Mechanism and multi-version consistency control. A data block can reside in various buffers with different versions, for example a dirtied block, where the previous version of the data block will be maintained in UNDO and the copy of the current version in REDO. Whenever a user request for the block that was already in buffer and dirtied, the UNDO segment provides the required information to construct the read-consistent (CR) image of the data blocks. So, Multi-version data blocks help to achieve read consistency.
The read consistency model guarantees that the data block seen by a statement is consistent with respect to a single point in time and does not change during the statement execution. Readers of data do not wait for other writer’s data or for other readers of the same data. At the same time, writers do not wait for other readers for the same data. Only writers wait for other writers if they attempt to write.
In a single instance the following happens when reading a block
* When a reader reads a recently modified block, it might find an active transaction in the block. * The reader will need to read the undo segment header to decide whether the transaction has been committed or not. * If the transaction is not committed, the process creates a consistent read (CR) version of the block in the buffer cache using the data in the block and the data stored in the undo segment. * If the undo segment shows the transaction is committed, the process has to revisit the block and clean out the block and generate the redo for the changes.
Now let us see how it goes in a RAC environment –
In RAC, there are 2 or more instances accessing same database files that are residing in shared storage area. Each instance has its own SGA and background processes, which means each instance has its own buffer cache (local to each instance). These buffer cache’s act individually at instance level and fuse together at database level to form a single entity (Global Cache) so as to share the data blocks between them. This is what we called ‘Cache Fusion’. Cache Fusion uses a high-speed IPC interconnect to provide cache-to-cache transfers of data blocks between instances in a cluster. This data block shipping eliminates the disk I/O and optimizes read/write concurrency.
Now the question is how the integrity of the data is maintained in a RAC environment, if there are concurrent requests for the same data block – Here too Oracle uses locking and queuing mechanisms to coordinate lock resources, data and inter-instance data requests.
Cache Fusion was implemented by a controlling mechanism called Global Cache Service (GCS), which is responsible for block transfers between instances. The Global Cache Service is implemented by various background processes, such as
Global Cache Service Processes (LMSn)
Global Enqueue Service Daemon (LMD)
[Before going into those processes, let us see how oracle treats the data blocks and how it manages them –
Oracle treats the data blocks as resources. Each of these resources can be held in different modes, which is important mechanism to maintain data integrity. These modes are classified into 3 types depending on whether resource holder intends to modify the data or read the data. They are –
Null (N) mode —Null mode is usually held as a placeholder.
Shared (S) mode — In this mode, data block is not modified by another session, but will allow concurrent shared access.
Exclusive (X) mode — This level grants the holding process exclusive access. Other processes cannot write to the resource. It may have consistent read blocks.
Furthermore, these resources act in one of 2 roles – Local (L) and Global (G).
A resource (data block) is assigned a local role, when a block is first read into the cache and no other instance request for the same data block.
A resource is assigned a Global role, when block is dirtied locally and transmitted to another instance.]
Now let us see what those daemon’s do –
Global Cache Service Daemon (LMSn)
Upon a request from an Instance GCS organizes the block shipping to other instances by retaining block copies in memory. Each such copy is called a past image (PI), which in the event of a node failure, Oracle can reconstruct the current version of a block by using a saved PI. It is also possible to have more than 1 PI of the data block; depending on how many times the block was requested in dirty stage.
Do not confuse read-consistent (CR) image with past image (PI), they are not same as they appear. PI is not a read consistent image of the data block, to make it so, you need to apply UNDO, which in turn converts into a CR image.
Keep in mind that if you want to read a data block, it must be in read consistent state. You are not allowed to read the changes made by others.
Global Enqueue Service Daemon (LMD)
The global enqueue service (GES) tracks the status of all Oracle enqueuing mechanisms. The GES performs concurrency control on dictionary cache locks, library cache locks, and transactions. It performs this operation for resources that are accessed by more than one instance. The GES controls access to data files and control files but not for the data blocks. GES processing includes the coordination for enqueues other than the data blocks. The resources managed by the GES include the following:
Transaction locks – It is acquired in the exclusive mode when a transaction initiates its first row level change. The lock is held until the transaction is committed or rolled back.
Library Cache locks – When a database object (such as a table, view, procedure, function, package, package body, trigger, index, cluster, or synonym) is referenced during parsing or compiling of a SQL, DML or DDL, PL/SQL, or Java statement, the process parsing or compiling the statement acquires the library cache lock in the correct mode.
Dictionary Cache Locks – Global enqueues are used in the cluster database mode. The data dictionary structure is the same for all Oracle instances in a cluster database, as it is for instances in a single-instance database. However, in real application clusters, Oracle synchronizes all the dictionary caches throughout the cluster. Real application clusters use latches to do this, just as in the case of a single-instance Oracle database.
Table locks – These are the GES locks that protect the entire table(s). A transaction acquires a table lock when a table is modified. A table lock can be held in any of several modes: null (N), row share (RS), row exclusive (RX), share lock (S), share row exclusive (SRX), or exclusive (X).
GCS (LMSn + LMD) keeps track of the resources, location and their statuses (mode, role) and this information is recorded in Global Resource Directory (GRD). Each instance maintains its own GRD and manages a portion of the directory. Whenever a block is transferred out of a local cache to another instance’s cache the GRD is updated. A GRD knows where exactly a recent version of the data block is available.
To perform any operation on a data block we need to know the current state of the particular data block. To know its current state, it requires 3 things –
1. What is its current role?(Local(L) or Global(G)) 2. What is its current mode?(Null (N) or Shared (S) or Exclusive (E)) 3. Whether the requesting block has any Past Images (PI)? (0 or 1)
But where can you get this information from? Yes – you are right – from GRD.
The state of the data block is represented in a 3 letter code – (mode,role,PI) – NL0, SL0, XL1 etc.,
Now let me show the different scenarios of data block transfer –
1. Reading the data block from the disk – In this scenario, initially a data block is read (disk read) from the data file, since no copy of this data block is currently available in any of the instances. Once the block is read into the buffer cache, it is in the state of SL0. This indicates that the block now is in shared mode with local role and doesn’t have any past images. The resource information will be updated accordingly in GRD.
2. Reading the data block from the cache – In this scenario, a data block is currently available in one of the instances buffer cache, so there is no need to read from the disk, thus avoiding a physical read. Let us say, Instance 2 request for a data block, in that regard it sends a request to GCS. GCS in turn passes the request to the owning instance (instance 1). Upon receiving the request, instance 1 forwards the data block to the requesting instance (instance 2) keeping the data block in shared mode and also retains its Local role. No past image is created on instance 1 as the data block was not dirtied yet. Now the state of the data block in instance 2 is SL0 (similar to that of reading from a disk).
3. Modifying the data block – In this scenario, Instance 2 requests for a block to modify, and pass the request to GCS. GCS in turn pass the request to the Instance 1 (owner of that data block). Instance 1 modified this data block, but not committed yet. Upon receiving the request, instance 1 sends the data block to instance 2. Before sending, the resource is downgraded to NULL mode and keeps a copy of current version of the block (PI). Now the role becomes Global, since it is dirtied. It also informs Instance 2 that it retained a PI copy and a NULL resource, which specifies that instance 2 can held the block in exclusive mode (X) with a global role (G). Upon receipt of the block, instance 2 informs GCS about the mode and role of the block (X, G).
4. Writing dirty buffers to disk – In this scenario, Instance 1 wants to write the buffer to disk, so a request is send to GCS. GCS forwards the request to instance 2(current holder of block). Upon receiving the request, instance 2 writes the block to the disk and informs GCS. Instance 2 also informs GCS that the resource role now become local because the instance has completed write of the current block. Upon receiving the message GCS orders all PI holders to discard their PI’s and they no longer need for recovery as the current block is written and buffer is released.
That’s the brief about Cache Fusion – will discuss another topic in the next post.