Understand Buffer Cache in further --- Build up queryable Buffer Cache Chain data

Date: 2013-02-06 06:10:11

Share to

I received email from a reader who want I could explain more details about the content of 6.3.2 in "Oracle SQL Optimizing and Tuning Technologies", especially, how the view v_bufheaders comes, and how it shows the position of buffer blocks in the buffer chains.
As we know, Buffer Cache is an important part of Oracle memory, it caches the data block read from data files. Hence, it impacts the performance very much. To improve performance using this limited resource, Oracle adopts LRU algorithm to manage the buffer block allocation and deallocation. LRU algorithm is quite familiar to us, because it has been explained in Oracle online document and other books very well. However, it's not easy to identify the postion of a buffer block in the buffer chain (although we can use oradebug to print a buffer chain to a trace file, it's little bit complicated and hard to read). In this book, I created the view v_bufheaders to show the position in the buffer chain and other information of the buffer block.
To understand this view, we need look into the core base table of this view, X$BH. It stores the basic information in the buffer block header, which is the data struct used by Oracle to manage the buffer blocks. In this data struct, it indicates the position of the buffer block in buffer chains. It uses a pair of pointers, e.g. NXT_REPL and PRV_REPL, to point to the next node in the chain in both direction. Such description might not so clear to understand this issue. So, let's explain what is X$ table.
A X$ table is not a traditional table, but a data struct in memory. With a special component in SQL engine, Oracle makes it be able to queried as a normal table in a SQL statement. Regarding X$BH, we should try to build up the relationship between the records if we want to figure out the position of the block in the chains.
The data struct definition of X$BH looks like below (in this case, we don't need other information but the chain pointers),

C++代码

struct BUFFER_HEADER

{

  ub2        le_addr

  ...

  CHAIN_NODE us_chain;

  CHAIN_NODE wa_chain;

  ...

  CHAIN_NODE hash_chain;

  CHAIN_NODE repl_chain;

  ...

};

CHAIN_NODE is also a data struct, which includes a pair of pointers for both direction.

C++代码

struct CHAIN_NODE

{

  CHAIN_NODE *prv_prt;

  CHAIN_NODE *nxt_prt;

};

With such data struct, it's ease to go through and to manage all nodes in the chain.

C++代码

CHAIN_NODE *startPoint = &(first_node.repl_chain);

CHAIN_NODE *nextPoint = startPoint->nxt_prt;

while (nextPoint != startPoint) {

  nextPoint = nextPoint->nxt_prt;

  ...

}

However, with such data, we cannot figure out the relationship between the nodes. Therefore, we need transfer these data to transform them to a kind of relationship data:
1. Assign a key for each record;
2. Define a column storing the key of next record instead of a memory address of the pointer.

The ideal key must be the address of the buffer header. But we can find such data from X$BH, so we have to generate/calculate it from the existing data. To achieve such goal, we need find out the clue that indicates the relationship between the existing data and the buffer header address. Thus, we need another X$ table, X$KQFCO.
X$KQFCO describes the data type, offset and other information of the members in the data struct. Take Buffer Header (X$BH) for example, if the base address of a buffer header is 0x30000000, plusing offset, we get the address of each member of it.

SQL代码

HELLODBA.COM>select kqfconam, kqfcooff, '0x'||trim(to_char(to_number('30000000','XXXXXXXX')+kqfcooff, 'XXXXXXXX')) addr

  2  from x$kqfco c,x$kqfta t

  3  where t.indx = c.kqfcotab and t.kqftanam='X$BH'

  4  and kqfcodty = 23

  5  order by kqfcooff;



KQFCONAM                         KQFCOOFF ADDR

------------------------------ ---------- -----------

ADDR                                    0 0x30000000

HLADDR                                  0 0x30000000

LE_ADDR                                 0 0x30000000

US_NXT                                 16 0x30000010

US_PRV                                 20 0x30000014

WA_NXT                                 24 0x30000018

WA_PRV                                 28 0x3000001C

NXT_HASH                              124 0x3000007C

PRV_HASH                              128 0x30000080

NXT_REPL                              172 0x300000AC

PRV_REPL                              176 0x300000B0

BA                                    188 0x300000BC

SET_DS                                192 0x300000C0

OQ_NXT                                196 0x300000C4

OQ_PRV                                200 0x300000C8

AQ_NXT                                204 0x300000CC

AQ_PRV                                208 0x300000D0

But such information still cannot help us to transform the memory data to relationship data.
A point we should understand is that, there are other buffer chains information in the buffer header besides the replacement buffer chain, such as user holding chain (US_NXT, US_PRV) and waiting chain (WA_NXT, WA_PRV) for Latch management. Unlike replacement chain management, such events are only raised when requires. Normally, most of buffer blocks are not included in those chains. In such scenario, the points are pointing to address of the data struct CHAIN_NODE.

SQL代码

HELLODBA.COM>select us_nxt, us_prv, wa_nxt, wa_prv from x$bh where rownum<=6;



US_NXT   US_PRV   WA_NXT   WA_PRV

-------- -------- -------- --------

353EFABC 353EFABC 353EFAC4 353EFAC4

313F9EB4 313F9EB4 313F9EBC 313F9EBC

35BEFEE0 35BEFEE0 35BEFEE8 35BEFEE8

30BEAE8C 30BEAE8C 30BEAE94 30BEAE94

387E9ED0 387E9ED0 387E9ED8 387E9ED8

2EFE940C 2EFE940C 2EFE9414 2EFE9414



6 rows selected.

For these records, we can find that the data of NXT and PRV pointers are exact same, which is actually the address of the corresponding CHAIN_NODE in current buffer header. And because NXT is the first member of CHAIN_NODE, they actually points to the address of NXT.
So, we can get the address of the buffer header simply. Take US_NXT for example, its value minus the offset, we get the address of the buffer header. Look the first record in previous query result:

Buffer Header Address = US_NXT - 16 = WA_NXT - 24
                        = 0x353EFABC - 16 = 0x353EFAC4 - 24
                        = 0x353EFAAC

Theoretically, a buffer block will never be in the user holding chain and waiting chain simultaneously. Therefore, we can get the buffer header address with these two set of data.
With the same theory, since the pointer in CHAIN_NODE of replacement buffer chain is pointing to the address of CHAIN_NODE in the next buffer header, minus the offset of, we can get the address of next buffer header in the chain.

For instance, if there are 3 buffer blocks in the buffer cache, and none of them exists in the user holding chain, then we can draft their relationship as below diagram (forgive me, it looks not so beautiful):

C++代码

BufferHeader(1)      BufferHeader(2)     BufferHeader(3)

[0X30000000]        [0X30001000]        [0X30002000]

{                   {                   {

   ...                 ...                 ...

           +----+              +----+             +----+

           |    |              |    |             |    |

   ( 16)US_NXT<-+      ( 16)US_NXT<-+     ( 16)US_NXT<-+

       /|\                 /|\                 /|\

        |                   |                   |

   ( 20)US_PRV         ( 20)US_PRV         ( 20)US_PRV

   ...                 ...                 ...

+---------------------------------------------------------+

|                                                         |

+--(172)NXT_REPL<------(172)NXT_REPL<------(172)NXT_REPL<-+

       /|\                 /|\                 /|\

+-------+   +---------------+   +---------------+

|           |                   |

|  (176)PRV_REPL       (176)PRV_REPL       (176)PRV_REPL--+

|                                                         |

+---------------------------------------------------------+

   ...                 ...                 ...

}                   }                   }

US_NXT and US_PRV point to the address of US_NXT, while NXT_REPL and PRV_REPL point to the address of NXT_REPL in next buffer header in different direction, and all nodes construct to a loop chain (Actually, there might be a main replacement chain and an auxiliary replacement chain in each buffer cache set).
Now, we can assign the address of the buffer header as the key of each buffer header data record; and we are able to get a column storing the key of next buffer header in the chain. This is definitely a hierarchy data. But we need another important data, the first buffer header in the chain. We can get it from another X$ table, X$KCBWDS, which describes the buffer cache sets. There are 2 pointers in this data struct, NXT_REPL and NXT_REPL, which point to address of NXT_REPL in the buffer header at the MRU end and LRU end of the main replacement buffer chain. And there are other 2 pointers in this data struct, NXT_REPLAX and NXT_REPLAX, which point to address of NXT_REPL in the buffer header at the MRU end and LRU end of the auxiliary replacement buffer chain. In the same way, minus the offset, we can get the address of the buffer header at the MRU end and LRU end.

With a hierarchy query, we can get a whole buffer chain containing the buffer nodes and their position in the chain. That's how I created the view V_BUFLIST.
In addition, I also created a view v_bufhashlist to get the hash buffer chain. Join with other tables, I finnaly created the view v_bufheaders. Regarding the view definitions, you can refer to the scripts of the book.
HBUFS in the X$KCBWDS indicates the block number in the hot area of the main replacement chain. Because we generate the poisition starting with 1, HBUFS could be treated as the position of the last buffer in the main replacement chain, and the next buffer is actually the insert point.
So, we can easily get the head, tail buffer of the main/auxiliary replacement chain as well as the insert point. With such information, we can analyze and study the replacement algorithsm in further.

SQL代码

HELLODBA.COM>select /*+no_merge(v)*/

  2         b.set_id, b.listname, b.indx, headeraddr, obj, dbms_utility.make_data_block_address(dbarfil, dbablk) dba, tch

  3    from v_bufheaders b,

  4        (select set_id, listname, max(indx) last_id

  5           from v_buflist

  6          group by set_id, listname) v

  7   where b.set_id = v.set_id

  8     and b.listname = v.listname

  9     and b.indx >= v.last_id-2

10   union

11   select set_id, listname, indx, headeraddr, obj, dbms_utility.make_data_block_address(dbarfil,dbablk) dba, tch

12     from v_bufheaders b

13    where b.indx in 1

14       or b.indx between b.hbufs and b.hbufs+2

15    order by set_id, listname desc, indx;



    SET_ID LISTNAME        INDX HEADERADDR                OBJ        DBA        TCH

---------- --------- ---------- ------------------ ---------- ---------- ----------

         9 REPL_MAIN          1 2D7E4F40                  248    4196126          1

         9 REPL_MAIN          2 2D7E5014                  233    4206545          1

         9 REPL_MAIN       4689 337FAA3C                 6215    8392894          1

         9 REPL_MAIN       4690 333E7208                  254    4264246          1

         9 REPL_MAIN       4691 2DFF25BC                 6369    8393883          1

         9 REPL_AUX           1 303F1458                  473    8448867          0

         9 REPL_AUX           2 303F1384                  473    8448871          0

         9 REPL_AUX         704 2EFFA2C8                  473    8469751          0

         9 REPL_AUX         705 2EFFA1F4                  473    8469755          0

         9 REPL_AUX         706 2EFFA120                  473    8469759          0

...

Following this way, I believe that you can also construct other buffer chains to analyze other buffer cache related issues.

Downloand scripts of "Oracle SQL Optimizing and Tuning Technologies".

--- Wei Huang ---

Previous：Oracle SQL Hints --- Contents

Next：Rescue data in the worst situation (1)

Return To this Category