How to read ERRORSTACK output

Errorstack tracefiles are very useful for troubleshooting ORA-600's, crashes, hangs and even just bad performance.

Errorstack dumps are dumped automatically by Oracle when critical errors such as ORA-600 happen. This is when you see an error reported in alert.log and the generated tracefile has a "ksedmp: internal or fatal error" string in its beginning, followed by the error code (usually ORA-7445 or ORA-600 with some parameters).

"ksedmp" means Kernel Service Error DuMP, so everything below that line is the errorstack dump.

Errorstack dumps can also be manually invoked by issuing ORADEBUG ERRORSTACK 3 (when being connected to the target process using ORADEBUG SETOSPID). This can be useful when a session seems to be hung (but without showing a reasonable wait event in V$SESSION_WAIT) or is consuming much more resources than normally and you want to know which exact bind variable values are currently used for executing the SQL.

Errorstack tracefile has a lot of information it, some more and some less useful for practical troubleshooting. The vast amount of data and the binary dumps may scare you away, but I hope to show that there's some well understandable structure in there.

In this article we will go through following techniques:

I will paste the relevant parts of a sample errorstack dump wherever relevant.

Reading the current executing SQL statement text from errorstack trace

Reading the current running SQL statement (if any) is easy, the statement text is dumped below the "Current SQL statement for this session" header in the top of the tracefile:

*** 2010-02-14 15:30:32.830

*** SERVICE NAME:(SOL102) 2010-02-14 15:30:32.829

*** SESSION ID:(156.974) 2010-02-14 15:30:32.829

Received ORADEBUG command 'dump errorstack 3' from process Unix process pid: 8556, image:

*** 2010-02-14 15:30:32.830

ksedmp: internal or fatal error

Current SQL statement for this session:

DELETE FROM MYTAB WHERE OWNER = :B1

----- PL/SQL Call Stack -----

  object      line  object

  handle    number  name

38d172270         8  function SYS.DELETE_FUNC

38d1518f0         1  anonymous block

----- Call Stack Trace -----

calling              call     entry                argument values in hex

location             type     point                (? means dubious value)

-------------------- -------- -------------------- ----------------------------

ksedst()+23          ?        0000000000000001     0017B342C 000000000 0FFDF2420

                                                   0FFFFFD7F

ksedmp()+636         ?        0000000000000001     0017B1EC1 000000000 00601C7E0

                                                   000000000

ksdxfdmp()+1062      ?        0000000000000001     0018A3F03 000000000 00601C7E0

                                                   000000000

ksdxcb()+1238        ?        0000000000000001     0018A22D3 000000000 0FF2DCC80

....

Reading the current executing PL/SQL package and PL/SQL source code line number from errorstack trace

*** 2010-02-14 15:30:32.830

ksedmp: internal or fatal error

Current SQL statement for this session:

DELETE FROM MYTAB WHERE OWNER = :B1

----- PL/SQL Call Stack -----

  object      line  object

  handle    number  name

38d172270         8  function SYS.DELETE_FUNC

38d1518f0         1  anonymous block

----- Call Stack Trace -----

calling              call     entry                argument values in hex

location             type     point                (? means dubious value)

-------------------- -------- -------------------- ----------------------------

ksedst()+23          ?        0000000000000001     0017B342C 000000000 0FFDF2420

                                                   0FFFFFD7F

ksedmp()+636         ?        0000000000000001     0017B1EC1 000000000 00601C7E0

                                                   000000000

ksdxfdmp()+1062      ?        0000000000000001     0018A3F03 000000000 00601C7E0

                                                

If the process which dumped the tracefile was in a PL/SQL call, then the PL/SQL call stack is also dumped (under the PL/SQL Call Stack section). This shows you exactly in which PL/SQL procedure this problem happened and which procedure calls led Oracle to the final procedure where the execution was when the dump was done.

The PL/SQL call stack (bold/italic above) reports 3 columns:

To recap, the above little demo PL/SQL call stack consists of 2 lines only:

38d172270 8 function SYS.DELETE_FUNC

38d1518f0 1 anonymous block

You should read a PL/SQL call stack from bottom up:

As we found the owner and name of the object from the above dump and the line number info, we can query DBA_SOURCE by this (as DBA_SOURCE has a LINE field showing the PL/SQL source line number):

SQL> select line, text

  2  from dba_source

  3  where owner = 'SYS'

  4  and   name = 'DELETE_FUNC'

  5  -- and line between X and Y

  6  and type = 'FUNCTION' -- PROCEDURE,PACKAGE BODY,TYPE BODY

  7  order by line asc

  8  /

      LINE TEXT

---------- -------------------------------------------------------------------

         1 function delete_func (owner_name in varchar2) return number

         2 as

         3    num_deleted number;

         4 begin

         5    -- this is a demo procedure

         6    -- it does not do anything useful!

         7

         8    DELETE FROM mytab WHERE owner = owner_name;

         9    COMMIT;

        10

        11    num_deleted := SQL%ROWCOUNT;

        12    DBMS_OUTPUT.PUT_LINE('Deleted rows ='|| TO_CHAR(num_deleted));

        13

        14    return num_deleted;

        15 end;

15 rows selected.

You see, the line 8 reported above is where the PL/SQL execution currently was (it invoked the recursive DELETE query which was still running).

Simply, the PL/SQL errorstack says exactly where in the PL/SQL code Oracle was executing when this error condition, dump, crash, hang, or whatever was the problem, happened (top line was the code execution currently was and the line(s) below show the "parent" functions which called their child function seen on the above line).

If the stack thing doesn't make sense right now, bear with me, I will write a separate note about it.

I will also explain the "Call Stack Trace" part in another note as it's a longer topic.

Reading the current bind variable values from errorstack trace

This is a question very frequently asked from me.

Let say this:

So, you would want to know what were the bind variable values used for that problematic case. Unfortunately there's no V$ view where to get current bind variable values of a session. If you think V$SQL_BIND_CAPTURE, then you're wrong - it only samples bind variable values occasionally, it doesn't store every single bind variable value used somewhere.

In Oracle 11.2, the Real-Time SQL Monitoring feature makes things easier. There's a new column BIND_XML in V$SQL_MONITOR, which contains the bind variable values of queries running long enough. Accessing that view however requires you to have Diagnostics and Tuning pack license. You can read more about this feature here

But let's get back to the original article, not everyone is running on Oracle 11.2 and not everyone has Diag+Tuning pack licenses.

There's a reason why it's non-trivial to get the bind values - the bind variable values live in the private memory of the processes. So you can't easily go and read the private memory of another process!

That's why we need to use ORADEBUG to connect to the problem process and send a signal to it so it would dump the errorstack, which luckily includes CURSORDUMP, which includes bind variable values!

The red number 8213 in the below example is the OS process ID (V$PROCESS.SPID) of the target, problem session which I want to dump:

SQL> oradebug setospid 8213

Oracle pid: 15, Unix process pid: 8213, image: oracleSOL102@solaris02

SQL> oradebug dump errorstack 3

Statement processed.

Now that target process should have written the errorstack to UDUMP (or BDUMP if its a background process). Note that you probably don't want to connect to background processes with ORADEBUG unless you really know what you're doing!

Of course, when you are diagnosing a crash or ORA-600 (which has already happened), then you don't need to use oradebug, as the crash handler should have dumped the errorstack with cursordump to a tracefile already.

So, lets open the same tracefile (beginning shown only).

*** 2010-02-14 15:30:32.830

ksedmp: internal or fatal error

Current SQL statement for this session:

DELETE FROM MYTAB WHERE OWNER = :B1

----- PL/SQL Call Stack -----

  object      line  object

  handle    number  name

38d172270         8  function SYS.DELETE_FUNC

38d1518f0         1  anonymous block

----- Call Stack Trace -----

calling              call     entry                argument values in hex

location             type     point                (? means dubious value)

-------------------- -------- -------------------- ----------------------------

ksedst()+23          ?        0000000000000001     0017B342C 000000000 0FFDF2420

                                                   0FFFFFD7F

ksedmp()+636         ?        0000000000000001     0017B1EC1 000000000 00601C7E0

                                                   000000000

ksdxfdmp()+1062      ?        0000000000000001     0018A3F03 000000000 00601C7E0

                                                   000000000

ksdxcb()+1238        ?        0000000000000001     0018A22D3 000000000 0FF2DCC80

                                                   0FFFFFD7F

....

When you start scrolling through all the (binary) data which comes below, you may go crazy as there's too much (incomprehensible) data there. So what's the systematic way to reliably navigate to the actual bind variables used?

What I usually do, is search for first occurrence of "Session Cursor Dump". Note that the case matters here.

When I search for it, I will see the following output (again I only paste some here, starting from the Session Cursor Dump I searched for):

******************** Session Cursor Dump **********************

Current cursor: 6, pgadep: 1

Open cursors(pls, sys, hwm, max): 5(3, 1, 64, 300)

 NULL 1 SYNTAX 0 PARSE 0 BOUND 4 FETCH 0 ROW 0

Cached frame pages(total, free):

 4k(41, 38), 8k(1, 1), 16k(1, 1), 32k(0, 0)

  pgactx: 38cc705e0 ctxcbk: 38cc701f0 ctxqbc: 0 ctxrws: 39392a188

Explain plan:

Dumping stream from 0 to 97

----------------------------------

0000: 143 136 161 146 147   3   0   0   0 137 119   1   0  16   0   0   0   3   1   2    ..........w.........

0020:   0   0   0   0   0  59   7   3   6   3 121 101 115   5   9   1   1   3  83  89    .....;....yes.....SY

0040:  83  12   8 106 108 109 224 152 228 163 113 111   3   2   3   3   2   0   0 143    S..jlm....qo........

0060:  56 164 146 147   3   0   0   0 134 116   2  38  24 133  86 229  24 144 100 133    8........t.&..V...d.

0080:  81 193 219 234 223 154 138   3   1   2 192 214 188   3   2  14 142                Q................

============

Plan Table

============

--------------------------------------+-----------------------------------+

| Id  | Operation           | Name    | Rows  | Bytes | Cost  | Time      |

--------------------------------------+-----------------------------------+

| 0   | DELETE STATEMENT    |         |       |       |  1366 |           |

| 1   |  DELETE             | MYTAB   |       |       |       |           |

| 2   |   TABLE ACCESS FULL | MYTAB   |  119K | 2023K |  1366 |  00:00:17 |

--------------------------------------+-----------------------------------+

Here you go, the Current cursor: 6 says that Oracle was currently executing Cursor#6 in the open cursor array in that session's UGA (the max array size is set by open_cursors parameter).

So, in order to find the bind variables of currently executing query, all I need to do is forward-search for Cursor#6 in this tracefile. Note that the case and the "missing" space between Cursor and #6 is intentional:

----------------------------------------

Cursor#6(fffffd7ffddb1918) state=BOUND curiob=fffffd7ffdd0f380

 curflg=cd fl2=0 par=0 ses=398f72fe8

 sqltxt(38d15d0f8)=DELETE FROM MYTAB WHERE OWNER = :B1

  hash=332a8a504bccd7f1a4a726a2879e71d9

  parent=38ca516d0 maxchild=01 plk=395bf4dd8 ppn=n

cursor instantiation=fffffd7ffdd0f380 used=1266132628

 child#0(38d151620) pcs=38ca512e0

  clk=395b767b8 ci=38ca509c8 pn=38d179e58 ctx=38cc705e0

 kgsccflg=0 llk[fffffd7ffdd0f388,fffffd7ffdd0f388] idx=0

 xscflg=c0110676 fl2=1d120000 fl3=4228218c fl4=100

 Bind bytecodes

  Opcode = 6   Bind Rpi Scalar Sql In(may be out) Nocopy NoSkip

  Offsi = 48, Offsi = 0

kkscoacd

 Bind#0

  oacdty=01 mxl=32(32) mxlc=00 mal=00 scl=00 pre=00

  oacflg=13 fl2=206001 frm=01 csi=31 siz=32 off=0

  kxsbbbfp=fffffd7ffdd31808  bln=32  avl=03  flg=09

  value="SYS"

 Frames pfr fffffd7ffdd3f8c0 siz=4488 efr fffffd7ffdd3f800 siz=4464

 Cursor frame dump

  enxt: 3.0x000004b0  enxt: 2.0x00000020  enxt: 1.0x00000ca0

  pnxt: 2.0x00000008  pnxt: 1.0x00000010

 kxscphp  fffffd7ffddc0388 siz=984 inu=472 nps=344

 kxscbhp  fffffd7ffdcef1a8 siz=1008 inu=120 nps=0

 kxscwhp  fffffd7ffddc0748 siz=10496 inu=6608 nps=6608

And here you go:

And that's it, this is the systematic and reliable way for finding bind variable values used by some other session.

Identifying how much private (UGA) memory a cursor is using

And finally, continuing from previous example, it is possible to measure how much of your UGA private memory space some cursor is using (as one session may have many cursors open the same time):

I will paste here again few last lines of the last output:

 Cursor frame dump

  enxt: 3.0x000004b0  enxt: 2.0x00000020  enxt: 1.0x00000ca0

  pnxt: 2.0x00000008  pnxt: 1.0x00000010

 kxscphp  fffffd7ffddc0388 siz=984 inu=472 nps=344

 kxscbhp  fffffd7ffdcef1a8 siz=1008 inu=120 nps=0

 kxscwhp  fffffd7ffddc0748 siz=10496 inu=6608 nps=6608

Under the cursor frame dump you see a bunch of lines starting with kxsc, which means Kernel eXecution Shared Cursor. In each line you see a field called siz, which tells the private memory allocation size for that particular allocation type (or what's the internal allocation size, to be correct).

You can just sum all these sizes together to determine a cursors memory usage.

Here are the meanings (which I have guessed) for the interested:

Finally, even though ORADEBUG is a widely used tool by Oracle support - be careful with it (for example, search what Metalink has to say about ORADEBUG DUMP ERRORSTACK) etc. If you are unsure, ask Oracle Support for "blessing" first.

I never connect to critical background processes such as LGWR, DBWR with oradebug unless I'm prepared to that the instance may crash. However these techniques have been very valuable in some cases where other instrumentation is not enough!