How to read ERRORSTACK output
Errorstack tracefiles are very useful for troubleshooting ORA-600's, crashes, hangs and even just bad performance.
Errorstack dumps are dumped automatically by Oracle when critical errors such as ORA-600 happen. This is when you see an error reported in alert.log and the generated tracefile has a "ksedmp: internal or fatal error" string in its beginning, followed by the error code (usually ORA-7445 or ORA-600 with some parameters).
"ksedmp" means Kernel Service Error DuMP, so everything below that line is the errorstack dump.
Errorstack dumps can also be manually invoked by issuing ORADEBUG ERRORSTACK 3 (when being connected to the target process using ORADEBUG SETOSPID). This can be useful when a session seems to be hung (but without showing a reasonable wait event in V$SESSION_WAIT) or is consuming much more resources than normally and you want to know which exact bind variable values are currently used for executing the SQL.
Errorstack tracefile has a lot of information it, some more and some less useful for practical troubleshooting. The vast amount of data and the binary dumps may scare you away, but I hope to show that there's some well understandable structure in there.
In this article we will go through following techniques:
I will paste the relevant parts of a sample errorstack dump wherever relevant.
Reading the current executing SQL statement text from errorstack trace
Reading the current running SQL statement (if any) is easy, the statement text is dumped below the "Current SQL statement for this session" header in the top of the tracefile:
*** 2010-02-14 15:30:32.830
*** SERVICE NAME:(SOL102) 2010-02-14 15:30:32.829
*** SESSION ID:(156.974) 2010-02-14 15:30:32.829
Received ORADEBUG command 'dump errorstack 3' from process Unix process pid: 8556, image:
*** 2010-02-14 15:30:32.830
ksedmp: internal or fatal error
Current SQL statement for this session:
DELETE FROM MYTAB WHERE OWNER = :B1
----- PL/SQL Call Stack -----
object line object
handle number name
38d172270 8 function SYS.DELETE_FUNC
38d1518f0 1 anonymous block
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst()+23 ? 0000000000000001 0017B342C 000000000 0FFDF2420
0FFFFFD7F
ksedmp()+636 ? 0000000000000001 0017B1EC1 000000000 00601C7E0
000000000
ksdxfdmp()+1062 ? 0000000000000001 0018A3F03 000000000 00601C7E0
000000000
ksdxcb()+1238 ? 0000000000000001 0018A22D3 000000000 0FF2DCC80
....
Reading the current executing PL/SQL package and PL/SQL source code line number from errorstack trace
*** 2010-02-14 15:30:32.830
ksedmp: internal or fatal error
Current SQL statement for this session:
DELETE FROM MYTAB WHERE OWNER = :B1
----- PL/SQL Call Stack -----
object line object
handle number name
38d172270 8 function SYS.DELETE_FUNC
38d1518f0 1 anonymous block
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst()+23 ? 0000000000000001 0017B342C 000000000 0FFDF2420
0FFFFFD7F
ksedmp()+636 ? 0000000000000001 0017B1EC1 000000000 00601C7E0
000000000
ksdxfdmp()+1062 ? 0000000000000001 0018A3F03 000000000 00601C7E0
If the process which dumped the tracefile was in a PL/SQL call, then the PL/SQL call stack is also dumped (under the PL/SQL Call Stack section). This shows you exactly in which PL/SQL procedure this problem happened and which procedure calls led Oracle to the final procedure where the execution was when the dump was done.
The PL/SQL call stack (bold/italic above) reports 3 columns:
object handle
The object handle is the address in library cache where that object executed is loaded (either PL/SQL stored procedure or anonymous block). You can map that address to X$KGLOB.KGLHDADR to find which exact object we are dealing with. However that object owner and name are reported in column 3 anyway.
line number
This is a goldmine! It will tell you exactly in which PL/SQL source code line the execution was when the errorstack dump was taken (or when the lower function in stack had called upper one). For example, in the above output, the anonymous block had called the SYS.DELETE_FUNC in line 1 (of the anonymous block). The SYS.DELETE_FUNC happened to be at PL/SQL line 8 when the errorstack dump was taken (or crash happened etc).
object name
The PL/SQL stored object name (or "anonymous block" when the object is not a stored procedure). In the anonymous block case (as anonymous block text is also seen from V$SQL), you can map this address to V$SQL.ADDRESS to find the anonymous block text.
To recap, the above little demo PL/SQL call stack consists of 2 lines only:
38d172270 8 function SYS.DELETE_FUNC
38d1518f0 1 anonymous block
You should read a PL/SQL call stack from bottom up:
The bottom line says that there was an anonymous block executed and at its line 1 it called the function just above it (which happens to be SYS.DELETE_FUNC)
The top line says that SYS.DELETE_FUNC was executing and at line 8 inside it, this error condition happened which caused the errorstack to be dumped.
As we found the owner and name of the object from the above dump and the line number info, we can query DBA_SOURCE by this (as DBA_SOURCE has a LINE field showing the PL/SQL source line number):
SQL> select line, text
2 from dba_source
3 where owner = 'SYS'
4 and name = 'DELETE_FUNC'
5 -- and line between X and Y
6 and type = 'FUNCTION' -- PROCEDURE,PACKAGE BODY,TYPE BODY
7 order by line asc
8 /
LINE TEXT
---------- -------------------------------------------------------------------
1 function delete_func (owner_name in varchar2) return number
2 as
3 num_deleted number;
4 begin
5 -- this is a demo procedure
6 -- it does not do anything useful!
7
8 DELETE FROM mytab WHERE owner = owner_name;
9 COMMIT;
10
11 num_deleted := SQL%ROWCOUNT;
12 DBMS_OUTPUT.PUT_LINE('Deleted rows ='|| TO_CHAR(num_deleted));
13
14 return num_deleted;
15 end;
15 rows selected.
You see, the line 8 reported above is where the PL/SQL execution currently was (it invoked the recursive DELETE query which was still running).
Simply, the PL/SQL errorstack says exactly where in the PL/SQL code Oracle was executing when this error condition, dump, crash, hang, or whatever was the problem, happened (top line was the code execution currently was and the line(s) below show the "parent" functions which called their child function seen on the above line).
If the stack thing doesn't make sense right now, bear with me, I will write a separate note about it.
I will also explain the "Call Stack Trace" part in another note as it's a longer topic.
Reading the current bind variable values from errorstack trace
This is a question very frequently asked from me.
Let say this:
a session has got stuck somehow or is burning CPU like crazy without any significant waits (for example the demo in my blog article using bind_peeking_nested_loops.sql script)
So you look into what SQL is being executed and you see some SQL statement with bind variables used
The execution plan is the same as it always is, but performance is much worse than normally
So you suspect there's data skew and this execution plan does not scale well when number of rows coming from some table/rowsource gets large
So, you would want to know what were the bind variable values used for that problematic case. Unfortunately there's no V$ view where to get current bind variable values of a session. If you think V$SQL_BIND_CAPTURE, then you're wrong - it only samples bind variable values occasionally, it doesn't store every single bind variable value used somewhere.
In Oracle 11.2, the Real-Time SQL Monitoring feature makes things easier. There's a new column BIND_XML in V$SQL_MONITOR, which contains the bind variable values of queries running long enough. Accessing that view however requires you to have Diagnostics and Tuning pack license. You can read more about this feature here.
But let's get back to the original article, not everyone is running on Oracle 11.2 and not everyone has Diag+Tuning pack licenses.
There's a reason why it's non-trivial to get the bind values - the bind variable values live in the private memory of the processes. So you can't easily go and read the private memory of another process!
That's why we need to use ORADEBUG to connect to the problem process and send a signal to it so it would dump the errorstack, which luckily includes CURSORDUMP, which includes bind variable values!
The red number 8213 in the below example is the OS process ID (V$PROCESS.SPID) of the target, problem session which I want to dump:
SQL> oradebug setospid 8213
Oracle pid: 15, Unix process pid: 8213, image: oracleSOL102@solaris02
SQL> oradebug dump errorstack 3
Statement processed.
Now that target process should have written the errorstack to UDUMP (or BDUMP if its a background process). Note that you probably don't want to connect to background processes with ORADEBUG unless you really know what you're doing!
Of course, when you are diagnosing a crash or ORA-600 (which has already happened), then you don't need to use oradebug, as the crash handler should have dumped the errorstack with cursordump to a tracefile already.
So, lets open the same tracefile (beginning shown only).
*** 2010-02-14 15:30:32.830
ksedmp: internal or fatal error
Current SQL statement for this session:
DELETE FROM MYTAB WHERE OWNER = :B1
----- PL/SQL Call Stack -----
object line object
handle number name
38d172270 8 function SYS.DELETE_FUNC
38d1518f0 1 anonymous block
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst()+23 ? 0000000000000001 0017B342C 000000000 0FFDF2420
0FFFFFD7F
ksedmp()+636 ? 0000000000000001 0017B1EC1 000000000 00601C7E0
000000000
ksdxfdmp()+1062 ? 0000000000000001 0018A3F03 000000000 00601C7E0
000000000
ksdxcb()+1238 ? 0000000000000001 0018A22D3 000000000 0FF2DCC80
0FFFFFD7F
....
When you start scrolling through all the (binary) data which comes below, you may go crazy as there's too much (incomprehensible) data there. So what's the systematic way to reliably navigate to the actual bind variables used?
What I usually do, is search for first occurrence of "Session Cursor Dump". Note that the case matters here.
When I search for it, I will see the following output (again I only paste some here, starting from the Session Cursor Dump I searched for):
******************** Session Cursor Dump **********************
Current cursor: 6, pgadep: 1
Open cursors(pls, sys, hwm, max): 5(3, 1, 64, 300)
NULL 1 SYNTAX 0 PARSE 0 BOUND 4 FETCH 0 ROW 0
Cached frame pages(total, free):
4k(41, 38), 8k(1, 1), 16k(1, 1), 32k(0, 0)
pgactx: 38cc705e0 ctxcbk: 38cc701f0 ctxqbc: 0 ctxrws: 39392a188
Explain plan:
Dumping stream from 0 to 97
----------------------------------
0000: 143 136 161 146 147 3 0 0 0 137 119 1 0 16 0 0 0 3 1 2 ..........w.........
0020: 0 0 0 0 0 59 7 3 6 3 121 101 115 5 9 1 1 3 83 89 .....;....yes.....SY
0040: 83 12 8 106 108 109 224 152 228 163 113 111 3 2 3 3 2 0 0 143 S..jlm....qo........
0060: 56 164 146 147 3 0 0 0 134 116 2 38 24 133 86 229 24 144 100 133 8........t.&..V...d.
0080: 81 193 219 234 223 154 138 3 1 2 192 214 188 3 2 14 142 Q................
============
Plan Table
============
--------------------------------------+-----------------------------------+
| Id | Operation | Name | Rows | Bytes | Cost | Time |
--------------------------------------+-----------------------------------+
| 0 | DELETE STATEMENT | | | | 1366 | |
| 1 | DELETE | MYTAB | | | | |
| 2 | TABLE ACCESS FULL | MYTAB | 119K | 2023K | 1366 | 00:00:17 |
--------------------------------------+-----------------------------------+
Here you go, the Current cursor: 6 says that Oracle was currently executing Cursor#6 in the open cursor array in that session's UGA (the max array size is set by open_cursors parameter).
As a side-comment, the pgadep: 1 variable shows the PGA depth, which shows what's the recursive depth of the query execution. If pgadep is 0, it would mean the query is a top-level query, being executed by the user/application via OPI interface. Pgadep 1 means that it's a recursive query, executed via recursive program interface (RPI), right under the top level query (either data dictionary query or just SQL executed via PL/SQL or vice versa). This pgadep corresponds to the "dep" value in SQL trace.
So, in order to find the bind variables of currently executing query, all I need to do is forward-search for Cursor#6 in this tracefile. Note that the case and the "missing" space between Cursor and #6 is intentional:
----------------------------------------
Cursor#6(fffffd7ffddb1918) state=BOUND curiob=fffffd7ffdd0f380
curflg=cd fl2=0 par=0 ses=398f72fe8
sqltxt(38d15d0f8)=DELETE FROM MYTAB WHERE OWNER = :B1
hash=332a8a504bccd7f1a4a726a2879e71d9
parent=38ca516d0 maxchild=01 plk=395bf4dd8 ppn=n
cursor instantiation=fffffd7ffdd0f380 used=1266132628
child#0(38d151620) pcs=38ca512e0
clk=395b767b8 ci=38ca509c8 pn=38d179e58 ctx=38cc705e0
kgsccflg=0 llk[fffffd7ffdd0f388,fffffd7ffdd0f388] idx=0
xscflg=c0110676 fl2=1d120000 fl3=4228218c fl4=100
Bind bytecodes
Opcode = 6 Bind Rpi Scalar Sql In(may be out) Nocopy NoSkip
Offsi = 48, Offsi = 0
kkscoacd
Bind#0
oacdty=01 mxl=32(32) mxlc=00 mal=00 scl=00 pre=00
oacflg=13 fl2=206001 frm=01 csi=31 siz=32 off=0
kxsbbbfp=fffffd7ffdd31808 bln=32 avl=03 flg=09
value="SYS"
Frames pfr fffffd7ffdd3f8c0 siz=4488 efr fffffd7ffdd3f800 siz=4464
Cursor frame dump
enxt: 3.0x000004b0 enxt: 2.0x00000020 enxt: 1.0x00000ca0
pnxt: 2.0x00000008 pnxt: 1.0x00000010
kxscphp fffffd7ffddc0388 siz=984 inu=472 nps=344
kxscbhp fffffd7ffdcef1a8 siz=1008 inu=120 nps=0
kxscwhp fffffd7ffddc0748 siz=10496 inu=6608 nps=6608
And here you go:
We have found the Cursor#6 from our trace, with state=BOUND which means that the bind variables have been put in place
The SQL text is what we already had identified previously and we see there's only one bind variable in there (it's automatically put there by PL/SQL execution engine).
The bind variable numbering starts from zero, so I need to search for Bind#0 if I want to see the first bind
Under that I'll just look into the value field and see value="SYS". This is the value which I had put into a bind variable for my test query.
And that's it, this is the systematic and reliable way for finding bind variable values used by some other session.
Identifying how much private (UGA) memory a cursor is using
And finally, continuing from previous example, it is possible to measure how much of your UGA private memory space some cursor is using (as one session may have many cursors open the same time):
I will paste here again few last lines of the last output:
Cursor frame dump
enxt: 3.0x000004b0 enxt: 2.0x00000020 enxt: 1.0x00000ca0
pnxt: 2.0x00000008 pnxt: 1.0x00000010
kxscphp fffffd7ffddc0388 siz=984 inu=472 nps=344
kxscbhp fffffd7ffdcef1a8 siz=1008 inu=120 nps=0
kxscwhp fffffd7ffddc0748 siz=10496 inu=6608 nps=6608
Under the cursor frame dump you see a bunch of lines starting with kxsc, which means Kernel eXecution Shared Cursor. In each line you see a field called siz, which tells the private memory allocation size for that particular allocation type (or what's the internal allocation size, to be correct).
You can just sum all these sizes together to determine a cursors memory usage.
Here are the meanings (which I have guessed) for the interested:
Finally, even though ORADEBUG is a widely used tool by Oracle support - be careful with it (for example, search what Metalink has to say about ORADEBUG DUMP ERRORSTACK) etc. If you are unsure, ask Oracle Support for "blessing" first.
I never connect to critical background processes such as LGWR, DBWR with oradebug unless I'm prepared to that the instance may crash. However these techniques have been very valuable in some cases where other instrumentation is not enough!