How to read ERRORSTACK output
Errorstack tracefiles are very useful for troubleshooting ORA-600's, crashes, hangs and even just bad performance.
Errorstack dumps are dumped automatically by Oracle when critical errors such as ORA-600 happen. This is when you see an error reported in alert.log and the generated tracefile has a "ksedmp: internal or fatal error" string in its beginning, followed by the error code (usually ORA-7445 or ORA-600 with some parameters).
"ksedmp" means Kernel Service Error DuMP, so everything below that line is the errorstack dump.
Errorstack dumps can also be manually invoked by issuing ORADEBUG ERRORSTACK 3 (when being connected to the target process using ORADEBUG SETOSPID). This can be useful when a session seems to be hung (but without showing a reasonable wait event in V$SESSION_WAIT) or is consuming much more resources than normally and you want to know which exact bind variable values are currently used for executing the SQL.
Errorstack tracefile has a lot of information it, some more and some less useful for practical troubleshooting. The vast amount of data and the binary dumps may scare you away, but I hope to show that there's some well understandable structure in there.
In this article we will go through following techniques:
I will paste the relevant parts of a sample errorstack dump wherever relevant.
Reading the current executing SQL statement text from errorstack trace
Reading the current running SQL statement (if any) is easy, the statement text is dumped below the "Current SQL statement for this session" header in the top of the tracefile:
Reading the current executing PL/SQL package and PL/SQL source code line number from errorstack trace
If the process which dumped the tracefile was in a PL/SQL call, then the PL/SQL call stack is also dumped (under the PL/SQL Call Stack section). This shows you exactly in which PL/SQL procedure this problem happened and which procedure calls led Oracle to the final procedure where the execution was when the dump was done.
The PL/SQL call stack (bold/italic above) reports 3 columns:
The object handle is the address in library cache where that object executed is loaded (either PL/SQL stored procedure or anonymous block). You can map that address to X$KGLOB.KGLHDADR to find which exact object we are dealing with. However that object owner and name are reported in column 3 anyway.
This is a goldmine! It will tell you exactly in which PL/SQL source code line the execution was when the errorstack dump was taken (or when the lower function in stack had called upper one). For example, in the above output, the anonymous block had called the SYS.DELETE_FUNC in line 1 (of the anonymous block). The SYS.DELETE_FUNC happened to be at PL/SQL line 8 when the errorstack dump was taken (or crash happened etc).
The PL/SQL stored object name (or "anonymous block" when the object is not a stored procedure). In the anonymous block case (as anonymous block text is also seen from V$SQL), you can map this address to V$SQL.ADDRESS to find the anonymous block text.
To recap, the above little demo PL/SQL call stack consists of 2 lines only:
38d172270 8 function SYS.DELETE_FUNC
38d1518f0 1 anonymous block
You should read a PL/SQL call stack from bottom up:
The bottom line says that there was an anonymous block executed and at its line 1 it called the function just above it (which happens to be SYS.DELETE_FUNC)
The top line says that SYS.DELETE_FUNC was executing and at line 8 inside it, this error condition happened which caused the errorstack to be dumped.
As we found the owner and name of the object from the above dump and the line number info, we can query DBA_SOURCE by this (as DBA_SOURCE has a LINE field showing the PL/SQL source line number):
You see, the line 8 reported above is where the PL/SQL execution currently was (it invoked the recursive DELETE query which was still running).
Simply, the PL/SQL errorstack says exactly where in the PL/SQL code Oracle was executing when this error condition, dump, crash, hang, or whatever was the problem, happened (top line was the code execution currently was and the line(s) below show the "parent" functions which called their child function seen on the above line).
If the stack thing doesn't make sense right now, bear with me, I will write a separate note about it.
I will also explain the "Call Stack Trace" part in another note as it's a longer topic.
Reading the current bind variable values from errorstack trace
This is a question very frequently asked from me.
Let say this:
a session has got stuck somehow or is burning CPU like crazy without any significant waits (for example the demo in my blog article using bind_peeking_nested_loops.sql script)
So you look into what SQL is being executed and you see some SQL statement with bind variables used
The execution plan is the same as it always is, but performance is much worse than normally
So you suspect there's data skew and this execution plan does not scale well when number of rows coming from some table/rowsource gets large
So, you would want to know what were the bind variable values used for that problematic case. Unfortunately there's no V$ view where to get current bind variable values of a session. If you think V$SQL_BIND_CAPTURE, then you're wrong - it only samples bind variable values occasionally, it doesn't store every single bind variable value used somewhere.
In Oracle 11.2, the Real-Time SQL Monitoring feature makes things easier. There's a new column BIND_XML in V$SQL_MONITOR, which contains the bind variable values of queries running long enough. Accessing that view however requires you to have Diagnostics and Tuning pack license. You can read more about this feature here.
But let's get back to the original article, not everyone is running on Oracle 11.2 and not everyone has Diag+Tuning pack licenses.
There's a reason why it's non-trivial to get the bind values - the bind variable values live in the private memory of the processes. So you can't easily go and read the private memory of another process!
That's why we need to use ORADEBUG to connect to the problem process and send a signal to it so it would dump the errorstack, which luckily includes CURSORDUMP, which includes bind variable values!
The red number 8213 in the below example is the OS process ID (V$PROCESS.SPID) of the target, problem session which I want to dump:
Now that target process should have written the errorstack to UDUMP (or BDUMP if its a background process). Note that you probably don't want to connect to background processes with ORADEBUG unless you really know what you're doing!
Of course, when you are diagnosing a crash or ORA-600 (which has already happened), then you don't need to use oradebug, as the crash handler should have dumped the errorstack with cursordump to a tracefile already.
So, lets open the same tracefile (beginning shown only).
When you start scrolling through all the (binary) data which comes below, you may go crazy as there's too much (incomprehensible) data there. So what's the systematic way to reliably navigate to the actual bind variables used?
What I usually do, is search for first occurrence of "Session Cursor Dump". Note that the case matters here.
When I search for it, I will see the following output (again I only paste some here, starting from the Session Cursor Dump I searched for):
Here you go, the Current cursor: 6 says that Oracle was currently executing Cursor#6 in the open cursor array in that session's UGA (the max array size is set by open_cursors parameter).
As a side-comment, the pgadep: 1 variable shows the PGA depth, which shows what's the recursive depth of the query execution. If pgadep is 0, it would mean the query is a top-level query, being executed by the user/application via OPI interface. Pgadep 1 means that it's a recursive query, executed via recursive program interface (RPI), right under the top level query (either data dictionary query or just SQL executed via PL/SQL or vice versa). This pgadep corresponds to the "dep" value in SQL trace.
So, in order to find the bind variables of currently executing query, all I need to do is forward-search for Cursor#6 in this tracefile. Note that the case and the "missing" space between Cursor and #6 is intentional:
And here you go:
We have found the Cursor#6 from our trace, with state=BOUND which means that the bind variables have been put in place
The SQL text is what we already had identified previously and we see there's only one bind variable in there (it's automatically put there by PL/SQL execution engine).
The bind variable numbering starts from zero, so I need to search for Bind#0 if I want to see the first bind
Under that I'll just look into the value field and see value="SYS". This is the value which I had put into a bind variable for my test query.
And that's it, this is the systematic and reliable way for finding bind variable values used by some other session.
Identifying how much private (UGA) memory a cursor is using
And finally, continuing from previous example, it is possible to measure how much of your UGA private memory space some cursor is using (as one session may have many cursors open the same time):
I will paste here again few last lines of the last output:
Under the cursor frame dump you see a bunch of lines starting with kxsc, which means Kernel eXecution Shared Cursor. In each line you see a field called siz, which tells the private memory allocation size for that particular allocation type (or what's the internal allocation size, to be correct).
You can just sum all these sizes together to determine a cursors memory usage.
Here are the meanings (which I have guessed) for the interested:
Finally, even though ORADEBUG is a widely used tool by Oracle support - be careful with it (for example, search what Metalink has to say about ORADEBUG DUMP ERRORSTACK) etc. If you are unsure, ask Oracle Support for "blessing" first.
I never connect to critical background processes such as LGWR, DBWR with oradebug unless I'm prepared to that the instance may crash. However these techniques have been very valuable in some cases where other instrumentation is not enough!