How to log on even when SYSDBA can't do so?
Background
When you have a situation where let say the archiver process is stuck, regular users and even users with DBA role may not be able to log on. This usually because new sessions logging on generate audit records or (sometimes) update SEQ$ table to get new values from a sequence for populating AUDSID. If the archiver is stuck, then any DML will get blocked as well as they can't generate any new redo until the archiver stuck issue is fixed.
In these cases the SYSDBA (and SYSOPER) privlege holders will still be able to log on and create their session. This is because SYSDBA connections do not generated any audit records to the database tables (that's why there's the audit_file_dest directory where SYSDBA audit files will be written) and SYSDBA connections also don't get their AUDSID from a sequence but use a hardcoded value (0xFFFFFFFF) instead.
But there are cases when even SYSDBA privilege holders can't log on! One situation what I've seen was due a bug, where a process never released a shared pool latch it had taken (in fact it was stuck spinning in free memory searching code). That instance happened to have only one shared pool subpool in use, protected by a single latch.
That meant that no-one else could allocate/free memory from shared pool, causing most sessions to eventually hang.
So, when the DBA tried to log on using SYSDBA privilege, he got a surprise - even SYSDBA logon got hung!
Why? The reason is that also new sessions need to allocate some memory from shared pool, for example for their V$SESSION parameter values (which are session-specific). And allocating shared pool memory would require taking the shared pool latch. But the latch was already taken and held. Thus even SYSDBA session got hung during session creation.
Luckily since Oracle 10g, there is a way to skip the session creation part (which could block) when logging on as SYSDBA.
When you log on normally (even as SYSDBA), this is what happens:
A new Oracle process is started (either by the listener or by local sqlplus if using the local BEQ connection)
The new process attaches to SGA shared memory segments (so it could access all the needed SGA structures)
The new process allocates process and session state objects and initializes new session structures in SGA
The step number 3, allocating stuff from SGA may obviously block as operations on shared memory structures need to be protected by some sort of locks (usually latches or Oracle KGX mutexes in Oracle's case). And that's exactly what had happened because of the shared pool latch as I had explained above.
Usage
So, to work around this, Oracle's sqlplus supports a new option since version 10g:
$ sqlplus -prelim "/as sysdba"
SQL*Plus: Release 11.2.0.1.0 Production on Mon Feb 15 12:40:50 2010
Copyright (c) 1982, 2009, Oracle. All rights reserved.
SQL>
The -prelim option stands for "preliminary connection". What it means is that sqlplus will only complete 2 of the above 3 steps of connection establishment:
A new Oracle process is started (either by the listener or by local sqlplus if using the local BEQ connection)
The new process attaches to SGA shared memory segments (so it could access all the needed SGA structures)
These 2 steps will not block, well at least not because Oracle's latching or locking issues, but if you have serious OS level problems (running out of some kernel resources for example) problems may still happen.
Anyway, we managed to log on using the preliminary connection, let's now see what we can query:
SQL> select * from scott.emp
2 .
SQL>
SQL> select * from scott.emp;
select * from scott.emp
*
ERROR at line 1:
ORA-01012: not logged on
Process ID: 0
Session ID: 0 Serial number: 0
SQL> select * from v$instance;
select * from v$instance
*
ERROR at line 1:
ORA-01012: not logged on
Process ID: 0
Session ID: 0 Serial number: 0
Oops! We can't query regular tables nor even V$ views, because we aren't really logged on! Because of the prelim option we do not have all the structures for query execution set up.
However, we can run any ORADEBUG command and the first one I would usually run in such hang cases is HANGANALYZE at detail level 3:
SQL> oradebug hanganalyze 3
Statement processed.
Now there's a hanganalyze trace in UDUMP generated by my preliminary connection's process.
For other commands, like ORADEBUG DUMP, you need to connect to a process (your own for example), hanganalyze works without explicitly connecting to any process:
SQL> oradebug dump systemstate 10
ORA-00074: no process has been specified
SQL>
SQL> oradebug setmypid
Statement processed.
SQL>
SQL> oradebug dump systemstate 10
Statement processed.
SQL>
I will have separate pages for interpreting hanganalyze and systemstate dump output.
Note that even though this -prelim option appeared in sqlplus in Oracle 10g, you can still take an Oracle 10g+ sqlplus and connect with it to an Oracle 9.2 database, as this option is a sqlplus client side feature rather than server feature.
You can just run sqlplus -prelim "sys/password@dbname as sysdba" to connect to a remote database.