Copyright

Using Process Memory Matrix script for understanding Oracle process memory usage

This tool currently works only on Solaris. I will write support for (newer) Linux versions soon and possibly also for HP-UX.

When working on a problem I wrote a script which helps to present the output of Solaris pmap in a better way. If you don't know what pmap is, it's a tool available on Solaris, Linux, HP-UX (and AIX where it's called procmap) which displays you the breakdown of each processes address space - virtual memory mappings. This is way better than just relying on ps or top command SIZE and RSS columns.

My script gives a better overview of how much memory an Oracle process is really using. Historically this has been problematic due various differences of memory accounting of shared memory segments (Oracle SGA) and the large amount of data returned from pmap command.

My script procmm.sh (Process Memory Matrix) will simply run pmap on the processes specified and aggregate the output into a matrix.

Example output is here:

oracle@solaris02:~/research/memory$ ./procmm.sh 12755

-- procmm.sh: Process Memory Matrix v1.01 by Tanel Poder ( http://tech.e2sn.com )
-- All numbers are shown in kilobytes

PID            SEGMENT_TYPE      VIRTUAL          RSS         ANON       LOCKED    SWAP_RSVD
------ -------------------- ------------ ------------ ------------ ------------ ------------
12755                   lib        22932        22872          900            0         1048
12755                oracle        95824        95820          232            0         2468
12755        ism_shmid=0x1d       409608       409608            0       409608            0
12755                  anon         5728         5512         5508            0         5724
12755                 stack          156          156          156            0          156
12755                  heap         1924          868          868            0         1924
------ -------------------- ------------ ------------ ------------ ------------ ------------
12755             TOTAL(kB)       536172       534836         7664       409608        11320

On the Horizontal axis you see the various memory sizes pmap reports (like VIRTUAL size and SWAP_RSVD - swap space reservation) and on vertical "SEGMENT_TYPE" axis you'll see for what (which mapping) in that process address these memory figures are shown (for example, Oracle binary, heap memory (think malloc()), libraries and process private memory allocations which you'll see as "anon").

I don't do any of my own computation magic here, I just show output from couple pmap commands in a better aggregated and understandable manner. So if you want to read official documentation about these figures then just run man pmap.

I will explain the columns shortly here too:

 SEGMENT_TYPE  Shows what uses that amount of memory reported.
  • lib - both Unix system and and application (Oracle) libraries
  • oracle - the oracle executable (or any other executable if examiing some other process)
  • ism_shmid=0x1d - shmid shows that we are dealing with the good old System V shared memory segment. The 0x1d here corresponds to shared memory segment's SHM ID seen from ipcs -ma output (if you convert the number to decimal).
    ism prefix shows that we are dealing with Solaris-specific Intimate Shared Memory (ISM). dism stands for dynamic intimate shared memory (DISM).
  • anon - anonymous memory deserves its own page, but shortly said this is process private memory. For example since version 9i, Oracle can allocate PGA/UGA/CGA memory by mapping /dev/zero into its process address space using mmap() system call instead of calling malloc()/brk() which just extends the process data segment (heap). The reason this memory segment type is called anonymous is that this memory doesn't correspond to any real (named) file on disk. When you load an executable, library or some datafile into memory, then their in-memory pages correspond to their on-filesystem files, thus are not anonymous. Process private allocations don't correspond to any files, thus are called anonymous pages.
  • stack - this is the relativaly small memory area (few kilobytes or few tens of kB) present in each thread where some function-local scope variables are kept (as determined by compiler) and also the return path to parent functions is remembered there. This information is used for resuming the parent function once the child functions it calls finish their work and return. Stack tracing is extremely useful for troulbleshooting and will have its own page soon.
  • heap - this is the memory "application scratch area", called data segment, which is used when you allocate memory with conventional malloc() calls. However Oracle does not use the heap for PGA/UGA's anymore thanks to the realfree heap allocation which was introduced in 9i (and enabled by default in 10g).
 VIRTUAL  This shows the total virtual address space (reservation) size in kB. Reserving virtual address space does not actually allocate corresponding physical memory from anywhere (OS'es are lazy in that sense, they only will allocate physical pages of RAM for you if you actually start touching these virtual pages. This is when the page fault mechanism will find a physical page of RAM and map it together with given process'es virtual memory page.

Note that some OS'es (Solaris, HP-UX) do automatically reserve swap space for such virtual memory segments which may need paging out in the future (writable segments). AIX and Linux (by default) will not try to reserve the swap space during virtual memory segment set-up, but that raises a risk that when running out of memory then some victim process will get an error signal or exception in random location in the code and will not be able to gracefully handle this situation.

It is important to remember that VIRTUAL memory does not equal to physical memory. Physical pages are allocated and mapped to the virtual segment only when the process actually tries to use (touch) these memory pages.

 RSS  This shows process Resident Set Size in kB. This value includes all virtual memory pages in the process address space which have a valid mapping to a physical page in kernel VM tracking stuctures, pagetables. In other words, all pages mapped to the  process address space which also happen to be in RAM.

This is why with Oracle processes you can sometimes see that when a process starts up and attaches to SGA, it's virtual size is let say 10G (assuming ~10GB SGA), but the RSS is only a few hundred megabytes. This is because that process has not yet touched all the SGA pages (which are in physical RAM), thus all the virtual-physical page mappings for that process haven't been done and won't show up in RSS.
This is specific to your instance configuration btw (large vs small pages, ISM or not etc).

 ANON  This is the key metric in procmm output. It shows in kB how much actual memory has been allocated for that process (not just reserved, but actually used, allocated). Note that this figure counts two things:
  1. Memory allocated for that process and currently in RAM (RSS)
  2. Memory allocated for that process and currently not in RAM (paged out)
Now you might ask, why do we see cases in above output where RSS is much larger than ANON? For example the "oracle" file mapping. The answer is that RSS shows both the resident named-file (corresponding to the file "oracle") contents and also the private anonymous memory pages which don't correspond to any named file (thus are "anonymoys"), but just to the process.

So, ANON is the figure which allows to answer the question of "How much memory are the Oracle processes really using?".

Note that there are many subtleties involved here and neither pmap nor my procmm.sh script are perfect, but on Solaris reading ANON is a quick and decent way for understanding Oracle memory usage.

 LOCKED  This shows (in kB) how much memory in the process address space is also locked to physical RAM. Intimate Shared Memory pages are always automatically locked to memory, thus can't ever be paged out (non-pageability makes implementation of shared pagetables easier). Other types of pages (Dynamic ISM for example) can be locked to memory using mlock() system calls - if the program has root access. This is why the oradism process exists in $ORACLE_HOME/bin

 SWAP_RSVD  On Solaris this shows how much memory (in kB) has been reserved in swapfs (which is the on-disk swap area + virtual swap cache in RAM). Reserving swap does not mean it's actually used and filled with some data, it merely means that Solaris adjusts its "free swap space left" counters so that it would be quaranteed that this memory could be swapped out in case there's need for that.



In the above example I examined only a single process. Below I pass all processes of an instance as a parameter and procmm walks through them. This is not a cheap and fast process, so shouldn't run this frequently.

The -t option below means "Total", the script doesn't show individual PID memory breakdown but sum of all PIDs passed to it.

oracle@solaris02:~/research/memory$ ./procmm.sh -t `pgrep -f ora_.*SOL102`

-- procmm.sh: Process Memory Matrix v1.01 by Tanel Poder ( http://tech.e2sn.com )
-- All numbers are shown in kilobytes

Total PIDs 17, working: .................

PID            SEGMENT_TYPE      VIRTUAL          RSS         ANON       LOCKED    SWAP_RSVD
------ -------------------- ------------ ------------ ------------ ------------ ------------
0                       lib       389844       388796        13180            0        17816
0                    oracle      1629064      1628908         3336            0        42012
0            ism_shmid=0x1d      6963336      6963336            0      6963336            0
0             hc_SOL102.dat           48           48            0            0            0
0                      anon        32936        15936        15452            0        32868
0                     stack         1660         1628         1592            0         1660
0                      heap        37004        18016        16844            0        37004
------ -------------------- ------------ ------------ ------------ ------------ ------------
0                 TOTAL(kB)      9053892      9016668        50404      6963336       131360

-- Note that in Total (-t) calculation mode it makes sense to look into ANON and SWAP_RSVD
-- totals only as other numbers may be heavily "doublecounted" due overlaps of shared mappings

The ANON figure reports roughly 50404 kB as the Oracle instance processes actual memory usage (I should be more precise and say memory allocation).

However the total 50404 kB also includes 13180 kB of ANON memory allocated by various libraries (in addition to Oracle libraries also multiple OS libraries which are not under Oracle's control). Also, total 3336 kB of private (writable) ANON memory has been allocated "in" oracle binary. This is because the BSS section in the binary which holds various static global variables (static function-local variables often go to stack, but global variables used over object modules go to BSS section).

Every new Oracle process reuses the shared Oracle binary pages (only one copy of Oracle binary is in memory), but when the process tries to write to the variables section, then the OS copies the shared page into a new physical page and maps the new page to process address space as a writable page. That's called copy on write.

To Be Continued...

Comments