Query execution plans, hints and the fundamental difference between nested loops and hash joins

(this article is still being written!)

Usually, when asked about what's the main difference between nested loop joins and hash joins, the answer will be that hash join uses a hash-table based lookup mechanism while nested loop doesn't or that the hash join can use cursor work-area memory (allocated in UGA) for buffering rows, while nested loops join can not, etc. These answers are not wrong, but I think there's one less explained, but more important, more substantial difference between these join methods. This article explains it.

I'm using Swingbench order entry schema for the examples.

I'm going to use only thee tables, WAREHOUSES, ORDERS and ORDER_ITEMS and you can see my awesome ASCII-art of the ERD diagram below:

                                +-------------------+            +-------------------+

                                |    ORDERS         |            |   ORDER_ITEMS     |

                                +-------------------+            +-------------------+

                                |   ORDER_ID        |-----------<|   ORDER_ID        |

                                |   ORDER_DATE      |            |   LINE_ITEM_ID    |

                                |   ORDER_MODE      |            |   PRODUCT_ID      |

                                |   CUSTOMER_ID     |            |   UNIT_PRICE      |

+-------------------+           |   ORDER_STATUS    |            |   QUANTITY        |

|   WAREHOUSES      |           |   ORDER_TOTAL     |            +-------------------+

+-------------------+           |   SALES_REP_ID    |

|   WAREHOUSE_ID    |----------<|   WAREHOUSE_ID    |                      

|   WAREHOUSE_NAME  |           |   PROMOTION_ID    | 

|   LOCATION_ID     |           +-------------------+

+-------------------+                                

   

See how the foreign key relationships have been laid out. The PK columns are of course indexed.

So, let's say, you want to write a report, which shows you this below (it's a business statement, basically the query written in english):

So, based on the above data model and the question (and further clarifications with the analysts as the above text isn't fully clear) you will write the following query:

This query should be self-explanatory but I still comment on few things:

Ok, let's run the damn thing. For brevity I have put the above query into a query.sql file and will just run the script (with timing on):

SQL> @query

SUM(OI.QUANTITY*OI.UNIT_PRICE)

------------------------------

                    1629881372

Elapsed: 00:00:04.58

Hmm... 4.58 seconds. Let's say this is too much for our users. Let's see what's going on.

I'm going to run the query again with SQL plan row-source level profiling, to see how many rows are being processed in each step:

SQL> ALTER SESSION SET statistics_level = ALL;

Session altered.

SQL> @query

SUM(OI.QUANTITY*OI.UNIT_PRICE)

------------------------------

                    1629881372

1 row selected.

Elapsed: 00:00:27.39

(Instead of the alter session command I could have used GATHER_PLAN_STATISTICS hint in the query)

Whoa! The query got much slower thanks to this setting - well that's what happens when you want very detailed instrumentation... but anyway, I care about the row-counts right now and won't look into the A-Time numbers as they are exaggerated thanks to the row-source level profiling (nevertheless, the A-Time gives a decent indication of where most of the time would be spent also when running the query without this profiling enabled):

-------------------------------------------------------------------------------------------------------------------------------------------------

| Id  | Operation                      | Name             | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |

-------------------------------------------------------------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT               |                  |      1 |        |      1 |00:00:27.39 |   95370 |  66331 |       |       |          |

|   1 |  SORT AGGREGATE                |                  |      1 |      1 |      1 |00:00:27.39 |   95370 |  66331 |       |       |          |

|*  2 |   HASH JOIN                    |                  |      1 |  10658 |    118K|00:00:27.31 |   95370 |  66331 |  1767K|  1347K| 2008K (0)|

|   3 |    NESTED LOOPS                |                  |      1 |        |  49217 |00:00:04.04 |   47528 |  18494 |       |       |          |

|   4 |     NESTED LOOPS               |                  |      1 |   4518 |  49217 |00:00:00.09 |     208 |     10 |       |       |          |

|*  5 |      TABLE ACCESS FULL         | WAREHOUSES       |      1 |      1 |      1 |00:00:00.01 |       7 |      0 |       |       |          |

|*  6 |      INDEX RANGE SCAN          | ORD_WAREHOUSE_IX |      1 |   4532 |  49217 |00:00:00.04 |     201 |     10 |       |       |          |

|   7 |     TABLE ACCESS BY INDEX ROWID| ORDERS           |  49217 |   4504 |  49217 |00:00:03.82 |   47320 |  18484 |       |       |          |

|*  8 |    TABLE ACCESS FULL           | ORDER_ITEMS      |      1 |     10M|     10M|00:00:06.50 |   47842 |  47837 |       |       |          |

-------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - access("O"."ORDER_ID"="OI"."ORDER_ID")

   5 - filter("W"."WAREHOUSE_NAME"='test')

   6 - access("W"."WAREHOUSE_ID"="O"."WAREHOUSE_ID")

   8 - filter("OI"."QUANTITY">1)

When you look into the red "Actual Rows" numbers above, you'll see that we are joining 49k rows with 10M rows in the step number 2 in the execution plan (if you follow the indentation, you'll see that the operation at line number 2 is the parent of the operations 3 and 8).

So, what can we do to make the query run faster? The first thing which catches the eye (especially in a regular OLTP system) is that full table scan in line 8 which reads over 47 thousand blocks via physical reads. As we are looking for a subset of rows only in our query, shouldn't we be able to use some index for accessing it?

So, let's do the wrong thing first and just dump an INDEX hint into the query as "I read from somewhere that index hints make everything faster" :)

  

SQL> ALTER SESSION SET statistics_level = TYPICAL; -- let's remove the row-source level profiling for now

Session altered.

SQL> ed

Wrote file afiedit.sql

  1  SELECT /*+ INDEX(oi) */

  2      SUM(oi.quantity * oi.unit_price)

  3  FROM

  4      warehouses w

  5    , orders o

  6    , order_items oi

  7  WHERE

  8  -- join

  9      w.warehouse_id = o.warehouse_id

 10  AND o.order_id = oi.order_id

 11  -- filter (constants)

 12  AND w.warehouse_name = 'test'

 13* AND oi.quantity > 1

 14  /

SUM(OI.QUANTITY*OI.UNIT_PRICE)

------------------------------

                    1629881372

1 row selected.

Elapsed: 00:00:11.13

Wow, instead of ~4 seconds earlier, this query runs in 11 seconds! So much for INDEX hints making things always faster :)

Let's look into the plan:

---------------------------------------------------------------------

| Id  | Operation                       | Name             | E-Rows |

---------------------------------------------------------------------

|   0 | SELECT STATEMENT                |                  |        |

|   1 |  SORT AGGREGATE                 |                  |      1 |

|   2 |   NESTED LOOPS                  |                  |        |

|   3 |    NESTED LOOPS                 |                  |  10658 |

|   4 |     NESTED LOOPS                |                  |   4518 |

|*  5 |      TABLE ACCESS FULL          | WAREHOUSES       |      1 |

|   6 |      TABLE ACCESS BY INDEX ROWID| ORDERS           |   4504 |

|*  7 |       INDEX RANGE SCAN          | ORD_WAREHOUSE_IX |   4532 |

|*  8 |     INDEX RANGE SCAN            | ITEM_ORDER_IX    |      3 |

|*  9 |    TABLE ACCESS BY INDEX ROWID  | ORDER_ITEMS      |      2 |

---------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   5 - filter("W"."WAREHOUSE_NAME"='test')

   7 - access("W"."WAREHOUSE_ID"="O"."WAREHOUSE_ID")

   8 - access("O"."ORDER_ID"="OI"."ORDER_ID")

   9 - filter("OI"."QUANTITY">1)

Oh, now I see! The join order is stil correct, an index is used on this ORDER_ITEMS with 13M rows, but Oracle has stubbornly switched the biggest join to NESTED LOOPS. That's why everything is slow as everyone knows, NESTED LOOPS joins aren't very good with huge datasets, right :)

So, let's force a hash join then as "hash joins are always fast when joining lots of rows", right :)

SQL> l

  1  SELECT /*+

  2           USE_HASH(oi)

  3           INDEX(oi)

  4         */

  5      SUM(oi.quantity * oi.unit_price)

  6  FROM

  7      warehouses w

  8    , orders o

  9    , order_items oi

 10  WHERE

 11  -- join

 12      w.warehouse_id = o.warehouse_id

 13  AND o.order_id = oi.order_id

 14  -- filter (constants)

 15  AND w.warehouse_name = 'test'

 16* AND oi.quantity > 1

 17  /

SUM(OI.QUANTITY*OI.UNIT_PRICE)

------------------------------

                    1629881372

1 row selected.

Elapsed: 00:08:12.11

What the hell? I just "tuned" my query from 4 seconds to 8 minutes!

Um, I guess that applying random hints from internet discussion boards isn't such a good idea after all!

Let's see what happened to the execution plan:

-----------------------------------------------------------------------------------------------

| Id  | Operation                      | Name             | E-Rows |  OMem |  1Mem | Used-Mem |

-----------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT               |                  |        |       |       |          |

|   1 |  SORT AGGREGATE                |                  |      1 |       |       |          |

|*  2 |   HASH JOIN                    |                  |  10658 |  1767K|  1347K| 1981K (0)|

|   3 |    NESTED LOOPS                |                  |        |       |       |          |

|   4 |     NESTED LOOPS               |                  |   4518 |       |       |          |

|*  5 |      TABLE ACCESS FULL         | WAREHOUSES       |      1 |       |       |          |

|*  6 |      INDEX RANGE SCAN          | ORD_WAREHOUSE_IX |   4532 |       |       |          |

|   7 |     TABLE ACCESS BY INDEX ROWID| ORDERS           |   4504 |       |       |          |

|*  8 |    TABLE ACCESS BY INDEX ROWID | ORDER_ITEMS      |     10M|       |       |          |

|   9 |     INDEX FULL SCAN            | ITEM_ORDER_IX    |     13M|       |       |          |

-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - access("O"."ORDER_ID"="OI"."ORDER_ID")

   5 - filter("W"."WAREHOUSE_NAME"='test')

   6 - access("W"."WAREHOUSE_ID"="O"."WAREHOUSE_ID")

   8 - filter("OI"."QUANTITY">1)

Thanks to the hints I applied, Oracle is forced to use hash join when joining the ORDER_ITEMS to a driving row source and it's forced to use access the ORDER_ITEMS table through any index, even if it doesn't make sense to do so (with currently available indexes). That's why we end up doing an INDEX FULL SCAN (which is essentially a range scan through all the index leaf blocks), followed by a table block access for each index row. This table has over 13 million rows. The excessive single block read physical IOs killed the performance. Forcing any index to be used this way (without knowing exactly why and how it should help) is not good.

So, despite my attempts to make use of an index and hash join, I didn't make my query any faster (it actually went way slower thanks to my "tuning").

Before we go on, I have to repeat that all the hinting above was the wrong way to add hints. 

First, such hints controlling join methods and access paths should be added only when you know exactly which execution plan you want to achieve and the CBO doesn't automatically generate it for some reason. If you don't know which exact execution plan you want to achieve, then don't bother using these hints, you may make matters worse as seen above. Or even worse, you may accidentally fix one issue with such incomplete hinting, release the code to production, where a few weeks later it seriously breaks as something else has changed (but your hint is still in place, limiting optimizer options).

And second, if you want to really control your execution plans (and you know exactly what you want), then the first thing you should figure out is the join order. If you don't control your join order, you don't control the execution plan. In my above examles, I wanted to access table WAREHOUSES first, then join ORDERS to it and then join ORDER_ITEMS to their result. It was purely by accident that the join order remained so after my incomplete and incorrect hinting above. What I really should have used is a LEADING() hint for specifying the join order (the good old ORDERED hint has some problems, which I'll explain some time in the future).

So, before we go on, I'll recap something about execution plans. These 3 (actually 4) things are the most important properties of an execution plan:

If you want complete control over your execution plan, you will need to fix all of these aspects of the plan. I have left the property 4 (in which step some filter predicates are applied) as it's different from first 3. The first 3 properties decide the structure, layout of the execution plan, the "how the data flows in the execution plan tree", while the 4th just defines how much of the data will flow through that tree, but the tree structure itself remains the same. Also, there's somewhat less control over filtering (via hints) compared to the first 3 properties.

When I optimized the query execution plan in my head, I decided that I want to start from the smallest row source first (the row source returning the least rows as per my estimation):

My problem query with proper hints (assuming that I want to use hash joins and proper indexes all over the place) should look like this:

  1  SELECT /*+

  2           LEADING(w o)

  3           USE_HASH(o) USE_HASH(oi)

  4           INDEX(o o(warehouse_id)) INDEX(oi oi(order_id))

  5         */

  6      SUM(oi.quantity * oi.unit_price)

  7  FROM

  8      warehouses w

  9    , orders o

 10    , order_items oi

 11  WHERE

 12  -- join

 13      w.warehouse_id = o.warehouse_id

 14  AND o.order_id = oi.order_id

 15  -- filter (constants)

 16  AND w.warehouse_name = 'test'

 17* AND oi.quantity > 1

SQL> /

SUM(OI.QUANTITY*OI.UNIT_PRICE)

------------------------------

                    1629881372

1 row selected.

Elapsed: 00:13:47.07

See, how I have organized the hints into 3 lines, each line covers one important property of the execution plan (join order, join methods, access path selection using the new index hint syntax).

The only problem is, that despite proper indexes and hash joins being used, the query now takes over 13 minutes, instead of the 4.5 seconds we saw at first. This time I ran the query also with statistics_level = all so we get an idea where most of the time is spent:

-------------------------------------------------------------------------------------------------------------------------------------------------

| Id  | Operation                      | Name             | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |

-------------------------------------------------------------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT               |                  |      1 |        |      1 |00:13:47.06 |    8847K|   4329K|       |       |          |

|   1 |  SORT AGGREGATE                |                  |      1 |      1 |      1 |00:13:47.06 |    8847K|   4329K|       |       |          |

|*  2 |   HASH JOIN                    |                  |      1 |  10658 |    118K|00:13:46.93 |    8847K|   4329K|  1767K|  1347K| 1961K (0)|

|*  3 |    HASH JOIN                   |                  |      1 |   4518 |  49217 |00:01:03.03 |    4284K|    170K|  1036K|  1036K|  383K (0)|

|*  4 |     TABLE ACCESS FULL          | WAREHOUSES       |      1 |      1 |      1 |00:00:00.01 |       7 |      6 |       |       |          |

|   5 |     TABLE ACCESS BY INDEX ROWID| ORDERS           |      1 |   4499K|   4499K|00:00:55.97 |    4284K|    170K|       |       |          |

|   6 |      INDEX FULL SCAN           | ORD_WAREHOUSE_IX |      1 |   4527K|   4499K|00:00:05.51 |   14379 |  11440 |       |       |          |

|*  7 |    TABLE ACCESS BY INDEX ROWID | ORDER_ITEMS      |      1 |     10M|     10M|00:12:25.92 |    4563K|   4159K|       |       |          |

|   8 |     INDEX FULL SCAN            | ITEM_ORDER_IX    |      1 |     13M|     13M|00:00:16.92 |   31555 |  31555 |       |       |          |

-------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - access("O"."ORDER_ID"="OI"."ORDER_ID")

   3 - access("W"."WAREHOUSE_ID"="O"."WAREHOUSE_ID")

   4 - filter("W"."WAREHOUSE_NAME"='test')

   7 - filter("OI"."QUANTITY">1)

Check the above plan. Hash joins are used, indexes are (kind of) used too. Look into the A-Rows column on line 4. We get only one row from the WAREHOUSES table, which means we'll have only one WAREHOUSE_ID to search for - and then the parent hash join (line 3) will get data from ORDERS table, using the index on WAREHOUSE_ID column on it. But we are not really using the index to traverse through only the rows with that we looked for, but instead are doing an index full scan on the ORD_WAREHOUSE_IX index and are returning all 4M+ rows back to the hash join from there!

So, for some reason the hash join was not able to understand that only a single row was returned from the driving table (WAREHOUSES) and use the join columns value (WAREHOUSE_ID) to look up only the matching values from the probe table using an index. And this is exactly what is the difference between a nested loops join and a hash join!

I will come up with a contest of who can phrase this statement the best, but here's my take on this issue:

When joining table A and B (A being driving table and B being the probed table), then hash joins can not perform index lookups into table B based on the values returned from table A !!!

Nested loops can do that - basically the nested loop joins invoke (start) the probed row source (table or index B) every time they get a new row from the driving row source (table A). So basically nested loops will do a million index range scans on table B if the table A returned a million rows. That's just how nested loops work, for every row they get from table A, they perform a lookup operation (whatever it will be, like table or index access) on the table B. 

This may mean lots of logical IOs as some blocks are going to be re-visited again and again and again. This action is also evident from the Starts column in DBMS_XPLAN output (with rowsource profiling enabled). With hash joins you'll see that every hash join calls both of its row sources only once, but a nested loop "Starts" (starts a completely new access, index lookup or scan) the probe row source (table B) once for every row returned from the driving row source (with some exceptions, as usual).

Oh, now it's time to get my beer. I've written so far only half of the planned article, so stay tuned!

....to be contiued... :-)