Nested Loop Joins

In this post i am writing about Nested loop joins based on the blog article from Jonathan Lewis – NLJ
Typical pseudocode of a nested loop is similar to following:

for each row in rowsource A
loop
	for each matching row in rowsource b
	loop
		provide required columns from rowsource A and rowsource B
	end loop;
end loop;

And below i have the setup which will produce a nested loop join

drop table t4 purge;
drop table t3 purge;
create table t3
as
select
	rownum-1				n1
	, dbms_random.string('l', 10)	v1
	, trunc(sysdate) + rownum/24/60 d1
from
	all_objects
where
	rownum <= 1e3
;
alter table t3 add constraint t3_pk primary key(n1);

create table t4
as
select
	rownum				n1
	, mod(rownum, 10)		n1_t1
	, dbms_random.string('l', 10)	v1
	, trunc(sysdate) + rownum/24/60 d1
from
	all_objects
where
	rownum <= 1e3
;

alter table t4 add constraint t4_pk primary key(n1);

alter table t4 add constraint t4_fk foreign key (n1_t1)
references t3(n1);

begin
	dbms_stats.gather_table_stats(
		user
		, 't3'
		, method_opt => 'for all columns size 1'
	);
	dbms_stats.gather_table_stats(
		user
		, 't4'
		, method_opt => 'for all columns size 1'
	);
end;
/

select
	t3.n1
	, t3.v1
	, t4.d1
from
	t3
	, t4
where
	t3.n1 = t4.n1
and	t4.n1 = 5
;

select * from table(dbms_xplan.display_cursor());

The execution plan from the above is as (i have omitted some information):

------------------------------------------------------
| Id  | Operation                    | Name  | Rows  |
------------------------------------------------------
|   0 | SELECT STATEMENT             |       |       |
|   1 |  NESTED LOOPS                |       |     1 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T4    |     1 |
|*  3 |    INDEX UNIQUE SCAN         | T4_PK |     1 |
|   4 |   TABLE ACCESS BY INDEX ROWID| T3    |     1 |
|*  5 |    INDEX UNIQUE SCAN         | T3_PK |     1 |
------------------------------------------------------

Now as mentioned by Jonathan Lewis there is a start-up cost to be considered in the joins, now what is a start up cost? it is the cost that involves in the work before we can produce the first item in the resulting rowsource. Further we need to consider the wasted efforts, by wasted efforts we mean that we might have been visiting the blocks which are not having the data of our interest and we might also be visiting the same block again and again to retrieve the data due to some reason that we are not able to hold those blocks in memory.

Now here we have to look for some approaches how we can acquire the data we need and have some resources primed, so that nested loops can be more efficient. And here comes the other Joins which we will look in later posts.

But here before that we have an example where we have a hash table and a normal table which are joined through a nested loop but one of them is accessed as hash key instead of an primary(index).

drop cluster hash_cluster including tables;
create cluster hash_cluster(
	hash_key	number(6)
)
single table		--this represents that it will store only a single table
hashkeys 1000		--this represents that it will store 1000 different hash values
size 150		--this represents that size of each hash is 150 bytes
hash is hash_key	--this names the hash key in the table
;

create table hashed_table(
	n1	number(6)
	, v1	varchar2(10)
	, v2	varchar2(100)
)
cluster hash_cluster(n1)
;

alter table hashed_table add constraint ht_pk primary key (n1);

insert into hashed_table
select
	rownum
	, lpad(rownum, 10, '0')
	, rpad('*', 100, '*')
from
	all_objects
where
	rownum <= 1e3
;

commit;

drop table t1 purge;
create table t1
as
select
	rownum		n1
	, dbms_random.value(1, 1000)	n2
	, rpad('*', 50, '*')		v1
from
	all_objects
where
	rownum <= 1e4
;

alter table t1 add constraint t1_pk primary key (n1);

begin
	dbms_stats.gather_table_stats(
		user
		, 't1'
		, method_opt => 'for all columns size 1'
	);
	dbms_stats.gather_table_stats(
		user
		, 'hashed_table'
		, method_opt => 'for all columns size 1'
	);
end;
/

set autotrace traceonly explain

select
	substr(t1.v1, 1, 10)
	, substr(ht.v2, 1, 10)
from
	t1
	, hashed_table ht
where
	t1.n1 between 1 and 2
and	ht.n1	= t1.n2
and	ht.v1	= 'a'
;

set autotrace off

The plan output from the above is as following:

---------------------------------------------------------------------------------------------
| Id  | Operation                    | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |     1 |   191 |     3   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                |              |     1 |   191 |     3   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1           |     2 |   152 |     3   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | T1_PK        |     2 |       |     2   (0)| 00:00:01 |
|*  4 |   TABLE ACCESS HASH          | HASHED_TABLE |     1 |   115 |            |          |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("T1"."N1">=1 AND "T1"."N1"<=2)
   4 - access("HT"."N1"="T1"."N2")
       filter("HT"."V1"='a')

Thanks to Jonathan Lewis for sharing his knowledge about Oracle.

I am writing this series on my blog to be kept as learning for future and share the same with my friends

Cheers!!!
Jagdeep Sangwan

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s