dbTalk Databases Forums  

Why is this query slow?

comp.databases.oracle.server comp.databases.oracle.server


Discuss Why is this query slow? in the comp.databases.oracle.server forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
vsevolod afanassiev
 
Posts: n/a

Default Why is this query slow? - 03-12-2010 , 04:34 AM






I have been doing SQL query tuning for a while and in most cases this
is relatively straightforward excercise. For example: if a query
selects only a few rows from large table then it should be accessed
through index, if a query selects more than 10% of the rows the table
should be accessed through full table scan. In many cases I was able
to reduce number of buffer gets from something like 200,000,000 to
less than 1000 per execution by creating indexes and adding a few
hints.

The difficulty I have now is that the query is very simple and as far
as I can see there is nothing wrong
with plan. Still it is slow.

The query is

select * from transactions
where tx_date between (date1 and date2) and
status = 'AAA' and
ID not in
(select ID from transaction where tx_date between (date3 and date4)
and status = 'AAA')

So we have a query and a subquery. Subquery is not correlated. Table
has 11 million rows.
Nothing fancy: no LOBS, LONG, etc, all columns are either NUMBER or
VARCHAR2.
Column ID has type VARCHAR2(32)

Executed separately both query and subquery run in less than a
second,
use index on tx_date. Main query returns 250 records while subquery
returns 2500.

I am running it on lightly loaded UNIX server with plenty of free
memory, SGA is 4 GB, PGA is 500 MB, there are no other heavy queries.
Still it takes 2 minutes.

Autotrace shows 2.2 million buffer gets and no disk reads.

There is nothing in wait events: no latch waits, no buffer busy, etc.

My main problem is: there is nothing wrong with execution plan for
complete query.
It shows use of the same index on tx_date by both main query and
subquery and FILTER.

If I save results of the subquery in a temporary table then such
modified query
completes in less than a second:

select * from transactions
where tx_date between (date1 and date2) and
status = 'AAA'
ID not in
(select ID from temp_table)

Surely Oracle should be able to hold 2,500 records from the subquery
in
memory and access it more efficiently than my temporary table? But it
doesn't do it.

I can re-write the query using MINUS and it will run much faster. I am
just puzzled why is it so slow.

This is plan from AUTOTRACE (9.2.0.8 on AIX):

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=45545 Card=81
Bytes=3888)
1 0 FILTER
2 1 FILTER
3 2 ABLE ACCESS (BY INDEX ROWID) OF
'TRANSACTIONS' (Cost=1076 Card=81 Bytes=3888)
4 3 NDEX (RANGE SCAN) OF 'TRANSACTIONS_IND01' (NON-
UNIQUE) (Cost=821 Card=105815)
5 1 FILTER
6 5 TABLE ACCESS (BY INDEX ROWID) OF
'TRANSACTIONS' (Cost=549 Card=4 Bytes=192)
7 6 INDEX (RANGE SCAN) OF 'TRANSACTIONS_IND01' (NON-
UNIQUE) (Cost=419Card=53853)


Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
2326009 consistent gets
0 physical reads
0 redo size
7609 bytes sent via SQL*Net to client
832 bytes received via SQL*Net from client
18 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
253 rows processed

Reply With Quote
  #2  
Old   
ddf
 
Posts: n/a

Default Re: Why is this query slow? - 03-12-2010 , 07:46 AM






On Mar 12, 5:34*am, vsevolod afanassiev
<vsevolod.afanass... (AT) gmail (DOT) com> wrote:
Quote:
I have been doing SQL query tuning for a while and in most cases this
is relatively straightforward excercise. For example: if a query
selects only a few rows from large table then it should be accessed
through index, if a query selects more than 10% of the rows the table
should be accessed through full table scan. In many cases I was able
to reduce number of buffer gets from something like 200,000,000 to
less than 1000 per execution by creating indexes and adding a few
hints.

The difficulty I have now is that the query is very simple and as far
as I can see there is nothing wrong
with plan. Still it is slow.

The query is

select * from transactions
where tx_date between (date1 and date2) and
status = 'AAA' and
ID not in
(select ID from transaction where tx_date between (date3 and date4)
and status = 'AAA')

So we have a query and a subquery. Subquery is not correlated. Table
has 11 million rows.
Nothing fancy: no LOBS, LONG, etc, all columns are either NUMBER or
VARCHAR2.
Column ID has type VARCHAR2(32)

Executed separately both query and subquery run in less than a
second,
use index on tx_date. Main query returns 250 records while subquery
returns 2500.

I am running it on lightly loaded UNIX server with plenty of free
memory, SGA is 4 GB, PGA is 500 MB, there are no other heavy queries.
Still it takes 2 minutes.

Autotrace shows 2.2 million buffer gets and no disk reads.

There is nothing in wait events: no latch waits, no buffer busy, etc.

My main problem is: there is nothing wrong with execution plan for
complete query.
It shows use of the same index on tx_date by both main query and
subquery and FILTER.

If I save results of the subquery in a temporary table then such
modified query
completes in less than a second:

select * from transactions
where tx_date between (date1 and date2) and
status = 'AAA'
ID not in
(select ID from temp_table)

Surely Oracle should be able to hold 2,500 records from the subquery
in
memory and access it more efficiently than my temporary table? But it
doesn't do it.

I can re-write the query using MINUS and it will run much faster. I am
just puzzled why is it so slow.

This is plan from AUTOTRACE (9.2.0.8 on AIX):

Execution Plan
----------------------------------------------------------
* *0 * * *SELECT STATEMENT Optimizer=CHOOSE (Cost=45545 Card=81
Bytes=3888)
* *1 * *0 * FILTER
* *2 * *1 * * FILTER
* *3 * *2 * * * ABLE ACCESS (BY INDEX ROWID) OF
'TRANSACTIONS' (Cost=1076 Card=81 Bytes=3888)
* *4 * *3 * * * * NDEX (RANGE SCAN) OF 'TRANSACTIONS_IND01' (NON-
UNIQUE) (Cost=821 Card=105815)
* *5 * *1 * * FILTER
* *6 * *5 * * *TABLE ACCESS (BY INDEX ROWID) OF
'TRANSACTIONS' (Cost=549 Card=4 Bytes=192)
* *7 * *6 * * * *INDEX (RANGE SCAN) OF 'TRANSACTIONS_IND01' (NON-
UNIQUE) (Cost=419Card=53853)

Statistics
----------------------------------------------------------
* * * * * 0 *recursive calls
* * * * * 0 *db block gets
* * 2326009 *consistent gets
* * * * * 0 *physical reads
* * * * * 0 *redo size
* * * *7609 *bytes sent via SQL*Net to client
* * * * 832 *bytes received via SQL*Net from client
* * * * *18 *SQL*Net roundtrips to/from client
* * * * * 0 *sorts (memory)
* * * * * 0 *sorts (disk)
* * * * 253 *rows processed
How does this rewrite perform:

with tlimit as (
select ID
from transaction
where tx_date between (date3 and date4)
and status = 'AAA'
)
select * from transactions
where tx_date between (date1 and date2) and
status = 'AAA' and
ID not in
(select ID from tlimit);

Yes, it's similar to your temp table version except this uses only one
permanant object. You did not mention how many rows were in this
transaction table nor did you post an event 10046 trace -- possibly
you should set event 10046 and see what the trace file reports. You
also failed to post autotrace results from your temp table query; how
do those differ from the autotrace statistics you reported here? Look
especially at the consistent gets returned as I expect they will be
far fewer in the factored query (and in your temp table version).




David Fitzjarrell

Reply With Quote
  #3  
Old   
Charles Hooper
 
Posts: n/a

Default Re: Why is this query slow? - 03-12-2010 , 07:46 AM



On Mar 12, 5:34*am, vsevolod afanassiev
<vsevolod.afanass... (AT) gmail (DOT) com> wrote:
Quote:
I have been doing SQL query tuning for a while and in most cases this
is relatively straightforward excercise. For example: if a query
selects only a few rows from large table then it should be accessed
through index, if a query selects more than 10% of the rows the table
should be accessed through full table scan. In many cases I was able
to reduce number of buffer gets from something like 200,000,000 to
less than 1000 per execution by creating indexes and adding a few
hints.
The 10% rule is not suitable in all cases. Last year I posted to this
group a test case that showed a full table scan was faster when
selecting 0.06% of the rows in a 100,000,000 row table than was an
index access when the index had a high clustering factor. However,
10% to 20% is probably an OK tipping point in most cases.

Quote:
The difficulty I have now is that the query is very simple and as far
as I can see there is nothing wrong
with plan. Still it is slow.

The query is

select * from transactions
where tx_date between (date1 and date2) and
status = 'AAA' and
ID not in
(select ID from transaction where tx_date between (date3 and date4)
and status = 'AAA')
Columns that are not explicitly defined as NOT NULL might be causing
performance problems for you when the optimizer is not able to
transform the SQL statement into a more efficient form. See this blog
article:
http://hoopercharles.wordpress.com/2...n-in-subquery/
(Notice the change in the execution plan in the above link)

I would be inclined to re-write the SQL statement to look something
like this:
select
T.*
from
transactions T,
(select DISTINCT
ID
from
temp_table) TT
where
T.tx_date between date1 and date2
and T.status = 'AAA'
AND t.ID=TT.ID(+)
AND TT.ID IS NULL;

You might also take a look at this AskTom thread:
http://asktom.oracle.com/pls/asktom/...313 616750808

Charles Hooper
Co-author of "Expert Oracle Practices: Oracle Database Administration
from the Oak Table"
http://hoopercharles.wordpress.com/
IT Manager/Oracle DBA
K&M Machine-Fabricating, Inc.

Reply With Quote
  #4  
Old   
Mladen Gogala
 
Posts: n/a

Default Re: Why is this query slow? - 03-12-2010 , 08:12 AM



On Fri, 12 Mar 2010 02:34:02 -0800, vsevolod afanassiev wrote:

Quote:
select * from transactions
where tx_date between (date1 and date2) and status = 'AAA' and
ID not in
(select ID from transaction where tx_date between (date3 and date4) and
status = 'AAA')
The problem is probably in the subquery. The "ID not in" part is resolved
by executing the subquery and checking that none of the retrieved results
matches the required ID in a gigantic loop. You could try something like
this:

select * from transactions tx1
where tx1.tx_date between (date1 and date2) and
status = 'AAA' and
not exists
(select 1 from transaction tx2 where tx2.tx_date between (date3 and date4)
and tx2.status = 'AAA' and tx2.id = tx1.id )

Logically, that is the same thing. If ID is the primary key column, there
will be a unique key lookup and you're done.


--
http://mgogala.byethost5.com

Reply With Quote
  #5  
Old   
Lothar Armbrüster
 
Posts: n/a

Default Re: Why is this query slow? - 03-12-2010 , 10:40 AM



vsevolod afanassiev <vsevolod.afanassiev (AT) gmail (DOT) com> writes:

[...]
Quote:
If I save results of the subquery in a temporary table then such
modified query
completes in less than a second:

select * from transactions
where tx_date between (date1 and date2) and
status = 'AAA'
ID not in
(select ID from temp_table)

Surely Oracle should be able to hold 2,500 records from the subquery
in
memory and access it more efficiently than my temporary table? But it
doesn't do it.

One way to speed up NOT IN queries is the HASH_AJ hint.
The query would look like the following:

select * from transactions
where
tx_date between (date1 and date2) and
status='AAA' and
id is not null and
id not in (select /*+ HASH_AJ */
id
from
temp_table
where
id is not null)

Maybe you can leave out one or even both of the not null
restrictions. Have a look at the execution plan if a hash anti join is
used. I think there is also an initialization parameter to automatically
enable hash anti joins.

Quote:
I can re-write the query using MINUS and it will run much faster. I am
just puzzled why is it so slow.

This is plan from AUTOTRACE (9.2.0.8 on AIX):

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=45545 Card=81
Bytes=3888)
1 0 FILTER
2 1 FILTER
3 2 ABLE ACCESS (BY INDEX ROWID) OF
'TRANSACTIONS' (Cost=1076 Card=81 Bytes=3888)
4 3 NDEX (RANGE SCAN) OF 'TRANSACTIONS_IND01' (NON-
UNIQUE) (Cost=821 Card=105815)
5 1 FILTER
6 5 TABLE ACCESS (BY INDEX ROWID) OF
'TRANSACTIONS' (Cost=549 Card=4 Bytes=192)
7 6 INDEX (RANGE SCAN) OF 'TRANSACTIONS_IND01' (NON-
UNIQUE) (Cost=419Card=53853)


Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
2326009 consistent gets
0 physical reads
0 redo size
7609 bytes sent via SQL*Net to client
832 bytes received via SQL*Net from client
18 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
253 rows processed



HTH,
Lothar

--
Lothar Armbrüster | lothar.armbruester (AT) t-online (DOT) de
Hauptstr. 26 |
65346 Eltville |

Reply With Quote
  #6  
Old   
joel garry
 
Posts: n/a

Default Re: Why is this query slow? - 03-12-2010 , 12:32 PM



On Mar 12, 5:46*am, Charles Hooper <hooperc2... (AT) yahoo (DOT) com> wrote:
Quote:
On Mar 12, 5:34*am, vsevolod afanassiev

vsevolod.afanass... (AT) gmail (DOT) com> wrote:
I have been doing SQL query tuning for a while and in most cases this
is relatively straightforward excercise. For example: if a query
selects only a few rows from large table then it should be accessed
through index, if a query selects more than 10% of the rows the table
should be accessed through full table scan. In many cases I was able
to reduce number of buffer gets from something like 200,000,000 to
less than 1000 per execution by creating indexes and adding a few
hints.

The 10% rule is not suitable in all cases. *Last year I posted to this
group a test case that showed a full table scan was faster when
selecting 0.06% of the rows in a 100,000,000 row table than was an
index access when the index had a high clustering factor. *However,
10% to 20% is probably an OK tipping point in most cases.
That is stated so much better than some other people who insist you
can't use any rule of thumb at all.

Quote:
The difficulty I have now is that the query is very simple and as far
as I can see there is nothing wrong
with plan. Still it is slow.

The query is

select * from transactions
where tx_date between (date1 and date2) and
status = 'AAA' and
ID not in
(select ID from transaction where tx_date between (date3 and date4)
and status = 'AAA')

Columns that are not explicitly defined as NOT NULL might be causing
performance problems for you when the optimizer is not able to
transform the SQL statement into a more efficient form. *See this blog
article:http://hoopercharles.wordpress.com/2...ts-of-potentia...
(Notice the change in the execution plan in the above link)

I would be inclined to re-write the SQL statement to look something
like this:
select
* T.*
from
* transactions T,
* (select DISTINCT
* * ID
* from
* * temp_table) TT
where
* T.tx_date between date1 and date2
* and T.status = 'AAA'
* AND t.ID=TT.ID(+)
* AND TT.ID IS NULL;

I habitually avoid DISTINCT as it has so often meant unnecessary
sorting; but of course, that is a bad habit from the RBO days, it
should be proven, or at least attempted, for each appropriate
circumstance, as here it would not be unnecessary anyways. Thanks for
making us think Charles! I never would have thought of the outer join
combined with is null, this may have practical implications for me for
things I do similar to the OP. As they are mostly time-pressured one-
offs, I wasn't bothering with hours of performance tuning for tens of
minutes of processing, just living with it. But this gives a glimmer
of hope.

jg
--
@home.com is bogus.
http://www.itworld.com/security/1003...it-cannot-stop

Reply With Quote
  #7  
Old   
Maxim Demenko
 
Posts: n/a

Default Re: Why is this query slow? - 03-12-2010 , 01:26 PM



On 12.03.2010 14:46, Charles Hooper wrote:
Quote:
On Mar 12, 5:34 am, vsevolod afanassiev
vsevolod.afanass... (AT) gmail (DOT) com> wrote:

I would be inclined to re-write the SQL statement to look something
like this:
select
T.*
from
transactions T,
(select DISTINCT
ID
from
temp_table) TT
where
T.tx_date between date1 and date2
and T.status = 'AAA'
AND t.ID=TT.ID(+)
AND TT.ID IS NULL;


Charles Hooper
Co-author of "Expert Oracle Practices: Oracle Database Administration
from the Oak Table"
http://hoopercharles.wordpress.com/
IT Manager/Oracle DBA
K&M Machine-Fabricating, Inc.
It would be however not equivalent to the original query ( in terms of
returned resultset, that execution plan will differ - it is obvious). By
anti join you would return the rows from TRANSACTIONS table, which *may*
contain ID=NULL. By semijoin (as provided by OP) no NULLs can be
returned. That is why, if both, TRANSACTION.ID and TRANSACTIONS.ID are
known to optimizer as NOT NULL, it may be able to rewrite semijoin to
antijoin on its own (as shown by Lothar). Here is small testcase on
10.2.0.4, but, i think, on 9.2.0.8 it would look similar.
Of course, not null constraints can be used by optimizer equally as
specifying not null conditions...



SQL> set echo on autot off feed off
SQL> create table transactions(id number,tx_date date,status varchar2(6));
SQL> create table transaction(id number,tx_date date,status varchar2(6));
SQL> insert into transaction values(1,trunc(sysdate),'AAA');
SQL> insert into transaction values(2,trunc(sysdate),'AAA');
SQL> insert into transaction values(null,trunc(sysdate),'AAA');
SQL> insert into transactions values(1,trunc(sysdate),'AAA');
SQL> insert into transactions values(3,trunc(sysdate),'AAA');
SQL> insert into transactions values(null,trunc(sysdate),'AAA');
SQL> set feed on
SQL> -- no rows are returned because NULL's in NOT IN()
SQL> select *
2 from transactions
3 where tx_date between sysdate -1 and sysdate +1
4 and status = 'AAA'
5 --and id is not null
6 and id not in (select id
7 from transaction
8 where tx_date between sysdate -1 and sysdate +1
9 --and id is not null
10 and status = 'AAA');

no rows selected

SQL>
SQL> select *
2 from transactions
3 where tx_date between sysdate -1 and sysdate +1
4 and status = 'AAA'
5 --and id is not null --! nulls are not excluded in the outer
query, but no nulls are returned
6 and id not in (select id
7 from transaction
8 where tx_date between sysdate -1 and sysdate +1
9 and id is not null
10 and status = 'AAA');

ID TX_DATE STATUS
---------- ------------------ ------
3 12-MAR-10 AAA

1 row selected.

SQL>
SQL> select
2 t.*
3 from
4 transactions t,
5 (select distinct
6 id
7 from
8 transaction) tt
9 where
10 t.tx_date between sysdate -1 and sysdate +1
11 and t.status = 'AAA'
12 and t.id=tt.id(+) --! here nulls from t are returned!!!
13 and tt.id is null;

ID TX_DATE STATUS
---------- ------------------ ------
12-MAR-10 AAA
3 12-MAR-10 AAA

2 rows selected.

SQL>
SQL> set autotrace traceonly explain
SQL> select *
2 from transactions
3 where tx_date between sysdate -1 and sysdate +1
4 and status = 'AAA'
5 and id is not null
6 and id not in (select id
7 from transaction
8 where tx_date between sysdate -1 and sysdate +1
9 and id is not null
10 and status = 'AAA');

Execution Plan
----------------------------------------------------------
Plan hash value: 2554144648

------------------------------------------------------------------------------------
Quote:
Id | Operation | Name | Rows | Bytes | Cost
(%CPU)| Time |
------------------------------------------------------------------------------------
Quote:
0 | SELECT STATEMENT | | 1 | 54 | 7
(15)| 00:00:01 |
* 1 | FILTER | | | |
|
* 2 | HASH JOIN ANTI | | 1 | 54 | 7
(15)| 00:00:01 |
* 3 | TABLE ACCESS FULL| TRANSACTIONS | 2 | 54 | 3
(0)| 00:00:01 |
* 4 | TABLE ACCESS FULL| TRANSACTION | 2 | 54 | 3
(0)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(SYSDATE@!-1<=SYSDATE@!+1)
2 - access("ID"="ID")
3 - filter("STATUS"='AAA' AND "ID" IS NOT NULL AND
"TX_DATE">=SYSDATE@!-1 AND "TX_DATE"<=SYSDATE@!+1)
4 - filter("STATUS"='AAA' AND "ID" IS NOT NULL AND
"TX_DATE">=SYSDATE@!-1 AND "TX_DATE"<=SYSDATE@!+1)



Best regards

Maxim

Reply With Quote
  #8  
Old   
Charles Hooper
 
Posts: n/a

Default Re: Why is this query slow? - 03-12-2010 , 02:54 PM



On Mar 12, 2:26*pm, Maxim Demenko <mdeme... (AT) gmail (DOT) com> wrote:
Quote:
On 12.03.2010 14:46, Charles Hooper wrote:
On Mar 12, 5:34 am, vsevolod afanassiev
vsevolod.afanass... (AT) gmail (DOT) com> *wrote:
I would be inclined to re-write the SQL statement to look something
like this:
select
* *T.*
from
* *transactions T,
* *(select DISTINCT
* * *ID
* *from
* * *temp_table) TT
where
* *T.tx_date between date1 and date2
* *and T.status = 'AAA'
* *AND t.ID=TT.ID(+)
* *AND TT.ID IS NULL;

Charles Hooper
Co-author of "Expert Oracle Practices: Oracle Database Administration
from the Oak Table"
http://hoopercharles.wordpress.com/
IT Manager/Oracle DBA
K&M Machine-Fabricating, Inc.

It would be however not equivalent to the original query ( in terms of
returned resultset, that execution plan will differ - it is obvious). By
anti join you would return the rows from TRANSACTIONS table, which *may*
contain ID=NULL. By semijoin (as provided by OP) no NULLs can be
returned. That is why, if both, TRANSACTION.ID and TRANSACTIONS.ID are
known to optimizer as NOT NULL, it may be able to rewrite semijoin to
antijoin on its own (as shown by Lothar). Here is small testcase on
10.2.0.4, but, i think, on 9.2.0.8 it would look similar.
Of course, not null constraints can be used by optimizer equally as
specifying not null conditions...

SQL> set echo on autot off feed off
SQL> create table transactions(id number,tx_date date,status varchar2(6));
SQL> create table transaction(id number,tx_date date,status varchar2(6));
SQL> insert into transaction values(1,trunc(sysdate),'AAA');
SQL> insert into transaction values(2,trunc(sysdate),'AAA');
SQL> insert into transaction values(null,trunc(sysdate),'AAA');
SQL> insert into transactions values(1,trunc(sysdate),'AAA');
SQL> insert into transactions values(3,trunc(sysdate),'AAA');
SQL> insert into transactions values(null,trunc(sysdate),'AAA');
Maxim,
As usual, nice demonstration.

This is from 11.2.0.1:
select *
from transactions
where tx_date between sysdate -1 and sysdate +1
and status = 'AAA'
and id is not null
and id not in (select id
from transaction
where tx_date between sysdate -1 and sysdate +1
and id is not null
and status = 'AAA')

SQL_ID curjz9ggzrgvt, child number 0
-------------------------------------
select * from transactions where tx_date between sysdate -1
and sysdate +1 and status = 'AAA' and id is not null
and id not in (select id from transaction where tx_date
between sysdate -1 and sysdate +1 and id is not null and
status = 'AAA');

Plan hash value: 2554144648

-------------------------------------------------------------------------------------------------------------------------
Quote:
Id | Operation | Name | Starts | E-Rows | A-Rows
A-Time | Buffers | OMem | 1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------
0 | SELECT STATEMENT | | 1 | | 1 |
00:00:00.01 | 14 | | | |
* 1 | FILTER | | 1 | | 1 |
00:00:00.01 | 14 | | | |
* 2 | HASH JOIN ANTI | | 1 | 1 | 1 |
00:00:00.01 | 14 | 1063K| 1063K| 542K (0)|
* 3 | TABLE ACCESS FULL| TRANSACTIONS | 1 | 2 | 2 |
00:00:00.01 | 7 | | | |
* 4 | TABLE ACCESS FULL| TRANSACTION | 1 | 2 | 2 |
00:00:00.01 | 7 | | | |
-------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(SYSDATE@!-1<=SYSDATE@!+1)
2 - access("ID"="ID")
3 - filter(("ID" IS NOT NULL AND "TX_DATE">=SYSDATE@!-1 AND
"TX_DATE"<=SYSDATE@!+1 AND "STATUS"='AAA'))
4 - filter(("ID" IS NOT NULL AND "TX_DATE">=SYSDATE@!-1 AND
"TX_DATE"<=SYSDATE@!+1 AND "STATUS"='AAA'))

--

SELECT
*
FROM
TRANSACTIONS TS,
(SELECT DISTINCT
ID
FROM
TRANSACTION) T
WHERE
TS.ID=T.ID(+)
AND T.ID IS NULL;

SQL_ID 1tpkf3uawhgbb, child number 0
-------------------------------------
SELECT * FROM TRANSACTIONS TS, (SELECT DISTINCT ID
FROM TRANSACTION) T WHERE TS.ID=T.ID(+) AND T.ID IS NULL

Plan hash value: 4011681805

---------------------------------------------------------------------------------------------------------------------------
Quote:
Id | Operation | Name | Starts | E-Rows | A-
Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------
Quote:
0 | SELECT STATEMENT | | 1 | |
2 |00:00:00.01 | 14 | | | |
1 | VIEW | VM_NWVW_1 | 1 | 1 |
2 |00:00:00.01 | 14 | | | |
2 | HASH UNIQUE | | 1 | 1 |
2 |00:00:00.01 | 14 | 915K| 915K| 611K (0)|
* 3 | FILTER | | 1 | |
2 |00:00:00.01 | 14 | | | |
* 4 | HASH JOIN OUTER | | 1 | 1 |
3 |00:00:00.01 | 14 | 947K| 947K| 548K (0)|
5 | TABLE ACCESS FULL| TRANSACTIONS | 1 | 3 |
3 |00:00:00.01 | 7 | | | |
6 | TABLE ACCESS FULL| TRANSACTION | 1 | 3 |
3 |00:00:00.01 | 7 | | | |
---------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("ID" IS NULL)
4 - access("TS"."ID"="ID")

--

select *
from transactions
where tx_date between sysdate -1 and sysdate +1
and status = 'AAA'
and id not in (select id
from transaction
where tx_date between sysdate -1 and sysdate +1
and status = 'AAA');

SQL_ID 2ydut2u6dspa0, child number 0
-------------------------------------
select * from transactions where tx_date between sysdate -1
and sysdate +1 and status = 'AAA' and id not in (select
id
from transaction where tx_date between sysdate -1 and
sysdate
+1 and status = 'AAA')

Plan hash value: 1245405710

-------------------------------------------------------------------------------------------------------------------------
Quote:
Id | Operation | Name | Starts | E-Rows | A-Rows
A-Time | Buffers | OMem | 1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------
0 | SELECT STATEMENT | | 1 | | 0 |
00:00:00.01 | 12 | | | |
* 1 | FILTER | | 1 | | 0 |
00:00:00.01 | 12 | | | |
* 2 | HASH JOIN ANTI NA | | 1 | 2 | 0 |
00:00:00.01 | 12 | 1079K| 1079K| 553K (0)|
* 3 | TABLE ACCESS FULL| TRANSACTIONS | 1 | 3 | 3 |
00:00:00.01 | 7 | | | |
* 4 | TABLE ACCESS FULL| TRANSACTION | 1 | 3 | 3 |
00:00:00.01 | 5 | | | |
-------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(SYSDATE@!-1<=SYSDATE@!+1)
2 - access("ID"="ID")
3 - filter(("TX_DATE">=SYSDATE@!-1 AND "TX_DATE"<=SYSDATE@!+1 AND
"STATUS"='AAA'))
4 - filter(("TX_DATE">=SYSDATE@!-1 AND "TX_DATE"<=SYSDATE@!+1 AND
"STATUS"='AAA'))

True, if NULL values are permitted, the OP could see different results
with the different SQL statements.

One item that stands out in the last of the above execution plans is
the null aware hash join anti operation, which was introduced in
Oracle 11.1.0.6. This makes should make a high performance increase
when for large data sets that permit NULL values.

Joel, I started using the outer join and NULL syntax back when I
started encountering performance problems with NOT IN type queries in
Oracle 8.0.5. I usually do not give much thought to potential NULLs
permitted in the join columns on the left side of a left outer join -
I know that those rows that are NULL will be excluded automatically,
as they would in a typical inner join. I saw an example of this
technique somewhere, but do not recall where (it might have been on
AskTom). Prior to Oracle 9i, Oracle would not automatically transform
the NOT IN syntax into a more efficient form, so the outer join syntax
was almost always 10, 100, possibly 1000 times faster than the NOT IN
syntax when I tested both query forms back then. Since that time the
optimizer has gained the ability to automatically rewrite SQL
statements (as seen in a 10053 trace file) like these into more
efficient equivalent forms, but the potential presence of NULL values
limits the optimization possibilities.

Charles Hooper
Co-author of "Expert Oracle Practices: Oracle Database Administration
from the Oak Table"
http://hoopercharles.wordpress.com/
IT Manager/Oracle DBA
K&M Machine-Fabricating, Inc.

Reply With Quote
  #9  
Old   
Maxim Demenko
 
Posts: n/a

Default Re: Why is this query slow? - 03-12-2010 , 03:22 PM



On 12.03.2010 21:54, Charles Hooper wrote:

Quote:
Plan hash value: 1245405710

-------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows
| A-Time | Buffers | OMem | 1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 0 |
00:00:00.01 | 12 | | | |
|* 1 | FILTER | | 1 | | 0 |
00:00:00.01 | 12 | | | |
|* 2 | HASH JOIN ANTI NA | | 1 | 2 | 0 |
00:00:00.01 | 12 | 1079K| 1079K| 553K (0)|
|* 3 | TABLE ACCESS FULL| TRANSACTIONS | 1 | 3 | 3 |
00:00:00.01 | 7 | | | |
|* 4 | TABLE ACCESS FULL| TRANSACTION | 1 | 3 | 3 |
00:00:00.01 | 5 | | | |
-------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(SYSDATE@!-1<=SYSDATE@!+1)
2 - access("ID"="ID")
3 - filter(("TX_DATE">=SYSDATE@!-1 AND "TX_DATE"<=SYSDATE@!+1 AND
"STATUS"='AAA'))
4 - filter(("TX_DATE">=SYSDATE@!-1 AND "TX_DATE"<=SYSDATE@!+1 AND
"STATUS"='AAA'))


One item that stands out in the last of the above execution plans is
the null aware hash join anti operation, which was introduced in
Oracle 11.1.0.6. This makes should make a high performance increase
when for large data sets that permit NULL values.

Excellent point, almost forgot it, thank you for reminder.

Best regards

Maxim

Reply With Quote
  #10  
Old   
The Boss
 
Posts: n/a

Default Re: Why is this query slow? - 03-12-2010 , 04:29 PM



Maxim Demenko wrote:
Quote:
On 12.03.2010 14:46, Charles Hooper wrote:
On Mar 12, 5:34 am, vsevolod afanassiev
vsevolod.afanass... (AT) gmail (DOT) com> wrote:

I would be inclined to re-write the SQL statement to look something
like this:
select
T.*
from
transactions T,
(select DISTINCT
ID
from
temp_table) TT
where
T.tx_date between date1 and date2
and T.status = 'AAA'
AND t.ID=TT.ID(+)
AND TT.ID IS NULL;


Charles Hooper
Co-author of "Expert Oracle Practices: Oracle Database Administration
from the Oak Table"
http://hoopercharles.wordpress.com/
IT Manager/Oracle DBA
K&M Machine-Fabricating, Inc.

It would be however not equivalent to the original query ( in terms of
returned resultset, that execution plan will differ - it is obvious).
By anti join you would return the rows from TRANSACTIONS table, which
*may* contain ID=NULL. By semijoin (as provided by OP) no NULLs can be
returned. That is why, if both, TRANSACTION.ID and TRANSACTIONS.ID are
known to optimizer as NOT NULL, it may be able to rewrite semijoin to
antijoin on its own (as shown by Lothar). Here is small testcase on
10.2.0.4, but, i think, on 9.2.0.8 it would look similar.
Of course, not null constraints can be used by optimizer equally as
specifying not null conditions...



set echo on autot off feed off
create table transactions(id number,tx_date date,status varchar2(6));
create table transaction(id number,tx_date date,status varchar2(6));
insert into transaction values(1,trunc(sysdate),'AAA');
insert into transaction values(2,trunc(sysdate),'AAA');
insert into transaction values(null,trunc(sysdate),'AAA');
insert into transactions values(1,trunc(sysdate),'AAA');
insert into transactions values(3,trunc(sysdate),'AAA');
insert into transactions values(null,trunc(sysdate),'AAA');
set feed on
-- no rows are returned because NULL's in NOT IN()
select *
2 from transactions
3 where tx_date between sysdate -1 and sysdate +1
4 and status = 'AAA'
5 --and id is not null
6 and id not in (select id
7 from transaction
8 where tx_date between sysdate -1 and sysdate +1
9 --and id is not null
10 and status = 'AAA');

no rows selected


select *
2 from transactions
3 where tx_date between sysdate -1 and sysdate +1
4 and status = 'AAA'
5 --and id is not null --! nulls are not excluded in the outer
query, but no nulls are returned
6 and id not in (select id
7 from transaction
8 where tx_date between sysdate -1 and sysdate +1
9 and id is not null
10 and status = 'AAA');

ID TX_DATE STATUS
---------- ------------------ ------
3 12-MAR-10 AAA

1 row selected.


select
2 t.*
3 from
4 transactions t,
5 (select distinct
6 id
7 from
8 transaction) tt
9 where
10 t.tx_date between sysdate -1 and sysdate +1
11 and t.status = 'AAA'
12 and t.id=tt.id(+) --! here nulls from t are returned!!!
13 and tt.id is null;

ID TX_DATE STATUS
---------- ------------------ ------
12-MAR-10 AAA
3 12-MAR-10 AAA

2 rows selected.


set autotrace traceonly explain
select *
2 from transactions
3 where tx_date between sysdate -1 and sysdate +1
4 and status = 'AAA'
5 and id is not null
6 and id not in (select id
7 from transaction
8 where tx_date between sysdate -1 and sysdate +1
9 and id is not null
10 and status = 'AAA');

Execution Plan
----------------------------------------------------------
Plan hash value: 2554144648

------------------------------------------------------------------------------------
Id | Operation | Name | Rows | Bytes | Cost
(%CPU)| Time |
------------------------------------------------------------------------------------
0 | SELECT STATEMENT | | 1 | 54 | 7
(15)| 00:00:01 |
* 1 | FILTER | | | |
|
* 2 | HASH JOIN ANTI | | 1 | 54 | 7
(15)| 00:00:01 |
* 3 | TABLE ACCESS FULL| TRANSACTIONS | 2 | 54 | 3
(0)| 00:00:01 |
* 4 | TABLE ACCESS FULL| TRANSACTION | 2 | 54 | 3
(0)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(SYSDATE@!-1<=SYSDATE@!+1)
2 - access("ID"="ID")
3 - filter("STATUS"='AAA' AND "ID" IS NOT NULL AND
"TX_DATE">=SYSDATE@!-1 AND "TX_DATE"<=SYSDATE@!+1)
4 - filter("STATUS"='AAA' AND "ID" IS NOT NULL AND
"TX_DATE">=SYSDATE@!-1 AND "TX_DATE"<=SYSDATE@!+1)



Best regards

Maxim
I'm a little confused here.
Although the OP references 2 tables in his original query ('transactions'
and 'transaction'), I guess he actually only has 1 table ('transactions'):
1. OP stated: "Table has 11 million rows" which doesn't make much sense if
he has 2 tables.
2. The original Execution Plan only references table 'transactions', not
'transaction'.

Cheers!

--
Jeroen

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.