![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
#2
| |||
| |||
|
|
I have been doing SQL query tuning for a while and in most cases this is relatively straightforward excercise. For example: if a query selects only a few rows from large table then it should be accessed through index, if a query selects more than 10% of the rows the table should be accessed through full table scan. In many cases I was able to reduce number of buffer gets from something like 200,000,000 to less than 1000 per execution by creating indexes and adding a few hints. The difficulty I have now is that the query is very simple and as far as I can see there is nothing wrong with plan. Still it is slow. The query is select * from transactions where tx_date between (date1 and date2) and status = 'AAA' and ID not in (select ID from transaction where tx_date between (date3 and date4) and status = 'AAA') So we have a query and a subquery. Subquery is not correlated. Table has 11 million rows. Nothing fancy: no LOBS, LONG, etc, all columns are either NUMBER or VARCHAR2. Column ID has type VARCHAR2(32) Executed separately both query and subquery run in less than a second, use index on tx_date. Main query returns 250 records while subquery returns 2500. I am running it on lightly loaded UNIX server with plenty of free memory, SGA is 4 GB, PGA is 500 MB, there are no other heavy queries. Still it takes 2 minutes. Autotrace shows 2.2 million buffer gets and no disk reads. There is nothing in wait events: no latch waits, no buffer busy, etc. My main problem is: there is nothing wrong with execution plan for complete query. It shows use of the same index on tx_date by both main query and subquery and FILTER. If I save results of the subquery in a temporary table then such modified query completes in less than a second: select * from transactions where tx_date between (date1 and date2) and status = 'AAA' ID not in (select ID from temp_table) Surely Oracle should be able to hold 2,500 records from the subquery in memory and access it more efficiently than my temporary table? But it doesn't do it. I can re-write the query using MINUS and it will run much faster. I am just puzzled why is it so slow. This is plan from AUTOTRACE (9.2.0.8 on AIX): Execution Plan ---------------------------------------------------------- * *0 * * *SELECT STATEMENT Optimizer=CHOOSE (Cost=45545 Card=81 Bytes=3888) * *1 * *0 * FILTER * *2 * *1 * * FILTER * *3 * *2 * * * ABLE ACCESS (BY INDEX ROWID) OF 'TRANSACTIONS' (Cost=1076 Card=81 Bytes=3888) * *4 * *3 * * * * NDEX (RANGE SCAN) OF 'TRANSACTIONS_IND01' (NON- UNIQUE) (Cost=821 Card=105815) * *5 * *1 * * FILTER * *6 * *5 * * *TABLE ACCESS (BY INDEX ROWID) OF 'TRANSACTIONS' (Cost=549 Card=4 Bytes=192) * *7 * *6 * * * *INDEX (RANGE SCAN) OF 'TRANSACTIONS_IND01' (NON- UNIQUE) (Cost=419Card=53853) Statistics ---------------------------------------------------------- * * * * * 0 *recursive calls * * * * * 0 *db block gets * * 2326009 *consistent gets * * * * * 0 *physical reads * * * * * 0 *redo size * * * *7609 *bytes sent via SQL*Net to client * * * * 832 *bytes received via SQL*Net from client * * * * *18 *SQL*Net roundtrips to/from client * * * * * 0 *sorts (memory) * * * * * 0 *sorts (disk) * * * * 253 *rows processed |
#3
| |||
| |||
|
|
I have been doing SQL query tuning for a while and in most cases this is relatively straightforward excercise. For example: if a query selects only a few rows from large table then it should be accessed through index, if a query selects more than 10% of the rows the table should be accessed through full table scan. In many cases I was able to reduce number of buffer gets from something like 200,000,000 to less than 1000 per execution by creating indexes and adding a few hints. |
|
The difficulty I have now is that the query is very simple and as far as I can see there is nothing wrong with plan. Still it is slow. The query is select * from transactions where tx_date between (date1 and date2) and status = 'AAA' and ID not in (select ID from transaction where tx_date between (date3 and date4) and status = 'AAA') |
#4
| |||
| |||
|
|
select * from transactions where tx_date between (date1 and date2) and status = 'AAA' and ID not in (select ID from transaction where tx_date between (date3 and date4) and status = 'AAA') |
#5
| |||
| |||
|
|
If I save results of the subquery in a temporary table then such modified query completes in less than a second: select * from transactions where tx_date between (date1 and date2) and status = 'AAA' ID not in (select ID from temp_table) Surely Oracle should be able to hold 2,500 records from the subquery in memory and access it more efficiently than my temporary table? But it doesn't do it. One way to speed up NOT IN queries is the HASH_AJ hint. |
|
I can re-write the query using MINUS and it will run much faster. I am just puzzled why is it so slow. This is plan from AUTOTRACE (9.2.0.8 on AIX): Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=45545 Card=81 Bytes=3888) 1 0 FILTER 2 1 FILTER 3 2 ABLE ACCESS (BY INDEX ROWID) OF 'TRANSACTIONS' (Cost=1076 Card=81 Bytes=3888) 4 3 NDEX (RANGE SCAN) OF 'TRANSACTIONS_IND01' (NON- UNIQUE) (Cost=821 Card=105815) 5 1 FILTER 6 5 TABLE ACCESS (BY INDEX ROWID) OF 'TRANSACTIONS' (Cost=549 Card=4 Bytes=192) 7 6 INDEX (RANGE SCAN) OF 'TRANSACTIONS_IND01' (NON- UNIQUE) (Cost=419Card=53853) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 2326009 consistent gets 0 physical reads 0 redo size 7609 bytes sent via SQL*Net to client 832 bytes received via SQL*Net from client 18 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 253 rows processed HTH, |
#6
| |||
| |||
|
|
On Mar 12, 5:34*am, vsevolod afanassiev vsevolod.afanass... (AT) gmail (DOT) com> wrote: I have been doing SQL query tuning for a while and in most cases this is relatively straightforward excercise. For example: if a query selects only a few rows from large table then it should be accessed through index, if a query selects more than 10% of the rows the table should be accessed through full table scan. In many cases I was able to reduce number of buffer gets from something like 200,000,000 to less than 1000 per execution by creating indexes and adding a few hints. The 10% rule is not suitable in all cases. *Last year I posted to this group a test case that showed a full table scan was faster when selecting 0.06% of the rows in a 100,000,000 row table than was an index access when the index had a high clustering factor. *However, 10% to 20% is probably an OK tipping point in most cases. |
|
The difficulty I have now is that the query is very simple and as far as I can see there is nothing wrong with plan. Still it is slow. The query is select * from transactions where tx_date between (date1 and date2) and status = 'AAA' and ID not in (select ID from transaction where tx_date between (date3 and date4) and status = 'AAA') Columns that are not explicitly defined as NOT NULL might be causing performance problems for you when the optimizer is not able to transform the SQL statement into a more efficient form. *See this blog article:http://hoopercharles.wordpress.com/2...ts-of-potentia... (Notice the change in the execution plan in the above link) I would be inclined to re-write the SQL statement to look something like this: select * T.* from * transactions T, * (select DISTINCT * * ID * from * * temp_table) TT where * T.tx_date between date1 and date2 * and T.status = 'AAA' * AND t.ID=TT.ID(+) * AND TT.ID IS NULL; |
#7
| |||
| |||
|
|
On Mar 12, 5:34 am, vsevolod afanassiev vsevolod.afanass... (AT) gmail (DOT) com> wrote: I would be inclined to re-write the SQL statement to look something like this: select T.* from transactions T, (select DISTINCT ID from temp_table) TT where T.tx_date between date1 and date2 and T.status = 'AAA' AND t.ID=TT.ID(+) AND TT.ID IS NULL; Charles Hooper Co-author of "Expert Oracle Practices: Oracle Database Administration from the Oak Table" http://hoopercharles.wordpress.com/ IT Manager/Oracle DBA K&M Machine-Fabricating, Inc. |
|
Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | |
|
0 | SELECT STATEMENT | | 1 | 54 | 7 (15)| 00:00:01 | * 1 | FILTER | | | | | * 2 | HASH JOIN ANTI | | 1 | 54 | 7 (15)| 00:00:01 | * 3 | TABLE ACCESS FULL| TRANSACTIONS | 2 | 54 | 3 (0)| 00:00:01 | * 4 | TABLE ACCESS FULL| TRANSACTION | 2 | 54 | 3 (0)| 00:00:01 | |
#8
| |||||
| |||||
|
|
On 12.03.2010 14:46, Charles Hooper wrote: On Mar 12, 5:34 am, vsevolod afanassiev vsevolod.afanass... (AT) gmail (DOT) com> *wrote: I would be inclined to re-write the SQL statement to look something like this: select * *T.* from * *transactions T, * *(select DISTINCT * * *ID * *from * * *temp_table) TT where * *T.tx_date between date1 and date2 * *and T.status = 'AAA' * *AND t.ID=TT.ID(+) * *AND TT.ID IS NULL; Charles Hooper Co-author of "Expert Oracle Practices: Oracle Database Administration from the Oak Table" http://hoopercharles.wordpress.com/ IT Manager/Oracle DBA K&M Machine-Fabricating, Inc. It would be however not equivalent to the original query ( in terms of returned resultset, that execution plan will differ - it is obvious). By anti join you would return the rows from TRANSACTIONS table, which *may* contain ID=NULL. By semijoin (as provided by OP) no NULLs can be returned. That is why, if both, TRANSACTION.ID and TRANSACTIONS.ID are known to optimizer as NOT NULL, it may be able to rewrite semijoin to antijoin on its own (as shown by Lothar). Here is small testcase on 10.2.0.4, but, i think, on 9.2.0.8 it would look similar. Of course, not null constraints can be used by optimizer equally as specifying not null conditions... SQL> set echo on autot off feed off SQL> create table transactions(id number,tx_date date,status varchar2(6)); SQL> create table transaction(id number,tx_date date,status varchar2(6)); SQL> insert into transaction values(1,trunc(sysdate),'AAA'); SQL> insert into transaction values(2,trunc(sysdate),'AAA'); SQL> insert into transaction values(null,trunc(sysdate),'AAA'); SQL> insert into transactions values(1,trunc(sysdate),'AAA'); SQL> insert into transactions values(3,trunc(sysdate),'AAA'); SQL> insert into transactions values(null,trunc(sysdate),'AAA'); |
|
Id | Operation | Name | Starts | E-Rows | A-Rows A-Time | Buffers | OMem | 1Mem | Used-Mem | ------------------------------------------------------------------------------------------------------------------------- 0 | SELECT STATEMENT | | 1 | | 1 | 00:00:00.01 | 14 | | | | * 1 | FILTER | | 1 | | 1 | 00:00:00.01 | 14 | | | | * 2 | HASH JOIN ANTI | | 1 | 1 | 1 | 00:00:00.01 | 14 | 1063K| 1063K| 542K (0)| * 3 | TABLE ACCESS FULL| TRANSACTIONS | 1 | 2 | 2 | 00:00:00.01 | 7 | | | | * 4 | TABLE ACCESS FULL| TRANSACTION | 1 | 2 | 2 | 00:00:00.01 | 7 | | | | |
|
Id | Operation | Name | Starts | E-Rows | A- Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem | |
|
0 | SELECT STATEMENT | | 1 | | 2 |00:00:00.01 | 14 | | | | 1 | VIEW | VM_NWVW_1 | 1 | 1 | 2 |00:00:00.01 | 14 | | | | 2 | HASH UNIQUE | | 1 | 1 | 2 |00:00:00.01 | 14 | 915K| 915K| 611K (0)| * 3 | FILTER | | 1 | | 2 |00:00:00.01 | 14 | | | | * 4 | HASH JOIN OUTER | | 1 | 1 | 3 |00:00:00.01 | 14 | 947K| 947K| 548K (0)| 5 | TABLE ACCESS FULL| TRANSACTIONS | 1 | 3 | 3 |00:00:00.01 | 7 | | | | 6 | TABLE ACCESS FULL| TRANSACTION | 1 | 3 | 3 |00:00:00.01 | 7 | | | | |
|
Id | Operation | Name | Starts | E-Rows | A-Rows A-Time | Buffers | OMem | 1Mem | Used-Mem | ------------------------------------------------------------------------------------------------------------------------- 0 | SELECT STATEMENT | | 1 | | 0 | 00:00:00.01 | 12 | | | | * 1 | FILTER | | 1 | | 0 | 00:00:00.01 | 12 | | | | * 2 | HASH JOIN ANTI NA | | 1 | 2 | 0 | 00:00:00.01 | 12 | 1079K| 1079K| 553K (0)| * 3 | TABLE ACCESS FULL| TRANSACTIONS | 1 | 3 | 3 | 00:00:00.01 | 7 | | | | * 4 | TABLE ACCESS FULL| TRANSACTION | 1 | 3 | 3 | 00:00:00.01 | 5 | | | | |
#9
| |||
| |||
|
|
Plan hash value: 1245405710 ------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem | ------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 0 | 00:00:00.01 | 12 | | | | |* 1 | FILTER | | 1 | | 0 | 00:00:00.01 | 12 | | | | |* 2 | HASH JOIN ANTI NA | | 1 | 2 | 0 | 00:00:00.01 | 12 | 1079K| 1079K| 553K (0)| |* 3 | TABLE ACCESS FULL| TRANSACTIONS | 1 | 3 | 3 | 00:00:00.01 | 7 | | | | |* 4 | TABLE ACCESS FULL| TRANSACTION | 1 | 3 | 3 | 00:00:00.01 | 5 | | | | ------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(SYSDATE@!-1<=SYSDATE@!+1) 2 - access("ID"="ID") 3 - filter(("TX_DATE">=SYSDATE@!-1 AND "TX_DATE"<=SYSDATE@!+1 AND "STATUS"='AAA')) 4 - filter(("TX_DATE">=SYSDATE@!-1 AND "TX_DATE"<=SYSDATE@!+1 AND "STATUS"='AAA')) One item that stands out in the last of the above execution plans is the null aware hash join anti operation, which was introduced in Oracle 11.1.0.6. This makes should make a high performance increase when for large data sets that permit NULL values. |
#10
| |||
| |||
|
|
On 12.03.2010 14:46, Charles Hooper wrote: On Mar 12, 5:34 am, vsevolod afanassiev vsevolod.afanass... (AT) gmail (DOT) com> wrote: I would be inclined to re-write the SQL statement to look something like this: select T.* from transactions T, (select DISTINCT ID from temp_table) TT where T.tx_date between date1 and date2 and T.status = 'AAA' AND t.ID=TT.ID(+) AND TT.ID IS NULL; Charles Hooper Co-author of "Expert Oracle Practices: Oracle Database Administration from the Oak Table" http://hoopercharles.wordpress.com/ IT Manager/Oracle DBA K&M Machine-Fabricating, Inc. It would be however not equivalent to the original query ( in terms of returned resultset, that execution plan will differ - it is obvious). By anti join you would return the rows from TRANSACTIONS table, which *may* contain ID=NULL. By semijoin (as provided by OP) no NULLs can be returned. That is why, if both, TRANSACTION.ID and TRANSACTIONS.ID are known to optimizer as NOT NULL, it may be able to rewrite semijoin to antijoin on its own (as shown by Lothar). Here is small testcase on 10.2.0.4, but, i think, on 9.2.0.8 it would look similar. Of course, not null constraints can be used by optimizer equally as specifying not null conditions... set echo on autot off feed off create table transactions(id number,tx_date date,status varchar2(6)); create table transaction(id number,tx_date date,status varchar2(6)); insert into transaction values(1,trunc(sysdate),'AAA'); insert into transaction values(2,trunc(sysdate),'AAA'); insert into transaction values(null,trunc(sysdate),'AAA'); insert into transactions values(1,trunc(sysdate),'AAA'); insert into transactions values(3,trunc(sysdate),'AAA'); insert into transactions values(null,trunc(sysdate),'AAA'); set feed on -- no rows are returned because NULL's in NOT IN() select * 2 from transactions 3 where tx_date between sysdate -1 and sysdate +1 4 and status = 'AAA' 5 --and id is not null 6 and id not in (select id 7 from transaction 8 where tx_date between sysdate -1 and sysdate +1 9 --and id is not null 10 and status = 'AAA'); no rows selected select * 2 from transactions 3 where tx_date between sysdate -1 and sysdate +1 4 and status = 'AAA' 5 --and id is not null --! nulls are not excluded in the outer query, but no nulls are returned 6 and id not in (select id 7 from transaction 8 where tx_date between sysdate -1 and sysdate +1 9 and id is not null 10 and status = 'AAA'); ID TX_DATE STATUS ---------- ------------------ ------ 3 12-MAR-10 AAA 1 row selected. select 2 t.* 3 from 4 transactions t, 5 (select distinct 6 id 7 from 8 transaction) tt 9 where 10 t.tx_date between sysdate -1 and sysdate +1 11 and t.status = 'AAA' 12 and t.id=tt.id(+) --! here nulls from t are returned!!! 13 and tt.id is null; ID TX_DATE STATUS ---------- ------------------ ------ 12-MAR-10 AAA 3 12-MAR-10 AAA 2 rows selected. set autotrace traceonly explain select * 2 from transactions 3 where tx_date between sysdate -1 and sysdate +1 4 and status = 'AAA' 5 and id is not null 6 and id not in (select id 7 from transaction 8 where tx_date between sysdate -1 and sysdate +1 9 and id is not null 10 and status = 'AAA'); Execution Plan ---------------------------------------------------------- Plan hash value: 2554144648 ------------------------------------------------------------------------------------ Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------ 0 | SELECT STATEMENT | | 1 | 54 | 7 (15)| 00:00:01 | * 1 | FILTER | | | | | * 2 | HASH JOIN ANTI | | 1 | 54 | 7 (15)| 00:00:01 | * 3 | TABLE ACCESS FULL| TRANSACTIONS | 2 | 54 | 3 (0)| 00:00:01 | * 4 | TABLE ACCESS FULL| TRANSACTION | 2 | 54 | 3 (0)| 00:00:01 | ------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(SYSDATE@!-1<=SYSDATE@!+1) 2 - access("ID"="ID") 3 - filter("STATUS"='AAA' AND "ID" IS NOT NULL AND "TX_DATE">=SYSDATE@!-1 AND "TX_DATE"<=SYSDATE@!+1) 4 - filter("STATUS"='AAA' AND "ID" IS NOT NULL AND "TX_DATE">=SYSDATE@!-1 AND "TX_DATE"<=SYSDATE@!+1) Best regards Maxim |
![]() |
| Thread Tools | |
| Display Modes | |
| |