dbTalk Databases Forums  

Weird access plan behavior

comp.databases.ibm-db2 comp.databases.ibm-db2


Discuss Weird access plan behavior in the comp.databases.ibm-db2 forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Michel Esber
 
Posts: n/a

Default Weird access plan behavior - 11-17-2010 , 07:28 AM






Hi Guys,

Db2 LUW V9.5.

Consider the following query:

SELECT DISTINCT A.SW_ID, B.SW_NAME
FROM ASSET.TBL_ASSET_SW A INNER JOIN ASSET.TBL_ASSET_SW_ID B ON
(A.SW_ID = B.SW_ID)
INNER JOIN ASSET.TBL_ASSET_MACHINE_ID C ON
(A.MACHINE_ID = C.MACHINE_ID)
WHERE C.CUSTOMER_ID = 'CC0D6761CD31F72EA7FCC686437DDA98' AND
UPPER(B.SW_NAME) LIKE '%ACROBAT%'
ORDER BY B.SW_NAME


This query is slow. I know that using UPPER(DB_FIELD) like '%%' is bad
for performance and bad design.
Here is the explain plan for the query (full output may be provided if
needed):

Access Plan:
-----------
Total Cost: 733996
Query Degree: 1

Rows
RETURN
( 1)
Cost
I/O
Quote:
17567.5
TBSCAN
( 2)
733996
29959.5
Quote:
17567.5
SORT
( 3)
733994
29959.5
Quote:
17567.5
^HSJOIN
( 4)
733984
29959.5
/--------+--------\
17567.5 980205
TBSCAN NLJOIN
( 5) ( 6)
2693.42 731235
653 29306.5
Quote:
/------+------\
175675 8676 112.979
TABLE: ASSET IXSCAN IXSCAN
TBL_ASSET_SW_ID ( 7) ( 8)
Q1 1674.76 84.1034
145.253 3.36114
Quote:
|
110839 2.26167e+07
INDEX: DB2INST1 INDEX: DB2INST1
IDX001220131510000 IXPKSW
Q3 Q2




According to this plan, we have IXSCAN on ASSET.TBL_ASSET_SW and
ASSET.TBL_ASSET_MACHINE_ID, then a NLJOIN.
On the other hand, there is a TBSCAN on TBL_ASSET_SW_ID -- maybe
caused by UPPER(SW_NAME) like '%ABC%'.

Compared to the total query cost (733996), the TBSCAN on
TBL_ASSET_SW_ID a Total Cumulative Cost of 2693.42
This represents 0.3% of the returned total cumulative cost.
One might disregard that TBSCAN because is represents virtually
nothing in terms of total access plan cost.

Now let's tune the query:

SELECT DISTINCT A.SW_ID, B.SW_NAME
FROM ASSET.TBL_ASSET_SW A INNER JOIN ASSET.TBL_ASSET_SW_ID B ON
(A.SW_ID = B.SW_ID)
INNER JOIN ASSET.TBL_ASSET_MACHINE_ID C ON
(A.MACHINE_ID = C.MACHINE_ID)
WHERE C.CUSTOMER_ID = 'CC0D6761CD31F72EA7FCC686437DDA98' AND
B.SW_NAME LIKE '%ACROBAT%'
ORDER BY B.SW_NAME

The plan now looks very different:

Access Plan:
-----------
Total Cost: 109695
Query Degree: 1

Rows
RETURN
( 1)
Cost
I/O
Quote:
1396.18
TBSCAN
( 2)
109695
5007.48
Quote:
1396.18
SORT
( 3)
109695
5007.48
Quote:
7790.22
^HSJOIN
( 4)
109690
5007.48
/---------+----------\
179747 8676
NLJOIN IXSCAN
( 5) ( 8)
107996 1674.75
4862.23 145.253
/------+-------\ |
1396.18 128.742 110839
TBSCAN IXSCAN INDEX: DB2INST1
( 6) ( 7) IDX001220131510000
2673.42 75.4497 Q3
653 3.01481
Quote:
|
175675 2.26167e+07
TABLE: ASSET INDEX: DB2INST1
TBL_ASSET_SW_ID IDX002081020460000
Q1 Q2


This query has good response time. We still have the TBSCAN on
TBL_ASSET_SW_ID and an IXSCAN on TBL_ASSET_SW, then they both are
NLJOINed.

What confuses me is that TBSCAN cost on TBL_ASSET_SW_ID is still
virtually the same (cost = 2673) and still represents 2% of overall
total cost.

Nevertheless, the overall access plan was been influenced by a bad
predicate, even though the plan for that table itself is not much
different on both scenarios.

What am I missing here?

Thanks,

Reply With Quote
  #2  
Old   
Frederik Engelen
 
Posts: n/a

Default Re: Weird access plan behavior - 11-17-2010 , 08:56 AM






On Nov 17, 2:28*pm, Michel Esber <smes... (AT) gmail (DOT) com> wrote:
Quote:
Hi Guys,

Db2 LUW V9.5.

Consider the following query:

SELECT DISTINCT A.SW_ID, B.SW_NAME
FROM ASSET.TBL_ASSET_SW A INNER JOIN ASSET.TBL_ASSET_SW_ID B ON
(A.SW_ID = B.SW_ID)
* * * * * * * * * * * * * INNER JOIN ASSET.TBL_ASSET_MACHINE_ID C ON
(A.MACHINE_ID = C.MACHINE_ID)
WHERE C.CUSTOMER_ID = 'CC0D6761CD31F72EA7FCC686437DDA98' AND
* * * UPPER(B.SW_NAME) LIKE '%ACROBAT%'
ORDER BY B.SW_NAME

This query is slow. I know that using UPPER(DB_FIELD) like '%%' is bad
for performance and bad design.
Here is the explain plan for the query (full output may be provided if
needed):

Access Plan:
-----------
* * * * Total Cost: * * * * * * 733996
* * * * Query Degree: * * * * * 1

* * * * * * * * *Rows
* * * * * * * * RETURN
* * * * * * * * ( * 1)
* * * * * * * * *Cost
* * * * * * * * * I/O
* * * * * * * * * |
* * * * * * * * 17567.5
* * * * * * * * TBSCAN
* * * * * * * * ( * 2)
* * * * * * * * 733996
* * * * * * * * 29959.5
* * * * * * * * * |
* * * * * * * * 17567.5
* * * * * * * * SORT
* * * * * * * * ( * 3)
* * * * * * * * 733994
* * * * * * * * 29959.5
* * * * * * * * * |
* * * * * * * * 17567.5
* * * * * * * * ^HSJOIN
* * * * * * * * ( * 4)
* * * * * * * * 733984
* * * * * * * * 29959.5
* * * * */--------+--------\
* * *17567.5 * * * * * * * 980205
* * *TBSCAN * * * * * * * *NLJOIN
* * *( * 5) * * * * * * * *( * 6)
* * *2693.42 * * * * * * * 731235
* * * *653 * * * * * * * * 29306.5
* * * *| * * * * * * */------+------\
* * *175675 * * * *8676 * * * * * * 112.979
*TABLE: ASSET * * IXSCAN * * * * * *IXSCAN
*TBL_ASSET_SW_ID *( * 7) * * * * * *( * 8)
* * * *Q1 * * * * 1674.76 * * * * * 84.1034
* * * * * * * * * 145.253 * * * * * 3.36114
* * * * * * * * * * | * * * * * * * *|
* * * * * * * * * 110839 * * * * *2.26167e+07
* * * * * * * INDEX: DB2INST1 * INDEX: DB2INST1
* * * * * * IDX001220131510000 * * *IXPKSW
* * * * * * * * * * Q3 * * * * * * * *Q2

According to this plan, we have IXSCAN on ASSET.TBL_ASSET_SW and
ASSET.TBL_ASSET_MACHINE_ID, then a NLJOIN.
On the other hand, there is a TBSCAN on TBL_ASSET_SW_ID -- maybe
caused by UPPER(SW_NAME) like '%ABC%'.

Compared to the total query cost (733996), the TBSCAN on
TBL_ASSET_SW_ID a Total Cumulative Cost of 2693.42
This represents 0.3% of the returned total cumulative cost.
One might disregard that TBSCAN because is represents virtually
nothing in terms of total access plan cost.

Now let's tune the query:

SELECT DISTINCT A.SW_ID, B.SW_NAME
FROM ASSET.TBL_ASSET_SW A INNER JOIN ASSET.TBL_ASSET_SW_ID B ON
(A.SW_ID = B.SW_ID)
* * * * * * * * * * * * * INNER JOIN ASSET.TBL_ASSET_MACHINE_ID C ON
(A.MACHINE_ID = C.MACHINE_ID)
WHERE C.CUSTOMER_ID = 'CC0D6761CD31F72EA7FCC686437DDA98' AND
* * * B.SW_NAME LIKE '%ACROBAT%'
ORDER BY B.SW_NAME

The plan now looks very different:

Access Plan:
-----------
* * * * Total Cost: * * * * * * 109695
* * * * Query Degree: * * * * * 1

* * * * * * * * * * * * * *Rows
* * * * * * * * * * * * * RETURN
* * * * * * * * * * * * * ( * 1)
* * * * * * * * * * * * * *Cost
* * * * * * * * * * * * * * I/O
* * * * * * * * * * * * * * |
* * * * * * * * * * * * * 1396.18
* * * * * * * * * * * * * TBSCAN
* * * * * * * * * * * * * ( * 2)
* * * * * * * * * * * * * 109695
* * * * * * * * * * * * * 5007.48
* * * * * * * * * * * * * * |
* * * * * * * * * * * * * 1396.18
* * * * * * * * * * * * * SORT
* * * * * * * * * * * * * ( * 3)
* * * * * * * * * * * * * 109695
* * * * * * * * * * * * * 5007.48
* * * * * * * * * * * * * * |
* * * * * * * * * * * * * 7790.22
* * * * * * * * * * * * * ^HSJOIN
* * * * * * * * * * * * * ( * 4)
* * * * * * * * * * * * * 109690
* * * * * * * * * * * * * 5007.48
* * * * * * * * * /---------+----------\
* * * * * * * 179747 * * * * * * * * * *8676
* * * * * * * NLJOIN * * * * * * * * * IXSCAN
* * * * * * * ( * 5) * * * * * * * * * ( * 8)
* * * * * * * 107996 * * * * * * * * * 1674.75
* * * * * * * 4862.23 * * * * * * * * *145.253
* * * * */------+-------\ * * * * * * * *|
* * *1396.18 * * * * * *128.742 * * * *110839
* * *TBSCAN * * * * * * IXSCAN * * INDEX: DB2INST1
* * *( * 6) * * * * * * ( * 7) * IDX001220131510000
* * *2673.42 * * * * * *75.4497 * * * * *Q3
* * * *653 * * * * * * *3.01481
* * * *| * * * * * * * * *|
* * *175675 * * * * * 2.26167e+07
*TABLE: ASSET * * * INDEX: DB2INST1
*TBL_ASSET_SW_ID *IDX002081020460000
* * * *Q1 * * * * * * * * Q2

This query has good response time. We still have the TBSCAN on
TBL_ASSET_SW_ID and an IXSCAN on TBL_ASSET_SW, then they both are
NLJOINed.

What confuses me is that TBSCAN cost on TBL_ASSET_SW_ID is still
virtually the same (cost = 2673) and still represents 2% of overall
total cost.

Nevertheless, the overall access plan was been influenced by a bad
predicate, even though the plan for that table itself is not much
different on both scenarios.

What am I missing here?

Thanks,
Hello Michel,

I think you should take a closer look on the estimated cardinality of
the TBSCAN operation. While the cost of the tablescan is practically
identical for both situations, the expected result set is much larger
when you use the UPPER-function. Logical, right? The biggest part of
the work is IO, and with the upper-function you add a little CPU to
the comparison, so the cost is close. In this case, DB2 estimates that
with using UPPER, over 10 times more rows will qualify for your
search, and that definitively has an impact on the rest of the access
plan. In your case, the work that needs to be done with the result
from your tablescan.

If you don't agree on the choices DB2 makes, did you compare the
estimated cardinality with the real one?

--
Frederik Engelen

Reply With Quote
  #3  
Old   
Michel Esber
 
Posts: n/a

Default Re: Weird access plan behavior - 11-17-2010 , 09:49 AM



Hi Frederick,

Thanks for your response. Let me share parts of the plan for both
queries.

Slow running query:

5) TBSCAN: (Table Scan)
Cumulative Total Cost: 2693.42
Cumulative CPU Cost: 5.06243e+08
Cumulative I/O Cost: 653
Cumulative Re-Total Cost: 117.402
Cumulative Re-CPU Cost: 4.58865e+08
Cumulative Re-I/O Cost: 0
Cumulative First Row Cost: 25.1504
Estimated Bufferpool Buffers: 653


Output Streams:
--------------
2) To Operator #4

Estimated number of rows:
17567.5
Number of columns: 2
Subquery predicate ID: Not
Applicable

Column Names:
------------
+Q1.SW_NAME+Q1.SW_ID


Fast Query:

6) TBSCAN: (Table Scan)
Cumulative Total Cost: 2673.42
Cumulative CPU Cost: 4.28068e+08
Cumulative I/O Cost: 653
Cumulative Re-Total Cost: 97.4006
Cumulative Re-CPU Cost: 3.80689e+08
Cumulative Re-I/O Cost: 0
Cumulative First Row Cost: 26.8954
Estimated Bufferpool Buffers: 653



Output Streams:
--------------
2) To Operator #5

Estimated number of rows:
1396.18
Number of columns: 2
Subquery predicate ID: Not
Applicable

Column Names:
------------
+Q1.SW_NAME+Q1.SW_ID


If we look at the CPU Costs, there is an approx. 20% decreased cost in
the Fast Query.
Also, the I/O Cost is exactly the same on both queries.

Having that in mind, why is the total cost for both plans so
different ?

I understand that having less estimated rows makes DB2 choose other
access plans with the other operators, tables and indexes.

But since this TBSCAN has a direct impact on all the access plan,
shouldn't it be more visible for less experienced DBAs that this
operator has direct impact on the total query performance ?

One may read the total cost of the original query (733996) and then
compare it to this TBSCAN cost (2693).
The first direct (and WRONG !!) conclusion is that a DBA should not
focus his attention on this TBSCAN ... but he should !!!


The real cardinality returned by this TBSCAN is approx 250 rows. Very
different from the estimated rows on the good query (~1400) and bad
query (~17000).

Thanks,

-M


Quote:
Hello Michel,

I think you should take a closer look on the estimated cardinality of
the TBSCAN operation. While the cost of the tablescan is practically
identical for both situations, the expected result set is much larger
when you use the UPPER-function. Logical, right? The biggest part of
the work is IO, and with the upper-function you add a little CPU to
the comparison, so the cost is close. In this case, DB2 estimates that
with using UPPER, over 10 times more rows will qualify for your
search, and that definitively has an impact on the rest of the access
plan. In your case, the work that needs to be done with the result
from your tablescan.

If you don't agree on the choices DB2 makes, did you compare the
estimated cardinality with the real one?

Reply With Quote
  #4  
Old   
Frederik Engelen
 
Posts: n/a

Default Re: Weird access plan behavior - 11-17-2010 , 10:44 AM



On Nov 17, 4:49*pm, Michel Esber <smes... (AT) gmail (DOT) com> wrote:
Quote:
Hi Frederick,

Thanks for your response. Let me share parts of the plan for both
queries.

Slow running query:

*5) TBSCAN: (Table Scan)
* * * * * * * * Cumulative Total Cost: * * * * *2693.42
* * * * * * * * Cumulative CPU Cost: * * * * * *5.06243e+08
* * * * * * * * Cumulative I/O Cost: * * * * * *653
* * * * * * * * Cumulative Re-Total Cost: * * * 117..402
* * * * * * * * Cumulative Re-CPU Cost: * * * * 4..58865e+08
* * * * * * * * Cumulative Re-I/O Cost: * * * * 0
* * * * * * * * Cumulative First Row Cost: * * *25.1504
* * * * * * * * Estimated Bufferpool Buffers: * 653

* * * * * * * * Output Streams:
* * * * * * * * --------------
* * * * * * * * * * * * 2) To Operator #4

* * * * * * * * * * * * * * * * Estimatednumber of rows:
17567.5
* * * * * * * * * * * * * * * * Number ofcolumns: * * * * * * *2
* * * * * * * * * * * * * * * * Subquery predicate ID: * * * * *Not
Applicable

* * * * * * * * * * * * * * * * Column Names:
* * * * * * * * * * * * * * * * ------------
* * * * * * * * * * * * * * * * +Q1.SW_NAME+Q1.SW_ID

Fast Query:

*6) TBSCAN: (Table Scan)
* * * * * * * * Cumulative Total Cost: * * * * *2673.42
* * * * * * * * Cumulative CPU Cost: * * * * * *4.28068e+08
* * * * * * * * Cumulative I/O Cost: * * * * * *653
* * * * * * * * Cumulative Re-Total Cost: * * * 97.4006
* * * * * * * * Cumulative Re-CPU Cost: * * * * 3..80689e+08
* * * * * * * * Cumulative Re-I/O Cost: * * * * 0
* * * * * * * * Cumulative First Row Cost: * * *26.8954
* * * * * * * * Estimated Bufferpool Buffers: * 653

* * * * * * * * Output Streams:
* * * * * * * * --------------
* * * * * * * * * * * * 2) To Operator #5

* * * * * * * * * * * * * * * * Estimatednumber of rows:
1396.18
* * * * * * * * * * * * * * * * Number ofcolumns: * * * * * * *2
* * * * * * * * * * * * * * * * Subquery predicate ID: * * * * *Not
Applicable

* * * * * * * * * * * * * * * * Column Names:
* * * * * * * * * * * * * * * * ------------
* * * * * * * * * * * * * * * * +Q1.SW_NAME+Q1.SW_ID

If we look at the CPU Costs, there is an approx. 20% decreased cost in
the Fast Query.
Also, the I/O Cost is exactly the same on both queries.

Having that in mind, why is the total cost for both plans so
different ?

I understand that having less estimated rows makes DB2 choose other
access plans with the other operators, tables and indexes.

But since this TBSCAN has a direct impact on all the access plan,
shouldn't it be more visible for less experienced DBAs that this
operator has direct impact on the total query performance ?

One may read the total cost of the original query (733996) and then
compare it to this TBSCAN cost (2693).
The first direct (and WRONG !!) conclusion is that a DBA should not
focus his attention on this TBSCAN ... but he should !!!

The real cardinality returned by this TBSCAN is approx 250 rows. Very
different from the estimated rows on the good query (~1400) and bad
query (~17000).

Thanks,

-M



Hello Michel,

I think you should take a closer look on the estimated cardinality of
the TBSCAN operation. While the cost of the tablescan is practically
identical for both situations, the expected result set is much larger
when you use the UPPER-function. Logical, right? The biggest part of
the work is IO, and with the upper-function you add a little CPU to
the comparison, so the cost is close. In this case, DB2 estimates that
with using UPPER, over 10 times more rows will qualify for your
search, and that definitively has an impact on the rest of the access
plan. In your case, the work that needs to be done with the result
from your tablescan.

If you don't agree on the choices DB2 makes, did you compare the
estimated cardinality with the real one?
"shouldn't it be more visible for less experienced DBAs that this
operator has direct impact on the total query performance ?"

No, that's what makes the others keep their jobs ;-)

On a more serious note, I think that would be pretty difficult to
achieve for DB2. In this case, you want him to essentially say: 'I
don't like this query because your predicate returns too much results.
If you'd simplify it, it would be easier to execute". It's designed to
execute what you want from it as efficiently as possible, not to make
judgements about it.

In your first example, how could he make the judgement between:
- I'm not too happy with this LIKE-statement, you really want me to
retrieve 17K rows? Pick something more precise.
- Do you really need this table C joined to it? What's wrong with only
A and B?

I'm mocking it a bit, but as far as I understood, that's really the
choice you expect him to make.

--
Frederik Engelen

Reply With Quote
  #5  
Old   
Michel Esber
 
Posts: n/a

Default Re: Weird access plan behavior - 11-17-2010 , 11:39 AM



Hi Fred,

Here is my final question about this

In my bad query:

The use of UPPER(SW_NAME) like '%ABC%' forced a TBSCAN on B
(TBL_ASSET_SW_ID). The other tables (A and C) were IXSCANned using
predicate (A.MACHINE_ID = C.MACHINE_ID), then this result was
HSJOINed with the TBSCAN on B.

In my good query:

The use of SW_NAME like '%ABC%' forced again a TBSCAN on B
(TBL_ASSET_SW_ID). DB2 now chose to IXSCAN table A on field (A.SW_ID),
then NLJOINed both (A and B). It then IXSCANed table C and HSJoined
with the previous result (A and B).

Notice that DB2 chose different indexes on both plans.
Is this due to different expected number of rows while using
UPPER(SW_NAME) like %% and without the use of UPPER () ?

I wonder if DB2 always forced the use of index A.SW_ID a better result
would be achieved, even with the UPPER(SW_NAME) predicate ?

Thanks again,

-M

Reply With Quote
  #6  
Old   
Frederik Engelen
 
Posts: n/a

Default Re: Weird access plan behavior - 11-17-2010 , 01:27 PM



On 17 nov, 18:39, Michel Esber <smes... (AT) gmail (DOT) com> wrote:
Quote:
Hi Fred,

Here is my final question about this

In my bad query:

The use of UPPER(SW_NAME) like '%ABC%' forced a TBSCAN on B
(TBL_ASSET_SW_ID). The other tables (A and C) were IXSCANned using
predicate (A.MACHINE_ID = C.MACHINE_ID), *then this result was
HSJOINed with the TBSCAN on B.

In my good query:

The use of SW_NAME like '%ABC%' forced again a TBSCAN on B
(TBL_ASSET_SW_ID). DB2 now chose to IXSCAN table A on field (A.SW_ID),
then NLJOINed both (A and B). It then IXSCANed table C and HSJoined
with the previous result (A and B).

Notice that DB2 chose different indexes on both plans.
Is this due to different expected number of rows while using
UPPER(SW_NAME) like %% and without the use of UPPER () ?

I wonder if DB2 always forced the use of index A.SW_ID a better result
would be achieved, even with the UPPER(SW_NAME) predicate ?

Thanks again,

-M
Hello Michel,

Don't be bothered by asking this. Everyone has wondered sometimes why
DB2 chooses a specific plan. And most of the time, DB2 is right ;-)

To be sure that we're talking about the same thing, I'll explain how I
think you should see a NLJOIN. In your non-UPPER query, it's not that
DB2 does:
- first a TBSCAN on B
- then a IXSCAN on A
- and then NLJOINs the two results together

In fact, what happens is that he:
- performs a TBSCAN over B
- for each matching records of B (like '%ABC%'), search all
corresponding records in A (A.SW_ID = B.SW_ID) through an IXSCAN

This 'going over all corresponding records' is the 'loop' in NLJOIN.

A hash join is more like throwing everything together and using hash
algorithms to find the corresponding records.

What happens in your good query is that DB2 uses the selectivity of
the LIKE operation to filter out the corresponding records of your 22M
record table A. This selectivity of 250/170K is pretty strong. This
small result is then thrown together with the whole table C in a
HSJOIN.

In your bad version, DB2 joins your A and C together first. That's at
least 22M records that are scanned and probably hashed afterwards and
that's probably what makes your query so slow.

I'm actually surprised that he still executes a hash join in the last
scenario. Do you have an index on C.MACHINE_ID?

--
Frederik Engelen

Reply With Quote
  #7  
Old   
Ian
 
Posts: n/a

Default Re: Weird access plan behavior - 11-17-2010 , 03:08 PM



On Nov 17, 6:28*am, Michel Esber <smes... (AT) gmail (DOT) com> wrote:


Quote:
What am I missing here?
The slowness is almost certainly coming from the hash join (op 4 in
the first query), not the table scan. Are you getting hash join
overflows in the first query, but not the second query?

In the first query, DB2 is estimating the the build table (the result
of the NLJOIN) will have ~980K rows, and the probe table will have
~17.5K rows. In the second query, the build table will have ~8.6K
rows, and the probe will have ~180K rows.

Are the optimizer's estimates accurate?

Reply With Quote
  #8  
Old   
Willem Fischer
 
Posts: n/a

Default Re: Weird access plan behavior - 11-18-2010 , 01:01 AM



On Nov 17, 10:08*pm, Ian <ianb... (AT) mobileaudio (DOT) com> wrote:
Quote:
On Nov 17, 6:28*am, Michel Esber <smes... (AT) gmail (DOT) com> wrote:

What am I missing here?

The slowness is almost certainly coming from the hash join (op 4 in
the first query), not the table scan. *Are you getting hash join
overflows in the first query, but not the second query?

In the first query, DB2 is estimating the the build table (the result
of the NLJOIN) will have ~980K rows, and the probe table will have
~17.5K rows. *In the second query, the build table will have ~8.6K
rows, and the probe will have ~180K rows.

Are the optimizer's estimates accurate?
Michel, I've seen LIKE and UPPER to be very slow on UTF8 databases.
You still do a LIKE in the second query, but I'd still be interested
in the results of CPU usage of

select count(*) from ASSET_SW_ID where upper(SW_NAME) like '%ACROBAT%'

versus

select count(*) from ASSET_SW_ID where SW_NAME like '%ACROBAT%'

Plus, take the execution plans of those two and look at the
cardinality... DB2 makes assumptions on the filter factor of functions
that are the same for all functions, and if that applies here then the
filter factor is 0.04 for the predicate "upper() like '%...
%'" (operator 5 on the first explain plan).

Reply With Quote
  #9  
Old   
Michel Esber
 
Posts: n/a

Default Re: Weird access plan behavior - 11-18-2010 , 07:20 AM



Quote:
In your bad version, DB2 joins your A and C together first. That's at
least 22M records that are scanned and probably hashed afterwards and
that's probably what makes your query so slow.

I'm actually surprised that he still executes a hash join in the last
scenario. Do you have an index on C.MACHINE_ID?
Hi Fred,

Yeah, I understand the problem now. I was hoping that DB2 decided to
joined B and A even with a UPPER(SW_NAME). Anyways ... I hope it is
right .

Yes, I do have indexes on all fields of the query I posted. And
MACHINE_ID is the PK of table C .

Thanks,

Reply With Quote
  #10  
Old   
Michel Esber
 
Posts: n/a

Default Re: Weird access plan behavior - 11-18-2010 , 07:44 AM



Hi Ian,

Quote:
The slowness is almost certainly coming from the hash join (op 4 in
the first query), not the table scan. *Are you getting hash join
overflows in the first query, but not the second query?
Not sure to check for hash join overflows. Anyway, here is the plan
for the hash join on both queries:


BAD:

4) HSJOIN: (Hash Join)
Cumulative Total Cost: 733984
Cumulative CPU Cost: 2.80605e+09
Cumulative I/O Cost: 29959.5
Cumulative Re-Total Cost: 733984
Cumulative Re-CPU Cost: 2.80605e+09
Cumulative Re-I/O Cost: 29959.5
Cumulative First Row Cost: 733984
Estimated Bufferpool Buffers: 653

Arguments:
---------
BITFLTR : (Hash Join Bit Filter used)
FALSE
EARLYOUT: (Early Out flag)
LEFT
HASHCODE: (Hash Code Size)
24 BIT
HASHTBSZ: (Number of hash table entries)
137759
TEMPSIZE: (Temporary Table Page Size)
8192
TUPBLKSZ: (Tuple Block Size (bytes))
16000

Predicates:
----------
2) Predicate used in Join
Comparison Operator: Equal (=)
Subquery Input Required: No
Filter Factor: 5.69233e-06

Predicate Text:
--------------
(Q2.SW_ID = Q1.SW_ID)


Input Streams:
-------------
2) From Operator #5

Estimated number of rows:
17567.5
Number of columns: 2
Subquery predicate ID: Not
Applicable

Column Names:
------------
+Q1.SW_NAME+Q1.SW_ID

7) From Operator #6

Estimated number of rows: 980205
Number of columns: 2
Subquery predicate ID: Not
Applicable

Column Names:
------------
+Q2.SW_ID+Q3.MACHINE_ID


Output Streams:
--------------
8) To Operator #3

Estimated number of rows:
17567.5
Number of columns: 3
Subquery predicate ID: Not
Applicable

Column Names:
------------
+Q1.SW_NAME+Q1.SW_ID+Q2.SW_ID


GOOD:

4) HSJOIN: (Hash Join)
Cumulative Total Cost: 109690
Cumulative CPU Cost: 8.77798e+08
Cumulative I/O Cost: 5007.48
Cumulative Re-Total Cost: 109690
Cumulative Re-CPU Cost: 8.77798e+08
Cumulative Re-I/O Cost: 5007.48
Cumulative First Row Cost: 109690
Estimated Bufferpool Buffers: 4862.23

Arguments:
---------
BITFLTR : (Hash Join Bit Filter used)
FALSE
EARLYOUT: (Early Out flag)
LEFT
HASHCODE: (Hash Code Size)
24 BIT
HASHTBSZ: (Number of hash table entries)
8676
TEMPSIZE: (Temporary Table Page Size)
8192
TUPBLKSZ: (Tuple Block Size (bytes))
16000

Predicates:
----------
3) Predicate used in Join
Comparison Operator: Equal (=)
Subquery Input Required: No
Filter Factor: 4.99538e-06

Predicate Text:
--------------
(Q2.MACHINE_ID = Q3.MACHINE_ID)


Input Streams:
-------------
5) From Operator #5

Estimated number of rows: 179747
Number of columns: 4
Subquery predicate ID: Not
Applicable

Column Names:
------------
+Q1.SW_NAME+Q1.SW_ID+Q2.MACHINE_ID
+Q2.SW_ID

7) From Operator #8

Estimated number of rows: 8676
Number of columns: 1
Subquery predicate ID: Not
Applicable

Column Names:
------------
+Q3.MACHINE_ID(D)


Output Streams:
--------------
8) To Operator #3

Estimated number of rows:
7790.22
Number of columns: 4
Subquery predicate ID: Not
Applicable

Column Names:
------------
+Q1.SW_NAME+Q2.MACHINE_ID+Q2.SW_ID
+Q3.MACHINE_ID



Quote:
In the first query, DB2 is estimating the the build table (the result
of the NLJOIN) will have ~980K rows, and the probe table will have
~17.5K rows. *In the second query, the build table will have ~8.6K
rows, and the probe will have ~180K rows.

Are the optimizer's estimates accurate?
Table C has 8633 rows that match that criteria. The optimizer is
estimating 8676 in the bad plan and good plans

Table B has in fact 250 rows (either containing ACROBAT, Acrobat or
whatever). The optimizer is estimating 17k rows (bad plan) and 1400
rows (good plan). Even in the good plan, this is still far from the
real number.

Table A (TBL_ASSET_SW) has 974k rows. I am following this query:

select count(*) from ASSET.TBL_ASSET_SW S , ASSET.TBL_ASSET_MACHINE_ID
M where CUSTOMER_ID='CC0D6761CD31F72EA7FCC686437DDA98' and
S.MACHINE_ID = M.MACHINE_ID

Weird enough, DB2 is estimating 112.979 for the bad plan (uses a PK
index on MACHINE_ID) and 128.742 for the good plan (uses a different
index on SW_ID).

Any thought about the optimizer's estimates?

Last statistics were collected in the beginning of this month, and the
table size does not change a lot.

Thanks,

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.