dbTalk Databases Forums  

Db2Advis, index suggestion and Optimizer

comp.databases.ibm-db2 comp.databases.ibm-db2


Discuss Db2Advis, index suggestion and Optimizer in the comp.databases.ibm-db2 forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Michel Esber
 
Posts: n/a

Default Db2Advis, index suggestion and Optimizer - 09-28-2010 , 04:13 PM






Hi,

DB2 LUW 9.5 FP5,

Consider the following DDL:


CREATE TABLE ASSET.TBL_ASSET_NETWORK_HISTORY (
MACHINE_ID VARCHAR(24) NOT NULL ,
LOCALTIME TIMESTAMP NOT NULL ,
OPERATION VARCHAR(1) NOT NULL ,
OBJECT VARCHAR(64) NOT NULL ,
VALUE VARCHAR(255) NOT NULL ,
COLLECT_TIME TIMESTAMP NOT NULL WITH DEFAULT CURRENT
TIMESTAMP ,
NOTIFICATION_ENABLED CHAR(1) NOT NULL WITH DEFAULT
'N' ,
IS_NOTIFIED CHAR(1) NOT NULL WITH DEFAULT 'N' ,
LOGIN_EVENT_ANALYSED CHAR(1)

);

ALTER TABLE ASSET .TBL_ASSET_NETWORK_HISTORY
ADD PRIMARY KEY
(MACHINE_ID,
LOCALTIME,
OPERATION,
OBJECT,
VALUE,
COLLECT_TIME);

CREATE INDEX ASSET .NTRYIDX1 ON ASSET .TBL_ASSET_NETWORK_HISTORY
(MACHINE_ID ASC,
OBJECT ASC,
LOGIN_EVENT_ANALYSED ASC,
OPERATION ASC,
LOCALTIME ASC,
VALUE ASC)
ALLOW REVERSE SCANS;


So the table has two indexes: the one created for the primary key and
NTRYIDX1.

I have a long running query:

select a.MACHINE_ID, a.VALUE LOGIN_NAME, a.LOCALTIME EVENT_TIME,
a.OPERATION
from ASSET.TBL_ASSET_NETWORK_HISTORY a inner join
ASSET.TBL_ASSET_LAST_TIME b on (a.machine_id = b.machine_id)
where a.OBJECT='LOGIN_NAME' and a.LOGIN_EVENT_ANALYSED='N' and
b.MACHINE_LOCALTIME between current timestamp - 3 months and current
timestamp
order by MACHINE_ID, EVENT_TIME, LOGIN_NAME, OPERATION
with ur ;


Running db2advis on it, here is the recommended index:

[96.05%] improvement

CREATE INDEX DB2INST1.IDX009281912100000 ON
ASSET.TBL_ASSET_NETWORK_HISTORY
(MACHINE_ID ASC, OBJECT ASC, LOGIN_EVENT_ANALYSED ASC, LOCALTIME ASC,
VALUE ASC, OPERATION ASC)
ALLOW REVERSE SCANS COLLECT SAMPLED DETAILED STATISTICS;


Well, index NTRYIDX1 has exactly all the required columns of the
suggested index, but not exactly in the same order.

What am I missing here? SHouldn't the db2 optimizer choose the
existing index?
Why do I need to create an index that is virtually the same as the
existing one?

Thanks,

-M

Reply With Quote
  #2  
Old   
Frederik Engelen
 
Posts: n/a

Default Re: Db2Advis, index suggestion and Optimizer - 09-29-2010 , 03:58 AM






On Sep 28, 11:13*pm, Michel Esber <mic... (AT) automatos (DOT) com> wrote:
Quote:
Hi,

DB2 LUW 9.5 FP5,

Consider the following DDL:

CREATE TABLE ASSET.TBL_ASSET_NETWORK_HISTORY *(
* * * * * * * * * MACHINE_ID VARCHAR(24) NOT NULL ,
* * * * * * * * * LOCALTIME TIMESTAMP NOT NULL ,
* * * * * * * * * OPERATION VARCHAR(1) NOT NULL ,
* * * * * * * * * OBJECT VARCHAR(64) NOT NULL ,
* * * * * * * * * VALUE VARCHAR(255) NOT NULL ,
* * * * * * * * * COLLECT_TIME TIMESTAMP NOT NULL WITH DEFAULT CURRENT
TIMESTAMP ,
* * * * * * * * * NOTIFICATION_ENABLED CHAR(1) NOT NULLWITH DEFAULT
'N' ,
* * * * * * * * * IS_NOTIFIED CHAR(1) NOT NULL WITH DEFAULT 'N' ,
* * * * * * * * * LOGIN_EVENT_ANALYSED CHAR(1)

* *);

ALTER TABLE ASSET * .TBL_ASSET_NETWORK_HISTORY
* * * * ADD PRIMARY KEY
* * * * * * * * (MACHINE_ID,
* * * * * * * * *LOCALTIME,
* * * * * * * * *OPERATION,
* * * * * * * * *OBJECT,
* * * * * * * * *VALUE,
* * * * * * * * *COLLECT_TIME);

CREATE INDEX ASSET * .NTRYIDX1 ON ASSET * .TBL_ASSET_NETWORK_HISTORY
* * * * * * * * (MACHINE_ID ASC,
* * * * * * * * *OBJECT ASC,
* * * * * * * * *LOGIN_EVENT_ANALYSED ASC,
* * * * * * * * *OPERATION ASC,
* * * * * * * * *LOCALTIME ASC,
* * * * * * * * *VALUE ASC)
* * * * * * * * ALLOW REVERSE SCANS;

So the table has two indexes: the one created for the primary key and
NTRYIDX1.

I have a long running query:

select *a.MACHINE_ID, a.VALUE LOGIN_NAME, a.LOCALTIME EVENT_TIME,
a.OPERATION
from *ASSET.TBL_ASSET_NETWORK_HISTORY *a inner join
ASSET.TBL_ASSET_LAST_TIME b on (a.machine_id = b.machine_id)
where *a.OBJECT='LOGIN_NAME' and a.LOGIN_EVENT_ANALYSED='N' and
b.MACHINE_LOCALTIME between current timestamp - 3 months and current
timestamp
order by MACHINE_ID, EVENT_TIME, LOGIN_NAME, OPERATION
with ur ;

Running db2advis on it, here is the recommended index:

[96.05%] improvement

CREATE INDEX DB2INST1.IDX009281912100000 ON
ASSET.TBL_ASSET_NETWORK_HISTORY
(MACHINE_ID ASC, OBJECT ASC, LOGIN_EVENT_ANALYSED ASC, LOCALTIME ASC,
VALUE ASC, OPERATION ASC)
ALLOW REVERSE SCANS COLLECT SAMPLED DETAILED STATISTICS;

Well, index NTRYIDX1 has exactly all the required columns of the
suggested index, but not exactly in the same order.

What am I missing here? SHouldn't the db2 optimizer choose the
existing index?
Why do I need to create an index that is virtually the same as the
existing one?

Thanks,

-M
Hello Michel,

The order of the columns in an index is important, since the sorting
(and the lookup) within the tree structure of an index happens on this
specific order.

Let me try to explain this with an example, you'll quickly understand
why:
table person (
first_name varchar,
last_name varchar,
month int,
year int
)

Let's assume this query: select * from person where month = 1 and year
= 1980

An index on (month, year) or (year, month) will perfectly address this
query.

Now, suppose that you have an index on (year, month, last_name), DB2
will still be able to navigate this index to the exact month-year
combination that you're looking for (first year, then month). The
index will be a bit bigger, so you'll have more IO, but this index is
still very suitable for this query.

An index on (year, first_name, month), however, will not be optimal.
DB2 will be able to navigate to the correct year, but after that, he's
stuck since all first_name values might be valid for you query. From
this point on, he must read all child values (all first_name/month
combinations from 1980) to find what he needs. Note that he is able to
limit the scan to all records from this specific year, so there is
still some advantage, but you'll understand that he easily must read
12 times as much records as he would've otherwise.

In your case, OPERATION is the 'blocking' column. Db2advis put all the
columns with a prerequisite at the front of the index declaration to
quickly traverse the index tree to the exact records you need. The
columns after that are used to further optimize IO. Since they are
part of the index, DB2 can read the columns LOCALTIME, VALUE and
OPERATION directly from the index, without having to do go to the
table.

I hope this all makes some sense to you.

--
Frederik Engelen

Reply With Quote
  #3  
Old   
Michel Esber
 
Posts: n/a

Default Re: Db2Advis, index suggestion and Optimizer - 09-29-2010 , 08:34 AM



Quote:
Hello Michel,

The order of the columns in an index is important, since the sorting
(and the lookup) within the tree structure of an index happens on this
specific order.

Let me try to explain this with an example, you'll quickly understand
why:
table person (
* * * * first_name varchar,
* * * * last_name varchar,
* * * * month int,
* * * * year int
)

Let's assume this query: select * from person where month = 1 and year
= 1980

An index on (month, year) or (year, month) will perfectly address this
query.

Now, suppose that you have an index on (year, month, last_name), DB2
will still be able to navigate this index to the exact month-year
combination that you're looking for (first year, then month). The
index will be a bit bigger, so you'll have more IO, but this index is
still very suitable for this query.

An index on (year, first_name, month), however, will not be optimal.
DB2 will be able to navigate to the correct year, but after that, he's
stuck since all first_name values might be valid for you query. From
this point on, he must read all child values (all first_name/month
combinations from 1980) to find what he needs. Note that he is able to
limit the scan to all records from this specific year, so there is
still some advantage, but you'll understand that he easily must read
12 times as much records as he would've otherwise.

In your case, OPERATION is the 'blocking' column. Db2advis put all the
columns with a prerequisite at the front of the index declaration to
quickly traverse the index tree to the exact records you need. The
columns after that are used to further optimize IO. Since they are
part of the index, DB2 can read the columns LOCALTIME, VALUE and
OPERATION directly from the index, without having to do go to the
table.

I hope this all makes some sense to you.

--
Frederik Engelen
Hi Frederik,

Thanks for your detailed explanation. So, let me see if I got this
right:

According to the long running query, I have the following relevant
predicates:

where
a.OBJECT='LOGIN_NAME' and
a.LOGIN_EVENT_ANALYSED='N' and

The field MACHINE_ID is used in the inner join with other tables. The
index:

CREATE INDEX ASSET .NTRYIDX1 ON ASSET .TBL_ASSET_NETWORK_HISTORY
(MACHINE_ID ASC,
OBJECT ASC,
LOGIN_EVENT_ANALYSED ASC,
OPERATION ASC,
LOCALTIME ASC,
VALUE ASC)
ALLOW REVERSE SCANS;

Contains the first 3 columns that needs to be traversed to match the
predicates.
The fields Operation, LOCALTIME, VALUE are used into the select column
list, and are not used as predicates.

SO it makes sense to traverse and index with these 3 initial columns
(MACHINE_ID, OBJECT, LOGIN_EVENT_ANALYSED).

What db2advis suggested is to create and index with the same exact
initial columns (the ones that will be used to traverse the index
tree),
but with the remaining columns (that will be used only as select
column list for output) in a slightly different order.

As far as I understand, the issue here is not to traverse the index
tree itself. Right ?

Couldn't the optimizer use index NTRYIDX1, traverse MACHINE_ID, OBJECT
and LOGIN_EVENT_ANALYSED, fetch the fields (OPERATION, LOCALTIME,
VALUE)
and then organize the selected columns in a different order ??

What am I missing here?

Reply With Quote
  #4  
Old   
Frederik Engelen
 
Posts: n/a

Default Re: Db2Advis, index suggestion and Optimizer - 09-29-2010 , 09:48 AM



On Sep 29, 3:34*pm, Michel Esber <mic... (AT) automatos (DOT) com> wrote:
Quote:
Hello Michel,

The order of the columns in an index is important, since the sorting
(and the lookup) within the tree structure of an index happens on this
specific order.

Let me try to explain this with an example, you'll quickly understand
why:
table person (
* * * * first_name varchar,
* * * * last_name varchar,
* * * * month int,
* * * * year int
)

Let's assume this query: select * from person where month = 1 and year
= 1980

An index on (month, year) or (year, month) will perfectly address this
query.

Now, suppose that you have an index on (year, month, last_name), DB2
will still be able to navigate this index to the exact month-year
combination that you're looking for (first year, then month). The
index will be a bit bigger, so you'll have more IO, but this index is
still very suitable for this query.

An index on (year, first_name, month), however, will not be optimal.
DB2 will be able to navigate to the correct year, but after that, he's
stuck since all first_name values might be valid for you query. From
this point on, he must read all child values (all first_name/month
combinations from 1980) to find what he needs. Note that he is able to
limit the scan to all records from this specific year, so there is
still some advantage, but you'll understand that he easily must read
12 times as much records as he would've otherwise.

In your case, OPERATION is the 'blocking' column. Db2advis put all the
columns with a prerequisite at the front of the index declaration to
quickly traverse the index tree to the exact records you need. The
columns after that are used to further optimize IO. Since they are
part of the index, DB2 can read the columns LOCALTIME, VALUE and
OPERATION directly from the index, without having to do go to the
table.

I hope this all makes some sense to you.

--
Frederik Engelen

Hi Frederik,

Thanks for your detailed explanation. So, let me see if I got this
right:

According to the long running query, I have the following relevant
predicates:

where
a.OBJECT='LOGIN_NAME' and
a.LOGIN_EVENT_ANALYSED='N' and

The field MACHINE_ID is used in the inner join with other tables. The
index:

CREATE INDEX ASSET * .NTRYIDX1 ON ASSET * .TBL_ASSET_NETWORK_HISTORY
* * * * * * * * (MACHINE_ID ASC,
* * * * * * * * *OBJECT ASC,
* * * * * * * * *LOGIN_EVENT_ANALYSED ASC,
* * * * * * * * *OPERATION ASC,
* * * * * * * * *LOCALTIME ASC,
* * * * * * * * *VALUE ASC)
* * * * * * * * ALLOW REVERSE SCANS;

Contains the first 3 columns that needs to be traversed to match the
predicates.
The fields Operation, LOCALTIME, VALUE are used into the select column
list, and are not used as predicates.

SO it makes sense to traverse and index with these 3 initial columns
(MACHINE_ID, OBJECT, LOGIN_EVENT_ANALYSED).

What db2advis suggested is to create and index with the same exact
initial columns (the ones that will be used to traverse the index
tree),
but with the remaining columns (that will be used only as select
column list for output) in a slightly different order.

As far as I understand, the issue here is not to traverse the index
tree itself. *Right ?

Couldn't the optimizer use index NTRYIDX1, traverse MACHINE_ID, OBJECT
and LOGIN_EVENT_ANALYSED, fetch the fields (OPERATION, LOCALTIME,
VALUE)
and then organize the selected columns in a different order ??

What am I missing here?
Hello Michel,

My apologies, I must've been confused (it was still pretty early
here). You already adressed the things I mentioned in the NTRYIDX1
index.

If I were you, I'd create this index and check the difference in
access plans before and after. Maybe that will give a clue.

--
Frederik

Reply With Quote
  #5  
Old   
Ian
 
Posts: n/a

Default Re: Db2Advis, index suggestion and Optimizer - 09-29-2010 , 03:52 PM



On Sep 29, 6:34*am, Michel Esber <mic... (AT) automatos (DOT) com> wrote:

Quote:
What am I missing here?
Remember that an index can allow the optimizer to avoid a sort.
So the ORDER BY clause has an effect on what db2advis will
suggest.

Your query's ORDER BY is:

MACHINE_ID
EVENT_TIME (i.e., LOCALTIME)
LOGIN_NAME (i.e., VALUE)
OPERATION

When reading scanning the suggested index, the rows will be returned
in sorted order.

With only the existing NTRYIDX1, you will likely see an additional
SORT
and TBSCAN operator, which will be required to sort the data into the
order you asked for.

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.