dbTalk Databases Forums  

Question about suggested index

comp.databases.ibm-db2 comp.databases.ibm-db2


Discuss Question about suggested index in the comp.databases.ibm-db2 forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Michel Esber
 
Posts: n/a

Default Question about suggested index - 05-04-2011 , 09:42 AM






Table T
(
MACHINE_ID varchar(24) not null,
COLLECT_TIME TIMESTAMP not null,
DISK_ID varchar(60) not null,
BYTES bigint not null,
DISKIO integer not null
);

alter table T add constraint PK primary key (MACHINE_ID, COLLECT_TIME,
DISK_ID).
There is not other defined index. This table has updated statistics
and contains 800M+ rows.

The following query runs for 11 minutes:

select MACHINE_ID, COLLECT_TIME, DISK_ID, BYTES, TMACT,TPS
from bi.t_minute_diskio
where machine_id = '002481D161ABB73C41BA6302'
and collect_time between '2011-04-01 00:00:00.0' and '2011-04-30
23:59:59.0'
and disk_id = 'hdisk1'
and TMACT between 0 and 60

The plan shows an IXSCAN on the existing index created for the PK.
db2advis recommends:

CREATE INDEX DB2INST1.IDX1105041205410 ON T
(DISK_ID ASC, TMACT DESC)
ALLOW REVERSE SCANS
COLLECT SAMPLED DETAILED STATISTICS;

This index would reduce the query cost in 98%. Index size is 6GB in
space, so I can't easily create it.

The query uses MACHINE_ID, COLLECT_TIME, DISK_ID and TMACT as
predicates. The suggested index contains (DISK_ID, TMACT).

Will DB2 try to combine both indexes here?
MACHINE_ID is our main identifier in the system and the suggested
index will not even use it.
Is there any way to simulate the final access plan without creating
the index?

Thanks,
-M

Reply With Quote
  #2  
Old   
Bruno Almeida
 
Posts: n/a

Default Re: Question about suggested index - 05-04-2011 , 02:36 PM






On May 4, 11:42*am, Michel Esber <smes... (AT) gmail (DOT) com> wrote:
Quote:
Table T
(
MACHINE_ID varchar(24) not null,
COLLECT_TIME TIMESTAMP not null,
DISK_ID *varchar(60) not null,
BYTES bigint not null,
DISKIO integer not null
);

alter table T add constraint PK primary key (MACHINE_ID, COLLECT_TIME,
DISK_ID).
There is not other defined index. This table has updated statistics
and contains 800M+ rows.

The following query runs for 11 minutes:

select MACHINE_ID, COLLECT_TIME, DISK_ID, BYTES, TMACT,TPS
from bi.t_minute_diskio
where machine_id = '002481D161ABB73C41BA6302'
* * * * * and collect_time between '2011-04-01 00:00:00.0' and '2011-04-30
23:59:59.0'
* * * * * and disk_id = 'hdisk1'
* * * and TMACT between 0 and 60

The plan shows an IXSCAN on the existing index created for the PK.
db2advis recommends:

* *CREATE INDEX DB2INST1.IDX1105041205410 ON T
* * * * * (DISK_ID ASC, TMACT DESC)
* *ALLOW REVERSE SCANS
* *COLLECT SAMPLED DETAILED STATISTICS;

This index would reduce the query cost in 98%. *Index size is 6GB in
space, so I can't easily create it.

The query uses MACHINE_ID, COLLECT_TIME, DISK_ID and TMACT as
predicates. The suggested index contains (DISK_ID, TMACT).

Will DB2 try to combine both indexes here?
MACHINE_ID is our main identifier in the system and the suggested
index will not even use it.
Is there any way to simulate the final access plan without creating
the index?

Thanks,
-M
DB2 LUW V9.5 Fixpack 5.

Reply With Quote
  #3  
Old   
Lennart Jonsson
 
Posts: n/a

Default Re: Question about suggested index - 05-04-2011 , 11:45 PM



On 2011-05-04 16:42, Michel Esber wrote:
Quote:
Table T
(
MACHINE_ID varchar(24) not null,
COLLECT_TIME TIMESTAMP not null,
DISK_ID varchar(60) not null,
BYTES bigint not null,
DISKIO integer not null
);

alter table T add constraint PK primary key (MACHINE_ID, COLLECT_TIME,
DISK_ID).
There is not other defined index. This table has updated statistics
and contains 800M+ rows.

The following query runs for 11 minutes:

select MACHINE_ID, COLLECT_TIME, DISK_ID, BYTES, TMACT,TPS
from bi.t_minute_diskio
where machine_id = '002481D161ABB73C41BA6302'
and collect_time between '2011-04-01 00:00:00.0' and '2011-04-30
23:59:59.0'
and disk_id = 'hdisk1'
and TMACT between 0 and 60

The plan shows an IXSCAN on the existing index created for the PK.
db2advis recommends:

CREATE INDEX DB2INST1.IDX1105041205410 ON T
(DISK_ID ASC, TMACT DESC)
ALLOW REVERSE SCANS
COLLECT SAMPLED DETAILED STATISTICS;

This index would reduce the query cost in 98%. Index size is 6GB in
space, so I can't easily create it.

The query uses MACHINE_ID, COLLECT_TIME, DISK_ID and TMACT as
predicates. The suggested index contains (DISK_ID, TMACT).

Will DB2 try to combine both indexes here?
MACHINE_ID is our main identifier in the system and the suggested
index will not even use it.
Just some thinking, what is firstkeycard for the existing index? I.e.
how many distinct machine_id is there? If the number is low the
selectivity for machine_id = '002481D161ABB73C41BA6302' will be poor.

What is:

select count(distinct DISK_ID) from bi.t_minute_diskio?

Given your sample data I guess it is very low, but for some reason it is
preferred over machine_id. If it against odds is much higher than
firstkeycard it is a better candidate than MACHINE_ID for beeing first
column in the index (for this query). If you create an explicit index
before defining the primary key you can choose another order of columns.
If you create an explicit indexes instead, you will also have the
opportunity to specify other physical properties of the index, it is
usually a good idea to do so.

Are you using parameter markers for collect_time in your query, or are
they hard-coded as shown in your query? The between predicate is tricky
when it comes to parameter markers, adding a selectivity clause for that
predicate in the query might help.


Quote:
Is there any way to simulate the final access plan without creating
the index?

Not that I know of.

/Lennart

Reply With Quote
  #4  
Old   
joklassen
 
Posts: n/a

Default Re: Question about suggested index - 05-05-2011 , 03:04 AM



On May 4, 4:42*pm, Michel Esber <smes... (AT) gmail (DOT) com> wrote:
Quote:
Table T
(
MACHINE_ID varchar(24) not null,
COLLECT_TIME TIMESTAMP not null,
DISK_ID *varchar(60) not null,
BYTES bigint not null,
DISKIO integer not null
);

alter table T add constraint PK primary key (MACHINE_ID, COLLECT_TIME,
DISK_ID).
There is not other defined index. This table has updated statistics
and contains 800M+ rows.

The following query runs for 11 minutes:

select MACHINE_ID, COLLECT_TIME, DISK_ID, BYTES, TMACT,TPS
from bi.t_minute_diskio
where machine_id = '002481D161ABB73C41BA6302'
* * * * * and collect_time between '2011-04-01 00:00:00.0' and '2011-04-30
23:59:59.0'
* * * * * and disk_id = 'hdisk1'
* * * and TMACT between 0 and 60

The plan shows an IXSCAN on the existing index created for the PK.
db2advis recommends:

* *CREATE INDEX DB2INST1.IDX1105041205410 ON T
* * * * * (DISK_ID ASC, TMACT DESC)
* *ALLOW REVERSE SCANS
* *COLLECT SAMPLED DETAILED STATISTICS;

This index would reduce the query cost in 98%. *Index size is 6GB in
space, so I can't easily create it.

The query uses MACHINE_ID, COLLECT_TIME, DISK_ID and TMACT as
predicates. The suggested index contains (DISK_ID, TMACT).

Will DB2 try to combine both indexes here?
MACHINE_ID is our main identifier in the system and the suggested
index will not even use it.
Is there any way to simulate the final access plan without creating
the index?

Thanks,
-M

Is there any way to simulate the final access plan without creating
the index?
Try SET CURRENT EXPLAIN MODE EVALUATE INDEXES
http://publib.boulder.ibm.com/infoce.../r0001001.html

Reply With Quote
  #5  
Old   
Michel Esber
 
Posts: n/a

Default Re: Question about suggested index - 05-05-2011 , 01:02 PM



Lennart,

Quote:
Just some thinking, what is firstkeycard for the existing index? I.e.
how many distinct machine_id is there? If the number is low the
selectivity for machine_id = '002481D161ABB73C41BA6302' will be poor.

What is:

* * select count(distinct DISK_ID) from bi.t_minute_diskio?


FirstKeyCard for the existing index is around 1800.
The count of distinct DISK_IDs is aroung 6000.

Now I understand why Db2 suggests DISK_ID as a first column.


Quote:
Are you using parameter markers for collect_time in your query, or are
they hard-coded as shown in your query? The between predicate is tricky
when it comes to parameter markers, adding a selectivity clause for that
predicate in the query might help.
Yes, we are using parameter markers. I am not sure what you mean by
adding a selectivity clause. Show I not use parameter markers in this
case?


joklassen,
I did not know about that option in set current explain. That was
exactly what I needed.

Thanks guys,

-M


I will create the suggested index in production and let you know the
results.

Reply With Quote
  #6  
Old   
Lennart Jonsson
 
Posts: n/a

Default Re: Question about suggested index - 05-05-2011 , 04:08 PM



On 2011-05-05 20:02, Michel Esber wrote:
[...]
Quote:
Are you using parameter markers for collect_time in your query, or are
they hard-coded as shown in your query? The between predicate is tricky
when it comes to parameter markers, adding a selectivity clause for that
predicate in the query might help.

Yes, we are using parameter markers. I am not sure what you mean by
adding a selectivity clause. Show I not use parameter markers in this
case?

As usual it depends :-) The query is probably cheap to compile, and if
you do not run it very frequently or don't vary collect_time a lot the
overhead for the package cache will be low. If on the other hand you run
the query frequently with different values all the time, the cost for
maintaining the package cache will increase.

In a predicate like:

collect_time between ? and ?

db2 has to guess the selectivity (I've heard the number 0.04 somewhere).
In a predicate like:

collect_time between '2011-04-01 00:00:00.0'
and '2011-04-30 23:59:59.0'

db2 will have a much better chance estimating the selectivity


A short example:

create table T (x date not null, y int not null);

insert into t
with tt (d,i) as (
values (current_date - 20 years,
mod(cast(current_date - 20 years as int), 1000))
union all
select d+1 day, mod(cast(d+1 day as int),1000)
from tt where d < current_date + 20 years
) select d,i from tt;

create index x1 on t (x) allow reverse scans collect sampled detailed
statistics;

create index x2 on t (y) allow reverse scans collect sampled detailed
statistics;

explain plan for select * from t where y = ? and x between ? and ?;

Rows
RETURN
( 1)
Cost
I/O
Quote:
58.444
NLJOIN
( 2)
43.2767
5.6
/------+-------\
1 58.444
TBSCAN FETCH
( 3) ( 4)
0.000240187 43.2765
0 5.6
Quote:
/---+----\
1 584.44 14611
TABFNC: SYSIBM IXSCAN TABLE: DB2INST1
GENROW ( 5) T
Q1 10.7378 Q3
1.36
Quote:
14611
INDEX: DB2INST1
X2
Q3



explain plan for select * from t where y = ? and x between '2011-01-01'
and '2011-01-01'

Rows
RETURN
( 1)
Cost
I/O
Quote:
23.3776
FETCH
( 2)
37.3497
4.8
/---+----\
23.3776 14611
RIDSCN TABLE: DB2INST1
( 3) T
29.7624 Q1
3.8
Quote:
23.3776
SORT
( 4)
29.7619
3.8
Quote:
23.3776
IXAND
( 5)
29.7511
3.8
/-----+------\
584.44 584.44
IXSCAN IXSCAN
( 6) ( 7)
18.9107 10.7378
2.44 1.36
Quote:
|
14611 14611
INDEX: DB2INST1 INDEX: DB2INST1
X1 X2
Q1 Q1


Here db2 realises that it can benefit from using X1


When it comes to the selectivity clause I was wrong, it does not work
with between. There's an old article (2003):

http://www.ibm.com/developerworks/da...ps/dm-0312yip/

discussing the selectivity clause.


/Lennart

Reply With Quote
  #7  
Old   
joklassen
 
Posts: n/a

Default Re: Question about suggested index - 05-06-2011 , 01:39 AM



On May 5, 8:02*pm, Michel Esber <smes... (AT) gmail (DOT) com> wrote:
Quote:
Lennart,

Just some thinking, what is firstkeycard for the existing index? I.e.
how many distinct machine_id is there? If the number is low the
selectivity for machine_id = '002481D161ABB73C41BA6302' will be poor.

What is:

* * select count(distinct DISK_ID) from bi.t_minute_diskio?

FirstKeyCard for the existing index is around 1800.
The count of distinct DISK_IDs is aroung 6000.

Now I understand why Db2 suggests DISK_ID as a first column.

Are you using parameter markers for collect_time in your query, or are
they hard-coded as shown in your query? The between predicate is tricky
when it comes to parameter markers, adding a selectivity clause for that
predicate in the query might help.

Yes, we are using parameter markers. I am not sure what you mean by
adding a selectivity clause. Show I not use parameter markers in this
case?

joklassen,
I did not know about that option in set current explain. That was
exactly what I needed.

Thanks guys,

-M

I will create the suggested index in production and let you know the
results.
When it comes to deal with parameter markers you might also want to
take a look at the REOPT feature:
http://publib.boulder.ibm.com/infoce.../c0055082.html

Reply With Quote
  #8  
Old   
Ian
 
Posts: n/a

Default Re: Question about suggested index - 05-07-2011 , 04:45 AM



On May 5, 11:02*am, Michel Esber <smes... (AT) gmail (DOT) com> wrote:
Quote:
Lennart,

Just some thinking, what is firstkeycard for the existing index? I.e.
how many distinct machine_id is there? If the number is low the
selectivity for machine_id = '002481D161ABB73C41BA6302' will be poor.

What is:

* * select count(distinct DISK_ID) from bi.t_minute_diskio?

FirstKeyCard for the existing index is around 1800.
The count of distinct DISK_IDs is aroung 6000.

Now I understand why Db2 suggests DISK_ID as a first column.

Are you using parameter markers for collect_time in your query, or are
they hard-coded as shown in your query? The between predicate is tricky
when it comes to parameter markers, adding a selectivity clause for that
predicate in the query might help.

Yes, we are using parameter markers. I am not sure what you mean by
adding a selectivity clause. Show I not use parameter markers in this
case?

joklassen,
I did not know about that option in set current explain. That was
exactly what I needed.

Thanks guys,

-M

I will create the suggested index in production and let you know the
results.
Michel,

I suggest that you consider changing the order of the primary key
columns to be MACHINE_ID, DISK_ID, COLLECT_TIME. I suspect that
this would greatly increase the speed of the query AND eliminate
the suggestion for this second index.

What percentage of the rows qualify for the predicate on TMACT?
If it's sufficiently low, you *might* consider adding it as an
include column in your Primary Key.


Good luck,

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.