![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
#2
| |||
| |||
|
|
Hi, DB2 LUW 9.5 FP5, Consider the following DDL: CREATE TABLE ASSET.TBL_ASSET_NETWORK_HISTORY *( * * * * * * * * * MACHINE_ID VARCHAR(24) NOT NULL , * * * * * * * * * LOCALTIME TIMESTAMP NOT NULL , * * * * * * * * * OPERATION VARCHAR(1) NOT NULL , * * * * * * * * * OBJECT VARCHAR(64) NOT NULL , * * * * * * * * * VALUE VARCHAR(255) NOT NULL , * * * * * * * * * COLLECT_TIME TIMESTAMP NOT NULL WITH DEFAULT CURRENT TIMESTAMP , * * * * * * * * * NOTIFICATION_ENABLED CHAR(1) NOT NULLWITH DEFAULT 'N' , * * * * * * * * * IS_NOTIFIED CHAR(1) NOT NULL WITH DEFAULT 'N' , * * * * * * * * * LOGIN_EVENT_ANALYSED CHAR(1) * *); ALTER TABLE ASSET * .TBL_ASSET_NETWORK_HISTORY * * * * ADD PRIMARY KEY * * * * * * * * (MACHINE_ID, * * * * * * * * *LOCALTIME, * * * * * * * * *OPERATION, * * * * * * * * *OBJECT, * * * * * * * * *VALUE, * * * * * * * * *COLLECT_TIME); CREATE INDEX ASSET * .NTRYIDX1 ON ASSET * .TBL_ASSET_NETWORK_HISTORY * * * * * * * * (MACHINE_ID ASC, * * * * * * * * *OBJECT ASC, * * * * * * * * *LOGIN_EVENT_ANALYSED ASC, * * * * * * * * *OPERATION ASC, * * * * * * * * *LOCALTIME ASC, * * * * * * * * *VALUE ASC) * * * * * * * * ALLOW REVERSE SCANS; So the table has two indexes: the one created for the primary key and NTRYIDX1. I have a long running query: select *a.MACHINE_ID, a.VALUE LOGIN_NAME, a.LOCALTIME EVENT_TIME, a.OPERATION from *ASSET.TBL_ASSET_NETWORK_HISTORY *a inner join ASSET.TBL_ASSET_LAST_TIME b on (a.machine_id = b.machine_id) where *a.OBJECT='LOGIN_NAME' and a.LOGIN_EVENT_ANALYSED='N' and b.MACHINE_LOCALTIME between current timestamp - 3 months and current timestamp order by MACHINE_ID, EVENT_TIME, LOGIN_NAME, OPERATION with ur ; Running db2advis on it, here is the recommended index: [96.05%] improvement CREATE INDEX DB2INST1.IDX009281912100000 ON ASSET.TBL_ASSET_NETWORK_HISTORY (MACHINE_ID ASC, OBJECT ASC, LOGIN_EVENT_ANALYSED ASC, LOCALTIME ASC, VALUE ASC, OPERATION ASC) ALLOW REVERSE SCANS COLLECT SAMPLED DETAILED STATISTICS; Well, index NTRYIDX1 has exactly all the required columns of the suggested index, but not exactly in the same order. What am I missing here? SHouldn't the db2 optimizer choose the existing index? Why do I need to create an index that is virtually the same as the existing one? Thanks, -M |
#3
| |||
| |||
|
|
Hello Michel, The order of the columns in an index is important, since the sorting (and the lookup) within the tree structure of an index happens on this specific order. Let me try to explain this with an example, you'll quickly understand why: table person ( * * * * first_name varchar, * * * * last_name varchar, * * * * month int, * * * * year int ) Let's assume this query: select * from person where month = 1 and year = 1980 An index on (month, year) or (year, month) will perfectly address this query. Now, suppose that you have an index on (year, month, last_name), DB2 will still be able to navigate this index to the exact month-year combination that you're looking for (first year, then month). The index will be a bit bigger, so you'll have more IO, but this index is still very suitable for this query. An index on (year, first_name, month), however, will not be optimal. DB2 will be able to navigate to the correct year, but after that, he's stuck since all first_name values might be valid for you query. From this point on, he must read all child values (all first_name/month combinations from 1980) to find what he needs. Note that he is able to limit the scan to all records from this specific year, so there is still some advantage, but you'll understand that he easily must read 12 times as much records as he would've otherwise. In your case, OPERATION is the 'blocking' column. Db2advis put all the columns with a prerequisite at the front of the index declaration to quickly traverse the index tree to the exact records you need. The columns after that are used to further optimize IO. Since they are part of the index, DB2 can read the columns LOCALTIME, VALUE and OPERATION directly from the index, without having to do go to the table. I hope this all makes some sense to you. -- Frederik Engelen |
#4
| |||
| |||
|
|
Hello Michel, The order of the columns in an index is important, since the sorting (and the lookup) within the tree structure of an index happens on this specific order. Let me try to explain this with an example, you'll quickly understand why: table person ( * * * * first_name varchar, * * * * last_name varchar, * * * * month int, * * * * year int ) Let's assume this query: select * from person where month = 1 and year = 1980 An index on (month, year) or (year, month) will perfectly address this query. Now, suppose that you have an index on (year, month, last_name), DB2 will still be able to navigate this index to the exact month-year combination that you're looking for (first year, then month). The index will be a bit bigger, so you'll have more IO, but this index is still very suitable for this query. An index on (year, first_name, month), however, will not be optimal. DB2 will be able to navigate to the correct year, but after that, he's stuck since all first_name values might be valid for you query. From this point on, he must read all child values (all first_name/month combinations from 1980) to find what he needs. Note that he is able to limit the scan to all records from this specific year, so there is still some advantage, but you'll understand that he easily must read 12 times as much records as he would've otherwise. In your case, OPERATION is the 'blocking' column. Db2advis put all the columns with a prerequisite at the front of the index declaration to quickly traverse the index tree to the exact records you need. The columns after that are used to further optimize IO. Since they are part of the index, DB2 can read the columns LOCALTIME, VALUE and OPERATION directly from the index, without having to do go to the table. I hope this all makes some sense to you. -- Frederik Engelen Hi Frederik, Thanks for your detailed explanation. So, let me see if I got this right: According to the long running query, I have the following relevant predicates: where a.OBJECT='LOGIN_NAME' and a.LOGIN_EVENT_ANALYSED='N' and The field MACHINE_ID is used in the inner join with other tables. The index: CREATE INDEX ASSET * .NTRYIDX1 ON ASSET * .TBL_ASSET_NETWORK_HISTORY * * * * * * * * (MACHINE_ID ASC, * * * * * * * * *OBJECT ASC, * * * * * * * * *LOGIN_EVENT_ANALYSED ASC, * * * * * * * * *OPERATION ASC, * * * * * * * * *LOCALTIME ASC, * * * * * * * * *VALUE ASC) * * * * * * * * ALLOW REVERSE SCANS; Contains the first 3 columns that needs to be traversed to match the predicates. The fields Operation, LOCALTIME, VALUE are used into the select column list, and are not used as predicates. SO it makes sense to traverse and index with these 3 initial columns (MACHINE_ID, OBJECT, LOGIN_EVENT_ANALYSED). What db2advis suggested is to create and index with the same exact initial columns (the ones that will be used to traverse the index tree), but with the remaining columns (that will be used only as select column list for output) in a slightly different order. As far as I understand, the issue here is not to traverse the index tree itself. *Right ? Couldn't the optimizer use index NTRYIDX1, traverse MACHINE_ID, OBJECT and LOGIN_EVENT_ANALYSED, fetch the fields (OPERATION, LOCALTIME, VALUE) and then organize the selected columns in a different order ?? What am I missing here? |
#5
| |||
| |||
|
|
What am I missing here? |
![]() |
| Thread Tools | |
| Display Modes | |
| |