dbTalk Databases Forums  

Load, reorg, runstats

comp.databases.ibm-db2 comp.databases.ibm-db2


Discuss Load, reorg, runstats in the comp.databases.ibm-db2 forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Troels Arvin
 
Posts: n/a

Default Load, reorg, runstats - 01-18-2011 , 09:23 AM






Hello,

Let's say I have a large table which needs to be reloaded from scratch
with (lots of) new data. The table has a clustering index and a few
ordinary indexes. The table needs to be put in an optimal state for
subsequent work (which includes minor updates once in a while).

What's the proper way to go?

Sequence a:
0. (Leave the indexes: Don't drop indexes before load.)
1. Load data (in replace mode).
3. RUNSTATS ON TABLE ... AND INDEXES ALL to speed up
the subsequent table re-organization?
4. Reorg table to make the clustering index have its
influence. (Or will LOAD be able to cluster data
while loading, knowing that there is clustering index?)
5. RUNSTATS ON TABLE ... AND INDEXES ALL again to let
the database know about the new situation.
6. Reorg indexes.
7. RUNSTATS ON TABLE ... AND INDEXES ALL again to let
the database know about the new situation.

Regarding sequence a: Maybe some RUNSTATS steps aren't needed?


Sequence b (like a, but drop indexes before load):
0. Drop indexes.
1. Load data (in replace mode).
2. Create indexes.
3-7 as above.

Regarding sequence b:
- According to the documentation, by default, indexing mode is
AUTOSELECT. And given that replace mode is being used,
AUTOSELECT should be clever enough to choose complete
index rebuilds, right?
- One would think that index creation would provide a lot of
statistics information to the database; can the database
use this knowledge, such that the runstats in step 3 is
implied?

Or is there a completely other sequence that I'm overlooking?

--
Troels

Reply With Quote
  #2  
Old   
Ian
 
Posts: n/a

Default Re: Load, reorg, runstats - 01-18-2011 , 11:49 AM






On Jan 18, 8:23*am, Troels Arvin <tro... (AT) arvin (DOT) dk> wrote:
Quote:
Hello,

Let's say I have a large table which needs to be reloaded from scratch
with (lots of) new data. The table has a clustering index and a few
ordinary indexes. The table needs to be put in an optimal state for
subsequent work (which includes minor updates once in a while).

What's the proper way to go?

Sequence a:
0. (Leave the indexes: Don't drop indexes before load.)
1. Load data (in replace mode).
3. RUNSTATS ON TABLE ... AND INDEXES ALL to speed up
the subsequent table re-organization?
4. Reorg table to make the clustering index have its
influence. (Or will LOAD be able to cluster data
while loading, knowing that there is clustering index?)
5. RUNSTATS ON TABLE ... AND INDEXES ALL again to let
the database know about the new situation.
6. Reorg indexes.
7. RUNSTATS ON TABLE ... AND INDEXES ALL again to let
the database know about the new situation.

Regarding sequence a: Maybe some RUNSTATS steps aren't needed?

Sequence b (like a, but drop indexes before load):
0. Drop indexes.
1. Load data (in replace mode).
2. Create indexes.
3-7 as above.

First of all, you don't need to do step 6. If you do a classic
(offline) reorg, it will automatically rebuild all indexes for you.

Second, you only need to do RUNSTATS once, at the end of the process.

LOAD will not sort data for you if you are writing to a normal
(non-MDC) table, even if it has a clustering index.

However, if you can sort your input data so that it's already in the
correct order, you can instruct LOAD not to mess up the ordering, so
that your data will be clustered upon completion. This would allow
you to avoid the (presumably expensive) REORG.

So, if it's practical to get your new data in proper clustered order
before you re-load the table in your database, then you can completely
avoid the REORG.


There is a debate about whether it's better to drop the indexes and
then load the table or keep the indexes in place (and let LOAD rebuild
them). I think that this really depends on your specific environment:
how big is the table (both number of rows and row width), and how many
indexes are there (and how wide is each index key)?

When you build an index on a table, DB2 must scan the table to find
all of the unique key values. DB2 then sorts the keys and writes the
index. So, if you build 5 indexes, DB2 will scan the table 5 times.

If indexes exist on the table when you do a LOAD REPLACE, DB2 will
build temporary tables for each index *in parallel* during the load
phase. When the load phase completes, DB2 uses these temp tables to
build the indexes, one at a time. This eliminates the repetitive
table scans, but there is certainly a cost to the LOAD that is
associated with building these temp tables as data is loaded.

If the table is relatively small (i.e. narrow rows & lots of rows per
page) then building the indexes after the load could be faster. But
if the rows are very wide, the table scans could be pretty expensive.

You should see which option (dropping indexes or not) works best in
your environment.






To answer your other questions:


Quote:
Regarding sequence b:
- According to the documentation, by default, indexing mode is
AUTOSELECT. And given that replace mode is being used,
AUTOSELECT should be clever enough to choose complete
index rebuilds, right?
The indexing mode only applies if you're doing a LOAD INSERT.
LOAD REPLACE necessitates a full rebuild.

Quote:
- One would think that index creation would provide a lot of
statistics information to the database; can the database
use this knowledge, such that the runstats in step 3 is
implied?
This is true; and when you create an index you can specify COLLECT
[DETAILED] STATISTICS. The point of this is to allow you to avoid
doing a full RUNSTATS on the table and all indexes if you are just
creating a new index.




Bottom line:

If you can sort your input data and keeping indexes in place works
well, then you can do the entire operation with a single LOAD command
(LOAD can collect statistics).

Otherwise do the minimum amount of work:

1) Drop indexes (if necessary)
2) Load
3) Recreate indexes (if necessary)
4) Reorg (if necessary)
5) Runstats (if necessary)



Good luck,

Reply With Quote
  #3  
Old   
Troels Arvin
 
Posts: n/a

Default Re: Load, reorg, runstats - 01-18-2011 , 12:29 PM



Hello,

Ian wrote:
Quote:
First of all, you don't need to do step 6. If you do a classic
(offline) reorg, it will automatically rebuild all indexes for you.
That's new to me. By "offline" you mean stating "USE <some tmpspace>",
right?


Quote:
Second, you only need to do RUNSTATS once, at the end of the process.
But will REORG not run faster if there are statistics available to help
the re-organization?


Quote:
If indexes exist on the table when you do a LOAD REPLACE, DB2 will build
temporary tables for each index *in parallel* during the load phase.
Does this mean that the table will be copied in full n times (n being the
number of indexes), i.e. even including columns which are not part of the
index?


Quote:
- One would think that index creation would provide a lot of
statistics information to the database; can the database use this
knowledge, such that the runstats in step 3 is implied?

This is true; and when you create an index you can specify COLLECT
[DETAILED] STATISTICS.
Ah - great.

Quote:
Bottom line:
[...]

Thanks for your input!

--
Troels

Reply With Quote
  #4  
Old   
Mark A
 
Posts: n/a

Default Re: Load, reorg, runstats - 01-18-2011 , 01:07 PM



"Troels Arvin" <troels (AT) arvin (DOT) dk> wrote

Quote:
That's new to me. By "offline" you mean stating "USE <some tmpspace>",
right?
No. he means offline (classic) instead of "inplace." See the Command Refence
Manual.

Quote:
But will REORG not run faster if there are statistics available to help
the re-organization?
No, it will not run faster with statistics.

Quote:
Does this mean that the table will be copied in full n times (n being the
number of indexes), i.e. even including columns which are not part of the
index?
No, only the index columns will be "copied," one for each index.

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.