dbTalk Databases Forums  

Table Partitioning the way to go?

comp.databases.ibm-db2 comp.databases.ibm-db2


Discuss Table Partitioning the way to go? in the comp.databases.ibm-db2 forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
shorti
 
Posts: n/a

Default Table Partitioning the way to go? - 07-02-2010 , 12:22 PM






I am working on a new database and am trying to determine how best to
set up a specific table (or set of tables). I did some research on
table level partitioning and was thinking this might be the way to go,
however, I am not entirely convinced and wanted to see if there was a
better solution. Here are the circumstances:

This will be set up on Linux using DB2 V9.7. There is a possibility
that HADR might be used. This particular table could contain 10's of
millions of records. The records could actually be split up into
groups to help split the table up and allow better performance for
searches. Each group could own up to 2 million records each. The
groups would not be time related...in other words, this is not a set
of daily or monthly data that would expire. Each group could start
out small and grow to the 2 million max. Each group could stay around
for an infinite amount of time (typical) or the entire group could be
removed or dropped (not so typical). Groups could also become
combined. A new group could be added at any time or removed at any
time. There will also be a high level of transactions to all the
groups so separating them may help performance. There could be a
large number of groups, however, I expect the typical number would be
around 24. It is expected that inserts will come in bulk to a group.
Also, there will be bulk updates to records within a group. Other
than potentially merging two groups, transactions to multiple groups
would not be common.

In saying all that, the objective is to be a high performance database
by reducing search times. Table partitioning seems to allow the
ability to separate common data into these groups for faster searches
and with the ability to separate bufferpools and indexing. The one
thing that bothers me is it seem table partitioning is used mainly for
time related data were the tables are separated by a 24 hour day and
possibly later merged to form a month then archived or removed
entirely. In my application, removing and merging partitions will not
be so common. Also, I am not sure how to tell DB2 where to store the
data (which partition) other than to say "Group1" or "Group2" since it
is not time related I cannot say put data in that group that is for
May 2010. Is this a problem...maybe I am just not seeing yet how to
set the groups up.

Are there any other suggestions?

Reply With Quote
  #2  
Old   
Serge Rielau
 
Posts: n/a

Default Re: Table Partitioning the way to go? - 07-03-2010 , 07:53 AM






On 7/2/2010 1:22 PM, shorti wrote:
Quote:
I am working on a new database and am trying to determine how best to
set up a specific table (or set of tables). I did some research on
table level partitioning and was thinking this might be the way to go,
however, I am not entirely convinced and wanted to see if there was a
better solution. Here are the circumstances:

This will be set up on Linux using DB2 V9.7. There is a possibility
that HADR might be used. This particular table could contain 10's of
millions of records. The records could actually be split up into
groups to help split the table up and allow better performance for
searches. Each group could own up to 2 million records each. The
groups would not be time related...in other words, this is not a set
of daily or monthly data that would expire. Each group could start
out small and grow to the 2 million max. Each group could stay around
for an infinite amount of time (typical) or the entire group could be
removed or dropped (not so typical). Groups could also become
combined. A new group could be added at any time or removed at any
time. There will also be a high level of transactions to all the
groups so separating them may help performance. There could be a
large number of groups, however, I expect the typical number would be
around 24. It is expected that inserts will come in bulk to a group.
Also, there will be bulk updates to records within a group. Other
than potentially merging two groups, transactions to multiple groups
would not be common.

In saying all that, the objective is to be a high performance database
by reducing search times. Table partitioning seems to allow the
ability to separate common data into these groups for faster searches
and with the ability to separate bufferpools and indexing. The one
thing that bothers me is it seem table partitioning is used mainly for
time related data were the tables are separated by a 24 hour day and
possibly later merged to form a month then archived or removed
entirely. In my application, removing and merging partitions will not
be so common. Also, I am not sure how to tell DB2 where to store the
data (which partition) other than to say "Group1" or "Group2" since it
is not time related I cannot say put data in that group that is for
May 2010. Is this a problem...maybe I am just not seeing yet how to
set the groups up.

Are there any other suggestions?

You may also look at multi-dimentional clustering (MDC).
In our case the dimension would likely be one.
Unliek range partitioning MDC is self maintaining.
That is a new "group" is added when the first row for it is added.
A group disappears when the last row disappears.
Delete's are sped up compared to regular deletes and they take less
logging (one log record per page).
You will get speed up similar to range partitioning for queries.

Teh downside of MDC compared to range partitioning is that you cannot
move a partitition (i.e. detach and archive) or attach a preloaded set
of data. Also indexes are global.

Cheers
Serge

--
Serge Rielau
SQL Architect DB2 for LUW
IBM Toronto Lab

Reply With Quote
  #3  
Old   
shorti
 
Posts: n/a

Default Re: Table Partitioning the way to go? - 07-06-2010 , 10:44 AM



On Jul 3, 5:53*am, Serge Rielau <srie... (AT) ca (DOT) ibm.com> wrote:
Quote:
On 7/2/2010 1:22 PM, shorti wrote:

I am working on a new database and am trying to determine how best to
set up a specific table (or set of tables). *I did some research on
table level partitioning and was thinking this might be the way to go,
however, I am not entirely convinced and wanted to see if there was a
better solution. *Here are the circumstances:

This will be set up on Linux using DB2 V9.7. *There is a possibility
that HADR might be used. *This particular table could contain 10's of
millions of records. *The records could actually be split up into
groups to help split the table up and allow better performance for
searches. *Each group could own up to 2 million records each. *The
groups would not be time related...in other words, this is not a set
of daily or monthly data that would expire. *Each group could start
out small and grow to the 2 million max. *Each group could stay around
for an infinite amount of time (typical) or the entire group could be
removed or dropped (not so typical). *Groups could also become
combined. *A new group could be added at any time or removed at any
time. *There will also be a high level of transactions to all the
groups so separating them may help performance. * There could be a
large number of groups, however, I expect the typical number would be
around 24. *It is expected that inserts will come in bulk to a group.
Also, there will be bulk updates to records within a group. *Other
than potentially merging two groups, transactions to multiple groups
would not be common.

In saying all that, the objective is to be a high performance database
by reducing search times. *Table partitioning seems to allow the
ability to separate common data into these groups for faster searches
and with the ability to separate bufferpools and indexing. *The one
thing that bothers me is it seem table partitioning is used mainly for
time related data were the tables are separated by a 24 hour day and
possibly later merged to form a month then archived or removed
entirely. *In my application, removing and merging partitions will not
be so common. *Also, I am not sure how to tell DB2 where to store the
data (which partition) other than to say "Group1" or "Group2" since it
is not time related I cannot say put data in that group that is for
May 2010. *Is this a problem...maybe I am just not seeing yet how to
set the groups up.

Are there any other suggestions?

You may also look at multi-dimentional clustering (MDC).
In our case the dimension would likely be one.
Unliek range partitioning MDC is self maintaining.
That is a new "group" is added when the first row for it is added.
A group disappears when the last row disappears.
Delete's are sped up compared to regular deletes and they take less
logging (one log record per page).
You will get speed up similar to range partitioning for queries.

Teh downside of MDC compared to range partitioning is that you cannot
move a partitition (i.e. detach and archive) or attach a preloaded set
of data. Also indexes are global.

Cheers
Serge

--
Serge Rielau
SQL Architect DB2 for LUW
IBM Toronto Lab
Thanks Serge, I am amazed you were able to find my post among all the
junk spam posts. I will do some research on MDC and see if its a
better fit!

Reply With Quote
  #4  
Old   
Naresh Chainani
 
Posts: n/a

Default Re: Table Partitioning the way to go? - 07-06-2010 , 11:52 AM



On Jul 2, 10:22*am, shorti <lbrya... (AT) juno (DOT) com> wrote:
Quote:
I am working on a new database and am trying to determine how best to
set up a specific table (or set of tables). *I did some research on
table level partitioning and was thinking this might be the way to go,
however, I am not entirely convinced and wanted to see if there was a
better solution. *Here are the circumstances:

This will be set up on Linux using DB2 V9.7. *There is a possibility
that HADR might be used. *This particular table could contain 10's of
millions of records. *The records could actually be split up into
groups to help split the table up and allow better performance for
searches. *Each group could own up to 2 million records each. *The
groups would not be time related...in other words, this is not a set
of daily or monthly data that would expire. *Each group could start
out small and grow to the 2 million max. *Each group could stay around
for an infinite amount of time (typical) or the entire group could be
removed or dropped (not so typical). *Groups could also become
combined. *A new group could be added at any time or removed at any
time. *There will also be a high level of transactions to all the
groups so separating them may help performance. * There could be a
large number of groups, however, I expect the typical number would be
around 24. *It is expected that inserts will come in bulk to a group.
Also, there will be bulk updates to records within a group. *Other
than potentially merging two groups, transactions to multiple groups
would not be common.

In saying all that, the objective is to be a high performance database
by reducing search times. *Table partitioning seems to allow the
ability to separate common data into these groups for faster searches
and with the ability to separate bufferpools and indexing. *The one
thing that bothers me is it seem table partitioning is used mainly for
time related data were the tables are separated by a 24 hour day and
possibly later merged to form a month then archived or removed
entirely. *In my application, removing and merging partitions will not
be so common. *Also, I am not sure how to tell DB2 where to store the
data (which partition) other than to say "Group1" or "Group2" since it
is not time related I cannot say put data in that group that is for
May 2010. *Is this a problem...maybe I am just not seeing yet how to
set the groups up.

Are there any other suggestions?
Few thoughts:

While table partitioning is commonly used for time-based partitioning,
we have some customers using it to partition various groups (prodID,
transactionID) into different table spaces. In your case, the likely
partitioning key will be "groupid" and at table creation time you will
specify each of the 24 partitions (aka groups) along with data and
index table spaces for each partition. Having a notion of time for
each partition is not required.

Do you need the flexibility of choosing data and/or index table space
for each group? For instance, may be some groups are more important
than others and need to be backed up more frequently. Table
partitioning provides you this flexibility.

MDC accommodates creation of new groups and deletion of old groups
automatically while with table partitioning required adding a new
table partition (ALTER TABLE ADD PARTITION) or detaching an existing
partition that is no longer required using ALTER TABLE DETACH
PARTITION.

Combining groups would be trickier with either option unless you
actually updating the "groupid" to achieve the same.

Finally, either option should provide decent search performance by
elimination unnecessary groups. With table partitioning, partition
elimination will come into play where the optimizer looks at query
predicates to determine what partitions need to be accessed.

Naresh

Reply With Quote
  #5  
Old   
shorti
 
Posts: n/a

Default Re: Table Partitioning the way to go? - 07-06-2010 , 04:00 PM



On Jul 6, 9:52*am, Naresh Chainani <fornar... (AT) gmail (DOT) com> wrote:
Quote:
Few thoughts:

While table partitioning is commonly used for time-based partitioning,
we have some customers using it to partition various groups (prodID,
transactionID) into different table spaces. In your case, the likely
partitioning key will be "groupid" and at table creation time you will
specify each of the 24 partitions (aka groups) along with data and
index table spaces for each partition. Having a notion of time for
each partition is not required.

Do you need the flexibility of choosing data and/or index table space
for each group? For instance, may be some groups are more important
than others and need to be backed up more frequently. Table
partitioning provides you this flexibility.

MDC accommodates creation of new groups and deletion of old groups
automatically while with table partitioning required adding a new
table partition (ALTER TABLE ADD PARTITION) or detaching an existing
partition that is no longer required using ALTER TABLE DETACH
PARTITION.

Combining groups would be trickier with either option unless you
actually updating the "groupid" to achieve the same.

Finally, either option should provide decent search performance by
elimination unnecessary groups. With table partitioning, partition
elimination will come into play where the optimizer looks at query
predicates to determine what partitions need to be accessed.

Naresh
Naresh,

Thanks for the information. It answers a few of my concerns regarding
the use of the table
partitions. Its nice to know I have the choice to backup one
partition more frequently.

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.