dbTalk Databases Forums  

MDC Question about Disk Space

comp.databases.ibm-db2 comp.databases.ibm-db2


Discuss MDC Question about Disk Space in the comp.databases.ibm-db2 forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Carlos
 
Posts: n/a

Default MDC Question about Disk Space - 08-14-2003 , 10:35 AM






Hello,

We are trying to use MDC tables in our DataWareHouse, but the tests
shows that MDC use a lot of space in a relation 1/10 or 1/100. (1 byte
of source data use 10 or 100 bytes in the tablespace not including the
space used on indexes).

We did this:

CREATE TABLE SALES(
TIME INTEGER NOT NULL,
STORE INTEGER NOT NULL,
PRODUCT INTEGER NOT NULL,
MEASURE1 BIGINT,
MEASURE2 BIGINT,
MEASURE3 BIGINT
) ORGANIZE BY DIMENSIONS(TIME,STORE,PRODUCT)
IN DWHSPACE1
INDEX IN INDEXSPACE1

ALTER TABLE LOCAL ADD PRIMARY KEY (TIME,STORE,PRODUCT);
CREATE INDEX I_SALES_TIME ON SALES
CREATE INDEX I_SALES_STORE ON SALES
CREATE INDEX I_SALES_PRODUCT ON SALES

For every dimension we have a separate index.

First we tryed a little file with 10 rows. With a normal table it
uses 4K (I am using 4K pages), but with MDC it uses 40K. Maybe this
is not a good example because the sample was not representative, etc,
etc.
So we tried to load a sample from a production source. We selected a
month with 5,382,851 rows. Normally it uses 200MB, so a 10GB
tablespace will be enough.
When we tried to load after a while, an error appears indicating that
there is no more space in the tablespace, in desperation we gave 20GB
to the tablespace, having the same results.

Later we did this change to the table, because of the high cardinality
of time:
.... ORGANIZE BY DIMENSIONS(STORE,PRODUCT)...
Obtaining the same results, finally this:
.... ORGANIZE BY DIMENSIONS(STORE)...

but I am writing to the news group.

¿What are we doing wrong?
¿Is there somebody with a successfull example to follow?


Greetings,

Carlos Farias

Reply With Quote
  #2  
Old   
Fred
 
Posts: n/a

Default Re: MDC Question about Disk Space - 08-14-2003 , 04:09 PM






Carlos,

MDC tables grab an entire extent of disk space for each unique
combination of all of the dimension values. Many people need to use a
smaller tablespace extent size with MDC tables in order to minimize
wasted space when the dimensions contain too much cardinality.

If your dimensions are STORE and PRODUCT, You can estimate the number
of extents required by multiplying COUNT(DISTINCT(PRODUCT)) *
COUNT(DISTINCT(STORE)). Unless you reduce the granularity of the time
column by round it down to the month, you should probably not use TIME
as a dimension, and just leave it in a regular Type-2 RID index.

If you've already made STORE and PRODUCT dimensions for MDC, you may
not be gaining much by building RID indexes on them as well.

Good Luck,

Fred

Reply With Quote
  #3  
Old   
Philip Nelson
 
Posts: n/a

Default Re: MDC Question about Disk Space - 08-15-2003 , 01:49 AM



On Thu, 14 Aug 2003 08:35:23 -0700, Carlos wrote:

Quote:
Hello,

We are trying to use MDC tables in our DataWareHouse, but the tests
shows that MDC use a lot of space in a relation 1/10 or 1/100. (1 byte
of source data use 10 or 100 bytes in the tablespace not including the
space used on indexes).

We did this:

CREATE TABLE SALES(
TIME INTEGER NOT NULL,
STORE INTEGER NOT NULL,
PRODUCT INTEGER NOT NULL,
MEASURE1 BIGINT,
MEASURE2 BIGINT,
MEASURE3 BIGINT
) ORGANIZE BY DIMENSIONS(TIME,STORE,PRODUCT)
IN DWHSPACE1
INDEX IN INDEXSPACE1

ALTER TABLE LOCAL ADD PRIMARY KEY (TIME,STORE,PRODUCT);
CREATE INDEX I_SALES_TIME ON SALES
CREATE INDEX I_SALES_STORE ON SALES
CREATE INDEX I_SALES_PRODUCT ON SALES

For every dimension we have a separate index.

First we tryed a little file with 10 rows. With a normal table it
uses 4K (I am using 4K pages), but with MDC it uses 40K. Maybe this
is not a good example because the sample was not representative, etc,
etc.
So we tried to load a sample from a production source. We selected a
month with 5,382,851 rows. Normally it uses 200MB, so a 10GB
tablespace will be enough.
When we tried to load after a while, an error appears indicating that
there is no more space in the tablespace, in desperation we gave 20GB
to the tablespace, having the same results.

Later we did this change to the table, because of the high cardinality
of time:
... ORGANIZE BY DIMENSIONS(STORE,PRODUCT)...
Obtaining the same results, finally this:
... ORGANIZE BY DIMENSIONS(STORE)...

but I am writing to the news group.

¿What are we doing wrong?
¿Is there somebody with a successfull example to follow?


Greetings,

Carlos Farias
Carlos,

DB2 will allocate an extent of storage for each intersection of your
dimensions. So using all of the keys in your primary key as dimensions
means you will allocate as many extents as you have rows in your table.

So you don't want to use columns with high cardinality (and time and
product are likely to be such).

If you want to include a time dimension (BTW, why are you not using a DB2
supplied TIME or TIMESTAMP for this) then I'd suggest using a derived
value (e.g. YEAR(TIME_COLUMN) + MONTH(TIME_COLUMN)) to group together
everything for a month (you could do similar things with weeks, but beware
the fact that the last few days of one year and the first few of the next
are given separate week numbers by the DB2 week function).

Phil


Reply With Quote
  #4  
Old   
Carlos
 
Posts: n/a

Default Re: MDC Question about Disk Space - 08-18-2003 , 10:45 PM



Fred, Philip thanks for your help.

I'm not using the time provided by DB2 because the advice from Ralph
Kimball:

"Most data warehouses need an explict time dimension even though the
primary time key may be an SQL date-valued object. The explicit time
is needed to describe fiscal periods, seasons, holidays, weekends, and
other calendar calculations that are difficult to get from the SQL
date machinery"

Thanks again,

Carlos

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.