dbTalk Databases Forums  

Architecture

microsoft.public.sqlserver.dts microsoft.public.sqlserver.dts


Discuss Architecture in the microsoft.public.sqlserver.dts forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Michael Vardinghus
 
Posts: n/a

Default Architecture - 05-14-2005 , 04:25 PM






Getting two tables from one base with 4 mio and 11 mio records.

The 11 mio records are dimension specification on the 4 mio records so I
need to put these 11 mio records alongsid the 4 mio in my fact table.

-- Example table 1
create view trans1 as
select 100 as beløb, 1 as linje
union all
select 200 as beløb, 2 as linje

-- Example table 2
create view trans2 as
select 1 as linje, 'a' as dim1, 'dim1' as type
union all
select 1 as linje, 'b' as dim2, 'dim2' as type
union all
select 2 as linje, 'a' as dim1, 'dim1' as typ
union all
select 2 as linje, 'a' as dim2, 'dim2' as type

-- Example of combining
select a.*, (select dim1 from trans2 b where
b.linje = a.linje and b.type = 'dim1') as dim1,
(select dim1 from trans2 c where c.linje = a.linje and
c.type = 'dim2') as dim2
from
trans1 a

My question is how to go about this with the best performance - here's one
shot:

1) Pulling the 2 tables into my relational warehouse base
2) An index combines the 2 tables and this index fills up my fact table

My concern is what to do about indexing ? Should I have indexes when filling
up from base 1 ? Or should the be removed when doing so ? Do I need to index
the two tables and the view ?

Perhaps I don't need a view - perhaps a could just use the sentence above
and let index optimizer give me a hint to what to make indexes for ?

Thanx in advance

/Michael



Reply With Quote
  #2  
Old   
Allan Mitchell
 
Posts: n/a

Default Re: Architecture - 05-15-2005 , 01:46 AM






Michael,

Let me see if I have the question correct. You have 2 source tables(11
million dimension table rows and 4 million fact table rows?). If that
is the case then what dimension has 11 million rows?

What is the source? Some sources need handling differently
Are the two tables in a database on a different server?
Is this a one off feed?

OR

Do you want to merge the two tables together on the destination from
your two sources?

That might become more complicated if your keys clash. Whilst you will
be generating new primary key values (Surrogate Key) you will need a
point of reference when loading the fact table so will need the old key
to refer to.

You could use partitioned views so you would split this 15 million row
table based on an attribute say Date. The view unions the partitions
back together again and the index optimizer is smart enough to know from
where the rows that you want will come.

mk:@MSITStore:C:\Program%20Files\Microsoft%20SQL%2 0Server\80\Tools\Books\createdb.chm::/cm_8_des_06_17zr.htm




The "combining" idea presented below will hurt you as not only do you
have a lot of rows to play with anyway but you do 2 extra queries for
every row as well.

The index optimizer will use whatever indexes it chooses unless you
override her with an index hint.

Let me know if I have grabbed the wrong meaning from the post and we can
revisit my comments.

Allan


"Michael Vardinghus" <michaelvardinghus (AT) hotmail (DOT) com> wrote


Quote:
Getting two tables from one base with 4 mio and 11 mio records.

The 11 mio records are dimension specification on the 4 mio records so I
need to put these 11 mio records alongsid the 4 mio in my fact table.

-- Example table 1
create view trans1 as
select 100 as beløb, 1 as linje
union all
select 200 as beløb, 2 as linje

-- Example table 2
create view trans2 as
select 1 as linje, 'a' as dim1, 'dim1' as type
union all
select 1 as linje, 'b' as dim2, 'dim2' as type
union all
select 2 as linje, 'a' as dim1, 'dim1' as typ
union all
select 2 as linje, 'a' as dim2, 'dim2' as type

-- Example of combining
select a.*, (select dim1 from trans2 b where
b.linje = a.linje and b.type = 'dim1') as dim1,
(select dim1 from trans2 c where c.linje = a.linje and
c.type = 'dim2') as dim2
from
trans1 a

My question is how to go about this with the best performance - here's one
shot:

1) Pulling the 2 tables into my relational warehouse base
2) An index combines the 2 tables and this index fills up my fact table

My concern is what to do about indexing ? Should I have indexes when filling
up from base 1 ? Or should the be removed when doing so ? Do I need to index
the two tables and the view ?

Perhaps I don't need a view - perhaps a could just use the sentence above
and let index optimizer give me a hint to what to make indexes for ?

Thanx in advance

/Michael


Reply With Quote
  #3  
Old   
Michael Vardinghus
 
Posts: n/a

Default Re: Architecture - 05-15-2005 , 03:42 AM



Its not dimension tables - it's a fact table with the transactions in table
1 and in table 2 there is the dimension specifications for each transaction.
The reason for the larger number in table 2 is that
there is used multiple dimensions and for each dimension a new line is made
but it is referring back to table 1 with only 1 field.

So what I would like to do was to take table a which for instance has 7
columns and then for each dimension TYPE in table 2 I want to add a new
column and in these columns I would like
to write the corresponding dimension value from table 2.

The tables are placed on a different server - and I would like to avoid to
do something on this server - would prefer to do it on my "washing"
machine...

About the surrogate keys that is a different matter - I will be doing that
later - after these two tables are merged. My problem is that I don't have
any dimension values in the first table yet - I need to put these on and
then afterwards replacing these with surrogate keys.

Perhaps I could do something with partitions on table 2 based on dimension
type ? Would that make the sentence below better ?

"Allan Mitchell" <allan (AT) no-spam (DOT) sqldts.com> wrote

Quote:
Michael,

Let me see if I have the question correct. You have 2 source tables(11
million dimension table rows and 4 million fact table rows?). If that
is the case then what dimension has 11 million rows?

What is the source? Some sources need handling differently
Are the two tables in a database on a different server?
Is this a one off feed?

OR

Do you want to merge the two tables together on the destination from
your two sources?

That might become more complicated if your keys clash. Whilst you will
be generating new primary key values (Surrogate Key) you will need a
point of reference when loading the fact table so will need the old key
to refer to.

You could use partitioned views so you would split this 15 million row
table based on an attribute say Date. The view unions the partitions
back together again and the index optimizer is smart enough to know from
where the rows that you want will come.


mk:@MSITStore:C:\Program%20Files\Microsoft%20SQL%2 0Server\80\Tools\Books\cre
atedb.chm::/cm_8_des_06_17zr.htm
Quote:



The "combining" idea presented below will hurt you as not only do you
have a lot of rows to play with anyway but you do 2 extra queries for
every row as well.

The index optimizer will use whatever indexes it chooses unless you
override her with an index hint.

Let me know if I have grabbed the wrong meaning from the post and we can
revisit my comments.

Allan


"Michael Vardinghus" <michaelvardinghus (AT) hotmail (DOT) com> wrote in message
news:michaelvardinghus (AT) hotmail (DOT) com:

Getting two tables from one base with 4 mio and 11 mio records.

The 11 mio records are dimension specification on the 4 mio records so I
need to put these 11 mio records alongsid the 4 mio in my fact table.

-- Example table 1
create view trans1 as
select 100 as beløb, 1 as linje
union all
select 200 as beløb, 2 as linje

-- Example table 2
create view trans2 as
select 1 as linje, 'a' as dim1, 'dim1' as type
union all
select 1 as linje, 'b' as dim2, 'dim2' as type
union all
select 2 as linje, 'a' as dim1, 'dim1' as typ
union all
select 2 as linje, 'a' as dim2, 'dim2' as type

-- Example of combining
select a.*, (select dim1 from trans2 b where
b.linje = a.linje and b.type = 'dim1') as dim1,
(select dim1 from trans2 c where c.linje = a.linje and
c.type = 'dim2') as dim2
from
trans1 a

My question is how to go about this with the best performance - here's
one
shot:

1) Pulling the 2 tables into my relational warehouse base
2) An index combines the 2 tables and this index fills up my fact table

My concern is what to do about indexing ? Should I have indexes when
filling
up from base 1 ? Or should the be removed when doing so ? Do I need to
index
the two tables and the view ?

Perhaps I don't need a view - perhaps a could just use the sentence
above
and let index optimizer give me a hint to what to make indexes for ?

Thanx in advance

/Michael




Reply With Quote
  #4  
Old   
Michael Vardinghus
 
Posts: n/a

Default Re: Architecture - 05-15-2005 , 04:04 AM



Real data example

Table 1

Entry no Posting date Description
Amount

1 1999-10-06 00:00:00.000 Rammeaft. udb. a/c
-8895277.00000000000000000000
2 1999-10-06 00:00:00.000 500100496 Ulandsekr.
8895277.00000000000000000000
3 1999-10-06 00:00:00.000 1. rate
-1610400.00000000000000000000


Table 2

Table ID Entry no Dimension Code Dimension value code
Amount Posting date

17 1 DELREGNSKAB
-8895277.00000000000000000000 1999-10-06 00:00:00.000
17 2 DELREGNSKAB 700
8895277.00000000000000000000 1999-10-06 00:00:00.000
17 2 STED 6091
8895277.00000000000000000000 1999-10-06 00:00:00.000
17 3 DELREGNSKAB
-1610400.00000000000000000000 1999-10-06 00:00:00.000

I will alwas only need table id 17 end Entry no joins them ...

So this is what i want

New table

Entry no Posting date Description
Amount Delregnskab
Sted

1 1999-10-06 00:00:00.000 Rammeaft. udb. a/c
-8895277.00000000000000000000 700
null
2 1999-10-06 00:00:00.000 500100496 Ulandsekr.
8895277.00000000000000000000 700
6091
3 1999-10-06 00:00:00.000 1. rate
-1610400.00000000000000000000 700
null

If possible i would like to eliminate nulls at once but that could be washed
afterwards

Besides delregnskab and sted there is 3-4 other dimension that should be
handled the same way.

"Allan Mitchell" <allan (AT) no-spam (DOT) sqldts.com> wrote

Quote:
Michael,

Let me see if I have the question correct. You have 2 source tables(11
million dimension table rows and 4 million fact table rows?). If that
is the case then what dimension has 11 million rows?

What is the source? Some sources need handling differently
Are the two tables in a database on a different server?
Is this a one off feed?

OR

Do you want to merge the two tables together on the destination from
your two sources?

That might become more complicated if your keys clash. Whilst you will
be generating new primary key values (Surrogate Key) you will need a
point of reference when loading the fact table so will need the old key
to refer to.

You could use partitioned views so you would split this 15 million row
table based on an attribute say Date. The view unions the partitions
back together again and the index optimizer is smart enough to know from
where the rows that you want will come.


mk:@MSITStore:C:\Program%20Files\Microsoft%20SQL%2 0Server\80\Tools\Books\cre
atedb.chm::/cm_8_des_06_17zr.htm
Quote:



The "combining" idea presented below will hurt you as not only do you
have a lot of rows to play with anyway but you do 2 extra queries for
every row as well.

The index optimizer will use whatever indexes it chooses unless you
override her with an index hint.

Let me know if I have grabbed the wrong meaning from the post and we can
revisit my comments.

Allan


"Michael Vardinghus" <michaelvardinghus (AT) hotmail (DOT) com> wrote in message
news:michaelvardinghus (AT) hotmail (DOT) com:

Getting two tables from one base with 4 mio and 11 mio records.

The 11 mio records are dimension specification on the 4 mio records so I
need to put these 11 mio records alongsid the 4 mio in my fact table.

-- Example table 1
create view trans1 as
select 100 as beløb, 1 as linje
union all
select 200 as beløb, 2 as linje

-- Example table 2
create view trans2 as
select 1 as linje, 'a' as dim1, 'dim1' as type
union all
select 1 as linje, 'b' as dim2, 'dim2' as type
union all
select 2 as linje, 'a' as dim1, 'dim1' as typ
union all
select 2 as linje, 'a' as dim2, 'dim2' as type

-- Example of combining
select a.*, (select dim1 from trans2 b where
b.linje = a.linje and b.type = 'dim1') as dim1,
(select dim1 from trans2 c where c.linje = a.linje and
c.type = 'dim2') as dim2
from
trans1 a

My question is how to go about this with the best performance - here's
one
shot:

1) Pulling the 2 tables into my relational warehouse base
2) An index combines the 2 tables and this index fills up my fact table

My concern is what to do about indexing ? Should I have indexes when
filling
up from base 1 ? Or should the be removed when doing so ? Do I need to
index
the two tables and the view ?

Perhaps I don't need a view - perhaps a could just use the sentence
above
and let index optimizer give me a hint to what to make indexes for ?

Thanx in advance

/Michael




Reply With Quote
  #5  
Old   
Michael Vardinghus
 
Posts: n/a

Default Re: Architecture - 05-15-2005 , 04:09 AM



That looked okay when typing it ... but not anymore...
"Michael Vardinghus" <michaelvardinghus (AT) hotmail (DOT) com> wrote

Quote:
Real data example

Table 1

Entry no Posting date Description
Amount

1 1999-10-06 00:00:00.000 Rammeaft. udb. a/c
-8895277.00000000000000000000
2 1999-10-06 00:00:00.000 500100496 Ulandsekr.
8895277.00000000000000000000
3 1999-10-06 00:00:00.000 1. rate
-1610400.00000000000000000000


Table 2

Table ID Entry no Dimension Code Dimension value code
Amount Posting date

17 1 DELREGNSKAB
-8895277.00000000000000000000 1999-10-06 00:00:00.000
17 2 DELREGNSKAB 700
8895277.00000000000000000000 1999-10-06 00:00:00.000
17 2 STED 6091
8895277.00000000000000000000 1999-10-06 00:00:00.000
17 3 DELREGNSKAB
-1610400.00000000000000000000 1999-10-06 00:00:00.000

I will alwas only need table id 17 end Entry no joins them ...

So this is what i want

New table

Entry no Posting date Description
Amount Delregnskab
Sted

1 1999-10-06 00:00:00.000 Rammeaft. udb. a/c
-8895277.00000000000000000000 700
null
2 1999-10-06 00:00:00.000 500100496 Ulandsekr.
8895277.00000000000000000000 700
6091
3 1999-10-06 00:00:00.000 1. rate
-1610400.00000000000000000000 700
null

If possible i would like to eliminate nulls at once but that could be
washed
afterwards

Besides delregnskab and sted there is 3-4 other dimension that should be
handled the same way.

"Allan Mitchell" <allan (AT) no-spam (DOT) sqldts.com> wrote in message
news:#Dpm4lRWFHA.616 (AT) TK2MSFTNGP12 (DOT) phx.gbl...
Michael,

Let me see if I have the question correct. You have 2 source tables(11
million dimension table rows and 4 million fact table rows?). If that
is the case then what dimension has 11 million rows?

What is the source? Some sources need handling differently
Are the two tables in a database on a different server?
Is this a one off feed?

OR

Do you want to merge the two tables together on the destination from
your two sources?

That might become more complicated if your keys clash. Whilst you will
be generating new primary key values (Surrogate Key) you will need a
point of reference when loading the fact table so will need the old key
to refer to.

You could use partitioned views so you would split this 15 million row
table based on an attribute say Date. The view unions the partitions
back together again and the index optimizer is smart enough to know from
where the rows that you want will come.



mk:@MSITStore:C:\Program%20Files\Microsoft%20SQL%2 0Server\80\Tools\Books\cre
atedb.chm::/cm_8_des_06_17zr.htm




The "combining" idea presented below will hurt you as not only do you
have a lot of rows to play with anyway but you do 2 extra queries for
every row as well.

The index optimizer will use whatever indexes it chooses unless you
override her with an index hint.

Let me know if I have grabbed the wrong meaning from the post and we can
revisit my comments.

Allan


"Michael Vardinghus" <michaelvardinghus (AT) hotmail (DOT) com> wrote in message
news:michaelvardinghus (AT) hotmail (DOT) com:

Getting two tables from one base with 4 mio and 11 mio records.

The 11 mio records are dimension specification on the 4 mio records so
I
need to put these 11 mio records alongsid the 4 mio in my fact table.

-- Example table 1
create view trans1 as
select 100 as beløb, 1 as linje
union all
select 200 as beløb, 2 as linje

-- Example table 2
create view trans2 as
select 1 as linje, 'a' as dim1, 'dim1' as type
union all
select 1 as linje, 'b' as dim2, 'dim2' as type
union all
select 2 as linje, 'a' as dim1, 'dim1' as typ
union all
select 2 as linje, 'a' as dim2, 'dim2' as type

-- Example of combining
select a.*, (select dim1 from trans2 b where
b.linje = a.linje and b.type = 'dim1') as dim1,
(select dim1 from trans2 c where c.linje = a.linje and
c.type = 'dim2') as dim2
from
trans1 a

My question is how to go about this with the best performance - here's
one
shot:

1) Pulling the 2 tables into my relational warehouse base
2) An index combines the 2 tables and this index fills up my fact
table

My concern is what to do about indexing ? Should I have indexes when
filling
up from base 1 ? Or should the be removed when doing so ? Do I need to
index
the two tables and the view ?

Perhaps I don't need a view - perhaps a could just use the sentence
above
and let index optimizer give me a hint to what to make indexes for ?

Thanx in advance

/Michael






Reply With Quote
  #6  
Old   
Allan Mitchell
 
Posts: n/a

Default Re: Architecture - 05-15-2005 , 04:20 AM



Can you post the definitions of the two tables and show how they relate.
Can you also post the definition of the fact table and how this
relates to those tables.

From what I can understand then is that there is a fact table file (4
million )and a dimensions table file (11 million)

The dimensions table file is large because you have multiple rows in
there for each fact table row telling you the dimension key value for
that fact table row.

You must then in the dimensions table file identify the dimension itself
and the value for that dimension

Strange design but without the structure I would have to guess at say
you have a dimensions table file that looks like this

FactRowID, DimensionType, DimensionValue

And your fact table file might look like

FactRowID, val1, Val2....

Your end result is required to be

FactRowID,DimensionName1, DimensionName2.........


You basically need to flatten the dimensions table file.


Am I any nearer?


"Michael Vardinghus" <michaelvardinghus (AT) hotmail (DOT) com> wrote


Quote:
Its not dimension tables - it's a fact table with the transactions in table
1 and in table 2 there is the dimension specifications for each transaction.
The reason for the larger number in table 2 is that
there is used multiple dimensions and for each dimension a new line is made
but it is referring back to table 1 with only 1 field.

So what I would like to do was to take table a which for instance has 7
columns and then for each dimension TYPE in table 2 I want to add a new
column and in these columns I would like
to write the corresponding dimension value from table 2.

The tables are placed on a different server - and I would like to avoid to
do something on this server - would prefer to do it on my "washing"
machine...

About the surrogate keys that is a different matter - I will be doing that
later - after these two tables are merged. My problem is that I don't have
any dimension values in the first table yet - I need to put these on and
then afterwards replacing these with surrogate keys.

Perhaps I could do something with partitions on table 2 based on dimension
type ? Would that make the sentence below better ?

"Allan Mitchell" <allan (AT) no-spam (DOT) sqldts.com> wrote in message
news:#Dpm4lRWFHA.616 (AT) TK2MSFTNGP12 (DOT) phx.gbl...
Michael,

Let me see if I have the question correct. You have 2 source tables(11
million dimension table rows and 4 million fact table rows?). If that
is the case then what dimension has 11 million rows?

What is the source? Some sources need handling differently
Are the two tables in a database on a different server?
Is this a one off feed?

OR

Do you want to merge the two tables together on the destination from
your two sources?

That might become more complicated if your keys clash. Whilst you will
be generating new primary key values (Surrogate Key) you will need a
point of reference when loading the fact table so will need the old key
to refer to.

You could use partitioned views so you would split this 15 million row
table based on an attribute say Date. The view unions the partitions
back together again and the index optimizer is smart enough to know from
where the rows that you want will come.


mk:@MSITStore:C:\Program%20Files\Microsoft%20SQL%2 0Server\80\Tools\Books\cre
atedb.chm::/cm_8_des_06_17zr.htm




The "combining" idea presented below will hurt you as not only do you
have a lot of rows to play with anyway but you do 2 extra queries for
every row as well.

The index optimizer will use whatever indexes it chooses unless you
override her with an index hint.

Let me know if I have grabbed the wrong meaning from the post and we can
revisit my comments.

Allan


"Michael Vardinghus" <michaelvardinghus (AT) hotmail (DOT) com> wrote in message
news:michaelvardinghus (AT) hotmail (DOT) com:

Getting two tables from one base with 4 mio and 11 mio records.

The 11 mio records are dimension specification on the 4 mio records so I
need to put these 11 mio records alongsid the 4 mio in my fact table.

-- Example table 1
create view trans1 as
select 100 as beløb, 1 as linje
union all
select 200 as beløb, 2 as linje

-- Example table 2
create view trans2 as
select 1 as linje, 'a' as dim1, 'dim1' as type
union all
select 1 as linje, 'b' as dim2, 'dim2' as type
union all
select 2 as linje, 'a' as dim1, 'dim1' as typ
union all
select 2 as linje, 'a' as dim2, 'dim2' as type

-- Example of combining
select a.*, (select dim1 from trans2 b where
b.linje = a.linje and b.type = 'dim1') as dim1,
(select dim1 from trans2 c where c.linje = a.linje and
c.type = 'dim2') as dim2
from
trans1 a

My question is how to go about this with the best performance - here's
one
shot:

1) Pulling the 2 tables into my relational warehouse base
2) An index combines the 2 tables and this index fills up my fact table

My concern is what to do about indexing ? Should I have indexes when
filling
up from base 1 ? Or should the be removed when doing so ? Do I need to
index
the two tables and the view ?

Perhaps I don't need a view - perhaps a could just use the sentence
above
and let index optimizer give me a hint to what to make indexes for ?

Thanx in advance

/Michael



Reply With Quote
  #7  
Old   
Michael Vardinghus
 
Posts: n/a

Default Re: Architecture - 05-15-2005 , 05:26 AM



You hid it right on the knotch

Yeah - it is strange isn't it ? Table will contatin a tremendous amount of
records ....

See my example in another post


"Allan Mitchell" <allan (AT) no-spam (DOT) sqldts.com> wrote

Quote:
Can you post the definitions of the two tables and show how they relate.
Can you also post the definition of the fact table and how this
relates to those tables.

From what I can understand then is that there is a fact table file (4
million )and a dimensions table file (11 million)

The dimensions table file is large because you have multiple rows in
there for each fact table row telling you the dimension key value for
that fact table row.

You must then in the dimensions table file identify the dimension itself
and the value for that dimension

Strange design but without the structure I would have to guess at say
you have a dimensions table file that looks like this

FactRowID, DimensionType, DimensionValue

And your fact table file might look like

FactRowID, val1, Val2....

Your end result is required to be

FactRowID,DimensionName1, DimensionName2.........


You basically need to flatten the dimensions table file.


Am I any nearer?


"Michael Vardinghus" <michaelvardinghus (AT) hotmail (DOT) com> wrote in message
news:michaelvardinghus (AT) hotmail (DOT) com:

Its not dimension tables - it's a fact table with the transactions in
table
1 and in table 2 there is the dimension specifications for each
transaction.
The reason for the larger number in table 2 is that
there is used multiple dimensions and for each dimension a new line is
made
but it is referring back to table 1 with only 1 field.

So what I would like to do was to take table a which for instance has 7
columns and then for each dimension TYPE in table 2 I want to add a new
column and in these columns I would like
to write the corresponding dimension value from table 2.

The tables are placed on a different server - and I would like to avoid
to
do something on this server - would prefer to do it on my "washing"
machine...

About the surrogate keys that is a different matter - I will be doing
that
later - after these two tables are merged. My problem is that I don't
have
any dimension values in the first table yet - I need to put these on and
then afterwards replacing these with surrogate keys.

Perhaps I could do something with partitions on table 2 based on
dimension
type ? Would that make the sentence below better ?

"Allan Mitchell" <allan (AT) no-spam (DOT) sqldts.com> wrote in message
news:#Dpm4lRWFHA.616 (AT) TK2MSFTNGP12 (DOT) phx.gbl...
Michael,

Let me see if I have the question correct. You have 2 source
tables(11
million dimension table rows and 4 million fact table rows?). If that
is the case then what dimension has 11 million rows?

What is the source? Some sources need handling differently
Are the two tables in a database on a different server?
Is this a one off feed?

OR

Do you want to merge the two tables together on the destination from
your two sources?

That might become more complicated if your keys clash. Whilst you
will
be generating new primary key values (Surrogate Key) you will need a
point of reference when loading the fact table so will need the old
key
to refer to.

You could use partitioned views so you would split this 15 million row
table based on an attribute say Date. The view unions the partitions
back together again and the index optimizer is smart enough to know
from
where the rows that you want will come.



mk:@MSITStore:C:\Program%20Files\Microsoft%20SQL%2 0Server\80\Tools\Books\cre
atedb.chm::/cm_8_des_06_17zr.htm




The "combining" idea presented below will hurt you as not only do you
have a lot of rows to play with anyway but you do 2 extra queries for
every row as well.

The index optimizer will use whatever indexes it chooses unless you
override her with an index hint.

Let me know if I have grabbed the wrong meaning from the post and we
can
revisit my comments.

Allan


"Michael Vardinghus" <michaelvardinghus (AT) hotmail (DOT) com> wrote in message
news:michaelvardinghus (AT) hotmail (DOT) com:

Getting two tables from one base with 4 mio and 11 mio records.

The 11 mio records are dimension specification on the 4 mio records
so I
need to put these 11 mio records alongsid the 4 mio in my fact
table.

-- Example table 1
create view trans1 as
select 100 as beløb, 1 as linje
union all
select 200 as beløb, 2 as linje

-- Example table 2
create view trans2 as
select 1 as linje, 'a' as dim1, 'dim1' as type
union all
select 1 as linje, 'b' as dim2, 'dim2' as type
union all
select 2 as linje, 'a' as dim1, 'dim1' as typ
union all
select 2 as linje, 'a' as dim2, 'dim2' as type

-- Example of combining
select a.*, (select dim1 from trans2 b where
b.linje = a.linje and b.type = 'dim1') as dim1,
(select dim1 from trans2 c where c.linje = a.linje and
c.type = 'dim2') as dim2
from
trans1 a

My question is how to go about this with the best performance -
here's
one
shot:

1) Pulling the 2 tables into my relational warehouse base
2) An index combines the 2 tables and this index fills up my fact
table

My concern is what to do about indexing ? Should I have indexes when
filling
up from base 1 ? Or should the be removed when doing so ? Do I need
to
index
the two tables and the view ?

Perhaps I don't need a view - perhaps a could just use the sentence
above
and let index optimizer give me a hint to what to make indexes for ?

Thanx in advance

/Michael





Reply With Quote
  #8  
Old   
Michael Vardinghus
 
Posts: n/a

Default Re: Architecture - 05-15-2005 , 05:29 AM



Don't know if you can read the other message but

Table 1 has an entry no, table id which links it to table 2
In table 2 the dimension code values are listed per entry no - with a
corresponding dimension type

And as you say I need to flatten this...

So the dimension type values in table 2 need to be columns in a new table
and in these columns the corresponding dimension code values.

"Allan Mitchell" <allan (AT) no-spam (DOT) sqldts.com> wrote

Quote:
Can you post the definitions of the two tables and show how they relate.
Can you also post the definition of the fact table and how this
relates to those tables.

From what I can understand then is that there is a fact table file (4
million )and a dimensions table file (11 million)

The dimensions table file is large because you have multiple rows in
there for each fact table row telling you the dimension key value for
that fact table row.

You must then in the dimensions table file identify the dimension itself
and the value for that dimension

Strange design but without the structure I would have to guess at say
you have a dimensions table file that looks like this

FactRowID, DimensionType, DimensionValue

And your fact table file might look like

FactRowID, val1, Val2....

Your end result is required to be

FactRowID,DimensionName1, DimensionName2.........


You basically need to flatten the dimensions table file.


Am I any nearer?


"Michael Vardinghus" <michaelvardinghus (AT) hotmail (DOT) com> wrote in message
news:michaelvardinghus (AT) hotmail (DOT) com:

Its not dimension tables - it's a fact table with the transactions in
table
1 and in table 2 there is the dimension specifications for each
transaction.
The reason for the larger number in table 2 is that
there is used multiple dimensions and for each dimension a new line is
made
but it is referring back to table 1 with only 1 field.

So what I would like to do was to take table a which for instance has 7
columns and then for each dimension TYPE in table 2 I want to add a new
column and in these columns I would like
to write the corresponding dimension value from table 2.

The tables are placed on a different server - and I would like to avoid
to
do something on this server - would prefer to do it on my "washing"
machine...

About the surrogate keys that is a different matter - I will be doing
that
later - after these two tables are merged. My problem is that I don't
have
any dimension values in the first table yet - I need to put these on and
then afterwards replacing these with surrogate keys.

Perhaps I could do something with partitions on table 2 based on
dimension
type ? Would that make the sentence below better ?

"Allan Mitchell" <allan (AT) no-spam (DOT) sqldts.com> wrote in message
news:#Dpm4lRWFHA.616 (AT) TK2MSFTNGP12 (DOT) phx.gbl...
Michael,

Let me see if I have the question correct. You have 2 source
tables(11
million dimension table rows and 4 million fact table rows?). If that
is the case then what dimension has 11 million rows?

What is the source? Some sources need handling differently
Are the two tables in a database on a different server?
Is this a one off feed?

OR

Do you want to merge the two tables together on the destination from
your two sources?

That might become more complicated if your keys clash. Whilst you
will
be generating new primary key values (Surrogate Key) you will need a
point of reference when loading the fact table so will need the old
key
to refer to.

You could use partitioned views so you would split this 15 million row
table based on an attribute say Date. The view unions the partitions
back together again and the index optimizer is smart enough to know
from
where the rows that you want will come.



mk:@MSITStore:C:\Program%20Files\Microsoft%20SQL%2 0Server\80\Tools\Books\cre
atedb.chm::/cm_8_des_06_17zr.htm




The "combining" idea presented below will hurt you as not only do you
have a lot of rows to play with anyway but you do 2 extra queries for
every row as well.

The index optimizer will use whatever indexes it chooses unless you
override her with an index hint.

Let me know if I have grabbed the wrong meaning from the post and we
can
revisit my comments.

Allan


"Michael Vardinghus" <michaelvardinghus (AT) hotmail (DOT) com> wrote in message
news:michaelvardinghus (AT) hotmail (DOT) com:

Getting two tables from one base with 4 mio and 11 mio records.

The 11 mio records are dimension specification on the 4 mio records
so I
need to put these 11 mio records alongsid the 4 mio in my fact
table.

-- Example table 1
create view trans1 as
select 100 as beløb, 1 as linje
union all
select 200 as beløb, 2 as linje

-- Example table 2
create view trans2 as
select 1 as linje, 'a' as dim1, 'dim1' as type
union all
select 1 as linje, 'b' as dim2, 'dim2' as type
union all
select 2 as linje, 'a' as dim1, 'dim1' as typ
union all
select 2 as linje, 'a' as dim2, 'dim2' as type

-- Example of combining
select a.*, (select dim1 from trans2 b where
b.linje = a.linje and b.type = 'dim1') as dim1,
(select dim1 from trans2 c where c.linje = a.linje and
c.type = 'dim2') as dim2
from
trans1 a

My question is how to go about this with the best performance -
here's
one
shot:

1) Pulling the 2 tables into my relational warehouse base
2) An index combines the 2 tables and this index fills up my fact
table

My concern is what to do about indexing ? Should I have indexes when
filling
up from base 1 ? Or should the be removed when doing so ? Do I need
to
index
the two tables and the view ?

Perhaps I don't need a view - perhaps a could just use the sentence
above
and let index optimizer give me a hint to what to make indexes for ?

Thanx in advance

/Michael





Reply With Quote
  #9  
Old   
Allan Mitchell
 
Posts: n/a

Default Re: Architecture - 05-15-2005 , 06:52 AM



So as Far as I can see what I might do is insert the fact rows into a
dummy fact table as is so NULLs for the dimensionIDs.
You pull all the dimension rows into a working table.
You can now update the dummy fact table with the Dimension IDs by
joining onto the fact table with the ID and the posting date.

You would make multiple passes through the data based on DimensionID in
the dimensions table.

The dimensions table would be indexed so I could retrieve easily and
efficiently the correct rows.

If you do not want to do the processing on the remote server then
bringing all the rows over is certainly an option.

Why are things held like this? Seems strange and inefficient.


Allan



"Michael Vardinghus" <michaelvardinghus (AT) hotmail (DOT) com> wrote


Quote:
Don't know if you can read the other message but

Table 1 has an entry no, table id which links it to table 2
In table 2 the dimension code values are listed per entry no - with a
corresponding dimension type

And as you say I need to flatten this...

So the dimension type values in table 2 need to be columns in a new table
and in these columns the corresponding dimension code values.

"Allan Mitchell" <allan (AT) no-spam (DOT) sqldts.com> wrote in message
news:OylKE8SWFHA.2664 (AT) TK2MSFTNGP15 (DOT) phx.gbl...
Can you post the definitions of the two tables and show how they relate.
Can you also post the definition of the fact table and how this
relates to those tables.

From what I can understand then is that there is a fact table file (4
million )and a dimensions table file (11 million)

The dimensions table file is large because you have multiple rows in
there for each fact table row telling you the dimension key value for
that fact table row.

You must then in the dimensions table file identify the dimension itself
and the value for that dimension

Strange design but without the structure I would have to guess at say
you have a dimensions table file that looks like this

FactRowID, DimensionType, DimensionValue

And your fact table file might look like

FactRowID, val1, Val2....

Your end result is required to be

FactRowID,DimensionName1, DimensionName2.........


You basically need to flatten the dimensions table file.


Am I any nearer?


"Michael Vardinghus" <michaelvardinghus (AT) hotmail (DOT) com> wrote in message
news:michaelvardinghus (AT) hotmail (DOT) com:

Its not dimension tables - it's a fact table with the transactions in
table
1 and in table 2 there is the dimension specifications for each
transaction.
The reason for the larger number in table 2 is that
there is used multiple dimensions and for each dimension a new line is
made
but it is referring back to table 1 with only 1 field.

So what I would like to do was to take table a which for instance has 7
columns and then for each dimension TYPE in table 2 I want to add a new
column and in these columns I would like
to write the corresponding dimension value from table 2.

The tables are placed on a different server - and I would like to avoid
to
do something on this server - would prefer to do it on my "washing"
machine...

About the surrogate keys that is a different matter - I will be doing
that
later - after these two tables are merged. My problem is that I don't
have
any dimension values in the first table yet - I need to put these on and
then afterwards replacing these with surrogate keys.

Perhaps I could do something with partitions on table 2 based on
dimension
type ? Would that make the sentence below better ?

"Allan Mitchell" <allan (AT) no-spam (DOT) sqldts.com> wrote in message
news:#Dpm4lRWFHA.616 (AT) TK2MSFTNGP12 (DOT) phx.gbl...
Michael,

Let me see if I have the question correct. You have 2 source
tables(11
million dimension table rows and 4 million fact table rows?). If that
is the case then what dimension has 11 million rows?

What is the source? Some sources need handling differently
Are the two tables in a database on a different server?
Is this a one off feed?

OR

Do you want to merge the two tables together on the destination from
your two sources?

That might become more complicated if your keys clash. Whilst you
will
be generating new primary key values (Surrogate Key) you will need a
point of reference when loading the fact table so will need the old
key
to refer to.

You could use partitioned views so you would split this 15 million row
table based on an attribute say Date. The view unions the partitions
back together again and the index optimizer is smart enough to know
from
where the rows that you want will come.



mk:@MSITStore:C:\Program%20Files\Microsoft%20SQL%2 0Server\80\Tools\Books\cre
atedb.chm::/cm_8_des_06_17zr.htm




The "combining" idea presented below will hurt you as not only do you
have a lot of rows to play with anyway but you do 2 extra queries for
every row as well.

The index optimizer will use whatever indexes it chooses unless you
override her with an index hint.

Let me know if I have grabbed the wrong meaning from the post and we
can
revisit my comments.

Allan


"Michael Vardinghus" <michaelvardinghus (AT) hotmail (DOT) com> wrote in message
news:michaelvardinghus (AT) hotmail (DOT) com:

Getting two tables from one base with 4 mio and 11 mio records.

The 11 mio records are dimension specification on the 4 mio records
so I
need to put these 11 mio records alongsid the 4 mio in my fact
table.

-- Example table 1
create view trans1 as
select 100 as beløb, 1 as linje
union all
select 200 as beløb, 2 as linje

-- Example table 2
create view trans2 as
select 1 as linje, 'a' as dim1, 'dim1' as type
union all
select 1 as linje, 'b' as dim2, 'dim2' as type
union all
select 2 as linje, 'a' as dim1, 'dim1' as typ
union all
select 2 as linje, 'a' as dim2, 'dim2' as type

-- Example of combining
select a.*, (select dim1 from trans2 b where
b.linje = a.linje and b.type = 'dim1') as dim1,
(select dim1 from trans2 c where c.linje = a.linje and
c.type = 'dim2') as dim2
from
trans1 a

My question is how to go about this with the best performance -
here's
one
shot:

1) Pulling the 2 tables into my relational warehouse base
2) An index combines the 2 tables and this index fills up my fact
table

My concern is what to do about indexing ? Should I have indexes when
filling
up from base 1 ? Or should the be removed when doing so ? Do I need
to
index
the two tables and the view ?

Perhaps I don't need a view - perhaps a could just use the sentence
above
and let index optimizer give me a hint to what to make indexes for ?

Thanx in advance

/Michael




Reply With Quote
  #10  
Old   
Michael Vardinghus
 
Posts: n/a

Default Re: Architecture - 05-15-2005 , 07:36 AM



thanx again Allan

So this is it

Table 1 on a new base equals table 1 on transaction base
Table 2 on a new base equals table 2 on trasaction base
Table 3 on a new base is a dummy table and this is used like this (in this
table I set 0 instead of null as default for dimension values)
insert records from table 1
update dimension type a from table 2
update dimension type a from table 2

And with indexes on table 1 and 2 ?
Should these indexes be removed when filling up from transaction base and
applied when sending from 1/2 to 3 in new base ?
I'm not that clever at indexing - only used the wizard so far - how would
indexing look like - just index what I'm using in the where clauses ?


"Allan Mitchell" <allan (AT) no-spam (DOT) sqldts.com> wrote

Quote:
So as Far as I can see what I might do is insert the fact rows into a
dummy fact table as is so NULLs for the dimensionIDs.
You pull all the dimension rows into a working table.
You can now update the dummy fact table with the Dimension IDs by
joining onto the fact table with the ID and the posting date.

You would make multiple passes through the data based on DimensionID in
the dimensions table.

The dimensions table would be indexed so I could retrieve easily and
efficiently the correct rows.

If you do not want to do the processing on the remote server then
bringing all the rows over is certainly an option.

Why are things held like this? Seems strange and inefficient.


Allan



"Michael Vardinghus" <michaelvardinghus (AT) hotmail (DOT) com> wrote in message
news:michaelvardinghus (AT) hotmail (DOT) com:

Don't know if you can read the other message but

Table 1 has an entry no, table id which links it to table 2
In table 2 the dimension code values are listed per entry no - with a
corresponding dimension type

And as you say I need to flatten this...

So the dimension type values in table 2 need to be columns in a new
table
and in these columns the corresponding dimension code values.

"Allan Mitchell" <allan (AT) no-spam (DOT) sqldts.com> wrote in message
news:OylKE8SWFHA.2664 (AT) TK2MSFTNGP15 (DOT) phx.gbl...
Can you post the definitions of the two tables and show how they
relate.
Can you also post the definition of the fact table and how this
relates to those tables.

From what I can understand then is that there is a fact table file (4
million )and a dimensions table file (11 million)

The dimensions table file is large because you have multiple rows in
there for each fact table row telling you the dimension key value for
that fact table row.

You must then in the dimensions table file identify the dimension
itself
and the value for that dimension

Strange design but without the structure I would have to guess at say
you have a dimensions table file that looks like this

FactRowID, DimensionType, DimensionValue

And your fact table file might look like

FactRowID, val1, Val2....

Your end result is required to be

FactRowID,DimensionName1, DimensionName2.........


You basically need to flatten the dimensions table file.


Am I any nearer?


"Michael Vardinghus" <michaelvardinghus (AT) hotmail (DOT) com> wrote in message
news:michaelvardinghus (AT) hotmail (DOT) com:

Its not dimension tables - it's a fact table with the transactions
in
table
1 and in table 2 there is the dimension specifications for each
transaction.
The reason for the larger number in table 2 is that
there is used multiple dimensions and for each dimension a new line
is
made
but it is referring back to table 1 with only 1 field.

So what I would like to do was to take table a which for instance
has 7
columns and then for each dimension TYPE in table 2 I want to add a
new
column and in these columns I would like
to write the corresponding dimension value from table 2.

The tables are placed on a different server - and I would like to
avoid
to
do something on this server - would prefer to do it on my "washing"
machine...

About the surrogate keys that is a different matter - I will be
doing
that
later - after these two tables are merged. My problem is that I
don't
have
any dimension values in the first table yet - I need to put these on
and
then afterwards replacing these with surrogate keys.

Perhaps I could do something with partitions on table 2 based on
dimension
type ? Would that make the sentence below better ?

"Allan Mitchell" <allan (AT) no-spam (DOT) sqldts.com> wrote in message
news:#Dpm4lRWFHA.616 (AT) TK2MSFTNGP12 (DOT) phx.gbl...
Michael,

Let me see if I have the question correct. You have 2 source
tables(11
million dimension table rows and 4 million fact table rows?). If
that
is the case then what dimension has 11 million rows?

What is the source? Some sources need handling differently
Are the two tables in a database on a different server?
Is this a one off feed?

OR

Do you want to merge the two tables together on the destination
from
your two sources?

That might become more complicated if your keys clash. Whilst you
will
be generating new primary key values (Surrogate Key) you will need
a
point of reference when loading the fact table so will need the
old
key
to refer to.

You could use partitioned views so you would split this 15 million
row
table based on an attribute say Date. The view unions the
partitions
back together again and the index optimizer is smart enough to
know
from
where the rows that you want will come.




mk:@MSITStore:C:\Program%20Files\Microsoft%20SQL%2 0Server\80\Tools\Books\cre
atedb.chm::/cm_8_des_06_17zr.htm




The "combining" idea presented below will hurt you as not only do
you
have a lot of rows to play with anyway but you do 2 extra queries
for
every row as well.

The index optimizer will use whatever indexes it chooses unless
you
override her with an index hint.

Let me know if I have grabbed the wrong meaning from the post and
we
can
revisit my comments.

Allan


"Michael Vardinghus" <michaelvardinghus (AT) hotmail (DOT) com> wrote in
message
news:michaelvardinghus (AT) hotmail (DOT) com:

Getting two tables from one base with 4 mio and 11 mio records.

The 11 mio records are dimension specification on the 4 mio
records
so I
need to put these 11 mio records alongsid the 4 mio in my fact
table.

-- Example table 1
create view trans1 as
select 100 as beløb, 1 as linje
union all
select 200 as beløb, 2 as linje

-- Example table 2
create view trans2 as
select 1 as linje, 'a' as dim1, 'dim1' as type
union all
select 1 as linje, 'b' as dim2, 'dim2' as type
union all
select 2 as linje, 'a' as dim1, 'dim1' as typ
union all
select 2 as linje, 'a' as dim2, 'dim2' as type

-- Example of combining
select a.*, (select dim1 from trans2 b where
b.linje = a.linje and b.type = 'dim1') as dim1,
(select dim1 from trans2 c where c.linje = a.linje and
c.type = 'dim2') as dim2
from
trans1 a

My question is how to go about this with the best performance -
here's
one
shot:

1) Pulling the 2 tables into my relational warehouse base
2) An index combines the 2 tables and this index fills up my
fact
table

My concern is what to do about indexing ? Should I have indexes
when
filling
up from base 1 ? Or should the be removed when doing so ? Do I
need
to
index
the two tables and the view ?

Perhaps I don't need a view - perhaps a could just use the
sentence
above
and let index optimizer give me a hint to what to make indexes
for ?

Thanx in advance

/Michael






Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.