![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
#2
| |||
| |||
|
|
Getting two tables from one base with 4 mio and 11 mio records. The 11 mio records are dimension specification on the 4 mio records so I need to put these 11 mio records alongsid the 4 mio in my fact table. -- Example table 1 create view trans1 as select 100 as beløb, 1 as linje union all select 200 as beløb, 2 as linje -- Example table 2 create view trans2 as select 1 as linje, 'a' as dim1, 'dim1' as type union all select 1 as linje, 'b' as dim2, 'dim2' as type union all select 2 as linje, 'a' as dim1, 'dim1' as typ union all select 2 as linje, 'a' as dim2, 'dim2' as type -- Example of combining select a.*, (select dim1 from trans2 b where b.linje = a.linje and b.type = 'dim1') as dim1, (select dim1 from trans2 c where c.linje = a.linje and c.type = 'dim2') as dim2 from trans1 a My question is how to go about this with the best performance - here's one shot: 1) Pulling the 2 tables into my relational warehouse base 2) An index combines the 2 tables and this index fills up my fact table My concern is what to do about indexing ? Should I have indexes when filling up from base 1 ? Or should the be removed when doing so ? Do I need to index the two tables and the view ? Perhaps I don't need a view - perhaps a could just use the sentence above and let index optimizer give me a hint to what to make indexes for ? Thanx in advance /Michael |
#3
| |||
| |||
|
|
Michael, Let me see if I have the question correct. You have 2 source tables(11 million dimension table rows and 4 million fact table rows?). If that is the case then what dimension has 11 million rows? What is the source? Some sources need handling differently Are the two tables in a database on a different server? Is this a one off feed? OR Do you want to merge the two tables together on the destination from your two sources? That might become more complicated if your keys clash. Whilst you will be generating new primary key values (Surrogate Key) you will need a point of reference when loading the fact table so will need the old key to refer to. You could use partitioned views so you would split this 15 million row table based on an attribute say Date. The view unions the partitions back together again and the index optimizer is smart enough to know from where the rows that you want will come. mk:@MSITStore:C:\Program%20Files\Microsoft%20SQL%2 0Server\80\Tools\Books\cre |
| The "combining" idea presented below will hurt you as not only do you have a lot of rows to play with anyway but you do 2 extra queries for every row as well. The index optimizer will use whatever indexes it chooses unless you override her with an index hint. Let me know if I have grabbed the wrong meaning from the post and we can revisit my comments. Allan "Michael Vardinghus" <michaelvardinghus (AT) hotmail (DOT) com> wrote in message news:michaelvardinghus (AT) hotmail (DOT) com: Getting two tables from one base with 4 mio and 11 mio records. The 11 mio records are dimension specification on the 4 mio records so I need to put these 11 mio records alongsid the 4 mio in my fact table. -- Example table 1 create view trans1 as select 100 as beløb, 1 as linje union all select 200 as beløb, 2 as linje -- Example table 2 create view trans2 as select 1 as linje, 'a' as dim1, 'dim1' as type union all select 1 as linje, 'b' as dim2, 'dim2' as type union all select 2 as linje, 'a' as dim1, 'dim1' as typ union all select 2 as linje, 'a' as dim2, 'dim2' as type -- Example of combining select a.*, (select dim1 from trans2 b where b.linje = a.linje and b.type = 'dim1') as dim1, (select dim1 from trans2 c where c.linje = a.linje and c.type = 'dim2') as dim2 from trans1 a My question is how to go about this with the best performance - here's one shot: 1) Pulling the 2 tables into my relational warehouse base 2) An index combines the 2 tables and this index fills up my fact table My concern is what to do about indexing ? Should I have indexes when filling up from base 1 ? Or should the be removed when doing so ? Do I need to index the two tables and the view ? Perhaps I don't need a view - perhaps a could just use the sentence above and let index optimizer give me a hint to what to make indexes for ? Thanx in advance /Michael |
#4
| |||
| |||
|
|
Michael, Let me see if I have the question correct. You have 2 source tables(11 million dimension table rows and 4 million fact table rows?). If that is the case then what dimension has 11 million rows? What is the source? Some sources need handling differently Are the two tables in a database on a different server? Is this a one off feed? OR Do you want to merge the two tables together on the destination from your two sources? That might become more complicated if your keys clash. Whilst you will be generating new primary key values (Surrogate Key) you will need a point of reference when loading the fact table so will need the old key to refer to. You could use partitioned views so you would split this 15 million row table based on an attribute say Date. The view unions the partitions back together again and the index optimizer is smart enough to know from where the rows that you want will come. mk:@MSITStore:C:\Program%20Files\Microsoft%20SQL%2 0Server\80\Tools\Books\cre |
| The "combining" idea presented below will hurt you as not only do you have a lot of rows to play with anyway but you do 2 extra queries for every row as well. The index optimizer will use whatever indexes it chooses unless you override her with an index hint. Let me know if I have grabbed the wrong meaning from the post and we can revisit my comments. Allan "Michael Vardinghus" <michaelvardinghus (AT) hotmail (DOT) com> wrote in message news:michaelvardinghus (AT) hotmail (DOT) com: Getting two tables from one base with 4 mio and 11 mio records. The 11 mio records are dimension specification on the 4 mio records so I need to put these 11 mio records alongsid the 4 mio in my fact table. -- Example table 1 create view trans1 as select 100 as beløb, 1 as linje union all select 200 as beløb, 2 as linje -- Example table 2 create view trans2 as select 1 as linje, 'a' as dim1, 'dim1' as type union all select 1 as linje, 'b' as dim2, 'dim2' as type union all select 2 as linje, 'a' as dim1, 'dim1' as typ union all select 2 as linje, 'a' as dim2, 'dim2' as type -- Example of combining select a.*, (select dim1 from trans2 b where b.linje = a.linje and b.type = 'dim1') as dim1, (select dim1 from trans2 c where c.linje = a.linje and c.type = 'dim2') as dim2 from trans1 a My question is how to go about this with the best performance - here's one shot: 1) Pulling the 2 tables into my relational warehouse base 2) An index combines the 2 tables and this index fills up my fact table My concern is what to do about indexing ? Should I have indexes when filling up from base 1 ? Or should the be removed when doing so ? Do I need to index the two tables and the view ? Perhaps I don't need a view - perhaps a could just use the sentence above and let index optimizer give me a hint to what to make indexes for ? Thanx in advance /Michael |
#5
| |||
| |||
|
|
Real data example Table 1 Entry no Posting date Description Amount 1 1999-10-06 00:00:00.000 Rammeaft. udb. a/c -8895277.00000000000000000000 2 1999-10-06 00:00:00.000 500100496 Ulandsekr. 8895277.00000000000000000000 3 1999-10-06 00:00:00.000 1. rate -1610400.00000000000000000000 Table 2 Table ID Entry no Dimension Code Dimension value code Amount Posting date 17 1 DELREGNSKAB -8895277.00000000000000000000 1999-10-06 00:00:00.000 17 2 DELREGNSKAB 700 8895277.00000000000000000000 1999-10-06 00:00:00.000 17 2 STED 6091 8895277.00000000000000000000 1999-10-06 00:00:00.000 17 3 DELREGNSKAB -1610400.00000000000000000000 1999-10-06 00:00:00.000 I will alwas only need table id 17 end Entry no joins them ... So this is what i want New table Entry no Posting date Description Amount Delregnskab Sted 1 1999-10-06 00:00:00.000 Rammeaft. udb. a/c -8895277.00000000000000000000 700 null 2 1999-10-06 00:00:00.000 500100496 Ulandsekr. 8895277.00000000000000000000 700 6091 3 1999-10-06 00:00:00.000 1. rate -1610400.00000000000000000000 700 null If possible i would like to eliminate nulls at once but that could be washed afterwards Besides delregnskab and sted there is 3-4 other dimension that should be handled the same way. "Allan Mitchell" <allan (AT) no-spam (DOT) sqldts.com> wrote in message news:#Dpm4lRWFHA.616 (AT) TK2MSFTNGP12 (DOT) phx.gbl... Michael, Let me see if I have the question correct. You have 2 source tables(11 million dimension table rows and 4 million fact table rows?). If that is the case then what dimension has 11 million rows? What is the source? Some sources need handling differently Are the two tables in a database on a different server? Is this a one off feed? OR Do you want to merge the two tables together on the destination from your two sources? That might become more complicated if your keys clash. Whilst you will be generating new primary key values (Surrogate Key) you will need a point of reference when loading the fact table so will need the old key to refer to. You could use partitioned views so you would split this 15 million row table based on an attribute say Date. The view unions the partitions back together again and the index optimizer is smart enough to know from where the rows that you want will come. mk:@MSITStore:C:\Program%20Files\Microsoft%20SQL%2 0Server\80\Tools\Books\cre atedb.chm::/cm_8_des_06_17zr.htm The "combining" idea presented below will hurt you as not only do you have a lot of rows to play with anyway but you do 2 extra queries for every row as well. The index optimizer will use whatever indexes it chooses unless you override her with an index hint. Let me know if I have grabbed the wrong meaning from the post and we can revisit my comments. Allan "Michael Vardinghus" <michaelvardinghus (AT) hotmail (DOT) com> wrote in message news:michaelvardinghus (AT) hotmail (DOT) com: Getting two tables from one base with 4 mio and 11 mio records. The 11 mio records are dimension specification on the 4 mio records so I need to put these 11 mio records alongsid the 4 mio in my fact table. -- Example table 1 create view trans1 as select 100 as beløb, 1 as linje union all select 200 as beløb, 2 as linje -- Example table 2 create view trans2 as select 1 as linje, 'a' as dim1, 'dim1' as type union all select 1 as linje, 'b' as dim2, 'dim2' as type union all select 2 as linje, 'a' as dim1, 'dim1' as typ union all select 2 as linje, 'a' as dim2, 'dim2' as type -- Example of combining select a.*, (select dim1 from trans2 b where b.linje = a.linje and b.type = 'dim1') as dim1, (select dim1 from trans2 c where c.linje = a.linje and c.type = 'dim2') as dim2 from trans1 a My question is how to go about this with the best performance - here's one shot: 1) Pulling the 2 tables into my relational warehouse base 2) An index combines the 2 tables and this index fills up my fact table My concern is what to do about indexing ? Should I have indexes when filling up from base 1 ? Or should the be removed when doing so ? Do I need to index the two tables and the view ? Perhaps I don't need a view - perhaps a could just use the sentence above and let index optimizer give me a hint to what to make indexes for ? Thanx in advance /Michael |
#6
| |||
| |||
|
|
Its not dimension tables - it's a fact table with the transactions in table 1 and in table 2 there is the dimension specifications for each transaction. The reason for the larger number in table 2 is that there is used multiple dimensions and for each dimension a new line is made but it is referring back to table 1 with only 1 field. So what I would like to do was to take table a which for instance has 7 columns and then for each dimension TYPE in table 2 I want to add a new column and in these columns I would like to write the corresponding dimension value from table 2. The tables are placed on a different server - and I would like to avoid to do something on this server - would prefer to do it on my "washing" machine... About the surrogate keys that is a different matter - I will be doing that later - after these two tables are merged. My problem is that I don't have any dimension values in the first table yet - I need to put these on and then afterwards replacing these with surrogate keys. Perhaps I could do something with partitions on table 2 based on dimension type ? Would that make the sentence below better ? "Allan Mitchell" <allan (AT) no-spam (DOT) sqldts.com> wrote in message news:#Dpm4lRWFHA.616 (AT) TK2MSFTNGP12 (DOT) phx.gbl... Michael, Let me see if I have the question correct. You have 2 source tables(11 million dimension table rows and 4 million fact table rows?). If that is the case then what dimension has 11 million rows? What is the source? Some sources need handling differently Are the two tables in a database on a different server? Is this a one off feed? OR Do you want to merge the two tables together on the destination from your two sources? That might become more complicated if your keys clash. Whilst you will be generating new primary key values (Surrogate Key) you will need a point of reference when loading the fact table so will need the old key to refer to. You could use partitioned views so you would split this 15 million row table based on an attribute say Date. The view unions the partitions back together again and the index optimizer is smart enough to know from where the rows that you want will come. mk:@MSITStore:C:\Program%20Files\Microsoft%20SQL%2 0Server\80\Tools\Books\cre atedb.chm::/cm_8_des_06_17zr.htm The "combining" idea presented below will hurt you as not only do you have a lot of rows to play with anyway but you do 2 extra queries for every row as well. The index optimizer will use whatever indexes it chooses unless you override her with an index hint. Let me know if I have grabbed the wrong meaning from the post and we can revisit my comments. Allan "Michael Vardinghus" <michaelvardinghus (AT) hotmail (DOT) com> wrote in message news:michaelvardinghus (AT) hotmail (DOT) com: Getting two tables from one base with 4 mio and 11 mio records. The 11 mio records are dimension specification on the 4 mio records so I need to put these 11 mio records alongsid the 4 mio in my fact table. -- Example table 1 create view trans1 as select 100 as beløb, 1 as linje union all select 200 as beløb, 2 as linje -- Example table 2 create view trans2 as select 1 as linje, 'a' as dim1, 'dim1' as type union all select 1 as linje, 'b' as dim2, 'dim2' as type union all select 2 as linje, 'a' as dim1, 'dim1' as typ union all select 2 as linje, 'a' as dim2, 'dim2' as type -- Example of combining select a.*, (select dim1 from trans2 b where b.linje = a.linje and b.type = 'dim1') as dim1, (select dim1 from trans2 c where c.linje = a.linje and c.type = 'dim2') as dim2 from trans1 a My question is how to go about this with the best performance - here's one shot: 1) Pulling the 2 tables into my relational warehouse base 2) An index combines the 2 tables and this index fills up my fact table My concern is what to do about indexing ? Should I have indexes when filling up from base 1 ? Or should the be removed when doing so ? Do I need to index the two tables and the view ? Perhaps I don't need a view - perhaps a could just use the sentence above and let index optimizer give me a hint to what to make indexes for ? Thanx in advance /Michael |
#7
| |||
| |||
|
|
Can you post the definitions of the two tables and show how they relate. Can you also post the definition of the fact table and how this relates to those tables. From what I can understand then is that there is a fact table file (4 million )and a dimensions table file (11 million) The dimensions table file is large because you have multiple rows in there for each fact table row telling you the dimension key value for that fact table row. You must then in the dimensions table file identify the dimension itself and the value for that dimension Strange design but without the structure I would have to guess at say you have a dimensions table file that looks like this FactRowID, DimensionType, DimensionValue And your fact table file might look like FactRowID, val1, Val2.... Your end result is required to be FactRowID,DimensionName1, DimensionName2......... You basically need to flatten the dimensions table file. Am I any nearer? "Michael Vardinghus" <michaelvardinghus (AT) hotmail (DOT) com> wrote in message news:michaelvardinghus (AT) hotmail (DOT) com: Its not dimension tables - it's a fact table with the transactions in table 1 and in table 2 there is the dimension specifications for each transaction. The reason for the larger number in table 2 is that there is used multiple dimensions and for each dimension a new line is made but it is referring back to table 1 with only 1 field. So what I would like to do was to take table a which for instance has 7 columns and then for each dimension TYPE in table 2 I want to add a new column and in these columns I would like to write the corresponding dimension value from table 2. The tables are placed on a different server - and I would like to avoid to do something on this server - would prefer to do it on my "washing" machine... About the surrogate keys that is a different matter - I will be doing that later - after these two tables are merged. My problem is that I don't have any dimension values in the first table yet - I need to put these on and then afterwards replacing these with surrogate keys. Perhaps I could do something with partitions on table 2 based on dimension type ? Would that make the sentence below better ? "Allan Mitchell" <allan (AT) no-spam (DOT) sqldts.com> wrote in message news:#Dpm4lRWFHA.616 (AT) TK2MSFTNGP12 (DOT) phx.gbl... Michael, Let me see if I have the question correct. You have 2 source tables(11 million dimension table rows and 4 million fact table rows?). If that is the case then what dimension has 11 million rows? What is the source? Some sources need handling differently Are the two tables in a database on a different server? Is this a one off feed? OR Do you want to merge the two tables together on the destination from your two sources? That might become more complicated if your keys clash. Whilst you will be generating new primary key values (Surrogate Key) you will need a point of reference when loading the fact table so will need the old key to refer to. You could use partitioned views so you would split this 15 million row table based on an attribute say Date. The view unions the partitions back together again and the index optimizer is smart enough to know from where the rows that you want will come. mk:@MSITStore:C:\Program%20Files\Microsoft%20SQL%2 0Server\80\Tools\Books\cre atedb.chm::/cm_8_des_06_17zr.htm The "combining" idea presented below will hurt you as not only do you have a lot of rows to play with anyway but you do 2 extra queries for every row as well. The index optimizer will use whatever indexes it chooses unless you override her with an index hint. Let me know if I have grabbed the wrong meaning from the post and we can revisit my comments. Allan "Michael Vardinghus" <michaelvardinghus (AT) hotmail (DOT) com> wrote in message news:michaelvardinghus (AT) hotmail (DOT) com: Getting two tables from one base with 4 mio and 11 mio records. The 11 mio records are dimension specification on the 4 mio records so I need to put these 11 mio records alongsid the 4 mio in my fact table. -- Example table 1 create view trans1 as select 100 as beløb, 1 as linje union all select 200 as beløb, 2 as linje -- Example table 2 create view trans2 as select 1 as linje, 'a' as dim1, 'dim1' as type union all select 1 as linje, 'b' as dim2, 'dim2' as type union all select 2 as linje, 'a' as dim1, 'dim1' as typ union all select 2 as linje, 'a' as dim2, 'dim2' as type -- Example of combining select a.*, (select dim1 from trans2 b where b.linje = a.linje and b.type = 'dim1') as dim1, (select dim1 from trans2 c where c.linje = a.linje and c.type = 'dim2') as dim2 from trans1 a My question is how to go about this with the best performance - here's one shot: 1) Pulling the 2 tables into my relational warehouse base 2) An index combines the 2 tables and this index fills up my fact table My concern is what to do about indexing ? Should I have indexes when filling up from base 1 ? Or should the be removed when doing so ? Do I need to index the two tables and the view ? Perhaps I don't need a view - perhaps a could just use the sentence above and let index optimizer give me a hint to what to make indexes for ? Thanx in advance /Michael |
#8
| |||
| |||
|
|
Can you post the definitions of the two tables and show how they relate. Can you also post the definition of the fact table and how this relates to those tables. From what I can understand then is that there is a fact table file (4 million )and a dimensions table file (11 million) The dimensions table file is large because you have multiple rows in there for each fact table row telling you the dimension key value for that fact table row. You must then in the dimensions table file identify the dimension itself and the value for that dimension Strange design but without the structure I would have to guess at say you have a dimensions table file that looks like this FactRowID, DimensionType, DimensionValue And your fact table file might look like FactRowID, val1, Val2.... Your end result is required to be FactRowID,DimensionName1, DimensionName2......... You basically need to flatten the dimensions table file. Am I any nearer? "Michael Vardinghus" <michaelvardinghus (AT) hotmail (DOT) com> wrote in message news:michaelvardinghus (AT) hotmail (DOT) com: Its not dimension tables - it's a fact table with the transactions in table 1 and in table 2 there is the dimension specifications for each transaction. The reason for the larger number in table 2 is that there is used multiple dimensions and for each dimension a new line is made but it is referring back to table 1 with only 1 field. So what I would like to do was to take table a which for instance has 7 columns and then for each dimension TYPE in table 2 I want to add a new column and in these columns I would like to write the corresponding dimension value from table 2. The tables are placed on a different server - and I would like to avoid to do something on this server - would prefer to do it on my "washing" machine... About the surrogate keys that is a different matter - I will be doing that later - after these two tables are merged. My problem is that I don't have any dimension values in the first table yet - I need to put these on and then afterwards replacing these with surrogate keys. Perhaps I could do something with partitions on table 2 based on dimension type ? Would that make the sentence below better ? "Allan Mitchell" <allan (AT) no-spam (DOT) sqldts.com> wrote in message news:#Dpm4lRWFHA.616 (AT) TK2MSFTNGP12 (DOT) phx.gbl... Michael, Let me see if I have the question correct. You have 2 source tables(11 million dimension table rows and 4 million fact table rows?). If that is the case then what dimension has 11 million rows? What is the source? Some sources need handling differently Are the two tables in a database on a different server? Is this a one off feed? OR Do you want to merge the two tables together on the destination from your two sources? That might become more complicated if your keys clash. Whilst you will be generating new primary key values (Surrogate Key) you will need a point of reference when loading the fact table so will need the old key to refer to. You could use partitioned views so you would split this 15 million row table based on an attribute say Date. The view unions the partitions back together again and the index optimizer is smart enough to know from where the rows that you want will come. mk:@MSITStore:C:\Program%20Files\Microsoft%20SQL%2 0Server\80\Tools\Books\cre atedb.chm::/cm_8_des_06_17zr.htm The "combining" idea presented below will hurt you as not only do you have a lot of rows to play with anyway but you do 2 extra queries for every row as well. The index optimizer will use whatever indexes it chooses unless you override her with an index hint. Let me know if I have grabbed the wrong meaning from the post and we can revisit my comments. Allan "Michael Vardinghus" <michaelvardinghus (AT) hotmail (DOT) com> wrote in message news:michaelvardinghus (AT) hotmail (DOT) com: Getting two tables from one base with 4 mio and 11 mio records. The 11 mio records are dimension specification on the 4 mio records so I need to put these 11 mio records alongsid the 4 mio in my fact table. -- Example table 1 create view trans1 as select 100 as beløb, 1 as linje union all select 200 as beløb, 2 as linje -- Example table 2 create view trans2 as select 1 as linje, 'a' as dim1, 'dim1' as type union all select 1 as linje, 'b' as dim2, 'dim2' as type union all select 2 as linje, 'a' as dim1, 'dim1' as typ union all select 2 as linje, 'a' as dim2, 'dim2' as type -- Example of combining select a.*, (select dim1 from trans2 b where b.linje = a.linje and b.type = 'dim1') as dim1, (select dim1 from trans2 c where c.linje = a.linje and c.type = 'dim2') as dim2 from trans1 a My question is how to go about this with the best performance - here's one shot: 1) Pulling the 2 tables into my relational warehouse base 2) An index combines the 2 tables and this index fills up my fact table My concern is what to do about indexing ? Should I have indexes when filling up from base 1 ? Or should the be removed when doing so ? Do I need to index the two tables and the view ? Perhaps I don't need a view - perhaps a could just use the sentence above and let index optimizer give me a hint to what to make indexes for ? Thanx in advance /Michael |
#9
| |||
| |||
|
|
Don't know if you can read the other message but Table 1 has an entry no, table id which links it to table 2 In table 2 the dimension code values are listed per entry no - with a corresponding dimension type And as you say I need to flatten this... So the dimension type values in table 2 need to be columns in a new table and in these columns the corresponding dimension code values. "Allan Mitchell" <allan (AT) no-spam (DOT) sqldts.com> wrote in message news:OylKE8SWFHA.2664 (AT) TK2MSFTNGP15 (DOT) phx.gbl... Can you post the definitions of the two tables and show how they relate. Can you also post the definition of the fact table and how this relates to those tables. From what I can understand then is that there is a fact table file (4 million )and a dimensions table file (11 million) The dimensions table file is large because you have multiple rows in there for each fact table row telling you the dimension key value for that fact table row. You must then in the dimensions table file identify the dimension itself and the value for that dimension Strange design but without the structure I would have to guess at say you have a dimensions table file that looks like this FactRowID, DimensionType, DimensionValue And your fact table file might look like FactRowID, val1, Val2.... Your end result is required to be FactRowID,DimensionName1, DimensionName2......... You basically need to flatten the dimensions table file. Am I any nearer? "Michael Vardinghus" <michaelvardinghus (AT) hotmail (DOT) com> wrote in message news:michaelvardinghus (AT) hotmail (DOT) com: Its not dimension tables - it's a fact table with the transactions in table 1 and in table 2 there is the dimension specifications for each transaction. The reason for the larger number in table 2 is that there is used multiple dimensions and for each dimension a new line is made but it is referring back to table 1 with only 1 field. So what I would like to do was to take table a which for instance has 7 columns and then for each dimension TYPE in table 2 I want to add a new column and in these columns I would like to write the corresponding dimension value from table 2. The tables are placed on a different server - and I would like to avoid to do something on this server - would prefer to do it on my "washing" machine... About the surrogate keys that is a different matter - I will be doing that later - after these two tables are merged. My problem is that I don't have any dimension values in the first table yet - I need to put these on and then afterwards replacing these with surrogate keys. Perhaps I could do something with partitions on table 2 based on dimension type ? Would that make the sentence below better ? "Allan Mitchell" <allan (AT) no-spam (DOT) sqldts.com> wrote in message news:#Dpm4lRWFHA.616 (AT) TK2MSFTNGP12 (DOT) phx.gbl... Michael, Let me see if I have the question correct. You have 2 source tables(11 million dimension table rows and 4 million fact table rows?). If that is the case then what dimension has 11 million rows? What is the source? Some sources need handling differently Are the two tables in a database on a different server? Is this a one off feed? OR Do you want to merge the two tables together on the destination from your two sources? That might become more complicated if your keys clash. Whilst you will be generating new primary key values (Surrogate Key) you will need a point of reference when loading the fact table so will need the old key to refer to. You could use partitioned views so you would split this 15 million row table based on an attribute say Date. The view unions the partitions back together again and the index optimizer is smart enough to know from where the rows that you want will come. mk:@MSITStore:C:\Program%20Files\Microsoft%20SQL%2 0Server\80\Tools\Books\cre atedb.chm::/cm_8_des_06_17zr.htm The "combining" idea presented below will hurt you as not only do you have a lot of rows to play with anyway but you do 2 extra queries for every row as well. The index optimizer will use whatever indexes it chooses unless you override her with an index hint. Let me know if I have grabbed the wrong meaning from the post and we can revisit my comments. Allan "Michael Vardinghus" <michaelvardinghus (AT) hotmail (DOT) com> wrote in message news:michaelvardinghus (AT) hotmail (DOT) com: Getting two tables from one base with 4 mio and 11 mio records. The 11 mio records are dimension specification on the 4 mio records so I need to put these 11 mio records alongsid the 4 mio in my fact table. -- Example table 1 create view trans1 as select 100 as beløb, 1 as linje union all select 200 as beløb, 2 as linje -- Example table 2 create view trans2 as select 1 as linje, 'a' as dim1, 'dim1' as type union all select 1 as linje, 'b' as dim2, 'dim2' as type union all select 2 as linje, 'a' as dim1, 'dim1' as typ union all select 2 as linje, 'a' as dim2, 'dim2' as type -- Example of combining select a.*, (select dim1 from trans2 b where b.linje = a.linje and b.type = 'dim1') as dim1, (select dim1 from trans2 c where c.linje = a.linje and c.type = 'dim2') as dim2 from trans1 a My question is how to go about this with the best performance - here's one shot: 1) Pulling the 2 tables into my relational warehouse base 2) An index combines the 2 tables and this index fills up my fact table My concern is what to do about indexing ? Should I have indexes when filling up from base 1 ? Or should the be removed when doing so ? Do I need to index the two tables and the view ? Perhaps I don't need a view - perhaps a could just use the sentence above and let index optimizer give me a hint to what to make indexes for ? Thanx in advance /Michael |
#10
| |||
| |||
|
|
So as Far as I can see what I might do is insert the fact rows into a dummy fact table as is so NULLs for the dimensionIDs. You pull all the dimension rows into a working table. You can now update the dummy fact table with the Dimension IDs by joining onto the fact table with the ID and the posting date. You would make multiple passes through the data based on DimensionID in the dimensions table. The dimensions table would be indexed so I could retrieve easily and efficiently the correct rows. If you do not want to do the processing on the remote server then bringing all the rows over is certainly an option. Why are things held like this? Seems strange and inefficient. Allan "Michael Vardinghus" <michaelvardinghus (AT) hotmail (DOT) com> wrote in message news:michaelvardinghus (AT) hotmail (DOT) com: Don't know if you can read the other message but Table 1 has an entry no, table id which links it to table 2 In table 2 the dimension code values are listed per entry no - with a corresponding dimension type And as you say I need to flatten this... So the dimension type values in table 2 need to be columns in a new table and in these columns the corresponding dimension code values. "Allan Mitchell" <allan (AT) no-spam (DOT) sqldts.com> wrote in message news:OylKE8SWFHA.2664 (AT) TK2MSFTNGP15 (DOT) phx.gbl... Can you post the definitions of the two tables and show how they relate. Can you also post the definition of the fact table and how this relates to those tables. From what I can understand then is that there is a fact table file (4 million )and a dimensions table file (11 million) The dimensions table file is large because you have multiple rows in there for each fact table row telling you the dimension key value for that fact table row. You must then in the dimensions table file identify the dimension itself and the value for that dimension Strange design but without the structure I would have to guess at say you have a dimensions table file that looks like this FactRowID, DimensionType, DimensionValue And your fact table file might look like FactRowID, val1, Val2.... Your end result is required to be FactRowID,DimensionName1, DimensionName2......... You basically need to flatten the dimensions table file. Am I any nearer? "Michael Vardinghus" <michaelvardinghus (AT) hotmail (DOT) com> wrote in message news:michaelvardinghus (AT) hotmail (DOT) com: Its not dimension tables - it's a fact table with the transactions in table 1 and in table 2 there is the dimension specifications for each transaction. The reason for the larger number in table 2 is that there is used multiple dimensions and for each dimension a new line is made but it is referring back to table 1 with only 1 field. So what I would like to do was to take table a which for instance has 7 columns and then for each dimension TYPE in table 2 I want to add a new column and in these columns I would like to write the corresponding dimension value from table 2. The tables are placed on a different server - and I would like to avoid to do something on this server - would prefer to do it on my "washing" machine... About the surrogate keys that is a different matter - I will be doing that later - after these two tables are merged. My problem is that I don't have any dimension values in the first table yet - I need to put these on and then afterwards replacing these with surrogate keys. Perhaps I could do something with partitions on table 2 based on dimension type ? Would that make the sentence below better ? "Allan Mitchell" <allan (AT) no-spam (DOT) sqldts.com> wrote in message news:#Dpm4lRWFHA.616 (AT) TK2MSFTNGP12 (DOT) phx.gbl... Michael, Let me see if I have the question correct. You have 2 source tables(11 million dimension table rows and 4 million fact table rows?). If that is the case then what dimension has 11 million rows? What is the source? Some sources need handling differently Are the two tables in a database on a different server? Is this a one off feed? OR Do you want to merge the two tables together on the destination from your two sources? That might become more complicated if your keys clash. Whilst you will be generating new primary key values (Surrogate Key) you will need a point of reference when loading the fact table so will need the old key to refer to. You could use partitioned views so you would split this 15 million row table based on an attribute say Date. The view unions the partitions back together again and the index optimizer is smart enough to know from where the rows that you want will come. mk:@MSITStore:C:\Program%20Files\Microsoft%20SQL%2 0Server\80\Tools\Books\cre atedb.chm::/cm_8_des_06_17zr.htm The "combining" idea presented below will hurt you as not only do you have a lot of rows to play with anyway but you do 2 extra queries for every row as well. The index optimizer will use whatever indexes it chooses unless you override her with an index hint. Let me know if I have grabbed the wrong meaning from the post and we can revisit my comments. Allan "Michael Vardinghus" <michaelvardinghus (AT) hotmail (DOT) com> wrote in message news:michaelvardinghus (AT) hotmail (DOT) com: Getting two tables from one base with 4 mio and 11 mio records. The 11 mio records are dimension specification on the 4 mio records so I need to put these 11 mio records alongsid the 4 mio in my fact table. -- Example table 1 create view trans1 as select 100 as beløb, 1 as linje union all select 200 as beløb, 2 as linje -- Example table 2 create view trans2 as select 1 as linje, 'a' as dim1, 'dim1' as type union all select 1 as linje, 'b' as dim2, 'dim2' as type union all select 2 as linje, 'a' as dim1, 'dim1' as typ union all select 2 as linje, 'a' as dim2, 'dim2' as type -- Example of combining select a.*, (select dim1 from trans2 b where b.linje = a.linje and b.type = 'dim1') as dim1, (select dim1 from trans2 c where c.linje = a.linje and c.type = 'dim2') as dim2 from trans1 a My question is how to go about this with the best performance - here's one shot: 1) Pulling the 2 tables into my relational warehouse base 2) An index combines the 2 tables and this index fills up my fact table My concern is what to do about indexing ? Should I have indexes when filling up from base 1 ? Or should the be removed when doing so ? Do I need to index the two tables and the view ? Perhaps I don't need a view - perhaps a could just use the sentence above and let index optimizer give me a hint to what to make indexes for ? Thanx in advance /Michael |
![]() |
| Thread Tools | |
| Display Modes | |
| |