![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
#2
| |||
| |||
|
|
Hi experts, I have three tables to track the medical waste container life cycle: container_collected, container_transferred and container_processed. All above three tables are keyed by container_id, and have 50-100 million records. Since those tables contain important measures like weight, etc. I would like to treat them as fact tables and define three measure groups to associate the different measures with the tables, I would also want to model container dimension as degenerated dimension (fact dimension since it is 1-1 to fact). My questions are: 1. Is this a sound approach? 2.Since fact dimension is associated with one fact table (measure group), and all three fact tables should share the same container dimension, what's the best strategy to model this? (My thinking is : model container dimension based on container_collected (superset) table, then make it shared among all three measure groups) 3. What's will be the performance implication of my degenerated container dimension? 4. What's the general recommendation on modelling fact dimension? Sorry that I have to ask so many questions since I have not found many discussion regarding this issue? Thanks, Yongli. |
#3
| |||
| |||
|
|
just to confirm: each table contain only 1 row by container_id? if yes... Focus on the questions your users have to answer. does your users really need to analyze by container id? or does they want to analyze "how many containers" "total weight" etc... so only summary data. create dimensions around the containers (like dates, weight of the container, geography...) and link them to your 3 fact tables. your users will be able to analyze by these dimensions and provides statistics: does the number of containers is the same this month compared to last month? what is the average delay to transefrt by container weight? how many containers are not processed? etc... if your questions are: "Where is the container id XXXX?" then don't use a cube for these questions. "yongli" <yongli (AT) discussions (DOT) microsoft.com> wrote in message news:EFD3AFBD-D6C1-4953-BF0A-87BD129CF5C2 (AT) microsoft (DOT) com... Hi experts, I have three tables to track the medical waste container life cycle: container_collected, container_transferred and container_processed. All above three tables are keyed by container_id, and have 50-100 million records. Since those tables contain important measures like weight, etc. I would like to treat them as fact tables and define three measure groups to associate the different measures with the tables, I would also want to model container dimension as degenerated dimension (fact dimension since it is 1-1 to fact). My questions are: 1. Is this a sound approach? 2.Since fact dimension is associated with one fact table (measure group), and all three fact tables should share the same container dimension, what's the best strategy to model this? (My thinking is : model container dimension based on container_collected (superset) table, then make it shared among all three measure groups) 3. What's will be the performance implication of my degenerated container dimension? 4. What's the general recommendation on modelling fact dimension? Sorry that I have to ask so many questions since I have not found many discussion regarding this issue? Thanks, Yongli. |
#4
| |||
| |||
|
|
Thanks. container_id is the primary key for all three tables. Actually 90% of the reports are for the analysis of total weight of the containers by other dimension like customer, geography, container type and waste type, so I can get away of not creating the container dimension. But there are 10% of the reports are on the container_id level (transaction level). Users want to see detailed list of containers picked up and processed within a month. This is why I bring up the fact dimension idea. Thanks for the feedback. Yongli "Jéjé" wrote: just to confirm: each table contain only 1 row by container_id? if yes... Focus on the questions your users have to answer. does your users really need to analyze by container id? or does they want to analyze "how many containers" "total weight" etc... so only summary data. create dimensions around the containers (like dates, weight of the container, geography...) and link them to your 3 fact tables. your users will be able to analyze by these dimensions and provides statistics: does the number of containers is the same this month compared to last month? what is the average delay to transefrt by container weight? how many containers are not processed? etc... if your questions are: "Where is the container id XXXX?" then don't use a cube for these questions. "yongli" <yongli (AT) discussions (DOT) microsoft.com> wrote in message news:EFD3AFBD-D6C1-4953-BF0A-87BD129CF5C2 (AT) microsoft (DOT) com... Hi experts, I have three tables to track the medical waste container life cycle: container_collected, container_transferred and container_processed. All above three tables are keyed by container_id, and have 50-100 million records. Since those tables contain important measures like weight, etc. I would like to treat them as fact tables and define three measure groups to associate the different measures with the tables, I would also want to model container dimension as degenerated dimension (fact dimension since it is 1-1 to fact). My questions are: 1. Is this a sound approach? 2.Since fact dimension is associated with one fact table (measure group), and all three fact tables should share the same container dimension, what's the best strategy to model this? (My thinking is : model container dimension based on container_collected (superset) table, then make it shared among all three measure groups) 3. What's will be the performance implication of my degenerated container dimension? 4. What's the general recommendation on modelling fact dimension? Sorry that I have to ask so many questions since I have not found many discussion regarding this issue? Thanks, Yongli. |
![]() |
| Thread Tools | |
| Display Modes | |
| |