![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
#2
| ||||
| ||||
|
|
* The performance seems to be about the same (that is, pretty fast) whether the aggregations are set to 0% or 70%. That confuses me - I thought that setting up the relationships is what allows the aggregations to be computed. But why would no aggregations perform so well? |
|
* I'm assuming that by setting up these relationships, I'm giving up some flexibility in my queries. Is that true? For example, what if I want to compare how January compares to February, across all years. Are those aggregations not longer being performed, because of the attribute relationships? If so, what's the best way to deal with that? Or am I misunderstanding something? |
|
* I'm looking at making similar changes to a different dimension, but that dimension has two different user hierarchies that combine the same attributes in different ways. Does defining a natural hierarchy through attribute relationships limit your ability to combine those attributes in different navigational hierarchies? |
|
(Cross posted from MSDN forums, no response there yet) I'm working on fixing some major query performance problems in an AS2005 cube. My suspicion was that they were caused by the original cube design not specifying attribute relationships in the time dimension. I've read the Project REAL stuff and the P&P BI Guidelines in an effort to fully grok the implications of attribute relationships, natural hierarchies, and aggregations, but there's still some stuff I'm unclear on. Our calendar hierarchy is moderately deep - Year > Month > Day > Hour Minute (where minute is actually a 15 minute time window). The dimension table columns are the key, minute (0,15,30,45), hour of day (0-23), day of month (0-days in month), month (1-12), year. Originally there were attributes defined for each of these columns, with each attribute using its corresponding column as its KeyColumn, and with all attributes related directly to the key. A user hierarchy provided navigation from the year down to the minute. Performance, in a word, sucked (drilling down in the cube browser to a low level in the hierarchy could take as long as 45 minutes!). I changed the KeyColumn for each attribute below year to be a composite of each higher level attribute (so the KeyColumns in Month were Year and Month, for Day they were Year/Month/Day, etc). Then I set the attribute relationships accordingly. I'm not sure if this is the most efficient way to define the attributes, but it seemed to be the only way to do it with our current dimension table schema. Once I made this change and re-processed, drilling through the hierarchy went from 45 minutes to less than a second. Not a bad improvement. ![]() However, there's still some things that I'm unclear on. * The performance seems to be about the same (that is, pretty fast) whether the aggregations are set to 0% or 70%. That confuses me - I thought that setting up the relationships is what allows the aggregations to be computed. But why would no aggregations perform so well? * I'm assuming that by setting up these relationships, I'm giving up some flexibility in my queries. Is that true? For example, what if I want to compare how January compares to February, across all years. Are those aggregations not longer being performed, because of the attribute relationships? If so, what's the best way to deal with that? Or am I misunderstanding something? * I'm looking at making similar changes to a different dimension, but that dimension has two different user hierarchies that combine the same attributes in different ways. Does defining a natural hierarchy through attribute relationships limit your ability to combine those attributes in different navigational hierarchies? Thanks, Kevin |
#3
| |||||
| |||||
|
|
So a query that happened to fetch data at the hours level could make a later query at the days level really fast because the cache for the hour-level data can be used to aggregate into the days-level data. |
|
Additionally, access to the user hierarchy becomes much faster if it is natural -- and depending on the queries this can sometimes lead to huge improvements in performance. |
|
2. You can have another attribute (Month of Year) which is related to Month (or to Day) and query by that attribute hierarchy instead. |
|
It would help to see an example of this problem. |
|
Some answers: * The performance seems to be about the same (that is, pretty fast) whether the aggregations are set to 0% or 70%. That confuses me - I thought that setting up the relationships is what allows the aggregations to be computed. But why would no aggregations perform so well? The benefits of good attribute relationships are in no way limited purely to disk-based aggregations. Once you have the attribute relationships, in-memory caches of data can also be used for aggregating higher levels of data. So a query that happened to fetch data at the hours level could make a later query at the days level really fast because the cache for the hour-level data can be used to aggregate into the days-level data. Additionally, access to the user hierarchy becomes much faster if it is natural -- and depending on the queries this can sometimes lead to huge improvements in performance. * I'm assuming that by setting up these relationships, I'm giving up some flexibility in my queries. Is that true? For example, what if I want to compare how January compares to February, across all years. Are those aggregations not longer being performed, because of the attribute relationships? If so, what's the best way to deal with that? Or am I misunderstanding something? In a sense, yes you are losing some flexibility -- now February isn't an entity by itself, but rather each instance of February is strongly associated with a year. However, you can still obtain the results you need with different solutions: 1. You can quite easily write a calculation to aggregate over the 1st child of each year and the 2nd child of each year. 2. You can have another attribute (Month of Year) which is related to Month (or to Day) and query by that attribute hierarchy instead. The second solution is the simplest but you risk making your model somewhat more complex by adding more attributes. * I'm looking at making similar changes to a different dimension, but that dimension has two different user hierarchies that combine the same attributes in different ways. Does defining a natural hierarchy through attribute relationships limit your ability to combine those attributes in different navigational hierarchies? It would help to see an example of this problem. Generally its okay if the attributes are combined together in different ways -- but you should make sure that they follow the same rules and semantics when creating the relationships. Make sure that the meaning of each attribute is the same in each hierarchy and the functional dependencies hold true and then it should turn out right. In some cases, you may decide to actually split up attributes (like Month and Month Of Year) because they really mean different things in the two hierarchies. But that is a choice and design decision that you need to make based on what kind of behavior you want to obtain. HTH, Akshai -- -- Try out the MSDN Forums for Analysis Services at: http://forums.microsoft.com/MSDN/Sho...ID=83&SiteID=1 This posting is provided "AS IS" with no warranties, and confers no rights Please do not send email directly to this alias. This alias is for newsgroup purposes only. "Kevin Dente" <kevind (AT) no (DOT) spam.denteworld.com> wrote in message news:uB62pAWvGHA.4444 (AT) TK2MSFTNGP05 (DOT) phx.gbl... (Cross posted from MSDN forums, no response there yet) I'm working on fixing some major query performance problems in an AS2005 cube. My suspicion was that they were caused by the original cube design not specifying attribute relationships in the time dimension. I've read the Project REAL stuff and the P&P BI Guidelines in an effort to fully grok the implications of attribute relationships, natural hierarchies, and aggregations, but there's still some stuff I'm unclear on. Our calendar hierarchy is moderately deep - Year > Month > Day > Hour Minute (where minute is actually a 15 minute time window). The dimension table columns are the key, minute (0,15,30,45), hour of day (0-23), day of month (0-days in month), month (1-12), year. Originally there were attributes defined for each of these columns, with each attribute using its corresponding column as its KeyColumn, and with all attributes related directly to the key. A user hierarchy provided navigation from the year down to the minute. Performance, in a word, sucked (drilling down in the cube browser to a low level in the hierarchy could take as long as 45 minutes!). I changed the KeyColumn for each attribute below year to be a composite of each higher level attribute (so the KeyColumns in Month were Year and Month, for Day they were Year/Month/Day, etc). Then I set the attribute relationships accordingly. I'm not sure if this is the most efficient way to define the attributes, but it seemed to be the only way to do it with our current dimension table schema. Once I made this change and re-processed, drilling through the hierarchy went from 45 minutes to less than a second. Not a bad improvement. ![]() However, there's still some things that I'm unclear on. * The performance seems to be about the same (that is, pretty fast) whether the aggregations are set to 0% or 70%. That confuses me - I thought that setting up the relationships is what allows the aggregations to be computed. But why would no aggregations perform so well? * I'm assuming that by setting up these relationships, I'm giving up some flexibility in my queries. Is that true? For example, what if I want to compare how January compares to February, across all years. Are those aggregations not longer being performed, because of the attribute relationships? If so, what's the best way to deal with that? Or am I misunderstanding something? * I'm looking at making similar changes to a different dimension, but that dimension has two different user hierarchies that combine the same attributes in different ways. Does defining a natural hierarchy through attribute relationships limit your ability to combine those attributes in different navigational hierarchies? Thanks, Kevin |
#4
| ||||
| ||||
|
|
Is there any more you can say here? This is the situation for me - after making the user hierarchy into a natural hierachy, drilling down sped up so dramatically it was amazing. But it all seems a bit like magic. If disk-based aggregations aren't to thank, then what? |
|
System > Process > Version > Activity Group > Activity This lets you, for example, examine all activities in a particular version of a process. However, for comparitive analysis, it's also useful to have the following hierarchy as well: System > Process > Activity Group > Activity > Activity Version |
|
Process (1,2) Version (1) Activity Group (1,2) Activity (1,2) Activity Version (2) |
|
Thanks Akshai! Great info. So a query that happened to fetch data at the hours level could make a later query at the days level really fast because the cache for the hour-level data can be used to aggregate into the days-level data. In my case, I was navigating down the hierarchy from year to minute (high-to-low-level), so I'm assuming your example doesn't apply here. Additionally, access to the user hierarchy becomes much faster if it is natural -- and depending on the queries this can sometimes lead to huge improvements in performance. Is there any more you can say here? This is the situation for me - after making the user hierarchy into a natural hierachy, drilling down sped up so dramatically it was amazing. But it all seems a bit like magic. If disk-based aggregations aren't to thank, then what? 2. You can have another attribute (Month of Year) which is related to Month (or to Day) and query by that attribute hierarchy instead. Looking around the AdvWorks cube, I noticed that pattern. I'll keep it in mind. I just wanted to make sure I was understanding things correctly. It would help to see an example of this problem. OK, in this case we have a set of business processes, each of which are made up activities. Processes can be versioned, so each activity has a version associated with it. One user hierarchy looks something this: System > Process > Version > Activity Group > Activity This lets you, for example, examine all activities in a particular version of a process. However, for comparitive analysis, it's also useful to have the following hierarchy as well: System > Process > Activity Group > Activity > Activity Version This lets you easily compare at two different versions of the same activity. So you see this is actually the same set of attributes, stacked in different ways. I'm not sure how I should design the natural hierarchies here (if at all). "Akshai Mirchandani [MS]" <akshaim (AT) online (DOT) microsoft.com> wrote in message news:%23kF2q$9vGHA.3372 (AT) TK2MSFTNGP02 (DOT) phx.gbl... Some answers: * The performance seems to be about the same (that is, pretty fast) whether the aggregations are set to 0% or 70%. That confuses me - I thought that setting up the relationships is what allows the aggregations to be computed. But why would no aggregations perform so well? The benefits of good attribute relationships are in no way limited purely to disk-based aggregations. Once you have the attribute relationships, in-memory caches of data can also be used for aggregating higher levels of data. So a query that happened to fetch data at the hours level could make a later query at the days level really fast because the cache for the hour-level data can be used to aggregate into the days-level data. Additionally, access to the user hierarchy becomes much faster if it is natural -- and depending on the queries this can sometimes lead to huge improvements in performance. * I'm assuming that by setting up these relationships, I'm giving up some flexibility in my queries. Is that true? For example, what if I want to compare how January compares to February, across all years. Are those aggregations not longer being performed, because of the attribute relationships? If so, what's the best way to deal with that? Or am I misunderstanding something? In a sense, yes you are losing some flexibility -- now February isn't an entity by itself, but rather each instance of February is strongly associated with a year. However, you can still obtain the results you need with different solutions: 1. You can quite easily write a calculation to aggregate over the 1st child of each year and the 2nd child of each year. 2. You can have another attribute (Month of Year) which is related to Month (or to Day) and query by that attribute hierarchy instead. The second solution is the simplest but you risk making your model somewhat more complex by adding more attributes. * I'm looking at making similar changes to a different dimension, but that dimension has two different user hierarchies that combine the same attributes in different ways. Does defining a natural hierarchy through attribute relationships limit your ability to combine those attributes in different navigational hierarchies? It would help to see an example of this problem. Generally its okay if the attributes are combined together in different ways -- but you should make sure that they follow the same rules and semantics when creating the relationships. Make sure that the meaning of each attribute is the same in each hierarchy and the functional dependencies hold true and then it should turn out right. In some cases, you may decide to actually split up attributes (like Month and Month Of Year) because they really mean different things in the two hierarchies. But that is a choice and design decision that you need to make based on what kind of behavior you want to obtain. HTH, Akshai -- -- Try out the MSDN Forums for Analysis Services at: http://forums.microsoft.com/MSDN/Sho...ID=83&SiteID=1 This posting is provided "AS IS" with no warranties, and confers no rights Please do not send email directly to this alias. This alias is for newsgroup purposes only. "Kevin Dente" <kevind (AT) no (DOT) spam.denteworld.com> wrote in message news:uB62pAWvGHA.4444 (AT) TK2MSFTNGP05 (DOT) phx.gbl... (Cross posted from MSDN forums, no response there yet) I'm working on fixing some major query performance problems in an AS2005 cube. My suspicion was that they were caused by the original cube design not specifying attribute relationships in the time dimension. I've read the Project REAL stuff and the P&P BI Guidelines in an effort to fully grok the implications of attribute relationships, natural hierarchies, and aggregations, but there's still some stuff I'm unclear on. Our calendar hierarchy is moderately deep - Year > Month > Day > Hour Minute (where minute is actually a 15 minute time window). The dimension table columns are the key, minute (0,15,30,45), hour of day (0-23), day of month (0-days in month), month (1-12), year. Originally there were attributes defined for each of these columns, with each attribute using its corresponding column as its KeyColumn, and with all attributes related directly to the key. A user hierarchy provided navigation from the year down to the minute. Performance, in a word, sucked (drilling down in the cube browser to a low level in the hierarchy could take as long as 45 minutes!). I changed the KeyColumn for each attribute below year to be a composite of each higher level attribute (so the KeyColumns in Month were Year and Month, for Day they were Year/Month/Day, etc). Then I set the attribute relationships accordingly. I'm not sure if this is the most efficient way to define the attributes, but it seemed to be the only way to do it with our current dimension table schema. Once I made this change and re-processed, drilling through the hierarchy went from 45 minutes to less than a second. Not a bad improvement. ![]() However, there's still some things that I'm unclear on. * The performance seems to be about the same (that is, pretty fast) whether the aggregations are set to 0% or 70%. That confuses me - I thought that setting up the relationships is what allows the aggregations to be computed. But why would no aggregations perform so well? * I'm assuming that by setting up these relationships, I'm giving up some flexibility in my queries. Is that true? For example, what if I want to compare how January compares to February, across all years. Are those aggregations not longer being performed, because of the attribute relationships? If so, what's the best way to deal with that? Or am I misunderstanding something? * I'm looking at making similar changes to a different dimension, but that dimension has two different user hierarchies that combine the same attributes in different ways. Does defining a natural hierarchy through attribute relationships limit your ability to combine those attributes in different navigational hierarchies? Thanks, Kevin |
#5
| |||
| |||
|
|
A hierarchy built on top of a natural hierarchy can materialize the children of each parent.... |
|
A natural user hierarchy can skip attributes along the way of the relationship chain. |
|
More answers: Is there any more you can say here? This is the situation for me - after making the user hierarchy into a natural hierachy, drilling down sped up so dramatically it was amazing. But it all seems a bit like magic. If disk-based aggregations aren't to thank, then what? A hierarchy built on top of a natural hierarchy can materialize the children of each parent. So a request to the storage engine that asks (for example) for the children of a member can simply ask for an iterator over a pre-materialized sub-tree of the children. But a hierarchy that is built on an unnatural hierarchy has to send queries to the "raw" dimension data, do ordering on top of the result and materialize the sub-tree "on the fly". So unnatural hierarchies are much slower to iterate over for some common set operations. System > Process > Version > Activity Group > Activity This lets you, for example, examine all activities in a particular version of a process. However, for comparitive analysis, it's also useful to have the following hierarchy as well: System > Process > Activity Group > Activity > Activity Version The combination above looks like this in terms of attribute relationships: System (1,2) | Process (1,2) | Version (1) | Activity Group (1,2) | Activity (1,2) | Activity Version (2) A natural user hierarchy can skip attributes along the way of the relationship chain. So hierarchy (1) and (2) can have levels as shown above and still be natural. HTH, Akshai -- Try out the MSDN Forums for Analysis Services at: http://forums.microsoft.com/MSDN/Sho...ID=83&SiteID=1 This posting is provided "AS IS" with no warranties, and confers no rights Please do not send email directly to this alias. This alias is for newsgroup purposes only. "Kevin Dente" <kevind (AT) no (DOT) spam.denteworld.com> wrote in message news:evcPvPAwGHA.4756 (AT) TK2MSFTNGP02 (DOT) phx.gbl... Thanks Akshai! Great info. So a query that happened to fetch data at the hours level could make a later query at the days level really fast because the cache for the hour-level data can be used to aggregate into the days-level data. In my case, I was navigating down the hierarchy from year to minute (high-to-low-level), so I'm assuming your example doesn't apply here. Additionally, access to the user hierarchy becomes much faster if it is natural -- and depending on the queries this can sometimes lead to huge improvements in performance. Is there any more you can say here? This is the situation for me - after making the user hierarchy into a natural hierachy, drilling down sped up so dramatically it was amazing. But it all seems a bit like magic. If disk-based aggregations aren't to thank, then what? 2. You can have another attribute (Month of Year) which is related to Month (or to Day) and query by that attribute hierarchy instead. Looking around the AdvWorks cube, I noticed that pattern. I'll keep it in mind. I just wanted to make sure I was understanding things correctly. It would help to see an example of this problem. OK, in this case we have a set of business processes, each of which are made up activities. Processes can be versioned, so each activity has a version associated with it. One user hierarchy looks something this: System > Process > Version > Activity Group > Activity This lets you, for example, examine all activities in a particular version of a process. However, for comparitive analysis, it's also useful to have the following hierarchy as well: System > Process > Activity Group > Activity > Activity Version This lets you easily compare at two different versions of the same activity. So you see this is actually the same set of attributes, stacked in different ways. I'm not sure how I should design the natural hierarchies here (if at all). "Akshai Mirchandani [MS]" <akshaim (AT) online (DOT) microsoft.com> wrote in message news:%23kF2q$9vGHA.3372 (AT) TK2MSFTNGP02 (DOT) phx.gbl... Some answers: * The performance seems to be about the same (that is, pretty fast) whether the aggregations are set to 0% or 70%. That confuses me - I thought that setting up the relationships is what allows the aggregations to be computed. But why would no aggregations perform so well? The benefits of good attribute relationships are in no way limited purely to disk-based aggregations. Once you have the attribute relationships, in-memory caches of data can also be used for aggregating higher levels of data. So a query that happened to fetch data at the hours level could make a later query at the days level really fast because the cache for the hour-level data can be used to aggregate into the days-level data. Additionally, access to the user hierarchy becomes much faster if it is natural -- and depending on the queries this can sometimes lead to huge improvements in performance. * I'm assuming that by setting up these relationships, I'm giving up some flexibility in my queries. Is that true? For example, what if I want to compare how January compares to February, across all years. Are those aggregations not longer being performed, because of the attribute relationships? If so, what's the best way to deal with that? Or am I misunderstanding something? In a sense, yes you are losing some flexibility -- now February isn't an entity by itself, but rather each instance of February is strongly associated with a year. However, you can still obtain the results you need with different solutions: 1. You can quite easily write a calculation to aggregate over the 1st child of each year and the 2nd child of each year. 2. You can have another attribute (Month of Year) which is related to Month (or to Day) and query by that attribute hierarchy instead. The second solution is the simplest but you risk making your model somewhat more complex by adding more attributes. * I'm looking at making similar changes to a different dimension, but that dimension has two different user hierarchies that combine the same attributes in different ways. Does defining a natural hierarchy through attribute relationships limit your ability to combine those attributes in different navigational hierarchies? It would help to see an example of this problem. Generally its okay if the attributes are combined together in different ways -- but you should make sure that they follow the same rules and semantics when creating the relationships. Make sure that the meaning of each attribute is the same in each hierarchy and the functional dependencies hold true and then it should turn out right. In some cases, you may decide to actually split up attributes (like Month and Month Of Year) because they really mean different things in the two hierarchies. But that is a choice and design decision that you need to make based on what kind of behavior you want to obtain. HTH, Akshai -- -- Try out the MSDN Forums for Analysis Services at: http://forums.microsoft.com/MSDN/Sho...ID=83&SiteID=1 This posting is provided "AS IS" with no warranties, and confers no rights Please do not send email directly to this alias. This alias is for newsgroup purposes only. "Kevin Dente" <kevind (AT) no (DOT) spam.denteworld.com> wrote in message news:uB62pAWvGHA.4444 (AT) TK2MSFTNGP05 (DOT) phx.gbl... (Cross posted from MSDN forums, no response there yet) I'm working on fixing some major query performance problems in an AS2005 cube. My suspicion was that they were caused by the original cube design not specifying attribute relationships in the time dimension. I've read the Project REAL stuff and the P&P BI Guidelines in an effort to fully grok the implications of attribute relationships, natural hierarchies, and aggregations, but there's still some stuff I'm unclear on. Our calendar hierarchy is moderately deep - Year > Month > Day > Hour Minute (where minute is actually a 15 minute time window). The dimension table columns are the key, minute (0,15,30,45), hour of day (0-23), day of month (0-days in month), month (1-12), year. Originally there were attributes defined for each of these columns, with each attribute using its corresponding column as its KeyColumn, and with all attributes related directly to the key. A user hierarchy provided navigation from the year down to the minute. Performance, in a word, sucked (drilling down in the cube browser to a low level in the hierarchy could take as long as 45 minutes!). I changed the KeyColumn for each attribute below year to be a composite of each higher level attribute (so the KeyColumns in Month were Year and Month, for Day they were Year/Month/Day, etc). Then I set the attribute relationships accordingly. I'm not sure if this is the most efficient way to define the attributes, but it seemed to be the only way to do it with our current dimension table schema. Once I made this change and re-processed, drilling through the hierarchy went from 45 minutes to less than a second. Not a bad improvement. However, there's still some things that I'm unclear on.* The performance seems to be about the same (that is, pretty fast) whether the aggregations are set to 0% or 70%. That confuses me - I thought that setting up the relationships is what allows the aggregations to be computed. But why would no aggregations perform so well? * I'm assuming that by setting up these relationships, I'm giving up some flexibility in my queries. Is that true? For example, what if I want to compare how January compares to February, across all years. Are those aggregations not longer being performed, because of the attribute relationships? If so, what's the best way to deal with that? Or am I misunderstanding something? * I'm looking at making similar changes to a different dimension, but that dimension has two different user hierarchies that combine the same attributes in different ways. Does defining a natural hierarchy through attribute relationships limit your ability to combine those attributes in different navigational hierarchies? Thanks, Kevin |
#6
| |||
| |||
|
|
figure out which to include, and then do the calculations. I'm sure I'm not getting that exactly right, but is that sort of the general idea? |
|
In other words, the exact same Process Version attribute is used, but in two different places in the hiearchies. We're not "skipping" levels here. |
|
A hierarchy built on top of a natural hierarchy can materialize the children of each parent.... Um..ok, I think I maybe kinda sorta get what you're saying. As an example, when attribute relationships are in place and I'm drilling down into the hour level of July 15, 2006, SSAS knows the 24 specific hours that belong to that day, and only needs to compute the cells for that subset. But when there are no relationships, it needs to scan through all hours of all days, figure out which to include, and then do the calculations. I'm sure I'm not getting that exactly right, but is that sort of the general idea? A natural user hierarchy can skip attributes along the way of the relationship chain. Apologies, Akshai, I was both imprecise and inaccurate in my original description of our hiearchies. Let me try this one more time. They are actually: System > Process > Process Version > Activity Group > Activity and System > Process > Activity Group > Activity > Process Version In other words, the exact same Process Version attribute is used, but in two different places in the hiearchies. We're not "skipping" levels here. "Akshai Mirchandani [MS]" <akshaim (AT) online (DOT) microsoft.com> wrote in message news:OOEuN1AwGHA.3372 (AT) TK2MSFTNGP02 (DOT) phx.gbl... More answers: Is there any more you can say here? This is the situation for me - after making the user hierarchy into a natural hierachy, drilling down sped up so dramatically it was amazing. But it all seems a bit like magic. If disk-based aggregations aren't to thank, then what? A hierarchy built on top of a natural hierarchy can materialize the children of each parent. So a request to the storage engine that asks (for example) for the children of a member can simply ask for an iterator over a pre-materialized sub-tree of the children. But a hierarchy that is built on an unnatural hierarchy has to send queries to the "raw" dimension data, do ordering on top of the result and materialize the sub-tree "on the fly". So unnatural hierarchies are much slower to iterate over for some common set operations. System > Process > Version > Activity Group > Activity This lets you, for example, examine all activities in a particular version of a process. However, for comparitive analysis, it's also useful to have the following hierarchy as well: System > Process > Activity Group > Activity > Activity Version The combination above looks like this in terms of attribute relationships: System (1,2) | Process (1,2) | Version (1) | Activity Group (1,2) | Activity (1,2) | Activity Version (2) A natural user hierarchy can skip attributes along the way of the relationship chain. So hierarchy (1) and (2) can have levels as shown above and still be natural. HTH, Akshai -- Try out the MSDN Forums for Analysis Services at: http://forums.microsoft.com/MSDN/Sho...ID=83&SiteID=1 This posting is provided "AS IS" with no warranties, and confers no rights Please do not send email directly to this alias. This alias is for newsgroup purposes only. "Kevin Dente" <kevind (AT) no (DOT) spam.denteworld.com> wrote in message news:evcPvPAwGHA.4756 (AT) TK2MSFTNGP02 (DOT) phx.gbl... Thanks Akshai! Great info. So a query that happened to fetch data at the hours level could make a later query at the days level really fast because the cache for the hour-level data can be used to aggregate into the days-level data. In my case, I was navigating down the hierarchy from year to minute (high-to-low-level), so I'm assuming your example doesn't apply here. Additionally, access to the user hierarchy becomes much faster if it is natural -- and depending on the queries this can sometimes lead to huge improvements in performance. Is there any more you can say here? This is the situation for me - after making the user hierarchy into a natural hierachy, drilling down sped up so dramatically it was amazing. But it all seems a bit like magic. If disk-based aggregations aren't to thank, then what? 2. You can have another attribute (Month of Year) which is related to Month (or to Day) and query by that attribute hierarchy instead. Looking around the AdvWorks cube, I noticed that pattern. I'll keep it in mind. I just wanted to make sure I was understanding things correctly. It would help to see an example of this problem. OK, in this case we have a set of business processes, each of which are made up activities. Processes can be versioned, so each activity has a version associated with it. One user hierarchy looks something this: System > Process > Version > Activity Group > Activity This lets you, for example, examine all activities in a particular version of a process. However, for comparitive analysis, it's also useful to have the following hierarchy as well: System > Process > Activity Group > Activity > Activity Version This lets you easily compare at two different versions of the same activity. So you see this is actually the same set of attributes, stacked in different ways. I'm not sure how I should design the natural hierarchies here (if at all). "Akshai Mirchandani [MS]" <akshaim (AT) online (DOT) microsoft.com> wrote in message news:%23kF2q$9vGHA.3372 (AT) TK2MSFTNGP02 (DOT) phx.gbl... Some answers: * The performance seems to be about the same (that is, pretty fast) whether the aggregations are set to 0% or 70%. That confuses me - I thought that setting up the relationships is what allows the aggregations to be computed. But why would no aggregations perform so well? The benefits of good attribute relationships are in no way limited purely to disk-based aggregations. Once you have the attribute relationships, in-memory caches of data can also be used for aggregating higher levels of data. So a query that happened to fetch data at the hours level could make a later query at the days level really fast because the cache for the hour-level data can be used to aggregate into the days-level data. Additionally, access to the user hierarchy becomes much faster if it is natural -- and depending on the queries this can sometimes lead to huge improvements in performance. * I'm assuming that by setting up these relationships, I'm giving up some flexibility in my queries. Is that true? For example, what if I want to compare how January compares to February, across all years. Are those aggregations not longer being performed, because of the attribute relationships? If so, what's the best way to deal with that? Or am I misunderstanding something? In a sense, yes you are losing some flexibility -- now February isn't an entity by itself, but rather each instance of February is strongly associated with a year. However, you can still obtain the results you need with different solutions: 1. You can quite easily write a calculation to aggregate over the 1st child of each year and the 2nd child of each year. 2. You can have another attribute (Month of Year) which is related to Month (or to Day) and query by that attribute hierarchy instead. The second solution is the simplest but you risk making your model somewhat more complex by adding more attributes. * I'm looking at making similar changes to a different dimension, but that dimension has two different user hierarchies that combine the same attributes in different ways. Does defining a natural hierarchy through attribute relationships limit your ability to combine those attributes in different navigational hierarchies? It would help to see an example of this problem. Generally its okay if the attributes are combined together in different ways -- but you should make sure that they follow the same rules and semantics when creating the relationships. Make sure that the meaning of each attribute is the same in each hierarchy and the functional dependencies hold true and then it should turn out right. In some cases, you may decide to actually split up attributes (like Month and Month Of Year) because they really mean different things in the two hierarchies. But that is a choice and design decision that you need to make based on what kind of behavior you want to obtain. HTH, Akshai -- -- Try out the MSDN Forums for Analysis Services at: http://forums.microsoft.com/MSDN/Sho...ID=83&SiteID=1 This posting is provided "AS IS" with no warranties, and confers no rights Please do not send email directly to this alias. This alias is for newsgroup purposes only. "Kevin Dente" <kevind (AT) no (DOT) spam.denteworld.com> wrote in message news:uB62pAWvGHA.4444 (AT) TK2MSFTNGP05 (DOT) phx.gbl... (Cross posted from MSDN forums, no response there yet) I'm working on fixing some major query performance problems in an AS2005 cube. My suspicion was that they were caused by the original cube design not specifying attribute relationships in the time dimension. I've read the Project REAL stuff and the P&P BI Guidelines in an effort to fully grok the implications of attribute relationships, natural hierarchies, and aggregations, but there's still some stuff I'm unclear on. Our calendar hierarchy is moderately deep - Year > Month > Day > Hour Minute (where minute is actually a 15 minute time window). The dimension table columns are the key, minute (0,15,30,45), hour of day (0-23), day of month (0-days in month), month (1-12), year. Originally there were attributes defined for each of these columns, with each attribute using its corresponding column as its KeyColumn, and with all attributes related directly to the key. A user hierarchy provided navigation from the year down to the minute. Performance, in a word, sucked (drilling down in the cube browser to a low level in the hierarchy could take as long as 45 minutes!). I changed the KeyColumn for each attribute below year to be a composite of each higher level attribute (so the KeyColumns in Month were Year and Month, for Day they were Year/Month/Day, etc). Then I set the attribute relationships accordingly. I'm not sure if this is the most efficient way to define the attributes, but it seemed to be the only way to do it with our current dimension table schema. Once I made this change and re-processed, drilling through the hierarchy went from 45 minutes to less than a second. Not a bad improvement. However, there's still some things that I'm unclearon. * The performance seems to be about the same (that is, pretty fast) whether the aggregations are set to 0% or 70%. That confuses me - I thought that setting up the relationships is what allows the aggregations to be computed. But why would no aggregations perform so well? * I'm assuming that by setting up these relationships, I'm giving up some flexibility in my queries. Is that true? For example, what if I want to compare how January compares to February, across all years. Are those aggregations not longer being performed, because of the attribute relationships? If so, what's the best way to deal with that? Or am I misunderstanding something? * I'm looking at making similar changes to a different dimension, but that dimension has two different user hierarchies that combine the same attributes in different ways. Does defining a natural hierarchy through attribute relationships limit your ability to combine those attributes in different navigational hierarchies? Thanks, Kevin |
#7
| |||
| |||
|
|
figure out which to include, and then do the calculations. I'm sure I'm not getting that exactly right, but is that sort of the general idea? Essentially yes. (Although using the word calculations is a little misleading -- its more like doing joins between attributes) In other words, the exact same Process Version attribute is used, but in two different places in the hiearchies. We're not "skipping" levels here. In that case you need to define the semantics of Process Version -- it doesn't usually make sense to have a loop in functional dependencies (one exception may be 1:1 relationships but attribute relationships don't deal with that). You can once again do the same thing as before -- create two attributes for Process Version, one which is a related attribute of Activity Group and the other which is related to Activity. Give the two attributes different semantics though, because their uniqueness and behavior with respect to the other attributes will be quite different. In one case Process Version appears to be the key of the dimension, and in the other it is somewhere in the middle of the hierarchy. Alternatively, you can go with one of the hierarchies being natural and the other being unnatural -- but it really depends on what you are trying to do here... Thanks, Akshai -- Try out the MSDN Forums for Analysis Services at: http://forums.microsoft.com/MSDN/Sho...ID=83&SiteID=1 This posting is provided "AS IS" with no warranties, and confers no rights Please do not send email directly to this alias. This alias is for newsgroup purposes only. "Kevin Dente" <kevind (AT) no (DOT) spam.denteworld.com> wrote in message news:uXBtNqIwGHA.2204 (AT) TK2MSFTNGP03 (DOT) phx.gbl... A hierarchy built on top of a natural hierarchy can materialize the children of each parent.... Um..ok, I think I maybe kinda sorta get what you're saying. As an example, when attribute relationships are in place and I'm drilling down into the hour level of July 15, 2006, SSAS knows the 24 specific hours that belong to that day, and only needs to compute the cells for that subset. But when there are no relationships, it needs to scan through all hours of all days, figure out which to include, and then do the calculations. I'm sure I'm not getting that exactly right, but is that sort of the general idea? A natural user hierarchy can skip attributes along the way of the relationship chain. Apologies, Akshai, I was both imprecise and inaccurate in my original description of our hiearchies. Let me try this one more time. They are actually: System > Process > Process Version > Activity Group > Activity and System > Process > Activity Group > Activity > Process Version In other words, the exact same Process Version attribute is used, but in two different places in the hiearchies. We're not "skipping" levels here. "Akshai Mirchandani [MS]" <akshaim (AT) online (DOT) microsoft.com> wrote in message news:OOEuN1AwGHA.3372 (AT) TK2MSFTNGP02 (DOT) phx.gbl... More answers: Is there any more you can say here? This is the situation for me - after making the user hierarchy into a natural hierachy, drilling down sped up so dramatically it was amazing. But it all seems a bit like magic. If disk-based aggregations aren't to thank, then what? A hierarchy built on top of a natural hierarchy can materialize the children of each parent. So a request to the storage engine that asks (for example) for the children of a member can simply ask for an iterator over a pre-materialized sub-tree of the children. But a hierarchy that is built on an unnatural hierarchy has to send queries to the "raw" dimension data, do ordering on top of the result and materialize the sub-tree "on the fly". So unnatural hierarchies are much slower to iterate over for some common set operations. System > Process > Version > Activity Group > Activity This lets you, for example, examine all activities in a particular version of a process. However, for comparitive analysis, it's also useful to have the following hierarchy as well: System > Process > Activity Group > Activity > Activity Version The combination above looks like this in terms of attribute relationships: System (1,2) | Process (1,2) | Version (1) | Activity Group (1,2) | Activity (1,2) | Activity Version (2) A natural user hierarchy can skip attributes along the way of the relationship chain. So hierarchy (1) and (2) can have levels as shown above and still be natural. HTH, Akshai -- Try out the MSDN Forums for Analysis Services at: http://forums.microsoft.com/MSDN/Sho...ID=83&SiteID=1 This posting is provided "AS IS" with no warranties, and confers no rights Please do not send email directly to this alias. This alias is for newsgroup purposes only. "Kevin Dente" <kevind (AT) no (DOT) spam.denteworld.com> wrote in message news:evcPvPAwGHA.4756 (AT) TK2MSFTNGP02 (DOT) phx.gbl... Thanks Akshai! Great info. So a query that happened to fetch data at the hours level could make a later query at the days level really fast because the cache for the hour-level data can be used to aggregate into the days-level data. In my case, I was navigating down the hierarchy from year to minute (high-to-low-level), so I'm assuming your example doesn't apply here. Additionally, access to the user hierarchy becomes much faster if it is natural -- and depending on the queries this can sometimes lead to huge improvements in performance. Is there any more you can say here? This is the situation for me - after making the user hierarchy into a natural hierachy, drilling down sped up so dramatically it was amazing. But it all seems a bit like magic. If disk-based aggregations aren't to thank, then what? 2. You can have another attribute (Month of Year) which is related to Month (or to Day) and query by that attribute hierarchy instead. Looking around the AdvWorks cube, I noticed that pattern. I'll keep it in mind. I just wanted to make sure I was understanding things correctly. It would help to see an example of this problem. OK, in this case we have a set of business processes, each of which are made up activities. Processes can be versioned, so each activity has a version associated with it. One user hierarchy looks something this: System > Process > Version > Activity Group > Activity This lets you, for example, examine all activities in a particular version of a process. However, for comparitive analysis, it's also useful to have the following hierarchy as well: System > Process > Activity Group > Activity > Activity Version This lets you easily compare at two different versions of the same activity. So you see this is actually the same set of attributes, stacked in different ways. I'm not sure how I should design the natural hierarchies here (if at all). "Akshai Mirchandani [MS]" <akshaim (AT) online (DOT) microsoft.com> wrote in message news:%23kF2q$9vGHA.3372 (AT) TK2MSFTNGP02 (DOT) phx.gbl... Some answers: * The performance seems to be about the same (that is, pretty fast) whether the aggregations are set to 0% or 70%. That confuses me - I thought that setting up the relationships is what allows the aggregations to be computed. But why would no aggregations perform so well? The benefits of good attribute relationships are in no way limited purely to disk-based aggregations. Once you have the attribute relationships, in-memory caches of data can also be used for aggregating higher levels of data. So a query that happened to fetch data at the hours level could make a later query at the days level really fast because the cache for the hour-level data can be used to aggregate into the days-level data. Additionally, access to the user hierarchy becomes much faster if it is natural -- and depending on the queries this can sometimes lead to huge improvements in performance. * I'm assuming that by setting up these relationships, I'm giving up some flexibility in my queries. Is that true? For example, what if I want to compare how January compares to February, across all years. Are those aggregations not longer being performed, because of the attribute relationships? If so, what's the best way to deal with that? Or am I misunderstanding something? In a sense, yes you are losing some flexibility -- now February isn't an entity by itself, but rather each instance of February is strongly associated with a year. However, you can still obtain the results you need with different solutions: 1. You can quite easily write a calculation to aggregate over the 1st child of each year and the 2nd child of each year. 2. You can have another attribute (Month of Year) which is related to Month (or to Day) and query by that attribute hierarchy instead. The second solution is the simplest but you risk making your model somewhat more complex by adding more attributes. * I'm looking at making similar changes to a different dimension, but that dimension has two different user hierarchies that combine the same attributes in different ways. Does defining a natural hierarchy through attribute relationships limit your ability to combine those attributes in different navigational hierarchies? It would help to see an example of this problem. Generally its okay if the attributes are combined together in different ways -- but you should make sure that they follow the same rules and semantics when creating the relationships. Make sure that the meaning of each attribute is the same in each hierarchy and the functional dependencies hold true and then it should turn out right. In some cases, you may decide to actually split up attributes (like Month and Month Of Year) because they really mean different things in the two hierarchies. But that is a choice and design decision that you need to make based on what kind of behavior you want to obtain. HTH, Akshai -- -- Try out the MSDN Forums for Analysis Services at: http://forums.microsoft.com/MSDN/Sho...ID=83&SiteID=1 This posting is provided "AS IS" with no warranties, and confers no rights Please do not send email directly to this alias. This alias is for newsgroup purposes only. "Kevin Dente" <kevind (AT) no (DOT) spam.denteworld.com> wrote in message news:uB62pAWvGHA.4444 (AT) TK2MSFTNGP05 (DOT) phx.gbl... (Cross posted from MSDN forums, no response there yet) I'm working on fixing some major query performance problems in an AS2005 cube. My suspicion was that they were caused by the original cube design not specifying attribute relationships in the time dimension. I've read the Project REAL stuff and the P&P BI Guidelines in an effort to fully grok the implications of attribute relationships, natural hierarchies, and aggregations, but there's still some stuff I'm unclear on. Our calendar hierarchy is moderately deep - Year > Month > Day > Hour Minute (where minute is actually a 15 minute time window). The dimension table columns are the key, minute (0,15,30,45), hour of day (0-23), day of month (0-days in month), month (1-12), year. Originally there were attributes defined for each of these columns, with each attribute using its corresponding column as its KeyColumn, and with all attributes related directly to the key. A user hierarchy provided navigation from the year down to the minute. Performance, in a word, sucked (drilling down in the cube browser to a low level in the hierarchy could take as long as 45 minutes!). I changed the KeyColumn for each attribute below year to be a composite of each higher level attribute (so the KeyColumns in Month were Year and Month, for Day they were Year/Month/Day, etc). Then I set the attribute relationships accordingly. I'm not sure if this is the most efficient way to define the attributes, but it seemed to be the only way to do it with our current dimension table schema. Once I made this change and re-processed, drilling through the hierarchy went from 45 minutes to less than a second. Not a bad improvement. However, there's still some things that I'm unclearon. * The performance seems to be about the same (that is, pretty fast) whether the aggregations are set to 0% or 70%. That confuses me - I thought that setting up the relationships is what allows the aggregations to be computed. But why would no aggregations perform so well? * I'm assuming that by setting up these relationships, I'm giving up some flexibility in my queries. Is that true? For example, what if I want to compare how January compares to February, across all years. Are those aggregations not longer being performed, because of the attribute relationships? If so, what's the best way to deal with that? Or am I misunderstanding something? * I'm looking at making similar changes to a different dimension, but that dimension has two different user hierarchies that combine the same attributes in different ways. Does defining a natural hierarchy through attribute relationships limit your ability to combine those attributes in different navigational hierarchies? Thanks, Kevin |
#8
| |||
| |||
|
|
I haven't found much documentation about how those member names are created, and I want to make sure I understand how we should be constructing them for our queries. Anything you can point me to here? |
|
justice. Fortunately the project REAL and P&P BI guide call it out as key. I hope the coverage of this area in the BOL is expanded in the future. Or perhaps a separate MSDN article or white paper - something that accentuates how important they are, and what the implications of using them (and not using them) are. |
|
OK, thanks for the info, Akshai. One follow up question on the topic. I only just realized something - changing the dimension definitions to include attribute relationships ended up changing the unique names of the dimension members. What was: [Calendar].[Calendar].[Year].&[2006].&[8].&[18] without relationships became [Calendar].[Calendar].[Day].&[2006]&[8]&[18] with them. We have some code that constructs dimension members for dynamic queries, and that broke (started returning null values). I haven't found much documentation about how those member names are created, and I want to make sure I understand how we should be constructing them for our queries. Anything you can point me to here? One last comment about attribute relationships - this seems like such an important area of dimension design, yet I feel like the BOL don't do it justice. Fortunately the project REAL and P&P BI guide call it out as key. I hope the coverage of this area in the BOL is expanded in the future. Or perhaps a separate MSDN article or white paper - something that accentuates how important they are, and what the implications of using them (and not using them) are. "Akshai Mirchandani [MS]" <akshaim (AT) online (DOT) microsoft.com> wrote in message news:%23SLe78JwGHA.4460 (AT) TK2MSFTNGP04 (DOT) phx.gbl... figure out which to include, and then do the calculations. I'm sure I'm not getting that exactly right, but is that sort of the general idea? Essentially yes. (Although using the word calculations is a little misleading -- its more like doing joins between attributes) In other words, the exact same Process Version attribute is used, but in two different places in the hiearchies. We're not "skipping" levels here. In that case you need to define the semantics of Process Version -- it doesn't usually make sense to have a loop in functional dependencies (one exception may be 1:1 relationships but attribute relationships don't deal with that). You can once again do the same thing as before -- create two attributes for Process Version, one which is a related attribute of Activity Group and the other which is related to Activity. Give the two attributes different semantics though, because their uniqueness and behavior with respect to the other attributes will be quite different. In one case Process Version appears to be the key of the dimension, and in the other it is somewhere in the middle of the hierarchy. Alternatively, you can go with one of the hierarchies being natural and the other being unnatural -- but it really depends on what you are trying to do here... Thanks, Akshai -- Try out the MSDN Forums for Analysis Services at: http://forums.microsoft.com/MSDN/Sho...ID=83&SiteID=1 This posting is provided "AS IS" with no warranties, and confers no rights Please do not send email directly to this alias. This alias is for newsgroup purposes only. "Kevin Dente" <kevind (AT) no (DOT) spam.denteworld.com> wrote in message news:uXBtNqIwGHA.2204 (AT) TK2MSFTNGP03 (DOT) phx.gbl... A hierarchy built on top of a natural hierarchy can materialize the children of each parent.... Um..ok, I think I maybe kinda sorta get what you're saying. As an example, when attribute relationships are in place and I'm drilling down into the hour level of July 15, 2006, SSAS knows the 24 specific hours that belong to that day, and only needs to compute the cells for that subset. But when there are no relationships, it needs to scan through all hours of all days, figure out which to include, and then do the calculations. I'm sure I'm not getting that exactly right, but is that sort of the general idea? A natural user hierarchy can skip attributes along the way of the relationship chain. Apologies, Akshai, I was both imprecise and inaccurate in my original description of our hiearchies. Let me try this one more time. They are actually: System > Process > Process Version > Activity Group > Activity and System > Process > Activity Group > Activity > Process Version In other words, the exact same Process Version attribute is used, but in two different places in the hiearchies. We're not "skipping" levels here. "Akshai Mirchandani [MS]" <akshaim (AT) online (DOT) microsoft.com> wrote in message news:OOEuN1AwGHA.3372 (AT) TK2MSFTNGP02 (DOT) phx.gbl... More answers: Is there any more you can say here? This is the situation for me - after making the user hierarchy into a natural hierachy, drilling down sped up so dramatically it was amazing. But it all seems a bit like magic. If disk-based aggregations aren't to thank, then what? A hierarchy built on top of a natural hierarchy can materialize the children of each parent. So a request to the storage engine that asks (for example) for the children of a member can simply ask for an iterator over a pre-materialized sub-tree of the children. But a hierarchy that is built on an unnatural hierarchy has to send queries to the "raw" dimension data, do ordering on top of the result and materialize the sub-tree "on the fly". So unnatural hierarchies are much slower to iterate over for some common set operations. System > Process > Version > Activity Group > Activity This lets you, for example, examine all activities in a particular version of a process. However, for comparitive analysis, it's also useful to have the following hierarchy as well: System > Process > Activity Group > Activity > Activity Version The combination above looks like this in terms of attribute relationships: System (1,2) | Process (1,2) | Version (1) | Activity Group (1,2) | Activity (1,2) | Activity Version (2) A natural user hierarchy can skip attributes along the way of the relationship chain. So hierarchy (1) and (2) can have levels as shown above and still be natural. HTH, Akshai -- Try out the MSDN Forums for Analysis Services at: http://forums.microsoft.com/MSDN/Sho...ID=83&SiteID=1 This posting is provided "AS IS" with no warranties, and confers no rights Please do not send email directly to this alias. This alias is for newsgroup purposes only. "Kevin Dente" <kevind (AT) no (DOT) spam.denteworld.com> wrote in message news:evcPvPAwGHA.4756 (AT) TK2MSFTNGP02 (DOT) phx.gbl... Thanks Akshai! Great info. So a query that happened to fetch data at the hours level could make a later query at the days level really fast because the cache for the hour-level data can be used to aggregate into the days-level data. In my case, I was navigating down the hierarchy from year to minute (high-to-low-level), so I'm assuming your example doesn't apply here. Additionally, access to the user hierarchy becomes much faster if it is natural -- and depending on the queries this can sometimes lead to huge improvements in performance. Is there any more you can say here? This is the situation for me - after making the user hierarchy into a natural hierachy, drilling down sped up so dramatically it was amazing. But it all seems a bit like magic. If disk-based aggregations aren't to thank, then what? 2. You can have another attribute (Month of Year) which is related to Month (or to Day) and query by that attribute hierarchy instead. Looking around the AdvWorks cube, I noticed that pattern. I'll keep it in mind. I just wanted to make sure I was understanding things correctly. It would help to see an example of this problem. OK, in this case we have a set of business processes, each of which are made up activities. Processes can be versioned, so each activity has a version associated with it. One user hierarchy looks something this: System > Process > Version > Activity Group > Activity This lets you, for example, examine all activities in a particular version of a process. However, for comparitive analysis, it's also useful to have the following hierarchy as well: System > Process > Activity Group > Activity > Activity Version This lets you easily compare at two different versions of the same activity. So you see this is actually the same set of attributes, stacked in different ways. I'm not sure how I should design the natural hierarchies here (if at all). "Akshai Mirchandani [MS]" <akshaim (AT) online (DOT) microsoft.com> wrote in message news:%23kF2q$9vGHA.3372 (AT) TK2MSFTNGP02 (DOT) phx.gbl... Some answers: * The performance seems to be about the same (that is, pretty fast) whether the aggregations are set to 0% or 70%. That confuses me - I thought that setting up the relationships is what allows the aggregations to be computed. But why would no aggregations perform so well? The benefits of good attribute relationships are in no way limited purely to disk-based aggregations. Once you have the attribute relationships, in-memory caches of data can also be used for aggregating higher levels of data. So a query that happened to fetch data at the hours level could make a later query at the days level really fast because the cache for the hour-level data can be used to aggregate into the days-level data. Additionally, access to the user hierarchy becomes much faster if it is natural -- and depending on the queries this can sometimes lead to huge improvements in performance. * I'm assuming that by setting up these relationships, I'm giving up some flexibility in my queries. Is that true? For example, what if I want to compare how January compares to February, across all years. Are those aggregations not longer being performed, because of the attribute relationships? If so, what's the best way to deal with that? Or am I misunderstanding something? In a sense, yes you are losing some flexibility -- now February isn't an entity by itself, but rather each instance of February is strongly associated with a year. However, you can still obtain the results you need with different solutions: 1. You can quite easily write a calculation to aggregate over the 1st child of each year and the 2nd child of each year. 2. You can have another attribute (Month of Year) which is related to Month (or to Day) and query by that attribute hierarchy instead. The second solution is the simplest but you risk making your model somewhat more complex by adding more attributes. * I'm looking at making similar changes to a different dimension, but that dimension has two different user hierarchies that combine the same attributes in different ways. Does defining a natural hierarchy through attribute relationships limit your ability to combine those attributes in different navigational hierarchies? It would help to see an example of this problem. Generally its okay if the attributes are combined together in different ways -- but you should make sure that they follow the same rules and semantics when creating the relationships. Make sure that the meaning of each attribute is the same in each hierarchy and the functional dependencies hold true and then it should turn out right. In some cases, you may decide to actually split up attributes (like Month and Month Of Year) because they really mean different things in the two hierarchies. But that is a choice and design decision that you need to make based on what kind of behavior you want to obtain. HTH, Akshai -- -- Try out the MSDN Forums for Analysis Services at: http://forums.microsoft.com/MSDN/Sho...ID=83&SiteID=1 This posting is provided "AS IS" with no warranties, and confers no rights Please do not send email directly to this alias. This alias is for newsgroup purposes only. "Kevin Dente" <kevind (AT) no (DOT) spam.denteworld.com> wrote in message news:uB62pAWvGHA.4444 (AT) TK2MSFTNGP05 (DOT) phx.gbl... (Cross posted from MSDN forums, no response there yet) I'm working on fixing some major query performance problems in an AS2005 cube. My suspicion was that they were caused by the original cube design not specifying attribute relationships in the time dimension. I've read the Project REAL stuff and the P&P BI Guidelines in an effort to fully grok the implications of attribute relationships, natural hierarchies, and aggregations, but there's still some stuff I'm unclear on. Our calendar hierarchy is moderately deep - Year > Month > Day Hour > Minute (where minute is actually a 15 minute time window). The dimension table columns are the key, minute (0,15,30,45), hour of day (0-23), day of month (0-days in month), month (1-12), year. Originally there were attributes defined for each of these columns, with each attribute using its corresponding column as its KeyColumn, and with all attributes related directly to the key. A user hierarchy provided navigation from the year down to the minute. Performance, in a word, sucked (drilling down in the cube browser to a low level in the hierarchy could take as long as 45 minutes!). I changed the KeyColumn for each attribute below year to be a composite of each higher level attribute (so the KeyColumns in Month were Year and Month, for Day they were Year/Month/Day, etc). Then I set the attribute relationships accordingly. I'm not sure if this is the most efficient way to define the attributes, but it seemed to be the only way to do it with our current dimension table schema. Once I made this change and re-processed, drilling through the hierarchy went from 45 minutes to less than a second. Not a bad improvement. However, there's still some things that I'm unclearon. * The performance seems to be about the same (that is, pretty fast) whether the aggregations are set to 0% or 70%. That confuses me - I thought that setting up the relationships is what allows the aggregations to be computed. But why would no aggregations perform so well? * I'm assuming that by setting up these relationships, I'm giving up some flexibility in my queries. Is that true? For example, what if I want to compare how January compares to February, across all years. Are those aggregations not longer being performed, because of the attribute relationships? If so, what's the best way to deal with that? Or am I misunderstanding something? * I'm looking at making similar changes to a different dimension, but that dimension has two different user hierarchies that combine the same attributes in different ways. Does defining a natural hierarchy through attribute relationships limit your ability to combine those attributes in different navigational hierarchies? Thanks, Kevin |
#9
| ||||||
| ||||||
|
|
It is strongly recommended to avoid constructing unique names yourself. |
|
like using the MDSCHEMA_MEMBERS schema rowset to discover the uniquenames by restricting on their caption/name instead. |
|
MemberUniqueNameStyle |
|
mainly as part of a whitepaper corresponding to the AS 2000 Performance Guide. |
|
The second discusses how attribute relationships affects calculations |
|
I haven't found much documentation about how those member names are created, and I want to make sure I understand how we should be constructing them for our queries. Anything you can point me to here? Yes, the unique names do depend on the uniqueness of attributes in the hierarchy -- so defining attribute relationships will change this. It is strongly recommended to avoid constructing unique names yourself. There are certain applications that do so but they have to be very careful about doing this and I would definitely suggest that you look at other solutions -- like using the MDSCHEMA_MEMBERS schema rowset to discover the uniquenames by restricting on their caption/name instead. There is a very little documentation on this in Books Online here on the main way you can tune this: http://msdn2.microsoft.com/en-us/library/ms174900.aspx MemberUniqueNameStyle Determines how unique names are generated for members of hierarchies contained within the cube dimension. This property can have the following values: Value Description Native Analysis Services automatically determines the unique names of members. This is the default value. NamePath Analysis Services generates a compound name consisting of the name of each level and the caption of the member. Basically, you can switch between letting the server decide versus forcing the server to generate a name-per-level unique name. But once again, I highly recommend not trying to do this yourself. justice. Fortunately the project REAL and P&P BI guide call it out as key. I hope the coverage of this area in the BOL is expanded in the future. Or perhaps a separate MSDN article or white paper - something that accentuates how important they are, and what the implications of using them (and not using them) are. A common complaint and one that is being addressed. It will take a little time for the documentation to become more easily available but it is absolutely in the works -- mainly as part of a whitepaper corresponding to the AS 2000 Performance Guide. In the meantime, here are a couple of other resources: http://msdn2.microsoft.com/en-us/library/ms174557.aspx http://www.sqlserveranalysisservices.com/default.htm The first is a general (simple) description of attribute relationships. The second discusses how attribute relationships affects calculations -- this is a very interesting and subtle area which you may not care about if you are in more of a data warehouse scenario than a calculation scenario. But it is a good area to understand. Additionally the Project REAL docs and various blogs and postings out there have discussions about this topic. Thanks, Akshai -- Try out the MSDN Forums for Analysis Services at: http://forums.microsoft.com/MSDN/Sho...ID=83&SiteID=1 This posting is provided "AS IS" with no warranties, and confers no rights Please do not send email directly to this alias. This alias is for newsgroup purposes only. "Kevin Dente" <kevind (AT) no (DOT) spam.denteworld.com> wrote in message news:eL$BbjywGHA.4872 (AT) TK2MSFTNGP02 (DOT) phx.gbl... OK, thanks for the info, Akshai. One follow up question on the topic. I only just realized something - changing the dimension definitions to include attribute relationships ended up changing the unique names of the dimension members. What was: [Calendar].[Calendar].[Year].&[2006].&[8].&[18] without relationships became [Calendar].[Calendar].[Day].&[2006]&[8]&[18] with them. We have some code that constructs dimension members for dynamic queries, and that broke (started returning null values). I haven't found much documentation about how those member names are created, and I want to make sure I understand how we should be constructing them for our queries. Anything you can point me to here? One last comment about attribute relationships - this seems like such an important area of dimension design, yet I feel like the BOL don't do it justice. Fortunately the project REAL and P&P BI guide call it out as key. I hope the coverage of this area in the BOL is expanded in the future. Or perhaps a separate MSDN article or white paper - something that accentuates how important they are, and what the implications of using them (and not using them) are. "Akshai Mirchandani [MS]" <akshaim (AT) online (DOT) microsoft.com> wrote in message news:%23SLe78JwGHA.4460 (AT) TK2MSFTNGP04 (DOT) phx.gbl... figure out which to include, and then do the calculations. I'm sure I'm not getting that exactly right, but is that sort of the general idea? Essentially yes. (Although using the word calculations is a little misleading -- its more like doing joins between attributes) In other words, the exact same Process Version attribute is used, but in two different places in the hiearchies. We're not "skipping" levels here. In that case you need to define the semantics of Process Version -- it doesn't usually make sense to have a loop in functional dependencies (one exception may be 1:1 relationships but attribute relationships don't deal with that). You can once again do the same thing as before -- create two attributes for Process Version, one which is a related attribute of Activity Group and the other which is related to Activity. Give the two attributes different semantics though, because their uniqueness and behavior with respect to the other attributes will be quite different. In one case Process Version appears to be the key of the dimension, and in the other it is somewhere in the middle of the hierarchy. Alternatively, you can go with one of the hierarchies being natural and the other being unnatural -- but it really depends on what you are trying to do here... Thanks, Akshai -- Try out the MSDN Forums for Analysis Services at: http://forums.microsoft.com/MSDN/Sho...ID=83&SiteID=1 This posting is provided "AS IS" with no warranties, and confers no rights Please do not send email directly to this alias. This alias is for newsgroup purposes only. "Kevin Dente" <kevind (AT) no (DOT) spam.denteworld.com> wrote in message news:uXBtNqIwGHA.2204 (AT) TK2MSFTNGP03 (DOT) phx.gbl... A hierarchy built on top of a natural hierarchy can materialize the children of each parent.... Um..ok, I think I maybe kinda sorta get what you're saying. As an example, when attribute relationships are in place and I'm drilling down into the hour level of July 15, 2006, SSAS knows the 24 specific hours that belong to that day, and only needs to compute the cells for that subset. But when there are no relationships, it needs to scan through all hours of all days, figure out which to include, and then do the calculations. I'm sure I'm not getting that exactly right, but is that sort of the general idea? A natural user hierarchy can skip attributes along the way of the relationship chain. Apologies, Akshai, I was both imprecise and inaccurate in my original description of our hiearchies. Let me try this one more time. They are actually: System > Process > Process Version > Activity Group > Activity and System > Process > Activity Group > Activity > Process Version In other words, the exact same Process Version attribute is used, but in two different places in the hiearchies. We're not "skipping" levels here. "Akshai Mirchandani [MS]" <akshaim (AT) online (DOT) microsoft.com> wrote in message news:OOEuN1AwGHA.3372 (AT) TK2MSFTNGP02 (DOT) phx.gbl... More answers: Is there any more you can say here? This is the situation for me - after making the user hierarchy into a natural hierachy, drilling down sped up so dramatically it was amazing. But it all seems a bit like magic. If disk-based aggregations aren't to thank, then what? A hierarchy built on top of a natural hierarchy can materialize the children of each parent. So a request to the storage engine that asks (for example) for the children of a member can simply ask for an iterator over a pre-materialized sub-tree of the children. But a hierarchy that is built on an unnatural hierarchy has to send queries to the "raw" dimension data, do ordering on top of the result and materialize the sub-tree "on the fly". So unnatural hierarchies are much slower to iterate over for some common set operations. System > Process > Version > Activity Group > Activity This lets you, for example, examine all activities in a particular version of a process. However, for comparitive analysis, it's also useful to have the following hierarchy as well: System > Process > Activity Group > Activity > Activity Version The combination above looks like this in terms of attribute relationships: System (1,2) | Process (1,2) | Version (1) | Activity Group (1,2) | Activity (1,2) | Activity Version (2) A natural user hierarchy can skip attributes along the way of the relationship chain. So hierarchy (1) and (2) can have levels as shown above and still be natural. HTH, Akshai -- Try out the MSDN Forums for Analysis Services at: http://forums.microsoft.com/MSDN/Sho...ID=83&SiteID=1 This posting is provided "AS IS" with no warranties, and confers no rights Please do not send email directly to this alias. This alias is for newsgroup purposes only. "Kevin Dente" <kevind (AT) no (DOT) spam.denteworld.com> wrote in message news:evcPvPAwGHA.4756 (AT) TK2MSFTNGP02 (DOT) phx.gbl... Thanks Akshai! Great info. So a query that happened to fetch data at the hours level could make a later query at the days level really fast because the cache for the hour-level data can be used to aggregate into the days-level data. In my case, I was navigating down the hierarchy from year to minute (high-to-low-level), so I'm assuming your example doesn't apply here. Additionally, access to the user hierarchy becomes much faster if it is natural -- and depending on the queries this can sometimes lead to huge improvements in performance. Is there any more you can say here? This is the situation for me - after making the user hierarchy into a natural hierachy, drilling down sped up so dramatically it was amazing. But it all seems a bit like magic. If disk-based aggregations aren't to thank, then what? 2. You can have another attribute (Month of Year) which is related to Month (or to Day) and query by that attribute hierarchy instead. Looking around the AdvWorks cube, I noticed that pattern. I'll keep it in mind. I just wanted to make sure I was understanding things correctly. It would help to see an example of this problem. OK, in this case we have a set of business processes, each of which are made up activities. Processes can be versioned, so each activity has a version associated with it. One user hierarchy looks something this: System > Process > Version > Activity Group > Activity This lets you, for example, examine all activities in a particular version of a process. However, for comparitive analysis, it's also useful to have the following hierarchy as well: System > Process > Activity Group > Activity > Activity Version This lets you easily compare at two different versions of the same activity. So you see this is actually the same set of attributes, stacked in different ways. I'm not sure how I should design the natural hierarchies here (if at all). "Akshai Mirchandani [MS]" <akshaim (AT) online (DOT) microsoft.com> wrote in message news:%23kF2q$9vGHA.3372 (AT) TK2MSFTNGP02 (DOT) phx.gbl... Some answers: * The performance seems to be about the same (that is, pretty fast) whether the aggregations are set to 0% or 70%. That confuses me - I thought that setting up the relationships is what allows the aggregations to be computed. But why would no aggregations perform so well? The benefits of good attribute relationships are in no way limited purely to disk-based aggregations. Once you have the attribute relationships, in-memory caches of data can also be used for aggregating higher levels of data. So a query that happened to fetch data at the hours level could make a later query at the days level really fast because the cache for the hour-level data can be used to aggregate into the days-level data. Additionally, access to the user hierarchy becomes much faster if it is natural -- and depending on the queries this can sometimes lead to huge improvements in performance. * I'm assuming that by setting up these relationships, I'm giving up some flexibility in my queries. Is that true? For example, what if I want to compare how January compares to February, across all years. Are those aggregations not longer being performed, because of the attribute relationships? If so, what's the best way to deal with that? Or am I misunderstanding something? In a sense, yes you are losing some flexibility -- now February isn't an entity by itself, but rather each instance of February is strongly associated with a year. However, you can still obtain the results you need with different solutions: 1. You can quite easily write a calculation to aggregate over the 1st child of each year and the 2nd child of each year. 2. You can have another attribute (Month of Year) which is related to Month (or to Day) and query by that attribute hierarchy instead. The second solution is the simplest but you risk making your model somewhat more complex by adding more attributes. * I'm looking at making similar changes to a different dimension, but that dimension has two different user hierarchies that combine the same attributes in different ways. Does defining a natural hierarchy through attribute relationships limit your ability to combine those attributes in different navigational hierarchies? It would help to see an example of this problem. Generally its okay if the attributes are combined together in different ways -- but you should make sure that they follow the same rules and semantics when creating the relationships. Make sure that the meaning of each attribute is the same in each hierarchy and the functional dependencies hold true and then it should turn out right. In some cases, you may decide to actually split up attributes (like Month and Month Of Year) because they really mean different things in the two hierarchies. But that is a choice and design decision that you need to make based on what kind of behavior you want to obtain. HTH, Akshai -- -- Try out the MSDN Forums for Analysis Services at: http://forums.microsoft.com/MSDN/Sho...ID=83&SiteID=1 This posting is provided "AS IS" with no warranties, and confers no rights Please do not send email directly to this alias. This alias is for newsgroup purposes only. "Kevin Dente" <kevind (AT) no (DOT) spam.denteworld.com> wrote in message news:uB62pAWvGHA.4444 (AT) TK2MSFTNGP05 (DOT) phx.gbl... (Cross posted from MSDN forums, no response there yet) I'm working on fixing some major query performance problems in an AS2005 cube. My suspicion was that they were caused by the original cube design not specifying attribute relationships in the time dimension. I've read the Project REAL stuff and the P&P BI Guidelines in an effort to fully grok the implications of attribute relationships, natural hierarchies, and aggregations, but there's still some stuff I'm unclear on. Our calendar hierarchy is moderately deep - Year > Month > Day Hour > Minute (where minute is actually a 15 minute time window). The dimension table columns are the key, minute (0,15,30,45), hour of day (0-23), day of month (0-days in month), month (1-12), year. Originally there were attributes defined for each of these columns, with each attribute using its corresponding column as its KeyColumn, and with all attributes related directly to the key. A user hierarchy provided navigation from the year down to the minute. Performance, in a word, sucked (drilling down in the cube browser to a low level in the hierarchy could take as long as 45 minutes!). I changed the KeyColumn for each attribute below year to be a composite of each higher level attribute (so the KeyColumns in Month were Year and Month, for Day they were Year/Month/Day, etc). Then I set the attribute relationships accordingly. I'm not sure if this is the most efficient way to define the attributes, but it seemed to be the only way to do it with our current dimension table schema. Once I made this change and re-processed, drilling through the hierarchy went from 45 minutes to less than a second. Not a bad improvement. However, there's still some things that I'm unclearon. * The performance seems to be about the same (that is, pretty fast) whether the aggregations are set to 0% or 70%. That confuses me - I thought that setting up the relationships is what allows the aggregations to be computed. But why would no aggregations perform so well? * I'm assuming that by setting up these relationships, I'm giving up some flexibility in my queries. Is that true? For example, what if I want to compare how January compares to February, across all years. Are those aggregations not longer being performed, because of the attribute relationships? If so, what's the best way to deal with that? Or am I misunderstanding something? * I'm looking at making similar changes to a different dimension, but that dimension has two different user hierarchies that combine the same attributes in different ways. Does defining a natural hierarchy through attribute relationships limit your ability to combine those attributes in different navigational hierarchies? Thanks, Kevin |
#10
| |||
| |||
|
|
Hmm, I'm not familiar with that technique. If I want to find a unique name for a specific member deep in the hierarchy (at the hour level, say - a specific hour of a specific level of a specific year), do I need to do an MDSCHEMA_MEMBERS query for each level down, or is there a way to do it all in one shot? |
|
I'll check it out. One comment - it seems like almost all the discussions of attribute relationships "out there" use the same, relatively simple example - geography. It would be nice to see some discussion of other types of dimensions where relationships are important, and what the ramifications of them are (like with my calendar questions earlier). |
|
It is strongly recommended to avoid constructing unique names yourself. I didn't realize it was a no-no. like using the MDSCHEMA_MEMBERS schema rowset to discover the uniquenames by restricting on their caption/name instead. Hmm, I'm not familiar with that technique. If I want to find a unique name for a specific member deep in the hierarchy (at the hour level, say - a specific hour of a specific level of a specific year), do I need to do an MDSCHEMA_MEMBERS query for each level down, or is there a way to do it all in one shot? MemberUniqueNameStyle I played around with this option, but it doesn't really help in our case. After setting it to NamePath, the resulting member unique name is [Calendar].[Calendar].[All].[2005].[July].[4].[0].[15] which is still different than that our old format, so either way we have to change this. Is there any performance implication to which name style we choose? Or any other reason to choose one or the other? mainly as part of a whitepaper corresponding to the AS 2000 Performance Guide. I look forward to it with great anticipation. The second discusses how attribute relationships affects calculations I'll check it out. One comment - it seems like almost all the discussions of attribute relationships "out there" use the same, relatively simple example - geography. It would be nice to see some discussion of other types of dimensions where relationships are important, and what the ramifications of them are (like with my calendar questions earlier). "Akshai Mirchandani [MS]" <akshaim (AT) online (DOT) microsoft.com> wrote in message news:O%2349HQVxGHA.4960 (AT) TK2MSFTNGP04 (DOT) phx.gbl... I haven't found much documentation about how those member names are created, and I want to make sure I understand how we should be constructing them for our queries. Anything you can point me to here? Yes, the unique names do depend on the uniqueness of attributes in the hierarchy -- so defining attribute relationships will change this. It is strongly recommended to avoid constructing unique names yourself. There are certain applications that do so but they have to be very careful about doing this and I would definitely suggest that you look at other solutions -- like using the MDSCHEMA_MEMBERS schema rowset to discover the uniquenames by restricting on their caption/name instead. There is a very little documentation on this in Books Online here on the main way you can tune this: http://msdn2.microsoft.com/en-us/library/ms174900.aspx MemberUniqueNameStyle Determines how unique names are generated for members of hierarchies contained within the cube dimension. This property can have the following values: Value Description Native Analysis Services automatically determines the unique names of members. This is the default value. NamePath Analysis Services generates a compound name consisting of the name of each level and the caption of the member. Basically, you can switch between letting the server decide versus forcing the server to generate a name-per-level unique name. But once again, I highly recommend not trying to do this yourself. justice. Fortunately the project REAL and P&P BI guide call it out as key. I hope the coverage of this area in the BOL is expanded in the future. Or perhaps a separate MSDN article or white paper - something that accentuates how important they are, and what the implications of using them (and not using them) are. A common complaint and one that is being addressed. It will take a little time for the documentation to become more easily available but it is absolutely in the works -- mainly as part of a whitepaper corresponding to the AS 2000 Performance Guide. In the meantime, here are a couple of other resources: http://msdn2.microsoft.com/en-us/library/ms174557.aspx http://www.sqlserveranalysisservices.com/default.htm The first is a general (simple) description of attribute relationships. The second discusses how attribute relationships affects calculations -- this is a very interesting and subtle area which you may not care about if you are in more of a data warehouse scenario than a calculation scenario. But it is a good area to understand. Additionally the Project REAL docs and various blogs and postings out there have discussions about this topic. Thanks, Akshai -- Try out the MSDN Forums for Analysis Services at: http://forums.microsoft.com/MSDN/Sho...ID=83&SiteID=1 This posting is provided "AS IS" with no warranties, and confers no rights Please do not send email directly to this alias. This alias is for newsgroup purposes only. "Kevin Dente" <kevind (AT) no (DOT) spam.denteworld.com> wrote in message news:eL$BbjywGHA.4872 (AT) TK2MSFTNGP02 (DOT) phx.gbl... OK, thanks for the info, Akshai. One follow up question on the topic. I only just realized something - changing the dimension definitions to include attribute relationships ended up changing the unique names of the dimension members. What was: [Calendar].[Calendar].[Year].&[2006].&[8].&[18] without relationships became [Calendar].[Calendar].[Day].&[2006]&[8]&[18] with them. We have some code that constructs dimension members for dynamic queries, and that broke (started returning null values). I haven't found much documentation about how those member names are created, and I want to make sure I understand how we should be constructing them for our queries. Anything you can point me to here? One last comment about attribute relationships - this seems like such an important area of dimension design, yet I feel like the BOL don't do it justice. Fortunately the project REAL and P&P BI guide call it out as key. I hope the coverage of this area in the BOL is expanded in the future. Or perhaps a separate MSDN article or white paper - something that accentuates how important they are, and what the implications of using them (and not using them) are. "Akshai Mirchandani [MS]" <akshaim (AT) online (DOT) microsoft.com> wrote in message news:%23SLe78JwGHA.4460 (AT) TK2MSFTNGP04 (DOT) phx.gbl... figure out which to include, and then do the calculations. I'm sure I'm not getting that exactly right, but is that sort of the general idea? Essentially yes. (Although using the word calculations is a little misleading -- its more like doing joins between attributes) In other words, the exact same Process Version attribute is used, but in two different places in the hiearchies. We're not "skipping" levels here. In that case you need to define the semantics of Process Version -- it doesn't usually make sense to have a loop in functional dependencies (one exception may be 1:1 relationships but attribute relationships don't deal with that). You can once again do the same thing as before -- create two attributes for Process Version, one which is a related attribute of Activity Group and the other which is related to Activity. Give the two attributes different semantics though, because their uniqueness and behavior with respect to the other attributes will be quite different. In one case Process Version appears to be the key of the dimension, and in the other it is somewhere in the middle of the hierarchy. Alternatively, you can go with one of the hierarchies being natural and the other being unnatural -- but it really depends on what you are trying to do here... Thanks, Akshai -- Try out the MSDN Forums for Analysis Services at: http://forums.microsoft.com/MSDN/Sho...ID=83&SiteID=1 This posting is provided "AS IS" with no warranties, and confers no rights Please do not send email directly to this alias. This alias is for newsgroup purposes only. "Kevin Dente" <kevind (AT) no (DOT) spam.denteworld.com> wrote in message news:uXBtNqIwGHA.2204 (AT) TK2MSFTNGP03 (DOT) phx.gbl... A hierarchy built on top of a natural hierarchy can materialize the children of each parent.... Um..ok, I think I maybe kinda sorta get what you're saying. As an example, when attribute relationships are in place and I'm drilling down into the hour level of July 15, 2006, SSAS knows the 24 specific hours that belong to that day, and only needs to compute the cells for that subset. But when there are no relationships, it needs to scan through all hours of all days, figure out which to include, and then do the calculations. I'm sure I'm not getting that exactly right, but is that sort of the general idea? A natural user hierarchy can skip attributes along the way of the relationship chain. Apologies, Akshai, I was both imprecise and inaccurate in my original description of our hiearchies. Let me try this one more time. They are actually: System > Process > Process Version > Activity Group > Activity and System > Process > Activity Group > Activity > Process Version In other words, the exact same Process Version attribute is used, but in two different places in the hiearchies. We're not "skipping" levels here. "Akshai Mirchandani [MS]" <akshaim (AT) online (DOT) microsoft.com> wrote in message news:OOEuN1AwGHA.3372 (AT) TK2MSFTNGP02 (DOT) phx.gbl... More answers: Is there any more you can say here? This is the situation for me - after making the user hierarchy into a natural hierachy, drilling down sped up so dramatically it was amazing. But it all seems a bit like magic. If disk-based aggregations aren't to thank, then what? A hierarchy built on top of a natural hierarchy can materialize the children of each parent. So a request to the storage engine that asks (for example) for the children of a member can simply ask for an iterator over a pre-materialized sub-tree of the children. But a hierarchy that is built on an unnatural hierarchy has to send queries to the "raw" dimension data, do ordering on top of the result and materialize the sub-tree "on the fly". So unnatural hierarchies are much slower to iterate over for some common set operations. System > Process > Version > Activity Group > Activity This lets you, for example, examine all activities in a particular version of a process. However, for comparitive analysis, it's also useful to have the following hierarchy as well: System > Process > Activity Group > Activity > Activity Version The combination above looks like this in terms of attribute relationships: System (1,2) | Process (1,2) | Version (1) | Activity Group (1,2) | Activity (1,2) | Activity Version (2) A natural user hierarchy can skip attributes along the way of the relationship chain. So hierarchy (1) and (2) can have levels as shown above and still be natural. HTH, Akshai -- Try out the MSDN Forums for Analysis Services at: http://forums.microsoft.com/MSDN/Sho...ID=83&SiteID=1 This posting is provided "AS IS" with no warranties, and confers no rights Please do not send email directly to this alias. This alias is for newsgroup purposes only. "Kevin Dente" <kevind (AT) no (DOT) spam.denteworld.com> wrote in message news:evcPvPAwGHA.4756 (AT) TK2MSFTNGP02 (DOT) phx.gbl... Thanks Akshai! Great info. So a query that happened to fetch data at the hours level could make a later query at the days level really fast because the cache for the hour-level data can be used to aggregate into the days-level data. In my case, I was navigating down the hierarchy from year to minute (high-to-low-level), so I'm assuming your example doesn't apply here. Additionally, access to the user hierarchy becomes much faster if it is natural -- and depending on the queries this can sometimes lead to huge improvements in performance. Is there any more you can say here? This is the situation for me - after making the user hierarchy into a natural hierachy, drilling down sped up so dramatically it was amazing. But it all seems a bit like magic. If disk-based aggregations aren't to thank, then what? 2. You can have another attribute (Month of Year) which is related to Month (or to Day) and query by that attribute hierarchy instead. Looking around the AdvWorks cube, I noticed that pattern. I'll keep it in mind. I just wanted to make sure I was understanding things correctly. It would help to see an example of this problem. OK, in this case we have a set of business processes, each of which are made up activities. Processes can be versioned, so each activity has a version associated with it. One user hierarchy looks something this: System > Process > Version > Activity Group > Activity This lets you, for example, examine all activities in a particular version of a process. However, for comparitive analysis, it's also useful to have the following hierarchy as well: System > Process > Activity Group > Activity > Activity Version This lets you easily compare at two different versions of the same activity. So you see this is actually the same set of attributes, stacked in different ways. I'm not sure how I should design the natural hierarchies here (if at all). "Akshai Mirchandani [MS]" <akshaim (AT) online (DOT) microsoft.com> wrote in message news:%23kF2q$9vGHA.3372 (AT) TK2MSFTNGP02 (DOT) phx.gbl... Some answers: * The performance seems to be about the same (that is, pretty fast) whether the aggregations are set to 0% or 70%. That confuses me - I thought that setting up the relationships is what allows the aggregations to be computed. But why would no aggregations perform so well? The benefits of good attribute relationships are in no way limited purely to disk-based aggregations. Once you have the attribute relationships, in-memory caches of data can also be used for aggregating higher levels of data. So a query that happened to fetch data at the hours level could make a later query at the days level really fast because the cache for the hour-level data can be used to aggregate into the days-level data. Additionally, access to the user hierarchy becomes much faster if it is natural -- and depending on the queries this can sometimes lead to huge improvements in performance. * I'm assuming that by setting up these relationships, I'm giving up some flexibility in my queries. Is that true? For example, what if I want to compare how January compares to February, across all years. Are those aggregations not longer being performed, because of the attribute relationships? If so, what's the best way to deal with that? Or am I misunderstanding something? In a sense, yes you are losing some flexibility -- now February isn't an entity by itself, but rather each instance of February is strongly associated with a year. However, you can still obtain the results you need with different solutions: 1. You can quite easily write a calculation to aggregate over the 1st child of each year and the 2nd child of each year. 2. You can have another attribute (Month of Year) which is related to Month (or to Day) and query by that attribute hierarchy instead. The second solution is the simplest but you risk making your model somewhat more complex by adding more attributes. * I'm looking at making similar changes to a different dimension, but that dimension has two different user hierarchies that combine the same attributes in different ways. Does defining a natural hierarchy through attribute relationships limit your ability to combine those attributes in different navigational hierarchies? It would help to see an example of this problem. Generally its okay if the attributes are combined together in different ways -- but you should make sure that they follow the same rules and semantics when creating the relationships. Make sure that the meaning of each attribute is the same in each hierarchy and the functional dependencies hold true and then it should turn out right. In some cases, you may decide to actually split up attributes (like Month and Month Of Year) because they really mean different things in the two hierarchies. But that is a choice and design decision that you need to make based on what kind of behavior you want to obtain. HTH, Akshai -- -- Try out the MSDN Forums for Analysis Services at: http://forums.microsoft.com/MSDN/Sho...ID=83&SiteID=1 This posting is provided "AS IS" with no warranties, and confers no rights Please do not send email directly to this alias. This alias is for newsgroup purposes only. "Kevin Dente" <kevind (AT) no (DOT) spam.denteworld.com> wrote in message news:uB62pAWvGHA.4444 (AT) TK2MSFTNGP05 (DOT) phx.gbl... (Cross posted from MSDN forums, no response there yet) I'm working on fixing some major query performance problems in an AS2005 cube. My suspicion was that they were caused by the original cube design not specifying attribute relationships in the time dimension. I've read the Project REAL stuff and the P&P BI Guidelines in an effort to fully grok the implications of attribute relationships, natural hierarchies, and aggregations, but there's still some stuff I'm unclear on. Our calendar hierarchy is moderately deep - Year > Month > Day Hour > Minute (where minute is actually a 15 minute time window). The dimension table columns are the key, minute (0,15,30,45), hour of day (0-23), day of month (0-days in month), month (1-12), year. Originally there were attributes defined for each of these columns, with each attribute using its corresponding column as its KeyColumn, and with all attributes related directly to the key. A user hierarchy provided navigation from the year down to the minute. Performance, in a word, sucked (drilling down in the cube browser to a low level in the hierarchy could take as long as 45 minutes!). I changed the KeyColumn for each attribute below year to be a composite of each higher level attribute (so the KeyColumns in Month were Year and Month, for Day they were Year/Month/Day, etc). Then I set the attribute relationships accordingly. I'm not sure if this is the most efficient way to define the attributes, but it seemed to be the only way to do it with our current dimension table schema. Once I made this change and re-processed, drilling through the hierarchy went from 45 minutes to less than a second. Not a bad improvement. However, there's still some things that I'munclear on. * The performance seems to be about the same (that is, pretty fast) whether the aggregations are set to 0% or 70%. That confuses me - I thought that setting up the relationships is what allows the aggregations to be computed. But why would no aggregations perform so well? * I'm assuming that by setting up these relationships, I'm giving up some flexibility in my queries. Is that true? For example, what if I want to compare how January compares to February, across all years. Are those aggregations not longer being performed, because of the attribute relationships? If so, what's the best way to deal with that? Or am I misunderstanding something? * I'm looking at making similar changes to a different dimension, but that dimension has two different user hierarchies that combine the same attributes in different ways. Does defining a natural hierarchy through attribute relationships limit your ability to combine those attributes in different navigational hierarchies? Thanks, Kevin |
![]() |
| Thread Tools | |
| Display Modes | |
| |