![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
#2
| |||
| |||
|
|
In AS2005, when building dimensions, the dimension has a property 'ProcessingGroup' which can be set to 'ByTable' or 'ByAttribute'. It seems that using the 'ByTable' setting causes only 1 SQL statment to be fired against the data-source - from which all attributes are calculated, whilst ByAttribute fires one SQL statement for each attribute in the dimension. Does anyone know why it'd ever make sense to use the 'ByAttribute' setting here ? For me both options seem to bring the same result but ByTable is faster because of the lower load on the rdbms (Sybase). |
#3
| |||
| |||
|
|
to detect a common issue of defining incorrect attribute relationships if you turn on Duplicate errors in the ErrorConfiguration -- you can't use that technique with ByTable" |
|
Performance can vary -- often it is faster for the relational database to return a distinct set of values for each attributes instead of fetching all the data... For example, take the scenario of a Customers dimension with attributes Customer and Marital Status. Marital Status has very few values. If you do ByTable, then AS will have to process Marital Status from perhaps 1 million customer rows. But if it does it by attribute, then a "SELECT DISTINCT [Marital Status] FROM Customers" to the relational table would return just a few rows. So Marital Status can be processed very quickly with minimum load on the AS side. Both options have advantages and disadvantages besides perf. ByAttribute is perhaps more scalable -- ByTable has to cache the result of the SQL query on local disk and do multiple passes over it. ByAttribute should also be able to detect a common issue of defining incorrect attribute relationships if you turn on Duplicate errors in the ErrorConfiguration -- you can't use that technique with ByTable... But ByTable also lets you reduce the load on the relational database. Thanks, Akshai -- Try out the MSDN Forums for Analysis Services at: http://forums.microsoft.com/MSDN/Sho...ID=83&SiteID=1 This posting is provided "AS IS" with no warranties, and confers no rights Please do not send email directly to this alias. This alias is for newsgroup purposes only. "Steve G" <SteveG (AT) discussions (DOT) microsoft.com> wrote in message news:0C6766B6-C0E6-4EF8-9784-964629FA9BAC (AT) microsoft (DOT) com... In AS2005, when building dimensions, the dimension has a property 'ProcessingGroup' which can be set to 'ByTable' or 'ByAttribute'. It seems that using the 'ByTable' setting causes only 1 SQL statment to be fired against the data-source - from which all attributes are calculated, whilst ByAttribute fires one SQL statement for each attribute in the dimension. Does anyone know why it'd ever make sense to use the 'ByAttribute' setting here ? For me both options seem to bring the same result but ByTable is faster because of the lower load on the rdbms (Sybase). |
#4
| |||
| |||
|
|
Thanks for the reply. Yes I can see that there are pros and cons for ByTable and ByAttribute - for me its always better to do ByTable because my AS box is much faster than my RDBMS box - but that won't be the case for everyone. However, you mention ... "ByAttribute should also be able to detect a common issue of defining incorrect attribute relationships if you turn on Duplicate errors in the ErrorConfiguration -- you can't use that technique with ByTable" I've been running some tests with the 'KeyDuplicate' error setting on, using ByAttribute and ByTable. I created a dimension with 3 attributes and a 3 level strong attribute hierarchy (level 3 being leaf level) such that 2 of the members in the middle level have the same member key but roll-up into different parents. This is detected both when using ByAttribute or ByTable. Therefore I can't see your point. Can you give an example of where processing with 'ByTable' setting would miss an error where using the ByAttribute setting would not ? These attributes seem to be functionally equivalent to me and both detect key duplicates. Am I missing something ? "Akshai Mirchandani [MS]" wrote: Performance can vary -- often it is faster for the relational database to return a distinct set of values for each attributes instead of fetching all the data... For example, take the scenario of a Customers dimension with attributes Customer and Marital Status. Marital Status has very few values. If you do ByTable, then AS will have to process Marital Status from perhaps 1 million customer rows. But if it does it by attribute, then a "SELECT DISTINCT [Marital Status] FROM Customers" to the relational table would return just a few rows. So Marital Status can be processed very quickly with minimum load on the AS side. Both options have advantages and disadvantages besides perf. ByAttribute is perhaps more scalable -- ByTable has to cache the result of the SQL query on local disk and do multiple passes over it. ByAttribute should also be able to detect a common issue of defining incorrect attribute relationships if you turn on Duplicate errors in the ErrorConfiguration -- you can't use that technique with ByTable... But ByTable also lets you reduce the load on the relational database. Thanks, Akshai -- Try out the MSDN Forums for Analysis Services at: http://forums.microsoft.com/MSDN/Sho...ID=83&SiteID=1 This posting is provided "AS IS" with no warranties, and confers no rights Please do not send email directly to this alias. This alias is for newsgroup purposes only. "Steve G" <SteveG (AT) discussions (DOT) microsoft.com> wrote in message news:0C6766B6-C0E6-4EF8-9784-964629FA9BAC (AT) microsoft (DOT) com... In AS2005, when building dimensions, the dimension has a property 'ProcessingGroup' which can be set to 'ByTable' or 'ByAttribute'. It seems that using the 'ByTable' setting causes only 1 SQL statment to be fired against the data-source - from which all attributes are calculated, whilst ByAttribute fires one SQL statement for each attribute in the dimension. Does anyone know why it'd ever make sense to use the 'ByAttribute' setting here ? For me both options seem to bring the same result but ByTable is faster because of the lower load on the rdbms (Sybase). |
#5
| |||
| |||
|
|
Let me try to describe this... Take the rows: Cust1 Portland Oregon Cust2 Portland Maine Lets say you (incorrectly) set the key of City to be just the City column and its related attribute is State. E.g.: State (Key=State) \ City (Key=City) \ Customer (Key=Customer) Now you process using ByAttribute. This is going to do: SELECT DISTINCT City, State FROM Customers It would result in two rows: Portland Oregon Portland Maine And "raise errors on duplicates" would raise an error. But if you used the ByTable configuration then there would be one query for the entire table: SELECT DISTINCT Customer, City, State FROM Customers Now this is usually going to return Portland many times -- and turning on the "raise errors on duplicates" would give you a whole lot of false positives. This isn't so easy to explain without better illustrations but hopefully the above example shows how there is a difference in the way duplicates would be encountered and errors raised... Thanks, Akshai -- Try out the MSDN Forums for Analysis Services at: http://forums.microsoft.com/MSDN/Sho...ID=83&SiteID=1 This posting is provided "AS IS" with no warranties, and confers no rights Please do not send email directly to this alias. This alias is for newsgroup purposes only. "Steve G" <SteveG (AT) discussions (DOT) microsoft.com> wrote in message news:CFD59BC7-9940-4388-88CC-20A2CFEE8621 (AT) microsoft (DOT) com... Thanks for the reply. Yes I can see that there are pros and cons for ByTable and ByAttribute - for me its always better to do ByTable because my AS box is much faster than my RDBMS box - but that won't be the case for everyone. However, you mention ... "ByAttribute should also be able to detect a common issue of defining incorrect attribute relationships if you turn on Duplicate errors in the ErrorConfiguration -- you can't use that technique with ByTable" I've been running some tests with the 'KeyDuplicate' error setting on, using ByAttribute and ByTable. I created a dimension with 3 attributes and a 3 level strong attribute hierarchy (level 3 being leaf level) such that 2 of the members in the middle level have the same member key but roll-up into different parents. This is detected both when using ByAttribute or ByTable. Therefore I can't see your point. Can you give an example of where processing with 'ByTable' setting would miss an error where using the ByAttribute setting would not ? These attributes seem to be functionally equivalent to me and both detect key duplicates. Am I missing something ? "Akshai Mirchandani [MS]" wrote: Performance can vary -- often it is faster for the relational database to return a distinct set of values for each attributes instead of fetching all the data... For example, take the scenario of a Customers dimension with attributes Customer and Marital Status. Marital Status has very few values. If you do ByTable, then AS will have to process Marital Status from perhaps 1 million customer rows. But if it does it by attribute, then a "SELECT DISTINCT [Marital Status] FROM Customers" to the relational table would return just a few rows. So Marital Status can be processed very quickly with minimum load on the AS side. Both options have advantages and disadvantages besides perf. ByAttribute is perhaps more scalable -- ByTable has to cache the result of the SQL query on local disk and do multiple passes over it. ByAttribute should also be able to detect a common issue of defining incorrect attribute relationships if you turn on Duplicate errors in the ErrorConfiguration -- you can't use that technique with ByTable... But ByTable also lets you reduce the load on the relational database. Thanks, Akshai -- Try out the MSDN Forums for Analysis Services at: http://forums.microsoft.com/MSDN/Sho...ID=83&SiteID=1 This posting is provided "AS IS" with no warranties, and confers no rights Please do not send email directly to this alias. This alias is for newsgroup purposes only. "Steve G" <SteveG (AT) discussions (DOT) microsoft.com> wrote in message news:0C6766B6-C0E6-4EF8-9784-964629FA9BAC (AT) microsoft (DOT) com... In AS2005, when building dimensions, the dimension has a property 'ProcessingGroup' which can be set to 'ByTable' or 'ByAttribute'. It seems that using the 'ByTable' setting causes only 1 SQL statment to be fired against the data-source - from which all attributes are calculated, whilst ByAttribute fires one SQL statement for each attribute in the dimension. Does anyone know why it'd ever make sense to use the 'ByAttribute' setting here ? For me both options seem to bring the same result but ByTable is faster because of the lower load on the rdbms (Sybase). |
#6
| |||
| |||
|
|
Akshai, I created your example in AS2005. Actually it fails with exactly the same error message whether using ByTable or ByAttribute. I have seen situations where a different error message is returned when using ByTable than when using ByAttribute but the main point is that neither succeed when there is a duplicate key. Therefore there doesn't seem to be any functional difference at all. Would you agree ? Steve "Akshai Mirchandani [MS]" wrote: Let me try to describe this... Take the rows: Cust1 Portland Oregon Cust2 Portland Maine Lets say you (incorrectly) set the key of City to be just the City column and its related attribute is State. E.g.: State (Key=State) \ City (Key=City) \ Customer (Key=Customer) Now you process using ByAttribute. This is going to do: SELECT DISTINCT City, State FROM Customers It would result in two rows: Portland Oregon Portland Maine And "raise errors on duplicates" would raise an error. But if you used the ByTable configuration then there would be one query for the entire table: SELECT DISTINCT Customer, City, State FROM Customers Now this is usually going to return Portland many times -- and turning on the "raise errors on duplicates" would give you a whole lot of false positives. This isn't so easy to explain without better illustrations but hopefully the above example shows how there is a difference in the way duplicates would be encountered and errors raised... Thanks, Akshai -- Try out the MSDN Forums for Analysis Services at: http://forums.microsoft.com/MSDN/Sho...ID=83&SiteID=1 This posting is provided "AS IS" with no warranties, and confers no rights Please do not send email directly to this alias. This alias is for newsgroup purposes only. "Steve G" <SteveG (AT) discussions (DOT) microsoft.com> wrote in message news:CFD59BC7-9940-4388-88CC-20A2CFEE8621 (AT) microsoft (DOT) com... Thanks for the reply. Yes I can see that there are pros and cons for ByTable and ByAttribute - for me its always better to do ByTable because my AS box is much faster than my RDBMS box - but that won't be the case for everyone. However, you mention ... "ByAttribute should also be able to detect a common issue of defining incorrect attribute relationships if you turn on Duplicate errors in the ErrorConfiguration -- you can't use that technique with ByTable" I've been running some tests with the 'KeyDuplicate' error setting on, using ByAttribute and ByTable. I created a dimension with 3 attributes and a 3 level strong attribute hierarchy (level 3 being leaf level) such that 2 of the members in the middle level have the same member key but roll-up into different parents. This is detected both when using ByAttribute or ByTable. Therefore I can't see your point. Can you give an example of where processing with 'ByTable' setting would miss an error where using the ByAttribute setting would not ? These attributes seem to be functionally equivalent to me and both detect key duplicates. Am I missing something ? "Akshai Mirchandani [MS]" wrote: Performance can vary -- often it is faster for the relational database to return a distinct set of values for each attributes instead of fetching all the data... For example, take the scenario of a Customers dimension with attributes Customer and Marital Status. Marital Status has very few values. If you do ByTable, then AS will have to process Marital Status from perhaps 1 million customer rows. But if it does it by attribute, then a "SELECT DISTINCT [Marital Status] FROM Customers" to the relational table would return just a few rows. So Marital Status can be processed very quickly with minimum load on the AS side. Both options have advantages and disadvantages besides perf. ByAttribute is perhaps more scalable -- ByTable has to cache the result of the SQL query on local disk and do multiple passes over it. ByAttribute should also be able to detect a common issue of defining incorrect attribute relationships if you turn on Duplicate errors in the ErrorConfiguration -- you can't use that technique with ByTable... But ByTable also lets you reduce the load on the relational database. Thanks, Akshai -- Try out the MSDN Forums for Analysis Services at: http://forums.microsoft.com/MSDN/Sho...ID=83&SiteID=1 This posting is provided "AS IS" with no warranties, and confers no rights Please do not send email directly to this alias. This alias is for newsgroup purposes only. "Steve G" <SteveG (AT) discussions (DOT) microsoft.com> wrote in message news:0C6766B6-C0E6-4EF8-9784-964629FA9BAC (AT) microsoft (DOT) com... In AS2005, when building dimensions, the dimension has a property 'ProcessingGroup' which can be set to 'ByTable' or 'ByAttribute'. It seems that using the 'ByTable' setting causes only 1 SQL statment to be fired against the data-source - from which all attributes are calculated, whilst ByAttribute fires one SQL statement for each attribute in the dimension. Does anyone know why it'd ever make sense to use the 'ByAttribute' setting here ? For me both options seem to bring the same result but ByTable is faster because of the lower load on the rdbms (Sybase). |
![]() |
| Thread Tools | |
| Display Modes | |
| |