dbTalk Databases Forums  

AS2005: What does the ProcessingGroup property do ?

microsoft.public.sqlserver.olap microsoft.public.sqlserver.olap


Discuss AS2005: What does the ProcessingGroup property do ? in the microsoft.public.sqlserver.olap forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Steve G
 
Posts: n/a

Default AS2005: What does the ProcessingGroup property do ? - 06-15-2006 , 03:11 AM






In AS2005, when building dimensions, the dimension has a property
'ProcessingGroup' which can be set to 'ByTable' or 'ByAttribute'. It seems
that using the 'ByTable' setting causes only 1 SQL statment to be fired
against the data-source - from which all attributes are calculated, whilst
ByAttribute fires one SQL statement for each attribute in the dimension. Does
anyone know why it'd ever make sense to use the 'ByAttribute' setting here ?
For me both options seem to bring the same result but ByTable is faster
because of the lower load on the rdbms (Sybase).

Reply With Quote
  #2  
Old   
Akshai Mirchandani [MS]
 
Posts: n/a

Default Re: AS2005: What does the ProcessingGroup property do ? - 06-16-2006 , 08:52 PM






Performance can vary -- often it is faster for the relational database to
return a distinct set of values for each attributes instead of fetching all
the data...

For example, take the scenario of a Customers dimension with attributes
Customer and Marital Status. Marital Status has very few values. If you do
ByTable, then AS will have to process Marital Status from perhaps 1 million
customer rows. But if it does it by attribute, then a "SELECT DISTINCT
[Marital Status] FROM Customers" to the relational table would return just a
few rows. So Marital Status can be processed very quickly with minimum load
on the AS side.

Both options have advantages and disadvantages besides perf. ByAttribute is
perhaps more scalable -- ByTable has to cache the result of the SQL query on
local disk and do multiple passes over it. ByAttribute should also be able
to detect a common issue of defining incorrect attribute relationships if
you turn on Duplicate errors in the ErrorConfiguration -- you can't use that
technique with ByTable... But ByTable also lets you reduce the load on the
relational database.

Thanks,
Akshai
--
Try out the MSDN Forums for Analysis Services at:
http://forums.microsoft.com/MSDN/Sho...ID=83&SiteID=1

This posting is provided "AS IS" with no warranties, and confers no rights
Please do not send email directly to this alias. This alias is for newsgroup
purposes only.

"Steve G" <SteveG (AT) discussions (DOT) microsoft.com> wrote

Quote:
In AS2005, when building dimensions, the dimension has a property
'ProcessingGroup' which can be set to 'ByTable' or 'ByAttribute'. It seems
that using the 'ByTable' setting causes only 1 SQL statment to be fired
against the data-source - from which all attributes are calculated, whilst
ByAttribute fires one SQL statement for each attribute in the dimension.
Does
anyone know why it'd ever make sense to use the 'ByAttribute' setting here
?
For me both options seem to bring the same result but ByTable is faster
because of the lower load on the rdbms (Sybase).



Reply With Quote
  #3  
Old   
Steve G
 
Posts: n/a

Default Re: AS2005: What does the ProcessingGroup property do ? - 06-19-2006 , 09:35 AM



Thanks for the reply. Yes I can see that there are pros and cons for ByTable
and ByAttribute - for me its always better to do ByTable because my AS box is
much faster than my RDBMS box - but that won't be the case for everyone.

However, you mention ...

"ByAttribute should also be able
Quote:
to detect a common issue of defining incorrect attribute relationships if
you turn on Duplicate errors in the ErrorConfiguration -- you can't use that
technique with ByTable"
I've been running some tests with the 'KeyDuplicate' error setting on, using
ByAttribute and ByTable. I created a dimension with 3 attributes and a 3
level strong attribute hierarchy (level 3 being leaf level) such that 2 of
the members in the middle level have the same member key but roll-up into
different parents. This is detected both when using ByAttribute or ByTable.
Therefore I can't see your point. Can you give an example of where processing
with 'ByTable' setting would miss an error where using the ByAttribute
setting would not ? These attributes seem to be functionally equivalent to me
and both detect key duplicates. Am I missing something ?






"Akshai Mirchandani [MS]" wrote:

Quote:
Performance can vary -- often it is faster for the relational database to
return a distinct set of values for each attributes instead of fetching all
the data...

For example, take the scenario of a Customers dimension with attributes
Customer and Marital Status. Marital Status has very few values. If you do
ByTable, then AS will have to process Marital Status from perhaps 1 million
customer rows. But if it does it by attribute, then a "SELECT DISTINCT
[Marital Status] FROM Customers" to the relational table would return just a
few rows. So Marital Status can be processed very quickly with minimum load
on the AS side.

Both options have advantages and disadvantages besides perf. ByAttribute is
perhaps more scalable -- ByTable has to cache the result of the SQL query on
local disk and do multiple passes over it. ByAttribute should also be able
to detect a common issue of defining incorrect attribute relationships if
you turn on Duplicate errors in the ErrorConfiguration -- you can't use that
technique with ByTable... But ByTable also lets you reduce the load on the
relational database.

Thanks,
Akshai
--
Try out the MSDN Forums for Analysis Services at:
http://forums.microsoft.com/MSDN/Sho...ID=83&SiteID=1

This posting is provided "AS IS" with no warranties, and confers no rights
Please do not send email directly to this alias. This alias is for newsgroup
purposes only.

"Steve G" <SteveG (AT) discussions (DOT) microsoft.com> wrote in message
news:0C6766B6-C0E6-4EF8-9784-964629FA9BAC (AT) microsoft (DOT) com...
In AS2005, when building dimensions, the dimension has a property
'ProcessingGroup' which can be set to 'ByTable' or 'ByAttribute'. It seems
that using the 'ByTable' setting causes only 1 SQL statment to be fired
against the data-source - from which all attributes are calculated, whilst
ByAttribute fires one SQL statement for each attribute in the dimension.
Does
anyone know why it'd ever make sense to use the 'ByAttribute' setting here
?
For me both options seem to bring the same result but ByTable is faster
because of the lower load on the rdbms (Sybase).




Reply With Quote
  #4  
Old   
Akshai Mirchandani [MS]
 
Posts: n/a

Default Re: AS2005: What does the ProcessingGroup property do ? - 06-19-2006 , 08:52 PM



Let me try to describe this... Take the rows:

Cust1 Portland Oregon
Cust2 Portland Maine

Lets say you (incorrectly) set the key of City to be just the City column
and its related attribute is State. E.g.:

State (Key=State)
\
City (Key=City)
\
Customer (Key=Customer)

Now you process using ByAttribute. This is going to do:
SELECT DISTINCT City, State FROM Customers
It would result in two rows:
Portland Oregon
Portland Maine
And "raise errors on duplicates" would raise an error.

But if you used the ByTable configuration then there would be one query for
the entire table:
SELECT DISTINCT Customer, City, State FROM Customers

Now this is usually going to return Portland many times -- and turning on
the "raise errors on duplicates" would give you a whole lot of false
positives.

This isn't so easy to explain without better illustrations but hopefully the
above example shows how there is a difference in the way duplicates would be
encountered and errors raised...

Thanks,
Akshai
--
Try out the MSDN Forums for Analysis Services at:
http://forums.microsoft.com/MSDN/Sho...ID=83&SiteID=1

This posting is provided "AS IS" with no warranties, and confers no rights
Please do not send email directly to this alias. This alias is for newsgroup
purposes only.

"Steve G" <SteveG (AT) discussions (DOT) microsoft.com> wrote

Quote:
Thanks for the reply. Yes I can see that there are pros and cons for
ByTable
and ByAttribute - for me its always better to do ByTable because my AS box
is
much faster than my RDBMS box - but that won't be the case for everyone.

However, you mention ...

"ByAttribute should also be able
to detect a common issue of defining incorrect attribute relationships if
you turn on Duplicate errors in the ErrorConfiguration -- you can't use
that
technique with ByTable"

I've been running some tests with the 'KeyDuplicate' error setting on,
using
ByAttribute and ByTable. I created a dimension with 3 attributes and a 3
level strong attribute hierarchy (level 3 being leaf level) such that 2 of
the members in the middle level have the same member key but roll-up into
different parents. This is detected both when using ByAttribute or
ByTable.
Therefore I can't see your point. Can you give an example of where
processing
with 'ByTable' setting would miss an error where using the ByAttribute
setting would not ? These attributes seem to be functionally equivalent to
me
and both detect key duplicates. Am I missing something ?






"Akshai Mirchandani [MS]" wrote:

Performance can vary -- often it is faster for the relational database to
return a distinct set of values for each attributes instead of fetching
all
the data...

For example, take the scenario of a Customers dimension with attributes
Customer and Marital Status. Marital Status has very few values. If you
do
ByTable, then AS will have to process Marital Status from perhaps 1
million
customer rows. But if it does it by attribute, then a "SELECT DISTINCT
[Marital Status] FROM Customers" to the relational table would return
just a
few rows. So Marital Status can be processed very quickly with minimum
load
on the AS side.

Both options have advantages and disadvantages besides perf. ByAttribute
is
perhaps more scalable -- ByTable has to cache the result of the SQL query
on
local disk and do multiple passes over it. ByAttribute should also be
able
to detect a common issue of defining incorrect attribute relationships if
you turn on Duplicate errors in the ErrorConfiguration -- you can't use
that
technique with ByTable... But ByTable also lets you reduce the load on
the
relational database.

Thanks,
Akshai
--
Try out the MSDN Forums for Analysis Services at:
http://forums.microsoft.com/MSDN/Sho...ID=83&SiteID=1

This posting is provided "AS IS" with no warranties, and confers no
rights
Please do not send email directly to this alias. This alias is for
newsgroup
purposes only.

"Steve G" <SteveG (AT) discussions (DOT) microsoft.com> wrote in message
news:0C6766B6-C0E6-4EF8-9784-964629FA9BAC (AT) microsoft (DOT) com...
In AS2005, when building dimensions, the dimension has a property
'ProcessingGroup' which can be set to 'ByTable' or 'ByAttribute'. It
seems
that using the 'ByTable' setting causes only 1 SQL statment to be fired
against the data-source - from which all attributes are calculated,
whilst
ByAttribute fires one SQL statement for each attribute in the
dimension.
Does
anyone know why it'd ever make sense to use the 'ByAttribute' setting
here
?
For me both options seem to bring the same result but ByTable is faster
because of the lower load on the rdbms (Sybase).






Reply With Quote
  #5  
Old   
Steve G
 
Posts: n/a

Default Re: AS2005: What does the ProcessingGroup property do ? - 06-20-2006 , 08:45 AM



Akshai, I created your example in AS2005. Actually it fails with exactly the
same error message whether using ByTable or ByAttribute. I have seen
situations where a different error message is returned when using ByTable
than when using ByAttribute but the main point is that neither succeed when
there is a duplicate key. Therefore there doesn't seem to be any functional
difference at all. Would you agree ?

Steve



"Akshai Mirchandani [MS]" wrote:

Quote:
Let me try to describe this... Take the rows:

Cust1 Portland Oregon
Cust2 Portland Maine

Lets say you (incorrectly) set the key of City to be just the City column
and its related attribute is State. E.g.:

State (Key=State)
\
City (Key=City)
\
Customer (Key=Customer)

Now you process using ByAttribute. This is going to do:
SELECT DISTINCT City, State FROM Customers
It would result in two rows:
Portland Oregon
Portland Maine
And "raise errors on duplicates" would raise an error.

But if you used the ByTable configuration then there would be one query for
the entire table:
SELECT DISTINCT Customer, City, State FROM Customers

Now this is usually going to return Portland many times -- and turning on
the "raise errors on duplicates" would give you a whole lot of false
positives.

This isn't so easy to explain without better illustrations but hopefully the
above example shows how there is a difference in the way duplicates would be
encountered and errors raised...

Thanks,
Akshai
--
Try out the MSDN Forums for Analysis Services at:
http://forums.microsoft.com/MSDN/Sho...ID=83&SiteID=1

This posting is provided "AS IS" with no warranties, and confers no rights
Please do not send email directly to this alias. This alias is for newsgroup
purposes only.

"Steve G" <SteveG (AT) discussions (DOT) microsoft.com> wrote in message
news:CFD59BC7-9940-4388-88CC-20A2CFEE8621 (AT) microsoft (DOT) com...
Thanks for the reply. Yes I can see that there are pros and cons for
ByTable
and ByAttribute - for me its always better to do ByTable because my AS box
is
much faster than my RDBMS box - but that won't be the case for everyone.

However, you mention ...

"ByAttribute should also be able
to detect a common issue of defining incorrect attribute relationships if
you turn on Duplicate errors in the ErrorConfiguration -- you can't use
that
technique with ByTable"

I've been running some tests with the 'KeyDuplicate' error setting on,
using
ByAttribute and ByTable. I created a dimension with 3 attributes and a 3
level strong attribute hierarchy (level 3 being leaf level) such that 2 of
the members in the middle level have the same member key but roll-up into
different parents. This is detected both when using ByAttribute or
ByTable.
Therefore I can't see your point. Can you give an example of where
processing
with 'ByTable' setting would miss an error where using the ByAttribute
setting would not ? These attributes seem to be functionally equivalent to
me
and both detect key duplicates. Am I missing something ?






"Akshai Mirchandani [MS]" wrote:

Performance can vary -- often it is faster for the relational database to
return a distinct set of values for each attributes instead of fetching
all
the data...

For example, take the scenario of a Customers dimension with attributes
Customer and Marital Status. Marital Status has very few values. If you
do
ByTable, then AS will have to process Marital Status from perhaps 1
million
customer rows. But if it does it by attribute, then a "SELECT DISTINCT
[Marital Status] FROM Customers" to the relational table would return
just a
few rows. So Marital Status can be processed very quickly with minimum
load
on the AS side.

Both options have advantages and disadvantages besides perf. ByAttribute
is
perhaps more scalable -- ByTable has to cache the result of the SQL query
on
local disk and do multiple passes over it. ByAttribute should also be
able
to detect a common issue of defining incorrect attribute relationships if
you turn on Duplicate errors in the ErrorConfiguration -- you can't use
that
technique with ByTable... But ByTable also lets you reduce the load on
the
relational database.

Thanks,
Akshai
--
Try out the MSDN Forums for Analysis Services at:
http://forums.microsoft.com/MSDN/Sho...ID=83&SiteID=1

This posting is provided "AS IS" with no warranties, and confers no
rights
Please do not send email directly to this alias. This alias is for
newsgroup
purposes only.

"Steve G" <SteveG (AT) discussions (DOT) microsoft.com> wrote in message
news:0C6766B6-C0E6-4EF8-9784-964629FA9BAC (AT) microsoft (DOT) com...
In AS2005, when building dimensions, the dimension has a property
'ProcessingGroup' which can be set to 'ByTable' or 'ByAttribute'. It
seems
that using the 'ByTable' setting causes only 1 SQL statment to be fired
against the data-source - from which all attributes are calculated,
whilst
ByAttribute fires one SQL statement for each attribute in the
dimension.
Does
anyone know why it'd ever make sense to use the 'ByAttribute' setting
here
?
For me both options seem to bring the same result but ByTable is faster
because of the lower load on the rdbms (Sybase).







Reply With Quote
  #6  
Old   
Akshai Mirchandani [MS]
 
Posts: n/a

Default Re: AS2005: What does the ProcessingGroup property do ? - 06-21-2006 , 01:59 PM



If you have a duplicate key, then yes -- both will fail. But the problem is
that ByTable may fail with false positives even if you *don't* have a
duplicate key.

Take:

Cust1 Portland Oregon
Cust11 Portland Oregon
Cust2 Portland Maine

SELECT DISTINCT Customer, City, State FROM Customers

Now when processing City -- even if the key of City was (City, State),
ByTable is going to see the pair (Portland, Oregon) at least twice.

Thanks,
Akshai

--
Try out the MSDN Forums for Analysis Services at:
http://forums.microsoft.com/MSDN/Sho...ID=83&SiteID=1

This posting is provided "AS IS" with no warranties, and confers no rights
Please do not send email directly to this alias. This alias is for newsgroup
purposes only.

"Steve G" <SteveG (AT) discussions (DOT) microsoft.com> wrote

Quote:
Akshai, I created your example in AS2005. Actually it fails with exactly
the
same error message whether using ByTable or ByAttribute. I have seen
situations where a different error message is returned when using ByTable
than when using ByAttribute but the main point is that neither succeed
when
there is a duplicate key. Therefore there doesn't seem to be any
functional
difference at all. Would you agree ?

Steve



"Akshai Mirchandani [MS]" wrote:

Let me try to describe this... Take the rows:

Cust1 Portland Oregon
Cust2 Portland Maine

Lets say you (incorrectly) set the key of City to be just the City column
and its related attribute is State. E.g.:

State (Key=State)
\
City (Key=City)
\
Customer (Key=Customer)

Now you process using ByAttribute. This is going to do:
SELECT DISTINCT City, State FROM Customers
It would result in two rows:
Portland Oregon
Portland Maine
And "raise errors on duplicates" would raise an error.

But if you used the ByTable configuration then there would be one query
for
the entire table:
SELECT DISTINCT Customer, City, State FROM Customers

Now this is usually going to return Portland many times -- and turning on
the "raise errors on duplicates" would give you a whole lot of false
positives.

This isn't so easy to explain without better illustrations but hopefully
the
above example shows how there is a difference in the way duplicates would
be
encountered and errors raised...

Thanks,
Akshai
--
Try out the MSDN Forums for Analysis Services at:
http://forums.microsoft.com/MSDN/Sho...ID=83&SiteID=1

This posting is provided "AS IS" with no warranties, and confers no
rights
Please do not send email directly to this alias. This alias is for
newsgroup
purposes only.

"Steve G" <SteveG (AT) discussions (DOT) microsoft.com> wrote in message
news:CFD59BC7-9940-4388-88CC-20A2CFEE8621 (AT) microsoft (DOT) com...
Thanks for the reply. Yes I can see that there are pros and cons for
ByTable
and ByAttribute - for me its always better to do ByTable because my AS
box
is
much faster than my RDBMS box - but that won't be the case for
everyone.

However, you mention ...

"ByAttribute should also be able
to detect a common issue of defining incorrect attribute relationships
if
you turn on Duplicate errors in the ErrorConfiguration -- you can't
use
that
technique with ByTable"

I've been running some tests with the 'KeyDuplicate' error setting on,
using
ByAttribute and ByTable. I created a dimension with 3 attributes and a
3
level strong attribute hierarchy (level 3 being leaf level) such that 2
of
the members in the middle level have the same member key but roll-up
into
different parents. This is detected both when using ByAttribute or
ByTable.
Therefore I can't see your point. Can you give an example of where
processing
with 'ByTable' setting would miss an error where using the ByAttribute
setting would not ? These attributes seem to be functionally equivalent
to
me
and both detect key duplicates. Am I missing something ?






"Akshai Mirchandani [MS]" wrote:

Performance can vary -- often it is faster for the relational database
to
return a distinct set of values for each attributes instead of
fetching
all
the data...

For example, take the scenario of a Customers dimension with
attributes
Customer and Marital Status. Marital Status has very few values. If
you
do
ByTable, then AS will have to process Marital Status from perhaps 1
million
customer rows. But if it does it by attribute, then a "SELECT DISTINCT
[Marital Status] FROM Customers" to the relational table would return
just a
few rows. So Marital Status can be processed very quickly with minimum
load
on the AS side.

Both options have advantages and disadvantages besides perf.
ByAttribute
is
perhaps more scalable -- ByTable has to cache the result of the SQL
query
on
local disk and do multiple passes over it. ByAttribute should also be
able
to detect a common issue of defining incorrect attribute relationships
if
you turn on Duplicate errors in the ErrorConfiguration -- you can't
use
that
technique with ByTable... But ByTable also lets you reduce the load on
the
relational database.

Thanks,
Akshai
--
Try out the MSDN Forums for Analysis Services at:
http://forums.microsoft.com/MSDN/Sho...ID=83&SiteID=1

This posting is provided "AS IS" with no warranties, and confers no
rights
Please do not send email directly to this alias. This alias is for
newsgroup
purposes only.

"Steve G" <SteveG (AT) discussions (DOT) microsoft.com> wrote in message
news:0C6766B6-C0E6-4EF8-9784-964629FA9BAC (AT) microsoft (DOT) com...
In AS2005, when building dimensions, the dimension has a property
'ProcessingGroup' which can be set to 'ByTable' or 'ByAttribute'. It
seems
that using the 'ByTable' setting causes only 1 SQL statment to be
fired
against the data-source - from which all attributes are calculated,
whilst
ByAttribute fires one SQL statement for each attribute in the
dimension.
Does
anyone know why it'd ever make sense to use the 'ByAttribute'
setting
here
?
For me both options seem to bring the same result but ByTable is
faster
because of the lower load on the rdbms (Sybase).









Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.