dbTalk Databases Forums  

Settle a partition argument

microsoft.public.sqlserver.olap microsoft.public.sqlserver.olap


Discuss Settle a partition argument in the microsoft.public.sqlserver.olap forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Richard R
 
Posts: n/a

Default Settle a partition argument - 09-27-2004 , 10:00 AM






Can anyone settle an argument about partitions & changing dimensions?

Consider a typical sales cube, with customers, products & time.
The Customers dimension includes the salesman to whom the customer is
assigned. (eg hierarchy is Country/Salesman/Customer) This changes over time
as salesmen arrive, quit, die, etc. We want to be able to view all the data
by the salesman currently responsible for the customer.

The first and most obvious option is to re-process the customer dimension
each evening, and fully process the cube. This works.
However fully processing the whole 5 years' worth of data will take longer
than is available.

If the cube is partitioned by year, and we just re-process the current year,
then the time would come down.
However, will the data in the non-processed partitions now be incorrect? ie
will data from prior years still show under the most recent salesman? I
think it has to be corrupted, but there is no documentation I can find to
say either way.

Alternatively, if we make the Customer dimension a "changing dimension" in
analysis services, then BOL says we able to re-process (refresh?) this
dimension without having to re-process the cube. I have never used this
feature; does it work, and is it still true if we partition the cube?

We're trying to create a test data set to see exactly what happens under
each situation above, but help or advice from anyone who has been here
before would be appreciated.

Regards,


Richard R





Reply With Quote
  #2  
Old   
Dave Wickert [MSFT]
 
Posts: n/a

Default Re: Settle a partition argument - 09-29-2004 , 10:17 PM






First, a simple question: When you say "quit, die, etc.", what do you want
to do with their data? I assume you want the data to be there, but the
customer just goes to "unassigned" or "unknown" node?? Or what should
happen?

Second, when you say "arrive", what kind of special processing needs to
happen? Ordinarily, you would just incrementally process the dimension, the
new sales person appears in the dimension and then you can start to having
fact table records start to appear. Adding new records to a dimension table
is no problem; just add them. In this case, since you've described a regular
hierarchy, what should the hierarchy look like if there are no customers
assigned to a sales person?

As you can see, this isn't a straightforward question, although on the
surface it does sound simple :-)

What you are describing "kind-of-sounds" like a type-2 slowly changing
dimension, but you didn't phrase it that way, so before we answer, I'd like
to understand what you would LIKE to have happen.

BTW: a changing dimension is NOT the same as a slowly-changing dimension, so
please be careful with how you phrase your description of what you would
like to see.
--
Dave Wickert [MSFT]
dwickert (AT) online (DOT) microsoft.com
Program Manager
BI SystemsTeam
SQL BI Product Unit (Analysis Services)
--
This posting is provided "AS IS" with no warranties, and confers no rights.

"Richard R" <rrose (AT) informsnospam (DOT) co.uk> wrote

Quote:
Can anyone settle an argument about partitions & changing dimensions?

Consider a typical sales cube, with customers, products & time.
The Customers dimension includes the salesman to whom the customer is
assigned. (eg hierarchy is Country/Salesman/Customer) This changes over
time
as salesmen arrive, quit, die, etc. We want to be able to view all the
data
by the salesman currently responsible for the customer.

The first and most obvious option is to re-process the customer dimension
each evening, and fully process the cube. This works.
However fully processing the whole 5 years' worth of data will take longer
than is available.

If the cube is partitioned by year, and we just re-process the current
year,
then the time would come down.
However, will the data in the non-processed partitions now be incorrect?
ie
will data from prior years still show under the most recent salesman? I
think it has to be corrupted, but there is no documentation I can find to
say either way.

Alternatively, if we make the Customer dimension a "changing dimension" in
analysis services, then BOL says we able to re-process (refresh?) this
dimension without having to re-process the cube. I have never used this
feature; does it work, and is it still true if we partition the cube?

We're trying to create a test data set to see exactly what happens under
each situation above, but help or advice from anyone who has been here
before would be appreciated.

Regards,


Richard R







Reply With Quote
  #3  
Old   
Richard R
 
Posts: n/a

Default Re: Settle a partition argument - 10-15-2004 , 08:59 AM



Hi Dave,

I'm sorry for not getting back earllier, I've been stuck at a client sites,
and from there I am unable to access newsgroups. Appologies also for the
slightly loose language in the first post.



What we have on this SQL tables is a type 1 slowly-changing dimension. To
give an explicit example, an entry on the customer_Dim table migght look
like this on the first of January:

CustomerID ERPCustomerID CustomerName SalesMan Territory
DateLoadedtoWarehouse
100 W12345-abc Bloggs Inc John Smith
UK 1-Jan-2004

and like this on the 10th October:

CustomerID ERPCustomerID CustomerName SalesMan Territory
DateLoadedtoWarehouse
100 W12345-abc Bloggs Inc Eric Patti
UK 10-oct-2004

John Smith died and was replaced by Eric Patti.

In this case I have not added new rows or keys to the dimension table, but
have overwritten the data with the new values. What I want to see when I
view the sales figures by dimensions based on this table is the most up to
date data, ie Eric Patti's customers include Bloggs Inc., and John Smith
does not exist.

To re-phase my quesion more explicitly therefore:

If my cube is partitioned, say by quarter, what processing do I have to do
to the dimensions and partitions to ensure that all the data in the cube
reflects the current state of the dimension tables in the warehouse?
I want to avoid complete re-processing of the cubes, as this takes time. But
if I reprocess the dimension to correctly bring in the changed rows, then I
have to re-process the cubes.
Can I avoid this by making the dimension a changing dimension in AS, and if
so what are the other impications.

(For the record we also have a classic Type-2 slowly-changing customers
dimension which writes a new record with a new ID for each changed or new
record, and the fact table has the keys for both tables on. This allows us
to see who actually sold what, whereas the type 1 lets us see who is
currently responsible for what.
I'm not worried about the type 2 dimension, because I can just incrementally
process that to add any new rows, and historical data in a partition will
not need processing because none of that data will have the new keys on. )

Thanks,

Richard




"Dave Wickert [MSFT]" <dwickert (AT) online (DOT) microsoft.com> wrote

Quote:
First, a simple question: When you say "quit, die, etc.", what do you want
to do with their data? I assume you want the data to be there, but the
customer just goes to "unassigned" or "unknown" node?? Or what should
happen?

Second, when you say "arrive", what kind of special processing needs to
happen? Ordinarily, you would just incrementally process the dimension,
the
new sales person appears in the dimension and then you can start to having
fact table records start to appear. Adding new records to a dimension
table
is no problem; just add them. In this case, since you've described a
regular
hierarchy, what should the hierarchy look like if there are no customers
assigned to a sales person?

As you can see, this isn't a straightforward question, although on the
surface it does sound simple :-)

What you are describing "kind-of-sounds" like a type-2 slowly changing
dimension, but you didn't phrase it that way, so before we answer, I'd
like
to understand what you would LIKE to have happen.

BTW: a changing dimension is NOT the same as a slowly-changing dimension,
so
please be careful with how you phrase your description of what you would
like to see.
--
Dave Wickert [MSFT]
dwickert (AT) online (DOT) microsoft.com
Program Manager
BI SystemsTeam
SQL BI Product Unit (Analysis Services)
--
This posting is provided "AS IS" with no warranties, and confers no
rights.

"Richard R" <rrose (AT) informsnospam (DOT) co.uk> wrote in message
news:OlnNzLKpEHA.3552 (AT) TK2MSFTNGP15 (DOT) phx.gbl...
Can anyone settle an argument about partitions & changing dimensions?

Consider a typical sales cube, with customers, products & time.
The Customers dimension includes the salesman to whom the customer is
assigned. (eg hierarchy is Country/Salesman/Customer) This changes over
time
as salesmen arrive, quit, die, etc. We want to be able to view all the
data
by the salesman currently responsible for the customer.

The first and most obvious option is to re-process the customer
dimension
each evening, and fully process the cube. This works.
However fully processing the whole 5 years' worth of data will take
longer
than is available.

If the cube is partitioned by year, and we just re-process the current
year,
then the time would come down.
However, will the data in the non-processed partitions now be incorrect?
ie
will data from prior years still show under the most recent salesman? I
think it has to be corrupted, but there is no documentation I can find
to
say either way.

Alternatively, if we make the Customer dimension a "changing dimension"
in
analysis services, then BOL says we able to re-process (refresh?) this
dimension without having to re-process the cube. I have never used this
feature; does it work, and is it still true if we partition the cube?

We're trying to create a test data set to see exactly what happens
under
each situation above, but help or advice from anyone who has been here
before would be appreciated.

Regards,


Richard R









Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.