dbTalk Databases Forums  

CUBE RETURNS DUPLICATE RECORDS

microsoft.public.sqlserver.olap microsoft.public.sqlserver.olap


Discuss CUBE RETURNS DUPLICATE RECORDS in the microsoft.public.sqlserver.olap forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
fsanchez
 
Posts: n/a

Default CUBE RETURNS DUPLICATE RECORDS - 04-13-2006 , 04:05 PM






Hello everyone,
I have a question concerning a data modeling issue or linking the dimensions
within the Analysis Services. I have created a Shared dimension called Event
Type. When defining the dimension, I link it with the bridge table. The
bridge table has the relationship or the events that the customer attended.
The Event Type dimension looks something like this:

CUSTOMER --< COMMUNICATION EVENT >-- EVENT TYPE

When I add the dimension to the OLAP Cube, it incorporates the structures
automatically, which then I a get duplicate records.

Basically, I am interested in knowing how the customer charges and/or fees
(commission) increased or descreased since a customer attended the company's
conference.

Here are the following dimensions in the cube:
CUSTOMER
PERIOD
CHARGE_TYPE
DEAL_DETAIL
SALES_PERSON
SALES_TRADER
INDUSTRY_SEGMENT
COUNTRY

The EVENT_TYPE consist of various events held through out the year. It also
has a dummy record with a description "No Event
Attendence"

EVENT_TYPE
--EVENT_TYPE_KEY
--EVENT_DESC
- -EVENT_TYPE
--FROM_DATE
--TO_DATE

I have created a bridge table to track the various events that customer
attended:

COMMUNICATION_EVENT
--CUSTOMER_KEY
--EVENT_TYPE_KEY

Here are the following facts
CHARGE_ALLOCATION_FACT
--CUSTOMER_KEY
--END_MONTH_PERIOD_KEY
--CHARGE_TYPE_KEY
--DEAL_DETAIL_KEY
--SALES_PERSON_KEY
--SALES_TRADER_KEY
--INDUSTRY_S EGMENT_KEY
--COUNTRY_KEY
--CHARGE_AMT
--VOLUME_AMT

Although the CHARGE_ALLOCAITON_FACT has a record for every type of charge,
period key is associated with the end of month key or the last business day
in the month. However, I can still extract or identify the actual trade date
to show the daily charges. This is need because the business user wants to
select a range date to analyze the generate revenue from charges/fees daily.

Initially, my thoughts on the design,I will need to create another fact
table that will incorporate the EVENT_TYPE dimension, which would look like
this:

CHARGE_EVENT_FACT
--CUSTOMER_KEY
--FROM_DATE (Based on the trade date)
--TO_DATE (Base on the trade date)
--CHARGE_TYPE_KEY
--DEAL_DETAIL_KEY
--SALES_PERSON_KEY< br>--SALES_TRADER_KEY
--INDUSTRY_SEGMENT_KEY
--COUNTRY_KEY
-- EVENT_TYPE_KEY
--CHARGE_AMT
--VOLUME_AMT

However, I will probably run into problems when a customer attends more than
one event during the year or does not have any charges during that event but
afterward. I initially thought of just assigning the event key to the fact
records that fall under the From Date and To Date.

I realize that MS SQL Server does not like snowflak schemas.

Your feedback and insight is greatly appreciated.

Regards,
Fernando Sanchez


Reply With Quote
  #2  
Old   
Deepak Puri
 
Posts: n/a

Default Re: CUBE RETURNS DUPLICATE RECORDS - 04-13-2006 , 10:39 PM






Hi Fernando,

Don't know whether you've upgraded to AS 2005 yet, but the new many-many
dimension support might work well for you:

http://msdn.microsoft.com/library/de.../en-us/dnsql90
/html/sql2k5snapshotisol.asp
Quote:
Many-to-Many Dimensions in Analysis Services 2005

Richard Tkachuk
Microsoft Corporation

June 2005

Applies to:

Microsoft SQL Server 2005 Analysis Services

Summary: See an example of using the Many-to-Many dimension in SQL
Server 2005 Analysis Services to analyze sales data, and get ideas for
other uses such as treating medical conditions, software testing, and
more.
...
Quote:

- Deepak

Deepak Puri
Microsoft MVP - SQL Server

*** Sent via Developersdex http://www.developersdex.com ***


Reply With Quote
  #3  
Old   
fsanchez
 
Posts: n/a

Default Re: CUBE RETURNS DUPLICATE RECORDS - 04-14-2006 , 08:53 AM



Hi Deepak,
Unfortunately, we have not yet upgraded to 2005. I believe we will in the
future. I have noticed in some of my other OLAP cubes, Analysis Services
does not like snowflake schemas. In the pass, I had to created a view and
flatten it. However, I am having difficulties with the design. Have you had
the opportunity before to work on this type data modeling issue. Any insight
is greatly appreciated.

"Deepak Puri" wrote:

Quote:
Hi Fernando,

Don't know whether you've upgraded to AS 2005 yet, but the new many-many
dimension support might work well for you:

http://msdn.microsoft.com/library/de.../en-us/dnsql90
/html/sql2k5snapshotisol.asp

Many-to-Many Dimensions in Analysis Services 2005

Richard Tkachuk
Microsoft Corporation

June 2005

Applies to:

Microsoft SQL Server 2005 Analysis Services

Summary: See an example of using the Many-to-Many dimension in SQL
Server 2005 Analysis Services to analyze sales data, and get ideas for
other uses such as treating medical conditions, software testing, and
more.
...



- Deepak

Deepak Puri
Microsoft MVP - SQL Server

*** Sent via Developersdex http://www.developersdex.com ***


Reply With Quote
  #4  
Old   
Deepak Puri
 
Posts: n/a

Default Re: CUBE RETURNS DUPLICATE RECORDS - 04-14-2006 , 11:53 AM



Fernando,

Even with AS 2000, snowflaked dimensions should not, by themselves, be
a problem - in fact, snowflaking can sometimes alleviate duplicate
record issues, as this past post discusses. Could you describe a
scenario where you encountered issues with snowflaking?

http://groups.google.com/group/micro...olap/msg/138ca
29d5c263b1f?hl=en&
Quote:
Different granularity for measures

From: George Spofford
Date: Mon, Aug 26 2002 9:41 am
Groups: microsoft.public.sqlserver.olap

You need to either optimize the joins away for the SKU and Category
tables,
or snowflake them so that SKU information is in a separate table from
category information. (You could do both, too.)

In order to optimize the joins, ensure that Category keys are unique
within
the level. Then, AS2K won't attempt to join in the product dimension
when it
processes the cube. The join is what throws off the numbers, as you get
N
fact rows per 1 dimension row.

Your Forecasts will still have SKU disabled and join to Prod_catg_key in
the
category table (manually ensure this is set in the cube editor). You SKU
information will join to SKU key.
...
Quote:

- Deepak

Deepak Puri
Microsoft MVP - SQL Server

*** Sent via Developersdex http://www.developersdex.com ***


Reply With Quote
  #5  
Old   
fsanchez
 
Posts: n/a

Default Re: CUBE RETURNS DUPLICATE RECORDS - 04-14-2006 , 12:37 PM



Deepak,
I am not sure if I understand your attached message regarding the unique
keys. Are you saying if I set the key for the Event Type to unique, it will
not duplicate the records.

I have also sent you a word document of the data model and screen prints of
the Event Type dimension and Cube for your review. As for a scenario, do you
want me to send you the output to show the duplication values?

As you know, your assistant is greatly appreciate it.

"Deepak Puri" wrote:

Quote:
Fernando,

Even with AS 2000, snowflaked dimensions should not, by themselves, be
a problem - in fact, snowflaking can sometimes alleviate duplicate
record issues, as this past post discusses. Could you describe a
scenario where you encountered issues with snowflaking?

http://groups.google.com/group/micro...olap/msg/138ca
29d5c263b1f?hl=en&

Different granularity for measures

From: George Spofford
Date: Mon, Aug 26 2002 9:41 am
Groups: microsoft.public.sqlserver.olap

You need to either optimize the joins away for the SKU and Category
tables,
or snowflake them so that SKU information is in a separate table from
category information. (You could do both, too.)

In order to optimize the joins, ensure that Category keys are unique
within
the level. Then, AS2K won't attempt to join in the product dimension
when it
processes the cube. The join is what throws off the numbers, as you get
N
fact rows per 1 dimension row.

Your Forecasts will still have SKU disabled and join to Prod_catg_key in
the
category table (manually ensure this is set in the cube editor). You SKU
information will join to SKU key.
...



- Deepak

Deepak Puri
Microsoft MVP - SQL Server

*** Sent via Developersdex http://www.developersdex.com ***


Reply With Quote
  #6  
Old   
fsanchez
 
Posts: n/a

Default Re: CUBE RETURNS DUPLICATE RECORDS - 04-14-2006 , 12:47 PM



Deepak,
I just thought of something...Can I be getting duplicate records because I
am missing join in the relationship.

In the Communication Event dimension, there can be one or more events that a
customer attends. I am only joining the Customer Key to the Customer Key in
the fact table.

Although there is a period key which represents the end of month or last
business day in the month, I will be generating another fact table from the
original to create the daily charges. I am able to this because the deal
detail key is in the original fact table. The Deal Detail table captures all
the deal for the day for the customer.

Regards,
Fernando Sanchez
"Deepak Puri" wrote:

Quote:
Fernando,

Even with AS 2000, snowflaked dimensions should not, by themselves, be
a problem - in fact, snowflaking can sometimes alleviate duplicate
record issues, as this past post discusses. Could you describe a
scenario where you encountered issues with snowflaking?

http://groups.google.com/group/micro...olap/msg/138ca
29d5c263b1f?hl=en&

Different granularity for measures

From: George Spofford
Date: Mon, Aug 26 2002 9:41 am
Groups: microsoft.public.sqlserver.olap

You need to either optimize the joins away for the SKU and Category
tables,
or snowflake them so that SKU information is in a separate table from
category information. (You could do both, too.)

In order to optimize the joins, ensure that Category keys are unique
within
the level. Then, AS2K won't attempt to join in the product dimension
when it
processes the cube. The join is what throws off the numbers, as you get
N
fact rows per 1 dimension row.

Your Forecasts will still have SKU disabled and join to Prod_catg_key in
the
category table (manually ensure this is set in the cube editor). You SKU
information will join to SKU key.
...



- Deepak

Deepak Puri
Microsoft MVP - SQL Server

*** Sent via Developersdex http://www.developersdex.com ***


Reply With Quote
  #7  
Old   
Deepak Puri
 
Posts: n/a

Default Re: CUBE RETURNS DUPLICATE RECORDS - 04-19-2006 , 11:17 PM



Fernando,

After reviewing your cube schema, it looks like a classic many-many
dimension scenario, using the COMMUNICATION_EVENT bridge table. This is
supported out-of-the-box in AS 2005, but will cause duplicate rows in AS
2000.

What I've often done in AS 2000 is to create 2 cubes and then combine
them in a virtual cube. In this case, the COMMUNICATION_EVENT and
EVENT_TYPE tables could be removed from the original cube. The fact
table for the 2nd cube could then be a view which joins the original
fact table to the bridge table, resulting in the extra rows; but the
Event dimension would only apply to the 2nd cube.


- Deepak

Deepak Puri
Microsoft MVP - SQL Server

*** Sent via Developersdex http://www.developersdex.com ***

Reply With Quote
  #8  
Old   
fsanchez
 
Posts: n/a

Default Re: CUBE RETURNS DUPLICATE RECORDS - 04-20-2006 , 10:06 AM



Hi Deepak,
Thanks for responding back...Your observation are greatly appreciated.
Your suggestion is a great idea. However,after carefully reviewing it, I
discovered that I would still have the duplicated records. So what I did was
to explode the Communication Event table by populating another field with the
actual Event Date that existed in the range (From Date, To Date) and call it
DW_CUSTOMER_EVENT which looks like this:

CREATE TABLE DW_CUSTOMER_EVENT
(
CUSTOMER_KEY NUMBER(6) not null,
EVENT_TYPE_KEY NUMBER(6) not null,
EVENT_TYPE VARCHAR2(50),
EVENT_DESC VARCHAR2(50),
EVENT_DATE DATE,
FROM_DATE DATE,
TO_DATE DATE,
LOAD_DATE DATE default (SYSDATE)
)

This table contains a record for every customer that attended the event for
the number of days. For example, if customer A attended the 2006 conference
and it was held for days. I have four records but the Event Date would be
the corresponding date that it was held.

CUSTOMER_KEY EVENT_TYPE_KEY EVENT_DATE FROM_DATE TO_DATE
------------------- -------------------- --------------- ---------------
------------
247 4 01/17/2006 01/17/2006
01/20/2006
247 4 01/18/2006 01/17/2006
01/20/2006
247 4 01/19/2006 01/17/2006
01/20/2006
247 4 01/20/2006 01/17/2006
01/20/2006

Using the Customer Key and the Event Date, I can join to the fact table and
remove the duplicate records...

I then created a separate fact table called DW_DAILY_CHARGE_FACT as followed:

CREATE TABLE DW_DAILY_CHARGE_FACT
AS
SELECT S1.CUSTOMER_KEY,
S1.PERIOD_KEY,
S1.CHARGE_TYPE_KEY,
S1.DEAL_DETAIL_KEY,
S1.SALES_PERSON_KEY,
S1.SALES_TRADER_KEY,
S1.INDUSTRY_SEGMENT_KEY,
S1.COUNTRY_KEY,
NVL(S2.EVENT_TYPE_KEY,0) AS EVENT_TYPE_KEY,
S1.MTD_CHARGE_AMT,
S1.VOLUME_AMT
FROM (SELECT T1.CUSTOMER_KEY,
T3.PERIOD_KEY,
T1.CHARGE_TYPE_KEY,
T1.DEAL_DETAIL_KEY,
T1.SALES_PERSON_KEY,
T1.SALES_TRADER_KEY,
T1.INDUSTRY_SEGMENT_KEY,
T1.COUNTRY_KEY,
T1.MTD_CHARGE_AMT,
T1.VOLUME_AMT
FROM DW_CHARGE_ALLOCATION_FACT T1,
DW_DEAL_DETAIL T2,
DW_PERIOD T3
WHERE T1.DEAL_DETAIL_KEY = T2.DEAL_DETAIL_KEY
AND T2.TRADE_DT = T3.FULL_DATE) S1,
(SELECT T4.CUSTOMER_KEY,
T5.PERIOD_KEY,
T4.EVENT_TYPE_KEY
FROM DW_CUSTOMER_EVENT T4,
DW_PERIOD T5
WHERE T4.EVENT_DATE = T5.FULL_DATE) S2
WHERE S1.CUSTOMER_KEY = S2.CUSTOMER_KEY (+)
AND S1.PERIOD_KEY = S2.PERIOD_KEY (+)

Now my star schema looks like this...

SELECT S1.CUSTOMER_KEY,
S1.PERIOD_KEY,
S1.CHARGE_TYPE_KEY,
S1.DEAL_DETAIL_KEY,
S1.SALES_PERSON_KEY,
S1.SALES_TRADER_KEY,
S1.INDUSTRY_SEGMENT_KEY,
S1.COUNTRY_KEY,
S1.EVENT_TYPE_KEY,
S1.MTD_CHARGE_AMT,
S1.VOLUME_AMT
FROM DW_DAILY_CHARGE_FACT S1,
DW_CUSTOMER S2,
DW_PERIOD S3,
DW_CHARGE_TYPE S4,
DW_EVENT_TYPE S5,
DW_DEAL_DETAIL S6,
DW_SALES_PERSON_VW S7,
DW_SALES_TRADER_VW S8,
DW_INDUSTRY_SEGMENT S9
WHERE S1.CUSTOMER_KEY = S2.CUSTOMER_KEY
AND S1.PERIOS_KEY = S3.PERIOD_KEY
AND S1.CHARGE_TYPE_KEY = S4.CHARGE_TYPE_KEY
AND S1.EVENT_TYPE_KEY = S5.EVENT_TYPE_KEY
AND S1.DEAL_DETAIL_KEY = S6.DEAL_DETAIL_KEY
AND S1.SALES_PERSON_KEY = S7.SALES_PERSON_KEY
AND S1.SALES_TRADER_KEY = S8.SALES_TRADER_KEY
AND S1.INDUSTRY_SEGMENT_KEY = S9.INDUSTRY_SEGMENT_KEY

But I just realize that what if a customer does not have any charges for the
days there are attending the event. This customer will not show up in the
fact table. How do I resolve this?

"Deepak Puri" wrote:

Quote:
Fernando,

After reviewing your cube schema, it looks like a classic many-many
dimension scenario, using the COMMUNICATION_EVENT bridge table. This is
supported out-of-the-box in AS 2005, but will cause duplicate rows in AS
2000.

What I've often done in AS 2000 is to create 2 cubes and then combine
them in a virtual cube. In this case, the COMMUNICATION_EVENT and
EVENT_TYPE tables could be removed from the original cube. The fact
table for the 2nd cube could then be a view which joins the original
fact table to the bridge table, resulting in the extra rows; but the
Event dimension would only apply to the 2nd cube.


- Deepak

Deepak Puri
Microsoft MVP - SQL Server

*** Sent via Developersdex http://www.developersdex.com ***


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.