dbTalk Databases Forums  

Cube showing slightly different data than in data source...?

microsoft.public.sqlserver.olap microsoft.public.sqlserver.olap


Discuss Cube showing slightly different data than in data source...? in the microsoft.public.sqlserver.olap forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Maria
 
Posts: n/a

Default Cube showing slightly different data than in data source...? - 12-17-2004 , 05:45 AM






Hello,

I am very perplexed with this problem I'm having. I work for a website-based
company and my job is to create a data warehouse based on usage of the site.
We have a Central Logging Database that logs every single hit to the site,
then i have a stored procedure that picks up these hits, does a bit of data
scrubbing and transformation, and puts them into a Warehouse database. i then
have an Analysis Services cube that points at this warehouse database. I then
use Reporting Services and MDX to create reports based on this usage info.

Scenario: the total hits in the database can be further grouped into
companies, so reports can be run to show a company's usage of the site.
Companies are then further divided into Users belonging to a company. I.e.
reports are generally run to show a company's usage, broken down into
individual users within that company.

In order to check that my warehouse-load stored procedure works properly I
compared the amount of hits in the Central Logging Database (where all hits
are originally recorded) with hits in the final Warehouse database to see if
they balance. Everything looks fine and hunky-dory.

My problem is this: When I browse the cube in analysis manager, some of the
companies have a higher hit count, i.e. show more hits, than actually exist
in any of the original databases! In the cube one company is even showing
1,000 hits for a user who does not have ANY hits in any of the original
databases!!!
the total amount of hits shown in the cube is equal to the original
databases, so some how it balances. but i don't understand where all this
extra\incorrect data is coming from.

My cube consists of a very simple Star-schema design. one main table called
Hits, and UserAccount, BusinessEntity, and Date dimensions. the Hits table
consists of the following columns:
Hit_ID (int)
DatabaseSource (int)
UserAccountID (int)
BusinessEntityID (int)
LogDateID (int)

One hit is defined by a Hit_ID and DatabaseSource (these two form a
composite primary key).

My Measure in the cube is the Hit_ID and I’m doing a Count on it.

Does anyone know what i could possibly be doing wrong? I would be very
grateful for any bit of help anyone could provide.

Thanks in advance!
Maria


Reply With Quote
  #2  
Old   
Mosha Pasumansky [MS]
 
Posts: n/a

Default Re: Cube showing slightly different data than in data source...? - 12-20-2004 , 02:59 AM






I think the best way to troubleshoot it is to do DrillThorugh on the cell
which shows wrong numbers - and track down the records in the fact table
that contributed to it.

--
==================================================
Mosha Pasumansky - http://www.mosha.com/msolap
Analysis Services blog at http://www.sqljunkies.com/WebLog/mosha
Development Lead in the Analysis Server team
All you need is love (John Lennon)
Disclaimer : This posting is provided "AS IS" with no warranties, and
confers no rights.
==================================================

"Maria" <Maria (AT) discussions (DOT) microsoft.com> wrote

Quote:
Hello,

I am very perplexed with this problem I'm having. I work for a
website-based
company and my job is to create a data warehouse based on usage of the
site.
We have a Central Logging Database that logs every single hit to the site,
then i have a stored procedure that picks up these hits, does a bit of
data
scrubbing and transformation, and puts them into a Warehouse database. i
then
have an Analysis Services cube that points at this warehouse database. I
then
use Reporting Services and MDX to create reports based on this usage info.

Scenario: the total hits in the database can be further grouped into
companies, so reports can be run to show a company's usage of the site.
Companies are then further divided into Users belonging to a company. I.e.
reports are generally run to show a company's usage, broken down into
individual users within that company.

In order to check that my warehouse-load stored procedure works properly I
compared the amount of hits in the Central Logging Database (where all
hits
are originally recorded) with hits in the final Warehouse database to see
if
they balance. Everything looks fine and hunky-dory.

My problem is this: When I browse the cube in analysis manager, some of
the
companies have a higher hit count, i.e. show more hits, than actually
exist
in any of the original databases! In the cube one company is even showing
1,000 hits for a user who does not have ANY hits in any of the original
databases!!!
the total amount of hits shown in the cube is equal to the original
databases, so some how it balances. but i don't understand where all this
extra\incorrect data is coming from.

My cube consists of a very simple Star-schema design. one main table
called
Hits, and UserAccount, BusinessEntity, and Date dimensions. the Hits table
consists of the following columns:
Hit_ID (int)
DatabaseSource (int)
UserAccountID (int)
BusinessEntityID (int)
LogDateID (int)

One hit is defined by a Hit_ID and DatabaseSource (these two form a
composite primary key).

My Measure in the cube is the Hit_ID and I'm doing a Count on it.

Does anyone know what i could possibly be doing wrong? I would be very
grateful for any bit of help anyone could provide.

Thanks in advance!
Maria




Reply With Quote
  #3  
Old   
Maria
 
Posts: n/a

Default Re: Cube showing slightly different data than in data source...? - 12-20-2004 , 07:51 AM



Mosha,

Thank you so so much. This really pointed me in the right direction and
helped me solve this problem!! I would just like to thank all of the
brilliant people who support these newsgroups. I can't count the amount of
times you guys have *really* saved my bacon! And I can add this time to that!
;-)

Basically i followed your advice and i enabled drillthrough on the cube so
that i could browse directly back to the database to see what exactly these
extra rows were. this is what i found out:

every so often we get a request to merge all users from one companyID into
another companyID (e.g. for a company merge). So in our UserAccounts table,
all users who were previously assigned to CompanyID 7 (old company) will now
be assigned to CompanyID 8 (new company). this is just a straight UPDATE on
the table.

BUT, in our Hits database, we don't update the hits to reflect this change
in CompanyID. so to get a meaningful report for a company that has merged,
you will have to run one report each on both CompanyID 7 and CompanyID 8, and
then combine the two.

THIS is exactly what Analysis Services was doing!!! I had forgotten that we
had this data issue so the report i was running off the original Hits
database had less hits than in the Analysis Services cube - because i was not
adding on the extra hits from the old companyID!!!

I am walking on air at the moment! Analysis Services rocks!!

Maria


"Mosha Pasumansky [MS]" wrote:

Quote:
I think the best way to troubleshoot it is to do DrillThorugh on the cell
which shows wrong numbers - and track down the records in the fact table
that contributed to it.

--
==================================================
Mosha Pasumansky - http://www.mosha.com/msolap
Analysis Services blog at http://www.sqljunkies.com/WebLog/mosha
Development Lead in the Analysis Server team
All you need is love (John Lennon)
Disclaimer : This posting is provided "AS IS" with no warranties, and
confers no rights.
==================================================

"Maria" <Maria (AT) discussions (DOT) microsoft.com> wrote in message
news:F2356D40-EA12-4D2F-B60E-F0243E977C22 (AT) microsoft (DOT) com...
Hello,

I am very perplexed with this problem I'm having. I work for a
website-based
company and my job is to create a data warehouse based on usage of the
site.
We have a Central Logging Database that logs every single hit to the site,
then i have a stored procedure that picks up these hits, does a bit of
data
scrubbing and transformation, and puts them into a Warehouse database. i
then
have an Analysis Services cube that points at this warehouse database. I
then
use Reporting Services and MDX to create reports based on this usage info.

Scenario: the total hits in the database can be further grouped into
companies, so reports can be run to show a company's usage of the site.
Companies are then further divided into Users belonging to a company. I.e.
reports are generally run to show a company's usage, broken down into
individual users within that company.

In order to check that my warehouse-load stored procedure works properly I
compared the amount of hits in the Central Logging Database (where all
hits
are originally recorded) with hits in the final Warehouse database to see
if
they balance. Everything looks fine and hunky-dory.

My problem is this: When I browse the cube in analysis manager, some of
the
companies have a higher hit count, i.e. show more hits, than actually
exist
in any of the original databases! In the cube one company is even showing
1,000 hits for a user who does not have ANY hits in any of the original
databases!!!
the total amount of hits shown in the cube is equal to the original
databases, so some how it balances. but i don't understand where all this
extra\incorrect data is coming from.

My cube consists of a very simple Star-schema design. one main table
called
Hits, and UserAccount, BusinessEntity, and Date dimensions. the Hits table
consists of the following columns:
Hit_ID (int)
DatabaseSource (int)
UserAccountID (int)
BusinessEntityID (int)
LogDateID (int)

One hit is defined by a Hit_ID and DatabaseSource (these two form a
composite primary key).

My Measure in the cube is the Hit_ID and I'm doing a Count on it.

Does anyone know what i could possibly be doing wrong? I would be very
grateful for any bit of help anyone could provide.

Thanks in advance!
Maria





Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.