Please help a dummy. -
01-20-2004
, 10:31 PM
I have a data warehouse of sorts, with metadata tables like this, which are populated from a separate OLTP system before data load. I've left out key information and constraints just so you can get a feel for the basic structure.
CREATE TABLE Author
(
AuthorID INT,
AuthorName VARCHAR(32)
)
CREATE TABLE Publisher
(
PublisherID INT,
PublisherName VARCHAR(32)
)
CREATE TABLE Article
(
ArticleID INT,
ArticleDate SMALLDATETIME,
PublisherID INT,
AuthorID INT
)
CREATE TABLE Event
(
EventID TINYINT,
EventName VARCHAR(32)
)
INSERT Event SELECT 1, 'previewed'
INSERT Event SELECT 2, 'purchased'
INSERT Event SELECT 3, 'e-mailed to friend'
INSERT Event SELECT 4, 'subscribed'
INSERT Event SELECT 5, 'gave feedback'
And this is the primary fact table:
CREATE TABLE EventHistory
(
ArticleID INT,
dt SMALLDATETIME,
EventID TINYINT
)
(Which can easily have columns added to include hour of day information, as well as publisherID and other things that might be relevant, either populated by the ETL process or as computed columns.)
This EventHistory table literally gets close to 100 million entries per day. I'd like to use Analysis Services to present customers with the ability to generate the following drilldown queries with ad hoc date ranges (which clearly has no business in an OLTP system, I would think). Counts of each Event summed at each level, and then broken down by Date, and further by Hour. The user only has to be able to pick one date at a time to see an hourly breakdown.
Pick date range, then pick one of the following drilldown paths:
Publisher -> Author -> Article
Publisher -> Author -> Article -> Date
Publisher -> Author -> Article -> Date -> Hour
Author -> Publisher -> Article
Author -> Publisher -> Article -> Date
Author -> Publisher -> Article -> Date -> Hour
Author -> Article
Author -> Article -> Date -> Hour
(Note that an article only appears once, but an author can write articles for more than one publisher.)
So, can you help me get started? Should I create Summary tables that take the raw transactions from EventHistory, and one of them sums them up Events by day, and another that groups by day/hour? (Both grouped, of course, by PublisherID, AuthorID, ArticleID, EventID and summed with an EventCount column?)
How many cubes will I need to present this data? What will each of their dimensions be? Measures (other than EventCount)?
I know this sounds like I'm asking you to do my homework, but I really just need the right push on how to make these decisions, not what the decisions should be.
Thanks in advance!
Clueless in Springfield |