dbTalk Databases Forums  

basics question on star schema design: sum and distinct-count measures

comp.databases.olap comp.databases.olap


Discuss basics question on star schema design: sum and distinct-count measures in the comp.databases.olap forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
elkarel
 
Posts: n/a

Default basics question on star schema design: sum and distinct-count measures - 04-23-2012 , 06:01 AM






I am beginner in Data Warehouse Design. I have red some theory, but
recently met a practical problem with a design of a OLAP cube. I use
star schema.
Lets say I have 2 dimension tables and 1 fact table:

Dimension Gazetteer:
- dimension_id
- country_name
- province_name
- district_name

Dimension Device:
- dimension_id
- device_category
- device_subcategory

Fact table:
- gazetteer_id
- device_dimension_id
- hazard_id (column for measure called "number of hazards")
- area_m2 (column for measure called "total area occupied")

A "business object" (which is a mine field actually) can have multiple
devices, is located in a single location (Gazetteer) and ocuppies X
square meters.
So in order to know which device categories there are, I created a
fact per each device in hazard like this:

+--------------+---------------------+-----------------------
+-----------+
Quote:
gazetteer_id | device_dimension_id | hazard_id |
area_m2 |
+--------------+---------------------+-----------------------
+-----------+
Quote:
123 | 321 | 0a0a-502c-11aa1331e98 |
6000 |
+--------------+---------------------+-----------------------
+-----------+
Quote:
123 | 654 | 0a0a-502c-11aa1331e98 |
6000 |
+--------------+---------------------+-----------------------
+-----------+
Quote:
123 | 987 | 0a0a-502c-11aa1331e98 |
6000 |
+--------------+---------------------+-----------------------
+-----------+
I defined a measure "number of hazards" as distinct-count of
hazard_id.
I also defined a "total area occupied" measure as a sum of area_m2.
Now I can use the dimension gazetteer and device and know how many
hazards there are with given dimension members.
But the problem is the "total area occupied" measure: because it is
defined as a sum, it gives a value n-times higher than the actual
area, where n is th number of devices of the particular hazard object.
For example, with the data above would give 18000m2.
How would you solve this problem?

I am using the Pentaho stack but I believe it is a generic question.

Thanks in advance for any suggestion

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.