dbTalk Databases Forums  

OLAP Design question

microsoft.public.sqlserver.olap microsoft.public.sqlserver.olap


Discuss OLAP Design question in the microsoft.public.sqlserver.olap forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
kurb
 
Posts: n/a

Default OLAP Design question - 11-25-2004 , 08:30 PM






Hello I am new to OLAP, and trying to architect cubes for the following
application. Are there trade-offs for selecting many cubes vs many
tables vs many columns?

I am still gathering resources. Is there a template available for writng
the requirements?

Thank you for any suggestions or leads.

Kurb



My Application: Time Series Analysis

I have many portfolios each with a set of stock Symbols with price
data every 10 mins (8 hr day), going back 10 years or more.

There are 10 portfolios each with hundreds of Symbols with 48 cells of data per day. For each raw cell there are 100 Stats to derived for each of 100 Time attributes

User likes to drill/aggregate/calculate data within each Symbol, and
across sets of Symbols, relating to various combinations of Time
attribute and Stats attributes.




My Model

I have dense fact tables, tons of Stats calculations to do, perhaps 50% of which may be pre-calculated

I can see having

1. One fact table for each Symbol (SymbolID,DateID,TimeOfDayID, data1,dat2...)for the raw data
2. Dimension for Symbols with Levels for different attributes (SymbolID,PortfolioId..)
3. Dimension for Stats with Levels for different attributes of Stats
4. Dimension for Time with Levels for different attributes of Time

Question: Do I architect for many cubes, many tables, many columns, or does it matter


Fully exploded, this looks like

Many Cubes: Sets of Virtual Cubes each with hundreds of cubes, one for each Symbol: = 2000 cubes, 10 Virtual Cubes

Many Tables: One Stat table for each Stat: (StatID,SymbolId,Date/TimeOfDayId, time1, time2...), one for each Symbol = 100 Stats X 200 Symbols x 10 Portfolios = 200 000 Tables, 10 cubes (one per portfolio)

Many Columns: All the Stats on one Table, one for each Symbol = 2000 Tables. 100 Stats X 100 Time attributes = 10 000 columns per Symbol, 10 cubes



Why(I think) I need separate tables for each Symbol
I want to have one table with raw data for each Symbol because keep all
the data rows of data for that Symbol in strict day/time sequence. This
way I can easily calculate price changes relative to Time (which I have
to do tons of) by simply counting relative row numbers ieg 20 (business)
days ago is 20 rows down from the current row.

Also the daily raw data that gets plugged into the cube is available
from the source on a per Symbol/TimeSeries basis. Also the data may be
charted on a per Symbol/TimeSeries and it would be easier to haul out
the data to an external application, if the Cube is structured for that.





Reply With Quote
  #2  
Old   
Deepak Puri
 
Posts: n/a

Default Re: OLAP Design question - 11-26-2004 , 04:59 PM






Can symbols be shared across multiple portfolios, or are they
exclusively associated with a parent portfolio? Also, are there
quantities and values associated with a portfolio that need to be
analyzed (eg: Portfolio A has N units of Symbol X, valued at D dollars)?

My usual preference is to initially select the number of
columns/tables/cubes based on conceptual requirements. There may be many
occasions to create more objects and artifacts later, to address
tactical or performance needs.


- Deepak

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!

Reply With Quote
  #3  
Old   
luminary
 
Posts: n/a

Default RE: OLAP Design question - 11-26-2004 , 05:23 PM



It is customary to model one fact table with one business process. In your
case the only business process I can see is stock price changes. That would
suggest one fact table.

That does not mean that you cannot have multiple source data tables. It does
multiply the ETL necessary to populate the fact table though. Do you truely
have multiple sources?

Cheers

"kurb" wrote:

Quote:
Hello I am new to OLAP, and trying to architect cubes for the following
application. Are there trade-offs for selecting many cubes vs many
tables vs many columns?

I am still gathering resources. Is there a template available for writng
the requirements?

Thank you for any suggestions or leads.

Kurb



My Application: Time Series Analysis

I have many portfolios each with a set of stock Symbols with price
data every 10 mins (8 hr day), going back 10 years or more.

There are 10 portfolios each with hundreds of Symbols with 48 cells of data per day. For each raw cell there are 100 Stats to derived for each of 100 Time attributes

User likes to drill/aggregate/calculate data within each Symbol, and
across sets of Symbols, relating to various combinations of Time
attribute and Stats attributes.




My Model

I have dense fact tables, tons of Stats calculations to do, perhaps 50% of which may be pre-calculated

I can see having

1. One fact table for each Symbol (SymbolID,DateID,TimeOfDayID, data1,dat2...)for the raw data
2. Dimension for Symbols with Levels for different attributes (SymbolID,PortfolioId..)
3. Dimension for Stats with Levels for different attributes of Stats
4. Dimension for Time with Levels for different attributes of Time

Question: Do I architect for many cubes, many tables, many columns, or does it matter


Fully exploded, this looks like

Many Cubes: Sets of Virtual Cubes each with hundreds of cubes, one for each Symbol: = 2000 cubes, 10 Virtual Cubes

Many Tables: One Stat table for each Stat: (StatID,SymbolId,Date/TimeOfDayId, time1, time2...), one for each Symbol = 100 Stats X 200 Symbols x 10 Portfolios = 200 000 Tables, 10 cubes (one per portfolio)

Many Columns: All the Stats on one Table, one for each Symbol = 2000 Tables. 100 Stats X 100 Time attributes = 10 000 columns per Symbol, 10 cubes



Why(I think) I need separate tables for each Symbol
I want to have one table with raw data for each Symbol because keep all
the data rows of data for that Symbol in strict day/time sequence. This
way I can easily calculate price changes relative to Time (which I have
to do tons of) by simply counting relative row numbers ieg 20 (business)
days ago is 20 rows down from the current row.

Also the daily raw data that gets plugged into the cube is available
from the source on a per Symbol/TimeSeries basis. Also the data may be
charted on a per Symbol/TimeSeries and it would be easier to haul out
the data to an external application, if the Cube is structured for that

Reply With Quote
  #4  
Old   
kurb
 
Posts: n/a

Default Re: OLAP Design question - 11-26-2004 , 05:33 PM



Thank you for your response.

Yes, there are higher level account and portfolio level relationships
such as you indicate.
Given a conceptual model connecting cubes/tables/columns, and given some
functional/performance objectives, is there a template (guidelines,
tool, ...) that might be useful.

I am trying to move from Excel to Analysis Services (don't have SQL
server yet). Should I start modelling the tables using Access....

Thanks for any suggestions
K

Deepak Puri wrote:

Quote:
Can symbols be shared across multiple portfolios, or are they
exclusively associated with a parent portfolio? Also, are there
quantities and values associated with a portfolio that need to be
analyzed (eg: Portfolio A has N units of Symbol X, valued at D dollars)?

My usual preference is to initially select the number of
columns/tables/cubes based on conceptual requirements. There may be many
occasions to create more objects and artifacts later, to address
tactical or performance needs.


- Deepak

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!




Reply With Quote
  #5  
Old   
Deepak Puri
 
Posts: n/a

Default Re: OLAP Design question - 11-26-2004 , 08:29 PM



From a modelling standpoint, familiarity with dimensional modelling is
desirable - eg: some resources from Kimball:

http://www.rkimball.com/


The only modelling template/tool that I'm aware of is ADAPT from
Symmetry Corp, but you may find more options:

http://www.symcorp.com/downloads/ADA...io_stencil.zip


Based on my limited knowledge of portfolio analysis, I can conceive of 2
fact tables, one for the 10-minute symbol price data and another for
portfolio snapshots (though I don't know how frequently portfolios can
change). 2 cubes could then be built and combined in a single virtual
cube. The dimensions would be date, security, portfolio and maybe a
time-series analysis "utility" dimension, to accommodate time
statistical calculations (I'm assuming many of the desired statistical
"attributes" can be derived from the data in the cube). A "time-of-day"
dimension may be needed to study intra-day price action (if that's of
interest).


- Deepak

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!

Reply With Quote
  #6  
Old   
kurb
 
Posts: n/a

Default Re: OLAP Design question - 11-30-2004 , 11:00 AM



Thank you very much . This site and the tool look like they're going to
be useful..

Given your suggestions for dimensioning, this is model/concept I'm
thinking about. I am unclear about something very basic. If I want a
cube for each snapshot level (described below) then I need a single Fact
table for each cube. Do I need to ( physically) recreate the Fact table
for level 2 from data derived in level 1?

Thank you for any clarification

This is the raw data:

1. FactTable1(end of day Price data for securities)
2.FactTable2(intrad day Price data for securities)

This is the first level of aggregation: market snapshot

3. Securities(SecurityID, ExchangeID,SectorID, CurrrencyID ....)
4..Date(DateID, HolidayID...._)
5. TimeOfDay(TODId,.....)
6. TimeSeries(TimeSeriesID, the attribute of time slices related to the
Stats)
7. StatsType (StatsTypeID,the different categories of Stats)
8. Stats (StatsID, StatsTypeID, TimeSeriesID - the root formula for
each Stat)

This is the second level of aggregation: portfolio snapshot

9. FactTable3 (Price and Stats data for selected Securities)
10. Portfolio (PortfolioID,SecurityID, EntryID....

This is the third level view: account snapshot

11.FactTable4 (...
12. Account (AccountID....

Deepak Puri wrote:

Quote:
From a modelling standpoint, familiarity with dimensional modelling is
desirable - eg: some resources from Kimball:

http://www.rkimball.com/


The only modelling template/tool that I'm aware of is ADAPT from
Symmetry Corp, but you may find more options:

http://www.symcorp.com/downloads/ADA...io_stencil.zip


Based on my limited knowledge of portfolio analysis, I can conceive of 2
fact tables, one for the 10-minute symbol price data and another for
portfolio snapshots (though I don't know how frequently portfolios can
change). 2 cubes could then be built and combined in a single virtual
cube. The dimensions would be date, security, portfolio and maybe a
time-series analysis "utility" dimension, to accommodate time
statistical calculations (I'm assuming many of the desired statistical
"attributes" can be derived from the data in the cube). A "time-of-day"
dimension may be needed to study intra-day price action (if that's of
interest).


- Deepak

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!




Reply With Quote
  #7  
Old   
kurb
 
Posts: n/a

Default Re: OLAP Design question - 11-30-2004 , 11:28 AM



Thank you very much for your response. As you can tell from my recent
response to Deepak, I am a little confused about fact tables (part of
the problem is that I am still carrying a spreadsheet mentality, trying
to move wholesale from Excel).

I will largely have one data source for a particular portfolio, and
therefore different data sources for different portfolios. Since this
data source is offline and provided by an extrenal vendor, I am also
thinking of storing the source data in Analysis Services. The easiest
way to do so is to have a table for each stock This facilitates
maintenance (daily or hourly updates, and data validation). There are
other data sources that are created on Excel spreadsheets.

My newbie question is: Can I logically create a Fact table from these
dicrete tables?. Also, these discrete tables for each symbol have rows
of data that are in strict time series sequence. This works well for me
in Excel, as the bulk of my calculations are time-series dependent. If I
move everyhting onto one fact table I seem to think I lose this efficiency.

Appreciate any clarification
Thanks
K

luminary wrote:

Quote:
It is customary to model one fact table with one business process. In your
case the only business process I can see is stock price changes. That would
suggest one fact table.

That does not mean that you cannot have multiple source data tables. It does
multiply the ETL necessary to populate the fact table though. Do you truely
have multiple sources?

Cheers

"kurb" wrote:



Hello I am new to OLAP, and trying to architect cubes for the following
application. Are there trade-offs for selecting many cubes vs many
tables vs many columns?

I am still gathering resources. Is there a template available for writng
the requirements?

Thank you for any suggestions or leads.

Kurb



My Application: Time Series Analysis

I have many portfolios each with a set of stock Symbols with price
data every 10 mins (8 hr day), going back 10 years or more.

There are 10 portfolios each with hundreds of Symbols with 48 cells of data per day. For each raw cell there are 100 Stats to derived for each of 100 Time attributes

User likes to drill/aggregate/calculate data within each Symbol, and
across sets of Symbols, relating to various combinations of Time
attribute and Stats attributes.




My Model

I have dense fact tables, tons of Stats calculations to do, perhaps 50% of which may be pre-calculated

I can see having

1. One fact table for each Symbol (SymbolID,DateID,TimeOfDayID, data1,dat2...)for the raw data
2. Dimension for Symbols with Levels for different attributes (SymbolID,PortfolioId..)
3. Dimension for Stats with Levels for different attributes of Stats
4. Dimension for Time with Levels for different attributes of Time

Question: Do I architect for many cubes, many tables, many columns, or does it matter


Fully exploded, this looks like

Many Cubes: Sets of Virtual Cubes each with hundreds of cubes, one for each Symbol: = 2000 cubes, 10 Virtual Cubes

Many Tables: One Stat table for each Stat: (StatID,SymbolId,Date/TimeOfDayId, time1, time2...), one for each Symbol = 100 Stats X 200 Symbols x 10 Portfolios = 200 000 Tables, 10 cubes (one per portfolio)

Many Columns: All the Stats on one Table, one for each Symbol = 2000 Tables. 100 Stats X 100 Time attributes = 10 000 columns per Symbol, 10 cubes



Why(I think) I need separate tables for each Symbol
I want to have one table with raw data for each Symbol because keep all
the data rows of data for that Symbol in strict day/time sequence. This
way I can easily calculate price changes relative to Time (which I have
to do tons of) by simply counting relative row numbers ieg 20 (business)
days ago is 20 rows down from the current row.

Also the daily raw data that gets plugged into the cube is available
from the source on a per Symbol/TimeSeries basis. Also the data may be
charted on a per Symbol/TimeSeries and it would be easier to haul out
the data to an external application, if the Cube is structured for that




Reply With Quote
  #8  
Old   
Deepak Puri
 
Posts: n/a

Default Re: OLAP Design question - 12-05-2004 , 10:35 PM



Not sure what you mean by the "need to ( physically) recreate the Fact
table for level 2 from data derived in level 1", but the cubes for the 2
fact tables can be combined in a virtual cube. This would enable, for
example, the net worth of a portfolio to be calculated from the number
of shares taken from a portfolio cube and share prices taken from a
securities cube. But the performance of such a solution would have to be
evaluated, since it may entail many "on-the-fly" calculations at query
time.

The order of records in the fact table is not relevant to statistical
computations from a cube, so records for different securities need not
be kept in separate tables. There are some statistical functions
included in MDX, and others can be used from the Excel library, if
registered:

http://msdn.microsoft.com/library/de.../en-us/olapdma
d/agmdxnonosfunct_3bzn.asp


- Deepak

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.