![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
#2
| |||
| |||
|
#3
| |||
| |||
|
|
Hello I am new to OLAP, and trying to architect cubes for the following application. Are there trade-offs for selecting many cubes vs many tables vs many columns? I am still gathering resources. Is there a template available for writng the requirements? Thank you for any suggestions or leads. Kurb My Application: Time Series Analysis I have many portfolios each with a set of stock Symbols with price data every 10 mins (8 hr day), going back 10 years or more. There are 10 portfolios each with hundreds of Symbols with 48 cells of data per day. For each raw cell there are 100 Stats to derived for each of 100 Time attributes User likes to drill/aggregate/calculate data within each Symbol, and across sets of Symbols, relating to various combinations of Time attribute and Stats attributes. My Model I have dense fact tables, tons of Stats calculations to do, perhaps 50% of which may be pre-calculated I can see having 1. One fact table for each Symbol (SymbolID,DateID,TimeOfDayID, data1,dat2...)for the raw data 2. Dimension for Symbols with Levels for different attributes (SymbolID,PortfolioId..) 3. Dimension for Stats with Levels for different attributes of Stats 4. Dimension for Time with Levels for different attributes of Time Question: Do I architect for many cubes, many tables, many columns, or does it matter Fully exploded, this looks like Many Cubes: Sets of Virtual Cubes each with hundreds of cubes, one for each Symbol: = 2000 cubes, 10 Virtual Cubes Many Tables: One Stat table for each Stat: (StatID,SymbolId,Date/TimeOfDayId, time1, time2...), one for each Symbol = 100 Stats X 200 Symbols x 10 Portfolios = 200 000 Tables, 10 cubes (one per portfolio) Many Columns: All the Stats on one Table, one for each Symbol = 2000 Tables. 100 Stats X 100 Time attributes = 10 000 columns per Symbol, 10 cubes Why(I think) I need separate tables for each Symbol I want to have one table with raw data for each Symbol because keep all the data rows of data for that Symbol in strict day/time sequence. This way I can easily calculate price changes relative to Time (which I have to do tons of) by simply counting relative row numbers ieg 20 (business) days ago is 20 rows down from the current row. Also the daily raw data that gets plugged into the cube is available from the source on a per Symbol/TimeSeries basis. Also the data may be charted on a per Symbol/TimeSeries and it would be easier to haul out the data to an external application, if the Cube is structured for that |
#4
| |||
| |||
|
|
Can symbols be shared across multiple portfolios, or are they exclusively associated with a parent portfolio? Also, are there quantities and values associated with a portfolio that need to be analyzed (eg: Portfolio A has N units of Symbol X, valued at D dollars)? My usual preference is to initially select the number of columns/tables/cubes based on conceptual requirements. There may be many occasions to create more objects and artifacts later, to address tactical or performance needs. - Deepak *** Sent via Developersdex http://www.developersdex.com *** Don't just participate in USENET...get rewarded for it! |
#5
| |||
| |||
|
#6
| |||
| |||
|
|
From a modelling standpoint, familiarity with dimensional modelling is desirable - eg: some resources from Kimball: http://www.rkimball.com/ The only modelling template/tool that I'm aware of is ADAPT from Symmetry Corp, but you may find more options: http://www.symcorp.com/downloads/ADA...io_stencil.zip Based on my limited knowledge of portfolio analysis, I can conceive of 2 fact tables, one for the 10-minute symbol price data and another for portfolio snapshots (though I don't know how frequently portfolios can change). 2 cubes could then be built and combined in a single virtual cube. The dimensions would be date, security, portfolio and maybe a time-series analysis "utility" dimension, to accommodate time statistical calculations (I'm assuming many of the desired statistical "attributes" can be derived from the data in the cube). A "time-of-day" dimension may be needed to study intra-day price action (if that's of interest). - Deepak *** Sent via Developersdex http://www.developersdex.com *** Don't just participate in USENET...get rewarded for it! |
#7
| |||
| |||
|
|
It is customary to model one fact table with one business process. In your case the only business process I can see is stock price changes. That would suggest one fact table. That does not mean that you cannot have multiple source data tables. It does multiply the ETL necessary to populate the fact table though. Do you truely have multiple sources? Cheers "kurb" wrote: Hello I am new to OLAP, and trying to architect cubes for the following application. Are there trade-offs for selecting many cubes vs many tables vs many columns? I am still gathering resources. Is there a template available for writng the requirements? Thank you for any suggestions or leads. Kurb My Application: Time Series Analysis I have many portfolios each with a set of stock Symbols with price data every 10 mins (8 hr day), going back 10 years or more. There are 10 portfolios each with hundreds of Symbols with 48 cells of data per day. For each raw cell there are 100 Stats to derived for each of 100 Time attributes User likes to drill/aggregate/calculate data within each Symbol, and across sets of Symbols, relating to various combinations of Time attribute and Stats attributes. My Model I have dense fact tables, tons of Stats calculations to do, perhaps 50% of which may be pre-calculated I can see having 1. One fact table for each Symbol (SymbolID,DateID,TimeOfDayID, data1,dat2...)for the raw data 2. Dimension for Symbols with Levels for different attributes (SymbolID,PortfolioId..) 3. Dimension for Stats with Levels for different attributes of Stats 4. Dimension for Time with Levels for different attributes of Time Question: Do I architect for many cubes, many tables, many columns, or does it matter Fully exploded, this looks like Many Cubes: Sets of Virtual Cubes each with hundreds of cubes, one for each Symbol: = 2000 cubes, 10 Virtual Cubes Many Tables: One Stat table for each Stat: (StatID,SymbolId,Date/TimeOfDayId, time1, time2...), one for each Symbol = 100 Stats X 200 Symbols x 10 Portfolios = 200 000 Tables, 10 cubes (one per portfolio) Many Columns: All the Stats on one Table, one for each Symbol = 2000 Tables. 100 Stats X 100 Time attributes = 10 000 columns per Symbol, 10 cubes Why(I think) I need separate tables for each Symbol I want to have one table with raw data for each Symbol because keep all the data rows of data for that Symbol in strict day/time sequence. This way I can easily calculate price changes relative to Time (which I have to do tons of) by simply counting relative row numbers ieg 20 (business) days ago is 20 rows down from the current row. Also the daily raw data that gets plugged into the cube is available from the source on a per Symbol/TimeSeries basis. Also the data may be charted on a per Symbol/TimeSeries and it would be easier to haul out the data to an external application, if the Cube is structured for that |
#8
| |||
| |||
|
![]() |
| Thread Tools | |
| Display Modes | |
| |