dbTalk Databases Forums  

How big is too big?

microsoft.public.sqlserver.olap microsoft.public.sqlserver.olap


Discuss How big is too big? in the microsoft.public.sqlserver.olap forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Mark
 
Posts: n/a

Default How big is too big? - 05-22-2004 , 07:46 AM






I'm working through a pilot scenario and am wondering how big is a "Big" fact table. I have about 190,000,000 rows in the fact table for just one year's worth of data and wonder if that will be a problem when I get 2 or 3 years (or if it might be a problem now).

Any insight would be great - I suppose I could try to redesign the data or preaggregate the items, but just wondered what people had found

Thanks in advance for any help
Mark

Reply With Quote
  #2  
Old   
Dave Wickert [MSFT]
 
Posts: n/a

Default Re: How big is too big? - 05-22-2004 , 12:06 PM






A couple of thoughts:

1) You should definitely be looking at partitioning a cube at that size. See
the Analysis Services perf guide for why:
http://www.microsoft.com/technet/pro.../ansvcspg.mspx
This requires a SQL Server 2000 Enterprise Edition license.

2) You should be careful in your design around when you will need to do a
full reprocess of the data as it will take a considerable period of time.
For example, if you have a non-changing dimension in the cube design, and
you do a full process of that dimension, then all cubes will knocked
off-line until you reprocess them from scratch. With a cube your size, this
is hours and hours -- so you will need to plan activities such as this.

3) As a very general rule of thumb, I use a million rows per minute as a
good server-class machine throughput for processing a cube partition. Now
you can get better than that using fast I/O devices, processing partitions
in parallel, etc., but 1 million / hr is a good planning number. You will
need to run the Optimize Schema wizard on it to reduce the complexity of the
SQL statement used by process (again, see the perf guide above), so that you
get as close to a table scan as possible on the RDBMS side for the
processing SQL. Using that rule of thumb, if you have 5 years of fact data,
let's say 60 monthly partitions, you are looking at a billion combined
records, or about 1000 mintues, or approx 17 hours of processing to
re-process the entire cube.

4) A good best practice is that you prototype a smaller environment, such as
6 months worth of data and then linearly extrapolate the full dataset
reprocessing. Analysis Services is good at that because partition processing
is pretty much linear.

That being said, with careful planning, I've seen systems with billions and
billions of combined fact table records. So what you describe is very
doable.

--
Dave Wickert [MS]
dwickert (AT) online (DOT) microsoft.com
Program Manager
BI Practices Team
SQL BI Product Unit (Analysis Services)
--
This posting is provided "AS IS" with no warranties, and confers no rights.

"Mark" <anonymous (AT) discussions (DOT) microsoft.com> wrote

Quote:
I'm working through a pilot scenario and am wondering how big is a "Big"
fact table. I have about 190,000,000 rows in the fact table for just one
year's worth of data and wonder if that will be a problem when I get 2 or 3
years (or if it might be a problem now).
Quote:
Any insight would be great - I suppose I could try to redesign the data or
preaggregate the items, but just wondered what people had found.

Thanks in advance for any help.
Mark



Reply With Quote
  #3  
Old   
Mark
 
Posts: n/a

Default Re: How big is too big? - 05-22-2004 , 04:56 PM



Thanks Dave for the info. I have the perf guide and have been looking at how I should partition the data. Most of the users will be limited to a portion of the customer dimension - which has geographical hierarchy up to region - and they will usually want to look at a 12 month rolling. When designing a partition, I wasn't sure how to decide what the best data slice would be - anything older than 12 months could go on it's on, I think, since it would rarely be used. However, date didn't seem like the best choice because they want to see a whole year at a time. So I was considering trying the region in the customer dimension.

Do you know if there are any threads - or documents that might guide me to select the best data slice

Also, I wonder , does Microsoft offer any kind of design review consulting? Where I might get someone to look at the structure I'm building and validate that the design isn't flawed or could be optimized

thanks again for you help

Mar


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.