Re: Design Strategy -
06-12-2005
, 04:40 PM
Hello AM,
6000 million rows? That's a lot. Are you sure that data is valid and
does not have duplicates with in the result set from the fact table?
This happens when the fact table joins to dimensions that have
duplicates in them.
Eg
You have a customer dimension that contains two customers with the same
customer code, and you have a base table that maintains sales data
(InvoiceSales).
DimCustomer
DimCustomerKey CustomerCode CustomerName
1 0012 Bob
2 0012 Kate
Base table InvoiceSales
CustomerCode SalesDate InvoiceNo GrossSalesValue
0012 01/01/2005 10002 $100
The following statement generates the duplicates.
SELECT
DimCustomerKey
, SalesDate
, InvoiceNo
, GrossSalesValue
FROM InvoiceSales
LEFT OUTER JOIN DimCustomer
ON InvoiceSales.dbo.CustomerCode = DimCustomer.dbo. CustomerCode
FactSales
DimCustomerKey SalesDate InvoiceNo GrossSalesValue
1 01/01/2005 10002 $100
2 01/01/2005 10002 $100
The results above are generated because of joining the customer
dimension to the base table. This will happen on any join type.
For design look at the following
1. Views that combine tables in Union all statements require Tempdb
recourses this can be really expensive in performance.
2. Index Views. If you are using views that do not contain Union all
make sure they are index. This will improve performance and query
response time.
3. A view is not as fast as a stored procedure or a select statement.
Check out the following for more tips:
http://www.sql-server-performance.com/
Hope this helps,
Myles Matheson
Data Warehouse Architect |