![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
#2
| |||
| |||
|
|
Using SQL Server AS 2000...I have read this document probably 1000 times because I am constantly struggling with performance with counting distinct customers in large data warehouses, especially when the result set is sliced by multiple dimensions. http://msdn.microsoft.com/library/de.../distinct2.asp What normally happens with a distinct cube (the normal solution to the distinct count problem), is the two slices get evaluated separately, and then added together at the end, which just about doubles the result of the distinct count measure, and renders it invalid. In this document it explains how to accomplish this distinct counting using a calculated measure which cross-joins a measure with a set, but the performance hit is unbearable (5-10 minutes on a very beefy server) if you have more than a couple thousand records in your fact table. It gets really bad if you slice by ranges of dimensions (for example, service date range of 6 months and paid date range of a different 6 months, to determine the number of unique customers that were serviced and then paid in those two date ranges). Anyway, at the bottom of the document I reference above, for performance conderations it talks about how DISTINCT COUNT queries should always be snapshot queries. But try to find any other reference to what that means anywhere else in MSDN. What are they talking about? It also talks about server vs. client side query execution. The only reference I can find for this is a property on the provider string. However, when I set this to server-side execution, it didn't make a difference in the response time, and also it didn't affect which process was using the CPU (which indicated to me that the client was still resolving the counts). Does anyone have any more insight into these performance considerations (snapshot, server-side execution) for DISTINCT COUNT calculations ? I would welcome any input to help me resolve this issue which has been vexing me for years. Thanks! |
#3
| |||
| |||
|
#4
| |||
| |||
|
|
Just curious why you wouldn't consider AS 2005, since it natively supports Distinct Count across selections of multiple members (I assume that's what you meant by "..the two slices get evaluated separately..", since the AS 2000 Distinct Count measure can handle single selections on each of multiple dimensions). - Deepak Deepak Puri Microsoft MVP - SQL Server *** Sent via Developersdex http://www.developersdex.com *** |
![]() |
| Thread Tools | |
| Display Modes | |
| |