dbTalk Databases Forums  

Why is this query faster?

microsoft.public.sqlserver.olap microsoft.public.sqlserver.olap


Discuss Why is this query faster? in the microsoft.public.sqlserver.olap forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Charlie
 
Posts: n/a

Default Why is this query faster? - 04-06-2005 , 02:12 AM






Could somebody explain why the first query below is significantly faster
than the second query. When I say faster, I'm talking 1 second versus a few
minutes. The only differences are the way the WITH MEMBER section is built
up, so I haven't included the full query.

The Source and Destination dimension has 300 members on it (001 to 300). The
slow query below is physically a much longer query when you expand the "etc
etc" parts below so don't be deceived by its short length below. The result
produced for [SOURCE].[Slicer] and [DESTINATION].[Slicer] should be exactly
the same in both queries.

So why does the way it is expressed (or built-up) matter? This should only
make a split second difference.

Fast Version of Query:
WITH MEMBER [SOURCE].[Slicer] AS '[SOURCE].[ALL SOURCE] - ([SOURCE].[298] +
[SOURCE].[299] + [SOURCE].[300])'
MEMBER [DESTINATION].[Slicer] AS '[DESTINATION].[298] + [DESTINATION].[299]
+ [DESTINATION].[300]'

Slow Version of Query:
WITH MEMBER [SOURCE].[Slicer] AS '[SOURCE].[001] + [SOURCE].[002] + {etc
etc} + [SOURCE].[297]'
MEMBER [DESTINATION].[Slicer] AS 'SUM(DESTINATION.DESTINATION.MEMBERS) -
([DESTINATION].[001] + [DESTINATION].[002] + {etc etc} +
[DESTINATION].[297])'



Reply With Quote
  #2  
Old   
Chris Webb
 
Posts: n/a

Default RE: Why is this query faster? - 04-06-2005 , 08:19 AM






Almost certainly because in the first query, when you request the value for
[Source].[All Source] you are hitting an aggregation, so that it doesn't need
to be computed at runtime. That then means that the first query only does a
fraction of the work that the second does.

You might find the following article interesting if you want to confirm this
http://spaces.msn.com/members/cwebbbi/Blog/cns!1pi7ETChsJ1un_2s41jm9Iyg!119.entry

HTH,

Chris

"Charlie" wrote:

Quote:
Could somebody explain why the first query below is significantly faster
than the second query. When I say faster, I'm talking 1 second versus a few
minutes. The only differences are the way the WITH MEMBER section is built
up, so I haven't included the full query.

The Source and Destination dimension has 300 members on it (001 to 300). The
slow query below is physically a much longer query when you expand the "etc
etc" parts below so don't be deceived by its short length below. The result
produced for [SOURCE].[Slicer] and [DESTINATION].[Slicer] should be exactly
the same in both queries.

So why does the way it is expressed (or built-up) matter? This should only
make a split second difference.

Fast Version of Query:
WITH MEMBER [SOURCE].[Slicer] AS '[SOURCE].[ALL SOURCE] - ([SOURCE].[298] +
[SOURCE].[299] + [SOURCE].[300])'
MEMBER [DESTINATION].[Slicer] AS '[DESTINATION].[298] + [DESTINATION].[299]
+ [DESTINATION].[300]'

Slow Version of Query:
WITH MEMBER [SOURCE].[Slicer] AS '[SOURCE].[001] + [SOURCE].[002] + {etc
etc} + [SOURCE].[297]'
MEMBER [DESTINATION].[Slicer] AS 'SUM(DESTINATION.DESTINATION.MEMBERS) -
([DESTINATION].[001] + [DESTINATION].[002] + {etc etc} +
[DESTINATION].[297])'




Reply With Quote
  #3  
Old   
Charlie
 
Posts: n/a

Default Re: Why is this query faster? - 04-07-2005 , 07:29 AM



"Chris Webb" <OnlyForPostingToNewsgroups (AT) hotmail (DOT) com> wrote

Quote:
Almost certainly because in the first query, when you request the value
for
[Source].[All Source] you are hitting an aggregation, so that it doesn't
need
to be computed at runtime. That then means that the first query only does
a
fraction of the work that the second does.
I didn't expect the cube to evaluate the query this way...but I guess it
makes sense.

The problem I have now is if you allow the user to pick a bunch of source
locations (eg. 100 source locations) and the unselected locations become
destination locations, a massive query is generated. This query takes many
minutes to execute.

SQL Server seems to do a more efficient job at grouping/filtering the data.
If I was to express the problem in SQL, it would be similar to this (the
user selected locations 001 to 297, there are 12 months of data):

SELECT Month, Sum(MyMeasure)
FROM MyTable
WHERE Source IN ('001', '002', '003' .... '297')
AND Destination NOT IN ('001', '002', '003' .... '297')
GROUP BY Month

It doesn't take too long to run in SQL Server. The cube takes forever to do
this job. How can you create a well designed cube or query to optimise this?

Thanks,
Charlie




Reply With Quote
  #4  
Old   
Chris Webb
 
Posts: n/a

Default Re: Why is this query faster? - 04-07-2005 , 10:05 AM



It's true that in the same way that some problems are easy in MDX that are
hard in SQL, some problems are hard in MDX that are easy in SQL. This is
unfortunately one of the latter, at least as far as generating an optimised
query goes.

As you've found, if the user selects many members on Source you can produce
a query that runs fairly quickly by framing it as [Source].[All Source] -
{[not selected members]} and having the correct aggregations built, but that
doesn't work if the user only selects one or two members - summing up the
selected members will work faster in that case. If you don't want to build
that kind of logic into whatever is generating your MDX then we need to look
for a query that is going to perform reasonably well in all scenarios.

Without thinking of any other kinds of optimisation, my first shot at such a
query would look something like this:

WITH
SET MYSOURCES AS '{[SOURCE].[001], [SOURCE].[002],... ETC}'
SET MYDESTS AS 'EXCEPT(
[DESTINATION].[DESTINATION LEVEL].MEMBERS,
GENERATE(MYSOURCES, LINKMEMBER([SOURCE].CURRENTMEMBER, [DESTINATION]))
)'
MEMBER [SOURCE].SLICER AS 'SUM(
NONEMPTYCROSSJOIN(MYSOURCES, MYDESTS)
)'
SELECT MEASURES.MEMBERS ON 0,
[MONTHLYTIME].MEMBERS ON 1
FROM
MYCUBE
WHERE([SOURCE].SLICER)

NB I've assumed that the keys you've used on the Source and Destination
dimensions are the same, and used a combination of GENERATE and LINKMEMBER to
reduce the overall size of the query text generated.

Now I don't think the above query is going to perform much better than your
original query and almost certainly it won't perform as fast as the 'fast'
query, but it should perform the same in all scenarios. Can you try it and
let me know how it does?

The next step would be to try to reduce the amount of summing up the above
query does by leveraging aggregations. What you would need to do is create
two or three extra levels on both Source and Destination to group members
together. Ideally these extra levels would make sense to the user, ie
grouping Source and Destination into something like City, State and Country,
and would group together members that users would pick in the same query, but
it would still be useful even if you created a dimension that looked
something like: Sources 0-100->Sources-50-100->Sources 50-60->Source 056. We
would then be able to employ a technique similar to the one I describe in my
blog for optimising YTD calculations here:
http://spaces.msn.com/members/cwebbbi/Blog/cns!1pi7ETChsJ1un_2s41jm9Iyg!107.entry
http://spaces.msn.com/members/cwebbbi/Blog/cns!1pi7ETChsJ1un_2s41jm9Iyg!111.entry
....which involves resolving a set of leaf level members to common ancestors.

Also, I know that someone else mentioned this earlier, but using
Execution Location=3; Default Isolation Mode=1
in your connection string to try to push query execution to the server would
be a good thing to try too, as would
Cache Policy=7
and
Cache Ratio=0.1; Cache Ratio2=0.1
either individually or in combination.

Regards,

Chris



"Charlie" wrote:

Quote:
"Chris Webb" <OnlyForPostingToNewsgroups (AT) hotmail (DOT) com> wrote in message
news:7EBAC267-915E-47C7-8D4F-B3E0F7C82902 (AT) microsoft (DOT) com...
Almost certainly because in the first query, when you request the value
for
[Source].[All Source] you are hitting an aggregation, so that it doesn't
need
to be computed at runtime. That then means that the first query only does
a
fraction of the work that the second does.

I didn't expect the cube to evaluate the query this way...but I guess it
makes sense.

The problem I have now is if you allow the user to pick a bunch of source
locations (eg. 100 source locations) and the unselected locations become
destination locations, a massive query is generated. This query takes many
minutes to execute.

SQL Server seems to do a more efficient job at grouping/filtering the data.
If I was to express the problem in SQL, it would be similar to this (the
user selected locations 001 to 297, there are 12 months of data):

SELECT Month, Sum(MyMeasure)
FROM MyTable
WHERE Source IN ('001', '002', '003' .... '297')
AND Destination NOT IN ('001', '002', '003' .... '297')
GROUP BY Month

It doesn't take too long to run in SQL Server. The cube takes forever to do
this job. How can you create a well designed cube or query to optimise this?

Thanks,
Charlie




Reply With Quote
  #5  
Old   
Charlie
 
Posts: n/a

Default Re: Why is this query faster? - 04-07-2005 , 11:46 PM



"Chris Webb" <OnlyForPostingToNewsgroups (AT) hotmail (DOT) com> wrote

Quote:
Without thinking of any other kinds of optimisation, my first shot at such
a
query would look something like this:

WITH
SET MYSOURCES AS '{[SOURCE].[001], [SOURCE].[002],... ETC}'
SET MYDESTS AS 'EXCEPT(
[DESTINATION].[DESTINATION LEVEL].MEMBERS,
GENERATE(MYSOURCES, LINKMEMBER([SOURCE].CURRENTMEMBER, [DESTINATION]))
)'
MEMBER [SOURCE].SLICER AS 'SUM(
NONEMPTYCROSSJOIN(MYSOURCES, MYDESTS)
)'
SELECT MEASURES.MEMBERS ON 0,
[MONTHLYTIME].MEMBERS ON 1
FROM
MYCUBE
WHERE([SOURCE].SLICER)
This query seems to take about 2-3 seconds to execute regardless of how many
locations are selected. This is very acceptable, so thanks for suggesting
it.

Quote:
Also, I know that someone else mentioned this earlier, but using
Execution Location=3; Default Isolation Mode=1
in your connection string to try to push query execution to the server
would
be a good thing to try too, as would
I've already tried using server-side execution, and it does seem to offer a
modest performance gain.

I will consider your other suggestions too but for now the query above was a
great help and probably all I needed.

Thanks,
Charlie




Reply With Quote
  #6  
Old   
Chris Webb
 
Posts: n/a

Default Re: Why is this query faster? - 04-08-2005 , 04:27 AM



Great. I would imagine the reason why the new query performs better is a
combination of the use of NECJ and the fact we're only doing one SUM - see
http://sqljunkies.com/WebLog/mosha/a...2/13/7784.aspx
for a more detailed discussion on this topic.

Regarding the other optimisation tips, I would guess that Cache Policy=7
would be the one that would offer the most improvement.

Chris

"Charlie" wrote:

Quote:
"Chris Webb" <OnlyForPostingToNewsgroups (AT) hotmail (DOT) com> wrote in message
news:5290F380-4353-4C32-B5B7-C53BB8D62B9F (AT) microsoft (DOT) com...
Without thinking of any other kinds of optimisation, my first shot at such
a
query would look something like this:

WITH
SET MYSOURCES AS '{[SOURCE].[001], [SOURCE].[002],... ETC}'
SET MYDESTS AS 'EXCEPT(
[DESTINATION].[DESTINATION LEVEL].MEMBERS,
GENERATE(MYSOURCES, LINKMEMBER([SOURCE].CURRENTMEMBER, [DESTINATION]))
)'
MEMBER [SOURCE].SLICER AS 'SUM(
NONEMPTYCROSSJOIN(MYSOURCES, MYDESTS)
)'
SELECT MEASURES.MEMBERS ON 0,
[MONTHLYTIME].MEMBERS ON 1
FROM
MYCUBE
WHERE([SOURCE].SLICER)

This query seems to take about 2-3 seconds to execute regardless of how many
locations are selected. This is very acceptable, so thanks for suggesting
it.

Also, I know that someone else mentioned this earlier, but using
Execution Location=3; Default Isolation Mode=1
in your connection string to try to push query execution to the server
would
be a good thing to try too, as would

I've already tried using server-side execution, and it does seem to offer a
modest performance gain.

I will consider your other suggestions too but for now the query above was a
great help and probably all I needed.

Thanks,
Charlie




Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.