dbTalk Databases Forums  

Explicit IN clause vs JOIN. Plan tanks with Join, Why?

comp.databases.ms-sqlserver comp.databases.ms-sqlserver


Discuss Explicit IN clause vs JOIN. Plan tanks with Join, Why? in the comp.databases.ms-sqlserver forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
bill
 
Posts: n/a

Default Explicit IN clause vs JOIN. Plan tanks with Join, Why? - 09-30-2011 , 02:24 AM






I try to generall supply real DDL and sample data, but the problem I
have only shows up in certain cases, with SELECTS against very large
tables and I can't supply enough sample data.

I have a view (some_view) that joins about eight tables. A query
like this:

SELECT
*
FROM
some_view
WHERE some_column IN ('x', 'y', 'z')

Is blazingly fast and yields a very good plan (all SEEKs, no SCANs).

But I don't want to build an IN list, and instead would rather join to
table that has the "some_column" values for which I want to filter the
SELECT on the view. The new query would thus look like this:

SELECT
*
FROM
some_view
INNER JOIN
some_table
ON some_view.some_column = some_table.some_column

some_table has a primary key on some_column. With the same three
values in "some_table" the query goes to hell. The plan ends up
SCANing 7 of the 8 tables and it takes forever to return data.

Does anyone have some general pointers for me? I know it's difficult
without sample data, but if you have some general ideas, I will try
them.

Thanks,

Bill

Reply With Quote
  #2  
Old   
Bob Barrows
 
Posts: n/a

Default Re: Explicit IN clause vs JOIN. Plan tanks with Join, Why? - 09-30-2011 , 06:45 AM






bill wrote:
Quote:
I try to generall supply real DDL and sample data, but the problem I
have only shows up in certain cases, with SELECTS against very large
tables and I can't supply enough sample data.

I have a view (some_view) that joins about eight tables. A query
like this:

SELECT
*
FROM
some_view
WHERE some_column IN ('x', 'y', 'z')

Is blazingly fast and yields a very good plan (all SEEKs, no SCANs).

But I don't want to build an IN list, and instead would rather join to
table that has the "some_column" values for which I want to filter the
SELECT on the view. The new query would thus look like this:

SELECT
*
FROM
some_view
INNER JOIN
some_table
ON some_view.some_column = some_table.some_column

some_table has a primary key on some_column. With the same three
values in "some_table" the query goes to hell. The plan ends up
SCANing 7 of the 8 tables and it takes forever to return data.

Does anyone have some general pointers for me? I know it's difficult
without sample data, but if you have some general ideas, I will try
them.

Have you run it through the tuning advisor? You failed to mention what
version of SS you are using so I wasn't sure what to call it.
If that fails to give you any hints, I think you're going to have to post
the execution plan.

Reply With Quote
  #3  
Old   
Hugo Kornelis
 
Posts: n/a

Default Re: Explicit IN clause vs JOIN. Plan tanks with Join, Why? - 09-30-2011 , 02:27 PM



Hi Bill,

You could try rewriting the second query to

SELECT some columns
FROM some_view
WHERE some_column NOT IN (SELECT some_column FROM some_table);

But frankly, I don't expect it will make much difference.

The important difference between the two queries is that in the first
case the values are hard-coded in the query, so the optimizer can
produce a plan that is optimal for the values 'x', 'y', and 'z'. The
second query is more generic, so the optimizer can only use generic
statistics when compiling the query plan, but not the specific
statistics for the values 'x', 'y', and 'z'. Or even the number of
values in some_table.

You could try if you can get a better plan by using optimizer hints,
but that requires knowledge of the tables and views involved. And it's
a glitchy road to start on; you could easily end up slipping.
--
Hugo Kornelis, SQL Server MVP
My SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis

Reply With Quote
  #4  
Old   
Erland Sommarskog
 
Posts: n/a

Default Re: Explicit IN clause vs JOIN. Plan tanks with Join, Why? - 09-30-2011 , 04:49 PM



bill (billmaclean1 (AT) gmail (DOT) com) writes:
Quote:
SELECT
*
FROM
some_view
WHERE some_column IN ('x', 'y', 'z')

Is blazingly fast and yields a very good plan (all SEEKs, no SCANs).

But I don't want to build an IN list, and instead would rather join to
table that has the "some_column" values for which I want to filter the
SELECT on the view. The new query would thus look like this:

SELECT
*
FROM
some_view
INNER JOIN
some_table
ON some_view.some_column = some_table.some_column

some_table has a primary key on some_column. With the same three
values in "some_table" the query goes to hell. The plan ends up
SCANing 7 of the 8 tables and it takes forever to return data.
As Hugo said, there first query gives the optimizer more information.

That lookup table, it does have a primary key defined, hasn't it?

Else, I can only give generic tips:

o Run UPDATE STATISTICS WITH FULLSCAN on all involved tables.
o Review the query in the view, and check that there are useful
indexes.
o Try hints to persuade the optimizer in the right direction.
o Review whether you really need SELECT * - by reducing the number
of columns, you make make an index covering.


--
Erland Sommarskog, SQL Server MVP, esquel (AT) sommarskog (DOT) se

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx

Reply With Quote
  #5  
Old   
bill
 
Posts: n/a

Default Re: Explicit IN clause vs JOIN. Plan tanks with Join, Why? - 10-04-2011 , 12:02 AM



Hi All,

Thanks for the tips! Sorry that I didn't post sooner, I didn't have
much access the boards over the weekend.

Rebuilding the stats did not work, nor did the NOT IN work. I need
all the columns from the view, but based on Erland's idea, started
commenting out columns to see if a column or set of columns was
killing performance.

Here's the (surprising to me) answer: The view contains ROW_NUMBER()
OVER PARTITION BY (some_column . . . ) where the some_column in the
partition is the same as the some_column in the WHERE clause, which is
the same as the some_column in the JOIN. When I commented out that
ROW_NUMBER(), the plan for the JOINed version became radically
better. The optimizer suggested an index, and the JOIN now performs
very well.

But a reasonable question some may ask me is "why the ROW_NUMBER()
function in the first place?" I realize that order is not
relationally significant. However, I am working with a commercial
system, and unfortunately order IS significant for parts of it, and I
can't avoid the issue. The ROW_NUMBER() takes care of the problem.

I think with a hard coded WHERE clause, the ROW_NUMBER() operator was
limiting itself to just the values listed in the partition. With the
join, the ROW_NUMBER() was operating on the entire result set, even
though it should have limited to jus the values in some_table.

I decided to do that JOINed query without the ROW_NUMBER() in a CTE,
and then do the ROW_NUMBER() on the result of the CTE.

WITH test_cte AS
(
SELECT
x
,y
,z
,aa
,bb
,some_column
FROM
some_view
INNER JOIN
some_table
ON some_view.some_column = some_table.some_column
)
SELECT
*
,row_number() OVER (PARTITION BY some_column ORDER . . . ) AS
ordinal_column
FROM
test_cte


This version returns results very quickly. I am happily surprised,
because i thought that optimizer would treat the version with a CTE
the same as the version with join and the ROW_NUMBER() function
sELECTed straight from the view. Apparently, these are treated
differently.

Thanks,

BIll

Reply With Quote
  #6  
Old   
Erland Sommarskog
 
Posts: n/a

Default Re: Explicit IN clause vs JOIN. Plan tanks with Join, Why? - 10-04-2011 , 02:22 AM



bill (billmaclean1 (AT) gmail (DOT) com) writes:
Quote:
But a reasonable question some may ask me is "why the ROW_NUMBER()
function in the first place?" I realize that order is not
relationally significant. However, I am working with a commercial
system, and unfortunately order IS significant for parts of it, and I
can't avoid the issue. The ROW_NUMBER() takes care of the problem.
What problem? You row_number() because you want number rows in your result
set, or you use the number to filter later. If you are using row_number()
with the intention that you will get a certain order in the output, you
have a bug you need to fix.

Quote:
I think with a hard coded WHERE clause, the ROW_NUMBER() operator was
limiting itself to just the values listed in the partition. With the
join, the ROW_NUMBER() was operating on the entire result set, even
though it should have limited to jus the values in some_table.
It is impossible to tell without seeing the query, but adding row_number
in a CTE can prevent the optimizer from recasting computation order,
since that would affect the result from row_number().


--
Erland Sommarskog, SQL Server MVP, esquel (AT) sommarskog (DOT) se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx

Reply With Quote
  #7  
Old   
bill
 
Posts: n/a

Default Re: Explicit IN clause vs JOIN. Plan tanks with Join, Why? - 10-05-2011 , 01:23 AM



Hi Erland:

I am not using ROW_NUMBER() in an attempt to order the result set. I
agree that would be bad. The problem to which I referred is that the
commercial system depends on the row including an ordinal number.
They shouldn't do that, but they do, so I need to return an ordinal.
ROW_NUMBER() works great to solve that issue.

I'm actually glad that the optimizer can't recast computation order in
this case, because it works much faster with the CTE and the plan
looks good. WIthout the CTE, the plan is horrible (scans on nearly
every table).

Thanks,

Bill


On Oct 4, 12:22*am, Erland Sommarskog <esq... (AT) sommarskog (DOT) se> wrote:
Quote:
bill (billmacle... (AT) gmail (DOT) com) writes:
But a reasonable question some may ask me is "why the ROW_NUMBER()
function in the first place?" *I realize that order is not
relationally significant. *However, I am working with a commercial
system, and unfortunately order IS significant for parts of it, and I
can't avoid the issue. *The ROW_NUMBER() takes care of the problem.

What problem? You row_number() because you want number rows in your result
set, or you use the number to filter later. If you are using row_number()
with the intention that you will get a certain order in the output, you
have a bug you need to fix.

I think with a hard coded WHEREclause, the ROW_NUMBER() operator was
limiting itself to just the values listed in the partition. *With the
join, the ROW_NUMBER() was operating on the entire result set, even
though it should have limited to jus the values in some_table.

It is impossible to tell without seeing the query, but adding row_number
in a CTE can prevent the optimizer from recasting computation order,
since that would affect the result from row_number().

--
Erland Sommarskog, SQL Server MVP, esq... (AT) sommarskog (DOT) se

Books Online for SQL Server 2005 athttp://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books...
Books Online for SQL Server 2000 athttp://www.microsoft.com/sql/prodinfo/previousversions/books.mspx

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.