dbTalk Databases Forums  

How to force index usage

sybase.public.sqlanywhere.general sybase.public.sqlanywhere.general


Discuss How to force index usage in the sybase.public.sqlanywhere.general forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Pavel Karady
 
Posts: n/a

Default How to force index usage - 07-29-2009 , 06:55 AM






Greetings,

I've got a large SELECT statement which when run for the 1st time, is using
a particular index on a table (among many others), then some other SELECT
statement is being run, then this is run again and the index is no longer
used, SELECT suffers from a sequential scan and takes hours to finish
instead of < 1 second.

ASA 9.0.2.2551, info above captured from LogExpensiveQueries.

The problematic part of the SELECT is:
WHERE ...
... AND date >= '2005-01-01' AND date <= '2005-02-01'

This only happens when SELECT issued from an application, not from ISQL.

A separate index on column "date" only, exists in the table.

Any insights? Any obvious thing?
Pavel

Reply With Quote
  #2  
Old   
Nick Elson [Sybase iAnywhere]
 
Posts: n/a

Default Re: How to force index usage - 07-29-2009 , 09:41 AM






I've worked through a number of similar issues with
Version 9 before. Typically this kind of report indicates
a usage pattern where the date range ends up being
'mostly clustered' and the index ends up being much
more efficient than can be indicated by the V9 statistics.

If this is you case then read on. If not you will want
to capture plans (cold cache and for both the first
such *fast* run as well as the subsequent runs) and
get a second opinion based upon those.

[ Incidentally V10.0.x and V11 keep additional stats
on the degree of clustered-ness of an index that can
often detect this and favour an unclustered index as
if it were clustered]

Basically my theory is ... the #rows 'known' for this range
indicates there is more index+table random page I/Os
required than would be efficient compared to the case
of doing a full table scan. Table scans tend to
benefit from large I/O, whereas index accessed
rows typically do not.

But that assumption fails when the data is mostly
clustered in date order since the index scan gets
costed as being more expensive.

A safe first stab at this would be to see if changing
the query to order by that field [assuming the index is a
single date column one, and if not order by the columns
of the index in order]. If this works that could resolve
the issue. [bascially this sometimes works because
the index can get costed in because it can be used
to eliminate an expensive sort pass - introduced by
the order-by clause]

If not (stable or reliable), then if your situation is still a
clustered pattern one, with the version 9 software you
should get relief by recreating that index as a clustered one.

CAVEATE #1: Of course this is just a guess. You
should compare plans and measure the amount of
physical page reading (cold cache cases always)
on both that first run as well as later ones.


"Pavel Karady" <firstname.lastname (AT) nospam_kogerusa (DOT) com> wrote

Quote:
Greetings,

I've got a large SELECT statement which when run for the 1st time, is
using a particular index on a table (among many others), then some other
SELECT statement is being run, then this is run again and the index is no
longer used, SELECT suffers from a sequential scan and takes hours to
finish instead of < 1 second.

ASA 9.0.2.2551, info above captured from LogExpensiveQueries.

The problematic part of the SELECT is:
WHERE ...
... AND date >= '2005-01-01' AND date <= '2005-02-01'

This only happens when SELECT issued from an application, not from ISQL.

A separate index on column "date" only, exists in the table.

Any insights? Any obvious thing?
Pavel

Reply With Quote
  #3  
Old   
Nick Elson [Sybase iAnywhere]
 
Posts: n/a

Default Re: How to force index usage - 07-29-2009 , 09:43 AM



P.S. there is syntax to force indexes as well (see index hints)
but it is usually better to understand the true nature
of the beast you are taming before adopting obscure
extensions to the SQL language.

"Nick Elson [Sybase iAnywhere]" <@nick@.@elson@@sybase@.@com@> wrote in
message news:4a705faf$3 (AT) forums-3-dub (DOT) sybase.com...
Quote:
I've worked through a number of similar issues with
Version 9 before. Typically this kind of report indicates
a usage pattern where the date range ends up being
'mostly clustered' and the index ends up being much
more efficient than can be indicated by the V9 statistics.

If this is you case then read on. If not you will want
to capture plans (cold cache and for both the first
such *fast* run as well as the subsequent runs) and
get a second opinion based upon those.

[ Incidentally V10.0.x and V11 keep additional stats
on the degree of clustered-ness of an index that can
often detect this and favour an unclustered index as
if it were clustered]

Basically my theory is ... the #rows 'known' for this range
indicates there is more index+table random page I/Os
required than would be efficient compared to the case
of doing a full table scan. Table scans tend to
benefit from large I/O, whereas index accessed
rows typically do not.

But that assumption fails when the data is mostly
clustered in date order since the index scan gets
costed as being more expensive.

A safe first stab at this would be to see if changing
the query to order by that field [assuming the index is a
single date column one, and if not order by the columns
of the index in order]. If this works that could resolve
the issue. [bascially this sometimes works because
the index can get costed in because it can be used
to eliminate an expensive sort pass - introduced by
the order-by clause]

If not (stable or reliable), then if your situation is still a
clustered pattern one, with the version 9 software you
should get relief by recreating that index as a clustered one.

CAVEATE #1: Of course this is just a guess. You
should compare plans and measure the amount of
physical page reading (cold cache cases always)
on both that first run as well as later ones.


"Pavel Karady" <firstname.lastname (AT) nospam_kogerusa (DOT) com> wrote in message
news:4a7038a1$5 (AT) forums-3-dub (DOT) sybase.com...
Greetings,

I've got a large SELECT statement which when run for the 1st time, is
using a particular index on a table (among many others), then some other
SELECT statement is being run, then this is run again and the index is no
longer used, SELECT suffers from a sequential scan and takes hours to
finish instead of < 1 second.

ASA 9.0.2.2551, info above captured from LogExpensiveQueries.

The problematic part of the SELECT is:
WHERE ...
... AND date >= '2005-01-01' AND date <= '2005-02-01'

This only happens when SELECT issued from an application, not from ISQL.

A separate index on column "date" only, exists in the table.

Any insights? Any obvious thing?
Pavel



Reply With Quote
  #4  
Old   
Pavel Karady
 
Posts: n/a

Default Re: How to force index usage - 07-30-2009 , 09:38 AM



Nick, the problem is the first run on cold cache.

ASA 9 tends to favor index scan a few times, then it changes it's mind to
sequential scan when the cache is no longer cold. This appears to be
happening randomly, e.g. no methodology of running the statement 4 times and
fifth time is guaranteed to be slow. You have to try & try & try and if you
get lucky, on the 17-th try the index is not used and it hangs for hours.
I'm looking for any possible explanations for that.

I've tried adding ordering as you recommended, it actually made the
statement hang in ISQL as well for the 1st time (the rest is from the
application).

I've used FORCE INDEX and the statement runs below 1 seconds from the
application. So the index is confirmed. FORCE INDEX requires DBA authority
so it's not an option.

Still no idea what makes the db server choose seq scan instead of index
usage. Nick, per your theory:

Quote:
Basically my theory is ... the #rows 'known' for this range
indicates there is more index+table random page I/Os
required than would be efficient compared to the case
of doing a full table scan. Table scans tend to
benefit from large I/O, whereas index accessed
rows typically do not.
The date range requests a few hunderd rows from 1.3 mill rows so random I/Os
required for the index will always have lower cost than sequential scan, per
my understanding. How can this be altered. Dropping and recreating
statistics? I guess that's the next step for me, to try this out, including
the clustered index. I don't like the idea of clustered index though, I'm
looking for an exact solution that will make the db server choose the index
every time guaranteed.

Pavel

"Nick Elson [Sybase iAnywhere]" <@nick@.@elson@@sybase@.@com@> wrote in
message news:4a705faf$3 (AT) forums-3-dub (DOT) sybase.com...
Quote:
I've worked through a number of similar issues with
Version 9 before. Typically this kind of report indicates
a usage pattern where the date range ends up being
'mostly clustered' and the index ends up being much
more efficient than can be indicated by the V9 statistics.

If this is you case then read on. If not you will want
to capture plans (cold cache and for both the first
such *fast* run as well as the subsequent runs) and
get a second opinion based upon those.

[ Incidentally V10.0.x and V11 keep additional stats
on the degree of clustered-ness of an index that can
often detect this and favour an unclustered index as
if it were clustered]

Basically my theory is ... the #rows 'known' for this range
indicates there is more index+table random page I/Os
required than would be efficient compared to the case
of doing a full table scan. Table scans tend to
benefit from large I/O, whereas index accessed
rows typically do not.

But that assumption fails when the data is mostly
clustered in date order since the index scan gets
costed as being more expensive.

A safe first stab at this would be to see if changing
the query to order by that field [assuming the index is a
single date column one, and if not order by the columns
of the index in order]. If this works that could resolve
the issue. [bascially this sometimes works because
the index can get costed in because it can be used
to eliminate an expensive sort pass - introduced by
the order-by clause]

If not (stable or reliable), then if your situation is still a
clustered pattern one, with the version 9 software you
should get relief by recreating that index as a clustered one.

CAVEATE #1: Of course this is just a guess. You
should compare plans and measure the amount of
physical page reading (cold cache cases always)
on both that first run as well as later ones.


"Pavel Karady" <firstname.lastname (AT) nospam_kogerusa (DOT) com> wrote in message
news:4a7038a1$5 (AT) forums-3-dub (DOT) sybase.com...
Greetings,

I've got a large SELECT statement which when run for the 1st time, is
using a particular index on a table (among many others), then some other
SELECT statement is being run, then this is run again and the index is no
longer used, SELECT suffers from a sequential scan and takes hours to
finish instead of < 1 second.

ASA 9.0.2.2551, info above captured from LogExpensiveQueries.

The problematic part of the SELECT is:
WHERE ...
... AND date >= '2005-01-01' AND date <= '2005-02-01'

This only happens when SELECT issued from an application, not from ISQL.

A separate index on column "date" only, exists in the table.

Any insights? Any obvious thing?
Pavel



Reply With Quote
  #5  
Old   
Nick Elson [Sybase iAnywhere]
 
Posts: n/a

Default Re: How to force index usage - 07-30-2009 , 12:06 PM



Is it possible that the 'random-ness-ish-ness' comes directly
from the values used to define the range maybe? I'm pretty
sure you don't run the exact same query every time with this
exact same predicates in the range clause
Quote:
... AND date >= '2005-01-01' AND date <= '2005-02-01'
every time.

Once it runs slow the first time, what happens if you run
the same exact query (with the same exact range) the
second time? ? ? It sounds like you are suggesting it
would speed up.

To me it sounds a lot like a case of stale statistics; which
will be confirmed by the test in the last paragraph. You
may find that doing occasional runs of CREATE STATISTICS
may be sufficient to avoid this.

That may not be tactically convenient but my be necessary
until we can compare the actual plans.

Another approach could be to gather a lot more details than
what might be feasible for a newsgroup posting and taking
this question to technical support. There a lot of details
missing between the 2 'date' predicates of your 1st run and
what 'some other' queries might be doing and what other
DML operations there might be going on in between [I'm
pretty confident in suggesting the possibility your live
application is doing INS/UPD/DELs in that period as well.]



"Pavel Karady" <firstname.lastname (AT) nospam_kogerusa (DOT) com> wrote

Quote:
Nick, the problem is the first run on cold cache.

ASA 9 tends to favor index scan a few times, then it changes it's mind to
sequential scan when the cache is no longer cold. This appears to be
happening randomly, e.g. no methodology of running the statement 4 times
and fifth time is guaranteed to be slow. You have to try & try & try and
if you get lucky, on the 17-th try the index is not used and it hangs for
hours. I'm looking for any possible explanations for that.

I've tried adding ordering as you recommended, it actually made the
statement hang in ISQL as well for the 1st time (the rest is from the
application).

I've used FORCE INDEX and the statement runs below 1 seconds from the
application. So the index is confirmed. FORCE INDEX requires DBA authority
so it's not an option.

Still no idea what makes the db server choose seq scan instead of index
usage. Nick, per your theory:

Basically my theory is ... the #rows 'known' for this range
indicates there is more index+table random page I/Os
required than would be efficient compared to the case
of doing a full table scan. Table scans tend to
benefit from large I/O, whereas index accessed
rows typically do not.

The date range requests a few hunderd rows from 1.3 mill rows so random
I/Os required for the index will always have lower cost than sequential
scan, per my understanding. How can this be altered. Dropping and
recreating statistics? I guess that's the next step for me, to try this
out, including the clustered index. I don't like the idea of clustered
index though, I'm looking for an exact solution that will make the db
server choose the index every time guaranteed.

Pavel

"Nick Elson [Sybase iAnywhere]" <@nick@.@elson@@sybase@.@com@> wrote in
message news:4a705faf$3 (AT) forums-3-dub (DOT) sybase.com...
I've worked through a number of similar issues with
Version 9 before. Typically this kind of report indicates
a usage pattern where the date range ends up being
'mostly clustered' and the index ends up being much
more efficient than can be indicated by the V9 statistics.

If this is you case then read on. If not you will want
to capture plans (cold cache and for both the first
such *fast* run as well as the subsequent runs) and
get a second opinion based upon those.

[ Incidentally V10.0.x and V11 keep additional stats
on the degree of clustered-ness of an index that can
often detect this and favour an unclustered index as
if it were clustered]

Basically my theory is ... the #rows 'known' for this range
indicates there is more index+table random page I/Os
required than would be efficient compared to the case
of doing a full table scan. Table scans tend to
benefit from large I/O, whereas index accessed
rows typically do not.

But that assumption fails when the data is mostly
clustered in date order since the index scan gets
costed as being more expensive.

A safe first stab at this would be to see if changing
the query to order by that field [assuming the index is a
single date column one, and if not order by the columns
of the index in order]. If this works that could resolve
the issue. [bascially this sometimes works because
the index can get costed in because it can be used
to eliminate an expensive sort pass - introduced by
the order-by clause]

If not (stable or reliable), then if your situation is still a
clustered pattern one, with the version 9 software you
should get relief by recreating that index as a clustered one.

CAVEATE #1: Of course this is just a guess. You
should compare plans and measure the amount of
physical page reading (cold cache cases always)
on both that first run as well as later ones.


"Pavel Karady" <firstname.lastname (AT) nospam_kogerusa (DOT) com> wrote in message
news:4a7038a1$5 (AT) forums-3-dub (DOT) sybase.com...
Greetings,

I've got a large SELECT statement which when run for the 1st time, is
using a particular index on a table (among many others), then some other
SELECT statement is being run, then this is run again and the index is
no longer used, SELECT suffers from a sequential scan and takes hours to
finish instead of < 1 second.

ASA 9.0.2.2551, info above captured from LogExpensiveQueries.

The problematic part of the SELECT is:
WHERE ...
... AND date >= '2005-01-01' AND date <= '2005-02-01'

This only happens when SELECT issued from an application, not from ISQL.

A separate index on column "date" only, exists in the table.

Any insights? Any obvious thing?
Pavel





Reply With Quote
  #6  
Old   
Nick Elson [Sybase iAnywhere]
 
Posts: n/a

Default Re: How to force index usage - 07-30-2009 , 12:15 PM



Short of:

A - going to a higher level EBF ( 3124 or higher)
and taking advantage of the facility documented
in the tech. doc. http://www.sybase.com/detail?id=1055891
to capture the 'slow' plan
or
B - get lucky and capture the state of the histograms
[via, say, scheduled runs of either dbhist.exe or
sa_get_histogram(); for example]
or
C - get real luck and find a way to mock up a test case
that reproduces the phenomena

we are still just guessing here.


If you show the fast plan maybe there would be more
details to go by in that.


"Nick Elson [Sybase iAnywhere]" <@nick@.@elson@@sybase@.@com@> wrote in
message news:4a71d32b$1 (AT) forums-3-dub (DOT) sybase.com...
Quote:
Is it possible that the 'random-ness-ish-ness' comes directly
from the values used to define the range maybe? I'm pretty
sure you don't run the exact same query every time with this
exact same predicates in the range clause
... AND date >= '2005-01-01' AND date <= '2005-02-01'
every time.

Once it runs slow the first time, what happens if you run
the same exact query (with the same exact range) the
second time? ? ? It sounds like you are suggesting it
would speed up.

To me it sounds a lot like a case of stale statistics; which
will be confirmed by the test in the last paragraph. You
may find that doing occasional runs of CREATE STATISTICS
may be sufficient to avoid this.

That may not be tactically convenient but my be necessary
until we can compare the actual plans.

Another approach could be to gather a lot more details than
what might be feasible for a newsgroup posting and taking
this question to technical support. There a lot of details
missing between the 2 'date' predicates of your 1st run and
what 'some other' queries might be doing and what other
DML operations there might be going on in between [I'm
pretty confident in suggesting the possibility your live
application is doing INS/UPD/DELs in that period as well.]



"Pavel Karady" <firstname.lastname (AT) nospam_kogerusa (DOT) com> wrote in message
news:4a71b073$1 (AT) forums-3-dub (DOT) sybase.com...
Nick, the problem is the first run on cold cache.

ASA 9 tends to favor index scan a few times, then it changes it's mind to
sequential scan when the cache is no longer cold. This appears to be
happening randomly, e.g. no methodology of running the statement 4 times
and fifth time is guaranteed to be slow. You have to try & try & try and
if you get lucky, on the 17-th try the index is not used and it hangs for
hours. I'm looking for any possible explanations for that.

I've tried adding ordering as you recommended, it actually made the
statement hang in ISQL as well for the 1st time (the rest is from the
application).

I've used FORCE INDEX and the statement runs below 1 seconds from the
application. So the index is confirmed. FORCE INDEX requires DBA
authority so it's not an option.

Still no idea what makes the db server choose seq scan instead of index
usage. Nick, per your theory:

Basically my theory is ... the #rows 'known' for this range
indicates there is more index+table random page I/Os
required than would be efficient compared to the case
of doing a full table scan. Table scans tend to
benefit from large I/O, whereas index accessed
rows typically do not.

The date range requests a few hunderd rows from 1.3 mill rows so random
I/Os required for the index will always have lower cost than sequential
scan, per my understanding. How can this be altered. Dropping and
recreating statistics? I guess that's the next step for me, to try this
out, including the clustered index. I don't like the idea of clustered
index though, I'm looking for an exact solution that will make the db
server choose the index every time guaranteed.

Pavel

"Nick Elson [Sybase iAnywhere]" <@nick@.@elson@@sybase@.@com@> wrote in
message news:4a705faf$3 (AT) forums-3-dub (DOT) sybase.com...
I've worked through a number of similar issues with
Version 9 before. Typically this kind of report indicates
a usage pattern where the date range ends up being
'mostly clustered' and the index ends up being much
more efficient than can be indicated by the V9 statistics.

If this is you case then read on. If not you will want
to capture plans (cold cache and for both the first
such *fast* run as well as the subsequent runs) and
get a second opinion based upon those.

[ Incidentally V10.0.x and V11 keep additional stats
on the degree of clustered-ness of an index that can
often detect this and favour an unclustered index as
if it were clustered]

Basically my theory is ... the #rows 'known' for this range
indicates there is more index+table random page I/Os
required than would be efficient compared to the case
of doing a full table scan. Table scans tend to
benefit from large I/O, whereas index accessed
rows typically do not.

But that assumption fails when the data is mostly
clustered in date order since the index scan gets
costed as being more expensive.

A safe first stab at this would be to see if changing
the query to order by that field [assuming the index is a
single date column one, and if not order by the columns
of the index in order]. If this works that could resolve
the issue. [bascially this sometimes works because
the index can get costed in because it can be used
to eliminate an expensive sort pass - introduced by
the order-by clause]

If not (stable or reliable), then if your situation is still a
clustered pattern one, with the version 9 software you
should get relief by recreating that index as a clustered one.

CAVEATE #1: Of course this is just a guess. You
should compare plans and measure the amount of
physical page reading (cold cache cases always)
on both that first run as well as later ones.


"Pavel Karady" <firstname.lastname (AT) nospam_kogerusa (DOT) com> wrote in message
news:4a7038a1$5 (AT) forums-3-dub (DOT) sybase.com...
Greetings,

I've got a large SELECT statement which when run for the 1st time, is
using a particular index on a table (among many others), then some
other SELECT statement is being run, then this is run again and the
index is no longer used, SELECT suffers from a sequential scan and
takes hours to finish instead of < 1 second.

ASA 9.0.2.2551, info above captured from LogExpensiveQueries.

The problematic part of the SELECT is:
WHERE ...
... AND date >= '2005-01-01' AND date <= '2005-02-01'

This only happens when SELECT issued from an application, not from
ISQL.

A separate index on column "date" only, exists in the table.

Any insights? Any obvious thing?
Pavel







Reply With Quote
  #7  
Old   
Pavel Karady
 
Posts: n/a

Default Re: How to force index usage - 07-30-2009 , 12:41 PM



Nick,

thanks for these thoughts, since I've mentioned before that the statements
runs very fast in ISQL and occassionally slow from the application, the
plans confirming it's the 1 index usage difference were from
LogExpensiveQueries already. I'm actually having issues with it as XC plans
return estimates only... plus I see the statistics in XB, while XBs are
having few seconds only, XC having 10k+ seconds.

LEQ logs have been captured on 9.0.2.3632.

The issue has been reproduced in a separated environment with only 1
application running. This application issues the faulting SELECT a few
times, then a different SELECT, then the faulting one a few times, then
another different one, then the faulting one and voila the faulting one is
slow.

I'll respond with the results of tests (recreating histogram stats and
clustered index).

Pavel

"Nick Elson [Sybase iAnywhere]" <@nick@.@elson@@sybase@.@com@> wrote in
message news:4a71d521$5 (AT) forums-3-dub (DOT) sybase.com...
Quote:
Short of:

A - going to a higher level EBF ( 3124 or higher)
and taking advantage of the facility documented
in the tech. doc. http://www.sybase.com/detail?id=1055891
to capture the 'slow' plan
or
B - get lucky and capture the state of the histograms
[via, say, scheduled runs of either dbhist.exe or
sa_get_histogram(); for example]
or
C - get real luck and find a way to mock up a test case
that reproduces the phenomena

we are still just guessing here.


If you show the fast plan maybe there would be more
details to go by in that.


"Nick Elson [Sybase iAnywhere]" <@nick@.@elson@@sybase@.@com@> wrote in
message news:4a71d32b$1 (AT) forums-3-dub (DOT) sybase.com...
Is it possible that the 'random-ness-ish-ness' comes directly
from the values used to define the range maybe? I'm pretty
sure you don't run the exact same query every time with this
exact same predicates in the range clause
... AND date >= '2005-01-01' AND date <= '2005-02-01'
every time.

Once it runs slow the first time, what happens if you run
the same exact query (with the same exact range) the
second time? ? ? It sounds like you are suggesting it
would speed up.

To me it sounds a lot like a case of stale statistics; which
will be confirmed by the test in the last paragraph. You
may find that doing occasional runs of CREATE STATISTICS
may be sufficient to avoid this.

That may not be tactically convenient but my be necessary
until we can compare the actual plans.

Another approach could be to gather a lot more details than
what might be feasible for a newsgroup posting and taking
this question to technical support. There a lot of details
missing between the 2 'date' predicates of your 1st run and
what 'some other' queries might be doing and what other
DML operations there might be going on in between [I'm
pretty confident in suggesting the possibility your live
application is doing INS/UPD/DELs in that period as well.]



"Pavel Karady" <firstname.lastname (AT) nospam_kogerusa (DOT) com> wrote in message
news:4a71b073$1 (AT) forums-3-dub (DOT) sybase.com...
Nick, the problem is the first run on cold cache.

ASA 9 tends to favor index scan a few times, then it changes it's mind
to sequential scan when the cache is no longer cold. This appears to be
happening randomly, e.g. no methodology of running the statement 4 times
and fifth time is guaranteed to be slow. You have to try & try & try and
if you get lucky, on the 17-th try the index is not used and it hangs
for hours. I'm looking for any possible explanations for that.

I've tried adding ordering as you recommended, it actually made the
statement hang in ISQL as well for the 1st time (the rest is from the
application).

I've used FORCE INDEX and the statement runs below 1 seconds from the
application. So the index is confirmed. FORCE INDEX requires DBA
authority so it's not an option.

Still no idea what makes the db server choose seq scan instead of index
usage. Nick, per your theory:

Basically my theory is ... the #rows 'known' for this range
indicates there is more index+table random page I/Os
required than would be efficient compared to the case
of doing a full table scan. Table scans tend to
benefit from large I/O, whereas index accessed
rows typically do not.

The date range requests a few hunderd rows from 1.3 mill rows so random
I/Os required for the index will always have lower cost than sequential
scan, per my understanding. How can this be altered. Dropping and
recreating statistics? I guess that's the next step for me, to try this
out, including the clustered index. I don't like the idea of clustered
index though, I'm looking for an exact solution that will make the db
server choose the index every time guaranteed.

Pavel

"Nick Elson [Sybase iAnywhere]" <@nick@.@elson@@sybase@.@com@> wrote in
message news:4a705faf$3 (AT) forums-3-dub (DOT) sybase.com...
I've worked through a number of similar issues with
Version 9 before. Typically this kind of report indicates
a usage pattern where the date range ends up being
'mostly clustered' and the index ends up being much
more efficient than can be indicated by the V9 statistics.

If this is you case then read on. If not you will want
to capture plans (cold cache and for both the first
such *fast* run as well as the subsequent runs) and
get a second opinion based upon those.

[ Incidentally V10.0.x and V11 keep additional stats
on the degree of clustered-ness of an index that can
often detect this and favour an unclustered index as
if it were clustered]

Basically my theory is ... the #rows 'known' for this range
indicates there is more index+table random page I/Os
required than would be efficient compared to the case
of doing a full table scan. Table scans tend to
benefit from large I/O, whereas index accessed
rows typically do not.

But that assumption fails when the data is mostly
clustered in date order since the index scan gets
costed as being more expensive.

A safe first stab at this would be to see if changing
the query to order by that field [assuming the index is a
single date column one, and if not order by the columns
of the index in order]. If this works that could resolve
the issue. [bascially this sometimes works because
the index can get costed in because it can be used
to eliminate an expensive sort pass - introduced by
the order-by clause]

If not (stable or reliable), then if your situation is still a
clustered pattern one, with the version 9 software you
should get relief by recreating that index as a clustered one.

CAVEATE #1: Of course this is just a guess. You
should compare plans and measure the amount of
physical page reading (cold cache cases always)
on both that first run as well as later ones.


"Pavel Karady" <firstname.lastname (AT) nospam_kogerusa (DOT) com> wrote in
message news:4a7038a1$5 (AT) forums-3-dub (DOT) sybase.com...
Greetings,

I've got a large SELECT statement which when run for the 1st time, is
using a particular index on a table (among many others), then some
other SELECT statement is being run, then this is run again and the
index is no longer used, SELECT suffers from a sequential scan and
takes hours to finish instead of < 1 second.

ASA 9.0.2.2551, info above captured from LogExpensiveQueries.

The problematic part of the SELECT is:
WHERE ...
... AND date >= '2005-01-01' AND date <= '2005-02-01'

This only happens when SELECT issued from an application, not from
ISQL.

A separate index on column "date" only, exists in the table.

Any insights? Any obvious thing?
Pavel









Reply With Quote
  #8  
Old   
Nick Elson [Sybase iAnywhere]
 
Posts: n/a

Default Re: How to force index usage - 07-31-2009 , 11:00 AM



Hmmm....

Interesting bit of info this:

Quote:
The issue has been reproduced in a separated environment
with only 1 application running. This application issues the
faulting SELECT a few times, then a different SELECT,
then the faulting one a few times, then another different one,
then the faulting one and voila the faulting one is slow.
That indicates the stats are getting rewritten between those
stats due to the interleaving selects.
[a test for this possibility could be to turn off
the update_statistics option through a test run
but dumping the histograms at each step will
capture that part in fine detail]
Rewritten one way the estimates are marginally worse for the
plan that picks the index, rewritten another the estimates are
marginally better for picking the index... and the coin turns
solely on that proposition; apparently. {With the marginal
difference one way, you get the performance you want but
when it goes the other way gives you get a table scan.}

Can I also assume these interleaving queries also use
date ranges with different starting and ending sargs?
[That can easiliy rebucket-ize the stats causing statistics
to be grouped together or broken appart.]

But since that seems to likely be the case then the index is
probably giving you UN-reasonably better performance than
can be expected (in the general case). Given the index is
(now obviously) not declared as clustered, the optimizer is
not able to estimate anywhere near the performance you are
actually seeing and it is free to pick alternate plans whenever
the estimate numbers dictate that. {Again my understanding
here is that this is due to the data just happening to be
captured in mostly clustered order, that and the starts are
getting regrouped to not favour the index scanning plan.}


To me it also sounds like the nature of the query is itself
marginally optimizable as well and that is definitely something
worth drilling into more.


As to your X[B|C|S] woes, that confusion is normal.
I believe the X-type is *determined* independantly from
the content of the captured[ | dumped] information.
The XB designation is determined from the estimated
plan cost; but they will often also have actual statistics
if the running cost also goes over. Further XC entries
will not capture actual stats unless the operation completes
(i.e. not cancelled, not closed). At least that was my
experience with this facility. A little confusing but
still quite useful as-is.


.... NOTE THE HEAVY USE OF WORDS LIKE:
marginal[ly], coin flipping references, ...? That is
intentional since this question looks like optimizer is
being asked to dance a jig at the edge of a precipace.
[textual picture intended]
"Pavel Karady" <firstname.lastname (AT) nospam_kogerusa (DOT) com> wrote

Quote:
Nick,

thanks for these thoughts, since I've mentioned before that the statements
runs very fast in ISQL and occassionally slow from the application, the
plans confirming it's the 1 index usage difference were from
LogExpensiveQueries already. I'm actually having issues with it as XC
plans return estimates only... plus I see the statistics in XB, while XBs
are having few seconds only, XC having 10k+ seconds.

LEQ logs have been captured on 9.0.2.3632.

The issue has been reproduced in a separated environment with only 1
application running. This application issues the faulting SELECT a few
times, then a different SELECT, then the faulting one a few times, then
another different one, then the faulting one and voila the faulting one is
slow.

I'll respond with the results of tests (recreating histogram stats and
clustered index).

Pavel

Reply With Quote
  #9  
Old   
Pavel Karady
 
Posts: n/a

Default Re: How to force index usage - 08-03-2009 , 09:58 AM



Nick,

thanks for your joint wit & wisdom.

Yes, the SELECTs issued by the application 'in between' include a condition
for the date range on the same problematic column (where the other, slow
SELECT is given a sequential scan instead of index usage), so are you saying
that histogram stats are rewritten and verge on the edge of an optimizer
formula deciding about seq.scan / index usage? Any chance of converting them
to the permanent-index-usage version? I guess if there would be a way, you'd
mention it already. Can optimizer hints, e.g. (A >= B,0) be used effectively
here?

I've tried recreating statistics (DROP STATISTICS ON table; CREATE
STATISTICS table); tried out the SELECT, frozen on the 1st run, stopped
after 30 minutes.

At these very moments, a test with transferring that one index to CLUSTERED
is in progress.. if that won't work, I'm basically out of options for the
moment.

Pavel

"Nick Elson [Sybase iAnywhere]" <@nick@.@elson@@sybase@.@com@> wrote in
message news:4a731529 (AT) forums-3-dub (DOT) sybase.com...
Quote:
Hmmm....

Interesting bit of info this:

The issue has been reproduced in a separated environment
with only 1 application running. This application issues the
faulting SELECT a few times, then a different SELECT,
then the faulting one a few times, then another different one,
then the faulting one and voila the faulting one is slow.

That indicates the stats are getting rewritten between those
stats due to the interleaving selects.
[a test for this possibility could be to turn off
the update_statistics option through a test run
but dumping the histograms at each step will
capture that part in fine detail]
Rewritten one way the estimates are marginally worse for the
plan that picks the index, rewritten another the estimates are
marginally better for picking the index... and the coin turns
solely on that proposition; apparently. {With the marginal
difference one way, you get the performance you want but
when it goes the other way gives you get a table scan.}

Can I also assume these interleaving queries also use
date ranges with different starting and ending sargs?
[That can easiliy rebucket-ize the stats causing statistics
to be grouped together or broken appart.]

But since that seems to likely be the case then the index is
probably giving you UN-reasonably better performance than
can be expected (in the general case). Given the index is
(now obviously) not declared as clustered, the optimizer is
not able to estimate anywhere near the performance you are
actually seeing and it is free to pick alternate plans whenever
the estimate numbers dictate that. {Again my understanding
here is that this is due to the data just happening to be
captured in mostly clustered order, that and the starts are
getting regrouped to not favour the index scanning plan.}


To me it also sounds like the nature of the query is itself
marginally optimizable as well and that is definitely something
worth drilling into more.


As to your X[B|C|S] woes, that confusion is normal.
I believe the X-type is *determined* independantly from
the content of the captured[ | dumped] information.
The XB designation is determined from the estimated
plan cost; but they will often also have actual statistics
if the running cost also goes over. Further XC entries
will not capture actual stats unless the operation completes
(i.e. not cancelled, not closed). At least that was my
experience with this facility. A little confusing but
still quite useful as-is.


... NOTE THE HEAVY USE OF WORDS LIKE:
marginal[ly], coin flipping references, ...? That is
intentional since this question looks like optimizer is
being asked to dance a jig at the edge of a precipace.
[textual picture intended]
"Pavel Karady" <firstname.lastname (AT) nospam_kogerusa (DOT) com> wrote in message
news:4a71db3b$2 (AT) forums-3-dub (DOT) sybase.com...
Nick,

thanks for these thoughts, since I've mentioned before that the
statements runs very fast in ISQL and occassionally slow from the
application, the plans confirming it's the 1 index usage difference were
from LogExpensiveQueries already. I'm actually having issues with it as
XC plans return estimates only... plus I see the statistics in XB, while
XBs are having few seconds only, XC having 10k+ seconds.

LEQ logs have been captured on 9.0.2.3632.

The issue has been reproduced in a separated environment with only 1
application running. This application issues the faulting SELECT a few
times, then a different SELECT, then the faulting one a few times, then
another different one, then the faulting one and voila the faulting one
is slow.

I'll respond with the results of tests (recreating histogram stats and
clustered index).

Pavel



Reply With Quote
  #10  
Old   
Pavel Karady
 
Posts: n/a

Default Re: How to force index usage - 08-03-2009 , 11:52 AM



Basically, this case nails down to:

How to force index usage without FORCE INDEX? (Why does it require DBA auth
anyway? Is it a try-out tool only?)

Pavel

"Pavel Karady" <firstname.lastname (AT) nospam_kogerusa (DOT) com> wrote

Quote:
Nick,

thanks for your joint wit & wisdom.

Yes, the SELECTs issued by the application 'in between' include a
condition for the date range on the same problematic column (where the
other, slow SELECT is given a sequential scan instead of index usage), so
are you saying that histogram stats are rewritten and verge on the edge of
an optimizer formula deciding about seq.scan / index usage? Any chance of
converting them to the permanent-index-usage version? I guess if there
would be a way, you'd mention it already. Can optimizer hints, e.g. (A >=
B,0) be used effectively here?

I've tried recreating statistics (DROP STATISTICS ON table; CREATE
STATISTICS table); tried out the SELECT, frozen on the 1st run, stopped
after 30 minutes.

At these very moments, a test with transferring that one index to
CLUSTERED is in progress.. if that won't work, I'm basically out of
options for the moment.

Pavel

"Nick Elson [Sybase iAnywhere]" <@nick@.@elson@@sybase@.@com@> wrote in
message news:4a731529 (AT) forums-3-dub (DOT) sybase.com...
Hmmm....

Interesting bit of info this:

The issue has been reproduced in a separated environment
with only 1 application running. This application issues the
faulting SELECT a few times, then a different SELECT,
then the faulting one a few times, then another different one,
then the faulting one and voila the faulting one is slow.

That indicates the stats are getting rewritten between those
stats due to the interleaving selects.
[a test for this possibility could be to turn off
the update_statistics option through a test run
but dumping the histograms at each step will
capture that part in fine detail]
Rewritten one way the estimates are marginally worse for the
plan that picks the index, rewritten another the estimates are
marginally better for picking the index... and the coin turns
solely on that proposition; apparently. {With the marginal
difference one way, you get the performance you want but
when it goes the other way gives you get a table scan.}

Can I also assume these interleaving queries also use
date ranges with different starting and ending sargs?
[That can easiliy rebucket-ize the stats causing statistics
to be grouped together or broken appart.]

But since that seems to likely be the case then the index is
probably giving you UN-reasonably better performance than
can be expected (in the general case). Given the index is
(now obviously) not declared as clustered, the optimizer is
not able to estimate anywhere near the performance you are
actually seeing and it is free to pick alternate plans whenever
the estimate numbers dictate that. {Again my understanding
here is that this is due to the data just happening to be
captured in mostly clustered order, that and the starts are
getting regrouped to not favour the index scanning plan.}


To me it also sounds like the nature of the query is itself
marginally optimizable as well and that is definitely something
worth drilling into more.


As to your X[B|C|S] woes, that confusion is normal.
I believe the X-type is *determined* independantly from
the content of the captured[ | dumped] information.
The XB designation is determined from the estimated
plan cost; but they will often also have actual statistics
if the running cost also goes over. Further XC entries
will not capture actual stats unless the operation completes
(i.e. not cancelled, not closed). At least that was my
experience with this facility. A little confusing but
still quite useful as-is.


... NOTE THE HEAVY USE OF WORDS LIKE:
marginal[ly], coin flipping references, ...? That is
intentional since this question looks like optimizer is
being asked to dance a jig at the edge of a precipace.
[textual picture intended]
"Pavel Karady" <firstname.lastname (AT) nospam_kogerusa (DOT) com> wrote in message
news:4a71db3b$2 (AT) forums-3-dub (DOT) sybase.com...
Nick,

thanks for these thoughts, since I've mentioned before that the
statements runs very fast in ISQL and occassionally slow from the
application, the plans confirming it's the 1 index usage difference were
from LogExpensiveQueries already. I'm actually having issues with it as
XC plans return estimates only... plus I see the statistics in XB, while
XBs are having few seconds only, XC having 10k+ seconds.

LEQ logs have been captured on 9.0.2.3632.

The issue has been reproduced in a separated environment with only 1
application running. This application issues the faulting SELECT a few
times, then a different SELECT, then the faulting one a few times, then
another different one, then the faulting one and voila the faulting one
is slow.

I'll respond with the results of tests (recreating histogram stats and
clustered index).

Pavel





Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.