dbTalk Databases Forums  

[BUGS] improper estimates even with high statistic values

mailing.database.pgsql-bugs mailing.database.pgsql-bugs


Discuss [BUGS] improper estimates even with high statistic values in the mailing.database.pgsql-bugs forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Robert Treat
 
Posts: n/a

Default [BUGS] improper estimates even with high statistic values - 01-17-2006 , 03:59 PM






After some extensive discussion on irc, berkus, myself and a few others
think we have uncovered a possible bug, or at the least some odd
behavior in > 8.1.1. It centers around my recent post to performance
http://archives.postgresql.org/pgsql...1/msg00248.php
and how I could not seem to get some of the estimates to become
reasonable even after bumping up my stats target to 400 which caused
every row to be analyzed. If you look at the left join and hash join
estimates of the third query you'll note they seem to always get
estimated to 1 for no reason that we could come up with.

Someone else on irc seemed to have a similar problem to this, so we are
wondering if there is some problem here. So the question really is if
someone can deduce the behavior from looking at what was provided in the
email? If not and you have questions let me know, otherwise I can send a
chopped up test database which can reproduce the query issues off list
should someone want to walk through the pg code to investigate. TIA


Robert Treat
--
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL


---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Reply With Quote
  #2  
Old   
Bruce Momjian
 
Posts: n/a

Default Re: [BUGS] improper estimates even with high statistic values - 01-18-2006 , 03:11 PM







Magnus reported a similar problem with path names. I looked at his
statistics and found that even at 100 buckets, his LIKE 'f:/.../%" query
would never span more than one bucket, and because all the path names
were unique, there were no most common values.

In the case where the LIKE hits only one bucket, and there are no most
common values, how is the optimzier supposed to estimate the number of
rows, especially for cases where the values in the buckets are unevenly
distributed.

---------------------------------------------------------------------------

Robert Treat wrote:
Quote:
After some extensive discussion on irc, berkus, myself and a few others
think we have uncovered a possible bug, or at the least some odd
behavior in > 8.1.1. It centers around my recent post to performance
http://archives.postgresql.org/pgsql...1/msg00248.php
and how I could not seem to get some of the estimates to become
reasonable even after bumping up my stats target to 400 which caused
every row to be analyzed. If you look at the left join and hash join
estimates of the third query you'll note they seem to always get
estimated to 1 for no reason that we could come up with.

Someone else on irc seemed to have a similar problem to this, so we are
wondering if there is some problem here. So the question really is if
someone can deduce the behavior from looking at what was provided in the
email? If not and you have questions let me know, otherwise I can send a
chopped up test database which can reproduce the query issues off list
should someone want to walk through the pg code to investigate. TIA


Robert Treat
--
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL


---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

--
Bruce Momjian | http://candle.pha.pa.us
pgman (AT) candle (DOT) pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq


Reply With Quote
  #3  
Old   
Robert Treat
 
Posts: n/a

Default Re: [BUGS] improper estimates even with high statistic values - 01-20-2006 , 10:49 AM



Magnus's case seemed like a beast of a different animal to me, given it
was a direct index scan using a wildcard based search on a primary key
column; I'd agree I don't know exactly how it would determine a value
different that 1.

But in my example, this misestimation comes between columns that are not
primary keys, contain duplicate values (so they are aren't unique), and
involves left joining subqueries. It doesn't seem to follow that it
would always reduce to 1 row quite so easily.


Robert Treat

On Wed, 2006-01-18 at 16:11, Bruce Momjian wrote:
Quote:
Magnus reported a similar problem with path names. I looked at his
statistics and found that even at 100 buckets, his LIKE 'f:/.../%" query
would never span more than one bucket, and because all the path names
were unique, there were no most common values.

In the case where the LIKE hits only one bucket, and there are no most
common values, how is the optimzier supposed to estimate the number of
rows, especially for cases where the values in the buckets are unevenly
distributed.

---------------------------------------------------------------------------

Robert Treat wrote:
After some extensive discussion on irc, berkus, myself and a few others
think we have uncovered a possible bug, or at the least some odd
behavior in > 8.1.1. It centers around my recent post to performance
http://archives.postgresql.org/pgsql...1/msg00248.php
and how I could not seem to get some of the estimates to become
reasonable even after bumping up my stats target to 400 which caused
every row to be analyzed. If you look at the left join and hash join
estimates of the third query you'll note they seem to always get
estimated to 1 for no reason that we could come up with.

Someone else on irc seemed to have a similar problem to this, so we are
wondering if there is some problem here. So the question really is if
someone can deduce the behavior from looking at what was provided in the
email? If not and you have questions let me know, otherwise I can send a
chopped up test database which can reproduce the query issues off list
should someone want to walk through the pg code to investigate. TIA


Robert Treat
--
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL


---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster


--
Bruce Momjian | http://candle.pha.pa.us
pgman (AT) candle (DOT) pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq
--
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match


Reply With Quote
  #4  
Old   
Bruce Momjian
 
Posts: n/a

Default Re: [BUGS] improper estimates even with high statistic values - 01-26-2006 , 04:42 PM




Has this been researched. Josh posted he thought it was an optimizer
bug, but I haven't seen anyone investigate it:

http://archives.postgresql.org/pgsql...1/msg00248.php

http://archives.postgresql.org/pgsql...1/msg00265.php

---------------------------------------------------------------------------

Robert Treat wrote:
Quote:
Magnus's case seemed like a beast of a different animal to me, given it
was a direct index scan using a wildcard based search on a primary key
column; I'd agree I don't know exactly how it would determine a value
different that 1.

But in my example, this misestimation comes between columns that are not
primary keys, contain duplicate values (so they are aren't unique), and
involves left joining subqueries. It doesn't seem to follow that it
would always reduce to 1 row quite so easily.


Robert Treat

On Wed, 2006-01-18 at 16:11, Bruce Momjian wrote:

Magnus reported a similar problem with path names. I looked at his
statistics and found that even at 100 buckets, his LIKE 'f:/.../%" query
would never span more than one bucket, and because all the path names
were unique, there were no most common values.

In the case where the LIKE hits only one bucket, and there are no most
common values, how is the optimzier supposed to estimate the number of
rows, especially for cases where the values in the buckets are unevenly
distributed.

---------------------------------------------------------------------------

Robert Treat wrote:
After some extensive discussion on irc, berkus, myself and a few others
think we have uncovered a possible bug, or at the least some odd
behavior in > 8.1.1. It centers around my recent post to performance
http://archives.postgresql.org/pgsql...1/msg00248.php
and how I could not seem to get some of the estimates to become
reasonable even after bumping up my stats target to 400 which caused
every row to be analyzed. If you look at the left join and hash join
estimates of the third query you'll note they seem to always get
estimated to 1 for no reason that we could come up with.

Someone else on irc seemed to have a similar problem to this, so we are
wondering if there is some problem here. So the question really is if
someone can deduce the behavior from looking at what was provided in the
email? If not and you have questions let me know, otherwise I can send a
chopped up test database which can reproduce the query issues off list
should someone want to walk through the pg code to investigate. TIA


Robert Treat
--
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL


---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster


--
Bruce Momjian | http://candle.pha.pa.us
pgman (AT) candle (DOT) pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq
--
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL

--
Bruce Momjian | http://candle.pha.pa.us
pgman (AT) candle (DOT) pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org


Reply With Quote
  #5  
Old   
Robert Treat
 
Posts: n/a

Default Re: [BUGS] improper estimates even with high statistic values - 01-27-2006 , 09:36 AM



To my knowledge it hasn't, at least no one has asked me for the sample
database.

Robert Treat

On Thursday 26 January 2006 17:42, Bruce Momjian wrote:
Quote:
Has this been researched. Josh posted he thought it was an optimizer
bug, but I haven't seen anyone investigate it:

http://archives.postgresql.org/pgsql...1/msg00248.php

http://archives.postgresql.org/pgsql...1/msg00265.php

---------------------------------------------------------------------------

Robert Treat wrote:
Magnus's case seemed like a beast of a different animal to me, given it
was a direct index scan using a wildcard based search on a primary key
column; I'd agree I don't know exactly how it would determine a value
different that 1.

But in my example, this misestimation comes between columns that are not
primary keys, contain duplicate values (so they are aren't unique), and
involves left joining subqueries. It doesn't seem to follow that it
would always reduce to 1 row quite so easily.


Robert Treat

On Wed, 2006-01-18 at 16:11, Bruce Momjian wrote:
Magnus reported a similar problem with path names. I looked at his
statistics and found that even at 100 buckets, his LIKE 'f:/.../%"
query would never span more than one bucket, and because all the path
names were unique, there were no most common values.

In the case where the LIKE hits only one bucket, and there are no most
common values, how is the optimzier supposed to estimate the number of
rows, especially for cases where the values in the buckets are unevenly
distributed.

-----------------------------------------------------------------------
----

Robert Treat wrote:
After some extensive discussion on irc, berkus, myself and a few
others think we have uncovered a possible bug, or at the least some
odd behavior in > 8.1.1. It centers around my recent post to
performance
http://archives.postgresql.org/pgsql...1/msg00248.php
and how I could not seem to get some of the estimates to become
reasonable even after bumping up my stats target to 400 which caused
every row to be analyzed. If you look at the left join and hash join
estimates of the third query you'll note they seem to always get
estimated to 1 for no reason that we could come up with.

Someone else on irc seemed to have a similar problem to this, so we
are wondering if there is some problem here. So the question really
is if someone can deduce the behavior from looking at what was
provided in the email? If not and you have questions let me know,
otherwise I can send a chopped up test database which can reproduce
the query issues off list should someone want to walk through the pg
code to investigate. TIA


Robert Treat
--
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL


---------------------------(end of
broadcast)--------------------------- TIP 2: Don't 'kill -9' the
postmaster

--
Bruce Momjian | http://candle.pha.pa.us
pgman (AT) candle (DOT) pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania
19073

---------------------------(end of
broadcast)--------------------------- TIP 3: Have you checked our
extensive FAQ?

http://www.postgresql.org/docs/faq

--
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
--
Robert Treat
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.