dbTalk Databases Forums  

Partitions and the optimizer

comp.databases.postgresql comp.databases.postgresql


Discuss Partitions and the optimizer in the comp.databases.postgresql forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Mladen Gogala
 
Posts: n/a

Default Partitions and the optimizer - 08-30-2010 , 10:17 AM






Optimizer doesn't behave well when the tables are partitioned. Here is
the problem:

news=# explain
select max("document#") from moreover_documents_y2010m08;

QUERY
PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=0.15..0.16 rows=1 width=0)
InitPlan 1 (returns $0)
-> Limit (cost=0.00..0.15 rows=1 width=8)
-> Index Scan Backward using pk_moreover_documents_y2010m08
on moreover_documents_y2010m08 (cost=0.00..2169871.61 rows=14615132
width=8)
Filter: ("document#" IS NOT NULL)
(5 rows)

Time: 31.191 ms
news=# explain
select max("document#") from moreover_documents;
QUERY
PLAN
-----------------------------------------------------------------------------------------------------------------------
Aggregate (cost=4227019.40..4227019.41 rows=1 width=8)
-> Append (cost=0.00..4145103.32 rows=32766432 width=8)
-> Seq Scan on moreover_documents (cost=0.00..10.20 rows=20
width=8)
-> Seq Scan on moreover_documents_y2010m06 moreover_documents
(cost=0.00..236523.53 rows=1856853 width=8)
-> Seq Scan on moreover_documents_y2010m07 moreover_documents
(cost=0.00..2073778.27 rows=16294327 width=8)
-> Seq Scan on moreover_documents_y2010m08 moreover_documents
(cost=0.00..1834740.32 rows=14615132 width=8)
-> Seq Scan on moreover_documents_y2010m09 moreover_documents
(cost=0.00..10.20 rows=20 width=8)
-> Seq Scan on moreover_documents_y2010m10 moreover_documents
(cost=0.00..10.20 rows=20 width=8)
-> Seq Scan on moreover_documents_y2010m11 moreover_documents
(cost=0.00..10.20 rows=20 width=8)
-> Seq Scan on moreover_documents_y2010m12 moreover_documents
(cost=0.00..10.20 rows=20 width=8)
-> Seq Scan on moreover_documents_y2011m01 moreover_documents
(cost=0.00..10.20 rows=20 width=8)
(11 rows)

Time: 31.961 ms
news=#

I have several partitions for the table, 3 of them are loaded. When I
search for the maximum value of the primary key column on a single
partition, the access path is as it should be, using the PK index. Each
of the partitions has the same PK. When I try selecting the maximum on
the entire table, the indexes on the partitions are not used, Postgres
does the full scan instead. The difference in timing is drastic:
news=# select max("document#") from moreover_documents_y2010m08;
max
------------
1175107508
(1 row)

Time: 56.778 ms
news=# select max("document#") from moreover_documents;
max
------------
1175107508
(1 row)

Time: 200490.228 ms
news=#



--
http://mgogala.byethost5.com

Reply With Quote
  #2  
Old   
Anselmo Canfora
 
Posts: n/a

Default Re: Partitions and the optimizer - 09-01-2010 , 02:30 AM






Hi, how did you define constraints in child tables?

Il 30/08/2010 17.17, Mladen Gogala ha scritto:
Quote:
Optimizer doesn't behave well when the tables are partitioned. Here is
the problem:

news=# explain
select max("document#") from moreover_documents_y2010m08;

QUERY
PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=0.15..0.16 rows=1 width=0)
InitPlan 1 (returns $0)
-> Limit (cost=0.00..0.15 rows=1 width=8)
-> Index Scan Backward using pk_moreover_documents_y2010m08
on moreover_documents_y2010m08 (cost=0.00..2169871.61 rows=14615132
width=8)
Filter: ("document#" IS NOT NULL)
(5 rows)

Time: 31.191 ms
news=# explain
select max("document#") from moreover_documents;
QUERY
PLAN
-----------------------------------------------------------------------------------------------------------------------
Aggregate (cost=4227019.40..4227019.41 rows=1 width=8)
-> Append (cost=0.00..4145103.32 rows=32766432 width=8)
-> Seq Scan on moreover_documents (cost=0.00..10.20 rows=20
width=8)
-> Seq Scan on moreover_documents_y2010m06 moreover_documents
(cost=0.00..236523.53 rows=1856853 width=8)
-> Seq Scan on moreover_documents_y2010m07 moreover_documents
(cost=0.00..2073778.27 rows=16294327 width=8)
-> Seq Scan on moreover_documents_y2010m08 moreover_documents
(cost=0.00..1834740.32 rows=14615132 width=8)
-> Seq Scan on moreover_documents_y2010m09 moreover_documents
(cost=0.00..10.20 rows=20 width=8)
-> Seq Scan on moreover_documents_y2010m10 moreover_documents
(cost=0.00..10.20 rows=20 width=8)
-> Seq Scan on moreover_documents_y2010m11 moreover_documents
(cost=0.00..10.20 rows=20 width=8)
-> Seq Scan on moreover_documents_y2010m12 moreover_documents
(cost=0.00..10.20 rows=20 width=8)
-> Seq Scan on moreover_documents_y2011m01 moreover_documents
(cost=0.00..10.20 rows=20 width=8)
(11 rows)

Time: 31.961 ms
news=#

I have several partitions for the table, 3 of them are loaded. When I
search for the maximum value of the primary key column on a single
partition, the access path is as it should be, using the PK index. Each
of the partitions has the same PK. When I try selecting the maximum on
the entire table, the indexes on the partitions are not used, Postgres
does the full scan instead. The difference in timing is drastic:
news=# select max("document#") from moreover_documents_y2010m08;
max
------------
1175107508
(1 row)

Time: 56.778 ms
news=# select max("document#") from moreover_documents;
max
------------
1175107508
(1 row)

Time: 200490.228 ms
news=#



Reply With Quote
  #3  
Old   
Mladen Gogala
 
Posts: n/a

Default Re: Partitions and the optimizer - 09-01-2010 , 01:14 PM



On Wed, 01 Sep 2010 09:30:49 +0200, Anselmo Canfora wrote:

Quote:
Hi, how did you define constraints in child tables?
Here it is, directly from PgAdmin:

ALTER TABLE moreover.moreover_documents_y2010m09
ADD CONSTRAINT moreover_documents_y2010m09_created_at_check CHECK
(created_at >= '2010-09-01 00:00:00'::timestamp without time zone AND
created_at < '2010-10-01 00:00:00'::timestamp without time zone);

ALTER TABLE moreover.moreover_documents_y2010m09
ADD CONSTRAINT pk_moreover_documents_y2010m09 PRIMARY KEY(document_id);

There are monthly partitions, all identically defined, ranging from June
2010 to January 2011. There is also the primary key constraint on
document_id on the empty master table.

--
http://mgogala.byethost5.com

Reply With Quote
  #4  
Old   
Anselmo Canfora
 
Posts: n/a

Default Re: Partitions and the optimizer - 09-02-2010 , 05:10 AM



You have to insert the column on which is defined the check in the
"where" (or "on" with joins) condition of the select in order to enable
the planner to locate the correct partition on which shrink the search,
for example, assuming that your document_id is always ascending:

select max(document#) from moreover_documents where
date_trunc('month', created_at)>=date_trunc('month', current_timestamp);

Il 01/09/2010 20.14, Mladen Gogala ha scritto:
Quote:
On Wed, 01 Sep 2010 09:30:49 +0200, Anselmo Canfora wrote:

Hi, how did you define constraints in child tables?

Here it is, directly from PgAdmin:

ALTER TABLE moreover.moreover_documents_y2010m09
ADD CONSTRAINT moreover_documents_y2010m09_created_at_check CHECK
(created_at>= '2010-09-01 00:00:00'::timestamp without time zone AND
created_at< '2010-10-01 00:00:00'::timestamp without time zone);

ALTER TABLE moreover.moreover_documents_y2010m09
ADD CONSTRAINT pk_moreover_documents_y2010m09 PRIMARY KEY(document_id);

There are monthly partitions, all identically defined, ranging from June
2010 to January 2011. There is also the primary key constraint on
document_id on the empty master table.

Reply With Quote
  #5  
Old   
Mladen Gogala
 
Posts: n/a

Default Re: Partitions and the optimizer - 09-02-2010 , 07:42 AM



On Thu, 02 Sep 2010 12:10:01 +0200, Anselmo Canfora wrote:

Quote:
You have to insert the column on which is defined the check in the
"where" (or "on" with joins) condition of the select in order to enable
the planner to locate the correct partition on which shrink the search,
for example, assuming that your document_id is always ascending:

select max(document#) from moreover_documents where date_trunc('month',
created_at)>=date_trunc('month', current_timestamp);
Anselmo, what I am complaining about is the fact that optimizer doesn't
use available indexes in partitioned tables the same way it uses them in
non-partitioned tables.



--
http://mgogala.byethost5.com

Reply With Quote
  #6  
Old   
Anselmo Canfora
 
Posts: n/a

Default Re: Partitions and the optimizer - 09-02-2010 , 08:18 AM



Il 02/09/2010 14.42, Mladen Gogala ha scritto:
Quote:
On Thu, 02 Sep 2010 12:10:01 +0200, Anselmo Canfora wrote:

You have to insert the column on which is defined the check in the
"where" (or "on" with joins) condition of the select in order to enable
the planner to locate the correct partition on which shrink the search,
for example, assuming that your document_id is always ascending:

select max(document#) from moreover_documents where date_trunc('month',
created_at)>=date_trunc('month', current_timestamp);

Anselmo, what I am complaining about is the fact that optimizer doesn't
use available indexes in partitioned tables the same way it uses them in
non-partitioned tables.
The PG optimizer it is not so smart in this case, you have to locate the
table on which do the index scan (your indexes are after all defined in
child tables, not in master).

Reply With Quote
  #7  
Old   
Mladen Gogala
 
Posts: n/a

Default Re: Partitions and the optimizer - 09-02-2010 , 08:20 AM



On Thu, 02 Sep 2010 15:18:03 +0200, Anselmo Canfora wrote:

Quote:
The PG optimizer it is not so smart in this case
Exactly! Voila! That is what I am complaining about. It should be smarter.



--
http://mgogala.byethost5.com

Reply With Quote
  #8  
Old   
Bob Badour
 
Posts: n/a

Default Re: Partitions and the optimizer - 09-02-2010 , 08:38 AM



Mladen Gogala wrote:

Quote:
On Thu, 02 Sep 2010 15:18:03 +0200, Anselmo Canfora wrote:

The PG optimizer it is not so smart in this case

Exactly! Voila! That is what I am complaining about. It should be smarter.
Mladen, I love your posts, and I really respect your ability and your
professionalism as an engineer. I am moved to post because I bristle at
the word "should". I try not to should on anyone or anything.

It suffices to say postgres could easily be smarter and would better
compete with other products like Oracle if it were smarter.

Equally, because postgresql is an open source database, you could easily
make it smarter.

Reply With Quote
  #9  
Old   
Mladen Gogala
 
Posts: n/a

Default Re: Partitions and the optimizer - 09-02-2010 , 10:04 AM



On Thu, 02 Sep 2010 10:38:07 -0300, Bob Badour wrote:

Quote:
It suffices to say postgres could easily be smarter and would better
compete with other products like Oracle if it were smarter.

Equally, because postgresql is an open source database, you could easily
make it smarter.
Yes, that is true. I downloaded the Postgresql source and am looking into
it, but it's very hard to follow, especially for someone who is not a C
programmer, like me. I would really be grateful for any commentary/
instructions about the code structure....



--
http://mgogala.byethost5.com

Reply With Quote
  #10  
Old   
Bob Badour
 
Posts: n/a

Default Re: Partitions and the optimizer - 09-02-2010 , 10:25 AM



Mladen Gogala wrote:

Quote:
On Thu, 02 Sep 2010 10:38:07 -0300, Bob Badour wrote:

It suffices to say postgres could easily be smarter and would better
compete with other products like Oracle if it were smarter.

Equally, because postgresql is an open source database, you could easily
make it smarter.

Yes, that is true. I downloaded the Postgresql source and am looking into
it, but it's very hard to follow, especially for someone who is not a C
programmer, like me. I would really be grateful for any commentary/
instructions about the code structure....
You are investigating postgresql for your employer as an alternative to
Oracle, aren't you? How much does your company spend on licensing and
support for Oracle?

Who are the most frequent committers to postgresql? Who are the
committers who maybe don't commit the most changes but regularly tackle
the hardest problems?

Once you know who they are, age, sex, etc. ask yourself: "What might
appeal to that demographic?"

What's the hottest, sexiest technology gadget that would appeal to that
demographic? How much would it cost your company to go down to the
nearest trophy store, buy a trophy and one or two of those gadgets to
offer as a prize?

Too expensive for you? What would you have to do to get someone to pitch
this feature as a project for Google Summer of Code?

Stack Overflow proves that people will do for free what they won't do
for money. What would happen if you asked your question there?

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.