dbTalk Databases Forums  

query planner woes

comp.databases.postgresql.novice comp.databases.postgresql.novice


Discuss query planner woes in the comp.databases.postgresql.novice forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Dmitry Karasik
 
Posts: n/a

Default query planner woes - 06-10-2004 , 08:01 AM






hi all,

I've constructed a simple query which takes too long
to finish. EXPLAIN ANALYZE ( all data shown below )
reports that at one stage there are >40K rows ( on a toy database,
Quote:
2M on the real one), and while I undertstand that this is the
most probable cause of the slowness, I cannot think of any change to
the query to aviod this. Any help?

Thanks,
Dmitry


mts=# \d addresses
Table "public.addresses"
Column | Type |
Modifiers
---------+--------------------------+-----------------------------------------------------------
id | integer | not null default
nextval('public.addresses_id_seq'::text)
address | text | not null
Indexes:
"addresses_pkey" primary key, btree (id)
"addresses_idx_address" btree (address)

mts=# \d recipients
Table "public.recipients"
Column | Type |
Modifiers
---------+--------------------------+------------------------------------------------------------
id | integer | not null default
nextval('public.recipients_id_seq'::text)
msgto | integer | not null
Indexes:
"recipients_pkey" primary key, btree (id)
"recipients_idx_msgto" btree (msgto)

mts=# explain analyze select *
mts-# from recipients,addresses
mts-# where addresses.address ~ '@cat' and
mts-# recipients.msgto = addresses.id;
QUERY
PLAN
-----------------------------------------------------------------------------------------------------------------------
Hash Join (cost=535.70..1603.96 rows=352 width=63)
(actual time=88.387..422.922 rows=576 loops=1)
Hash Cond: ("outer".msgto = "inner".id)
-> Seq Scan on recipients (cost=0.00..737.42 rows=43642 width=24)
(actual time=0.012..178.258 rows=43642
loops=1)
^^^^^
-> Hash (cost=535.34..535.34 rows=147 width=39)
(actual time=87.866..87.866 rows=0 loops=1)
-> Seq Scan on addresses (cost=0.00..535.34 rows=147 width=39)
(actual time=0.958..87.806 rows=3 loops=1)
Filter: (address ~ '@cat'::text)
Total runtime: 425.061 ms
(7 rows)

mts# select count(*) from recipients,addresses where
addresses.address ~ '@cat' and recipients.msgto = addresses.id;
count
-----
576





---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo (AT) postgresql (DOT) org so that your
message can get through to the mailing list cleanly



Reply With Quote
  #2  
Old   
Tom Lane
 
Posts: n/a

Default Re: query planner woes - 06-10-2004 , 10:53 AM






Dmitry Karasik <dmitry (AT) karasik (DOT) eu.org> writes:
Quote:
I've constructed a simple query which takes too long
to finish.

mts=# explain analyze select *
mts-# from recipients,addresses
mts-# where addresses.address ~ '@cat' and
mts-# recipients.msgto = addresses.id;
QUERY
PLAN
-----------------------------------------------------------------------------------------------------------------------
Hash Join (cost=535.70..1603.96 rows=352 width=63)
(actual time=88.387..422.922 rows=576 loops=1)
Hash Cond: ("outer".msgto = "inner".id)
-> Seq Scan on recipients (cost=0.00..737.42 rows=43642 width=24)
(actual time=0.012..178.258 rows=43642
loops=1)
^^^^^
-> Hash (cost=535.34..535.34 rows=147 width=39)
(actual time=87.866..87.866 rows=0 loops=1)
-> Seq Scan on addresses (cost=0.00..535.34 rows=147 width=39)
(actual time=0.958..87.806 rows=3 loops=1)
Filter: (address ~ '@cat'::text)
Total runtime: 425.061 ms
(7 rows)
Given that there are actually only 3 rows in addresses matching '@cat',
it would have been best to use a plan like
Nestloop
Seq Scan on addresses
Index Scan on recipients
Index Cond: msgto = outer.id
ie, use the index to visit each of those three recipients. However,
for the 147 rows that the planner was expecting, it's not clear that
the hash plan choice is wrong; and with an even-less-selective address
filter, the hash plan will considerably beat the nestloop. So ISTM that
the real problem is the inaccuracy of the estimate about how many
rows will be pulled from addresses.

Unfortunately there's not a lot you can do to really solve that problem
for arbitrary pattern-match queries :-(. The planner has no stats that
would let it derive a non-guess estimate.

Depending on what your real workload will be like, you might be able to
formulate things so that common cases go faster. For instance, I gather
that what you're doing here is looking at the site portion of email
addresses? If you do that a lot, it might be worth splitting the site
portion out and storing it as a separate column. Then the search
becomes an anchored-left match:
... where addresses.site ~ '^cat'
which the planner has significantly better chance of handling well.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster



Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.