dbTalk Databases Forums  

Query does not use index

comp.databases.postgresql.novice comp.databases.postgresql.novice


Discuss Query does not use index in the comp.databases.postgresql.novice forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Martin Hampl
 
Posts: n/a

Default Query does not use index - 04-29-2004 , 02:34 PM






Hi,

I hope this message won't be sent several times to the list. If so,
please accept my apologies.

It would be very nice, if someone could have a look at the query I'm
trying to optimize. At the moment, I don't understand PostgreSQL's
behaviour and are stuck. Thanks a lot in advance.

I am using PostgreSQL 7.4.1

In my database are the following tables:

Table "public.token"
Column | Type | Modifiers
----------+------------------------+-----------
text_id | integer | not null
position | integer | not null
word | character varying(255) |
Indexes:
"token_pkey" primary key, btree (text_id, "position")
"word_idx" btree (word)

and

Table "public.s"
Column | Type | Modifiers
---------+---------+-----------
text_id | integer | not null
s | integer | not null
start | integer |
stop | integer |
Indexes:
"s_pkey" primary key, btree (text_id, s)
"s_begin_idx" btree (text_id, "start")
"s_end_idx" btree (text_id, stop)
Foreign-key constraints:
"$1" FOREIGN KEY (text_id, "start") REFERENCES token(text_id,
"position")
"$2" FOREIGN KEY (text_id, stop) REFERENCES token(text_id,
"position")

(I hope it's readable)

I have the following query:

select * from
token,
s
where
token.word = 'FACTSHEET' and
s.text_id = token.text_id and
s.start = token.position


and PostgreSQL generates the following query plan

QUERY PLAN
------------------------------------------------------------------------
--------------------------
Nested Loop (cost=0.00..39120.95 rows=1 width=32)
-> Index Scan using s_pkey on s (cost=0.00..24698.44 rows=3367
width=16)
-> Index Scan using token_pkey on token (cost=0.00..4.27 rows=1
width=16)
Index Cond: (("outer".text_id = token.text_id) AND
("outer"."start" = token."position"))
Filter: ((word)::text = 'FACTSHEET'::text)

and the execution takes ages.

Now, 'FACTSHEET' is *very* seldom (5 of 23 million) and I intended it
to first search for 'FACTSHEET' and then use index s_begin_idx (and the
values of token.text_id and token.position) to derive the corresponding
records in s.

I tried to force PostgreSQL in doing it in this order by trying the
following:

select * from (select text_id, position from token where word =
'FACTSHEET') t left join s on (s.text_id = t.text_id and s.start =
t.position )

QUERY PLAN
------------------------------------------------------------------------
-----------------------
Hash Left Join (cost=24715.28..56630.78 rows=7892 width=24)
Hash Cond: (("outer".text_id = "inner".text_id) AND
("outer"."position" = "inner"."start"))
-> Index Scan using word_idx on token (cost=0.00..31402.51
rows=7892 width=8)
Index Cond: ((word)::text = 'FACTSHEET'::text)
-> Hash (cost=24698.44..24698.44 rows=3367 width=16)
-> Index Scan using s_pkey on s (cost=0.00..24698.44
rows=3367 width=16)


Now it *does* first search for 'FACTSHEET' but it still does not use
s_begin_idx and I have no idea why. Any ideas, what I could do?

Thanks,
Martin.



---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo (AT) postgresql (DOT) org so that your
message can get through to the mailing list cleanly


Reply With Quote
  #2  
Old   
Tom Lane
 
Posts: n/a

Default Re: Query does not use index - 04-29-2004 , 06:32 PM






Martin Hampl <Martin.Hampl (AT) gmx (DOT) de> writes:
Quote:
It would be very nice, if someone could have a look at the query I'm
trying to optimize. At the moment, I don't understand PostgreSQL's
behaviour and are stuck. Thanks a lot in advance.
Did you ANALYZE these tables? Also, please post EXPLAIN ANALYZE not
just EXPLAIN when complaining about bad plans. Since the essence of
your complaint is that the planner's estimates are wrong, showing us
only estimates and not reality makes it hard to give constructive
suggestions ...

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings



Reply With Quote
  #3  
Old   
Martin Hampl
 
Posts: n/a

Default Re: Query does not use index - 04-30-2004 , 01:01 AM



Hi,

Am 30.04.2004 um 01:32 schrieb Tom Lane:

Quote:
Martin Hampl <Martin.Hampl (AT) gmx (DOT) de> writes:
It would be very nice, if someone could have a look at the query I'm
trying to optimize. At the moment, I don't understand PostgreSQL's
behaviour and are stuck. Thanks a lot in advance.

Did you ANALYZE these tables?
I did.

Quote:
Also, please post EXPLAIN ANALYZE not
just EXPLAIN when complaining about bad plans. Since the essence of
your complaint is that the planner's estimates are wrong, showing us
only estimates and not reality makes it hard to give constructive
suggestions ...
OK.

bnc23Mio=# EXPLAIN ANALYZE select * from
bnc23Mio-# token,
bnc23Mio-# s
bnc23Mio-# where
bnc23Mio-# token.word = 'FACTSHEET' and
bnc23Mio-# s.text_id = token.text_id and
bnc23Mio-# s.start = token.position;
QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------
Nested Loop (cost=0.00..39120.95 rows=1 width=32) (actual
time=102.263..692248.553 rows=3 loops=1)
-> Index Scan using s_pkey on s (cost=0.00..24698.44 rows=3367
width=16) (actual time=75.933..589743.642 rows=1111220 loops=1)
-> Index Scan using token_pkey on token (cost=0.00..4.27 rows=1
width=16) (actual time=0.086..0.086 rows=0 loops=1111220)
Index Cond: (("outer".text_id = token.text_id) AND
("outer"."start" = token."position"))
Filter: ((word)::text = 'FACTSHEET'::text)
Total runtime: 692249.314 ms


bnc23Mio=# EXPLAIN ANALYZE select * from (select text_id, position from
token where word = 'FACTSHEET') t left join s on (s.text_id = t.text_id
and s.start = t.position );
QUERY
PLAN
------------------------------------------------------------------------
-------------------------------------------------------------------
Hash Left Join (cost=24715.28..56630.78 rows=7892 width=24) (actual
time=255329.976..255355.967 rows=5 loops=1)
Hash Cond: (("outer".text_id = "inner".text_id) AND
("outer"."position" = "inner"."start"))
-> Index Scan using word_idx on token (cost=0.00..31402.51
rows=7892 width=8) (actual time=91.010..109.394 rows=5 loops=1)
Index Cond: ((word)::text = 'FACTSHEET'::text)
-> Hash (cost=24698.44..24698.44 rows=3367 width=16) (actual
time=255236.914..255236.914 rows=0 loops=1)
-> Index Scan using s_pkey on s (cost=0.00..24698.44
rows=3367 width=16) (actual time=105.100..247798.661 rows=1111220
loops=1)
Total runtime: 255502.736 ms

Maybe that *is* what i wanted it to do? However, searching just for
'FACTSHEET' is very quick (I rebooted before this query to clear any
cache---is there a better way to do this?):

bnc23Mio=# EXPLAIN ANALYZE select * from token where word = 'FACTSHEET';
QUERY PLAN
------------------------------------------------------------------------
----------------------------------------------------
Index Scan using word_idx on token (cost=0.00..31402.51 rows=7892
width=16) (actual time=102.350..125.032 rows=5 loops=1)
Index Cond: ((word)::text = 'FACTSHEET'::text)
Total runtime: 125.289 ms

and I would have thought that the results of this query could have been
used to search for the respective records in s (using on of the
indexes)?

Regards,
Martin.


---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match



Reply With Quote
  #4  
Old   
Tom Lane
 
Posts: n/a

Default Re: Query does not use index - 04-30-2004 , 10:52 AM



Martin Hampl <Martin.Hampl (AT) gmx (DOT) de> writes:
Quote:
Am 30.04.2004 um 01:32 schrieb Tom Lane:
Did you ANALYZE these tables?

I did.
Hm. I'm wondering why the row estimates for 's' are off by several
orders of magnitude:

Quote:
-> Index Scan using s_pkey on s (cost=0.00..24698.44 rows=3367
width=16) (actual time=75.933..589743.642 rows=1111220 loops=1)
It could be that this table has a lot of empty pages near the front,
which is a condition that's known to lead to underestimated row count
from ANALYZE. (Manfred is working on a better ANALYZE sampling method
that should avoid such errors in future.) Try doing a straight VACUUM
and see if the row count estimate gets better. If so, it might be worth
the trouble to do a VACUUM FULL to get rid of the empty space. (And you
should also think about doing routine vacuums more often, and perhaps
increasing the FSM settings, to ensure you don't get back into this state.)

The other thing I'm wondering about is why the devil it's choosing an
indexscan at all, when it has no indexscan conditions to use. Are you
perhaps forcing that choice via "enable_seqscan = false"? If so, don't.

Quote:
Maybe that *is* what i wanted it to do? However, searching just for
'FACTSHEET' is very quick (I rebooted before this query to clear any
cache---is there a better way to do this?):
and I would have thought that the results of this query could have been
used to search for the respective records in s (using on of the
indexes)?
Undoubtedly it did consider that plan, but rejected it because it looked
more expensive than the alternatives. This is not too surprising given
the overestimate of the number of rows matching 'FACTSHEET' (7892 vs
reality of 5). You might need to increase the statistics target for
token.word (see ALTER TABLE SET STATISTICS) to give the planner more
data to work with about the distribution of words.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo (AT) postgresql (DOT) org)



Reply With Quote
  #5  
Old   
Martin Hampl
 
Posts: n/a

Default Re: Query does not use index - 05-01-2004 , 05:16 AM



Am 30.04.2004 um 17:52 schrieb Tom Lane:

Quote:
It could be that this table has a lot of empty pages near the front,
which is a condition that's known to lead to underestimated row count
from ANALYZE. (Manfred is working on a better ANALYZE sampling method
that should avoid such errors in future.) Try doing a straight VACUUM
and see if the row count estimate gets better.
It did! Thanks! Explain analyze now yields the following result:

Explain analyze now
QUERY PLAN
------------------------------------------------------------------------
----------------------------------------------------------
Nested Loop (cost=0.00..70650.10 rows=12 width=32) (actual
time=170.449..181.663 rows=3 loops=1)
-> Index Scan using word_idx on token (cost=0.00..30403.79
rows=7621 width=16) (actual time=158.237..179.105 rows=5 loops=1)
Index Cond: ((word)::text = 'FACTSHEET'::text)
-> Index Scan using s_begin_idx on s (cost=0.00..5.27 rows=1
width=16) (actual time=0.429..0.434 rows=1 loops=5)
Index Cond: ((s.text_id = "outer".text_id) AND (s."start" =
"outer"."position"))
Total runtime: 182.207 ms


I never deleted anything from this database, but it might be that I
dropped some of the tables a couple of times when populating the
database. Also 'copy from' failed a couple of times. Can that have the
same effect?

Thanks again,
Martin


---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo (AT) postgresql (DOT) org



Reply With Quote
  #6  
Old   
Tom Lane
 
Posts: n/a

Default Re: Query does not use index - 05-01-2004 , 10:48 AM



Martin Hampl <Martin.Hampl (AT) gmx (DOT) de> writes:
Quote:
Am 30.04.2004 um 17:52 schrieb Tom Lane:
It could be that this table has a lot of empty pages near the front,

I never deleted anything from this database, but it might be that I
dropped some of the tables a couple of times when populating the
database. Also 'copy from' failed a couple of times. Can that have the
same effect?
A failed copy-in would leave a dead row for each input line that it was
able to process before hitting the error. So potentially that could
account for a lot of dead rows. I think there is a hint on the COPY
reference page suggesting that you VACUUM in such a situation ...

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html



Reply With Quote
  #7  
Old   
Martin Hampl
 
Posts: n/a

Default Re: Query does not use index - 05-01-2004 , 10:58 AM




Am 01.05.2004 um 17:48 schrieb Tom Lane:

Quote:
Martin Hampl <Martin.Hampl (AT) gmx (DOT) de> writes:
Am 30.04.2004 um 17:52 schrieb Tom Lane:
It could be that this table has a lot of empty pages near the front,

I never deleted anything from this database, but it might be that I
dropped some of the tables a couple of times when populating the
database. Also 'copy from' failed a couple of times. Can that have the
same effect?

A failed copy-in would leave a dead row for each input line that it was
able to process before hitting the error. So potentially that could
account for a lot of dead rows. I think there is a hint on the COPY
reference page suggesting that you VACUUM in such a situation ...
There is. Well...

Regards,
Martin


---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings



Reply With Quote
  #8  
Old   
M. Bastin
 
Posts: n/a

Default How to set up an SSL test environment? - 05-05-2004 , 05:48 AM



Hi,

I would like to set up an SSL test environment to further develop my
frontend-backend implementation and add SSL support to it.

Trouble is, I don't know much about SSL (and command shells)--I'm on
OS X and come from a traditional Mac culture. I've been searching
the docs and archives and it's all very cryptic to me.

I guess this is the best hint to what I'm looking for:
http://archives.postgresql.org/pgsql...9/msg00038.php
But I'm not sure on how to interpret this and it looks like a step
somewhere in the middle and not the whole procedure.

Can anyone direct me to some good entry level documentation or go
through the trouble of explaining me the whole thing if it's not too
much--I bet it is though?

Thanks,

Marc

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match


Reply With Quote
  #9  
Old   
M. Bastin
 
Posts: n/a

Default Re: How to set up an SSL test environment? - 05-05-2004 , 08:13 AM



I found a not too high level explanation here:
http://www.gtlib.cc.gatech.edu/pub/l...tes-HOWTO.html

Now what I'd really like to find next is a OpenSSL GUI for OS X.
Does anyone know of one?
I checked versiontracker, google, Apple, etc. without luck so far. I
hear there's one that comes with OS X Server.

Marc

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.