dbTalk Databases Forums  

Order by and indexes

comp.databases.postgresql comp.databases.postgresql


Discuss Order by and indexes in the comp.databases.postgresql forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Mladen Gogala
 
Posts: n/a

Default Order by and indexes - 08-27-2010 , 05:43 PM






It looks like the Postgres optimizer cannot use indexes for "order by"
conditions. The query that made me conclude this, looks like this:

explain analyze
select "document#" from moreover_documents
where created_at<TIMESTAMP '2010-07-01'
order by "document#"
limit 10;

The plan looks like this:
QUERY
PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=280833.01..280833.03 rows=10 width=8) (actual
time=2335.322..2335.325 rows=10 loops=1)
-> Sort (cost=280833.01..285481.01 rows=1859200 width=8) (actual
time=2335.320..2335.321 rows=10 loops=1)
Sort Key: moreover.moreover_documents."document#"
Sort Method: top-N heapsort Memory: 25kB
-> Result (cost=0.00..240656.36 rows=1859200 width=8) (actual
time=0.022..1980.176 rows=1857510 loops=1)
-> Append (cost=0.00..240656.36 rows=1859200 width=8)
(actual time=0.021..1683.461 rows=1857510 loops=1)
-> Seq Scan on moreover_documents
(cost=0.00..10.25 rows=7 width=8) (actual time=0.001..0.001 rows=0
loops=1)
Filter: (created_at < '2010-07-01
00:00:00'::timestamp without time zone)
-> Seq Scan on moreover_documents_y2010m06
moreover_documents (cost=0.00..240646.11 rows=1859193 width=8) (actual
time=0.020..1436.262 rows=1857510 loops=1)
Filter: (created_at < '2010-07-01
00:00:00'::timestamp without time zone)
Total runtime: 2335.364 ms
(11 rows)


Column "document#" is the primary key and will be renamed to
"document_id" before the project enters the production phase. And there
is the catch: I am accessing only the primary key, which is, of course,
indexed by a B*tree index. Data set in the B*tree index is sorted, by
virtue of the underlying mathematical structure which is, not
surprisingly, known as B*tree. So, this query can be resolved by using
index blocks only and reading them in order. The plan, however, says
something else. The database is doing heap sort and sequential scan. The
source database can use index quite well:

SQL> set autotrace on explain;
SQL> select document# from (
2 select document# from moreover_documents
3 order by document#)
4 where rownum<=10;

DOCUMENT#
----------
927598124
927598126
927598128
927598130
927598132
927598134
927598136
927598138
927598140
927598142

10 rows selected.


Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=10 Bytes=130)
1 0 COUNT (STOPKEY)
2 1 VIEW (Cost=2 Card=31829315 Bytes=413781095)
3 2 INDEX (FULL SCAN) OF 'MOREOVER_DOCUMENTS_PK' (UNIQUE)
(Cost=1 Card=31829315 Bytes=190975890)


I am nitpicking because my application developers are used to that
possibility and have created several indexes with the explicit purpose
of helping them with "order by" clause. I know that the version 9 does
resolve "max" and "min" using index. This should not be that hard to do,
either. Would it be presumptuous from me to expect something like this
in Postgres 9.1? What is actually needed is the "index full scan"
access method, which can be used when all of the columns in the select
list are indexed. The difference in speed is quite significant:

news=# set search_path=moreover;
SET
news=# \timing
Timing is on.
news=# select "document#" from moreover_documents
news-# where created_at<TIMESTAMP '2010-07-01'
news-# order by "document#"
news-# limit 10;
document#
-----------
927598124
927598126
927598128
927598130
927598132
927598134
927598136
927598138
927598140
927598142
(10 rows)

Time: 1594.013 ms
news=#

The other database is almost 5 times faster:

SQL> set timing on
SQL> select document# from (
2 select document# from moreover_documents
3 order by document#)
4 where rownum<=10;

DOCUMENT#
----------
927598124
927598126
927598128
927598130
927598132
927598134
927598136
927598138
927598140
927598142

10 rows selected.

Elapsed: 00:00:00.32

The other database is made by the same company which owns MySQL, but is
not open source.

--
http://mgogala.byethost5.com

Reply With Quote
  #2  
Old   
Thomas Kellerer
 
Posts: n/a

Default Re: Order by and indexes - 08-28-2010 , 02:56 AM






Mladen Gogala wrote on 28.08.2010 00:43:
Quote:
It looks like the Postgres optimizer cannot use indexes for "order by"
conditions. The query that made me conclude this, looks like this:
It sure can:

select id, prog_start, prog_end
from definition
order by id

QUERY PLAN
Index Scan using pk_def on definition (cost=0.00..21791.91 rows=372901 width=20)

Id is the PK for that table. If I order by a different column, the index is not used

QUERY PLAN
Sort (cost=48627.58..49559.83 rows=372901 width=20)
Sort Key: prog_end
-> Seq Scan on definition (cost=0.00..6471.01 rows=372901 width=20)

I would suspect the fact that your table is partitioned to be the reason for not using the index.
But I don't know how to overcome that.

Note that PostgreSQL cannot use "index-only" retrieval as Oracle can. Something that might change with 9.1

Thomas

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.