dbTalk Databases Forums  

Slow subquery on large dataset

comp.databases.postgresql.novice comp.databases.postgresql.novice


Discuss Slow subquery on large dataset in the comp.databases.postgresql.novice forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Bob
 
Posts: n/a

Default Slow subquery on large dataset - 02-19-2004 , 04:07 PM






Hi,

I'm having some performance issues when querying a couple of tables
containing a large amount of data.

Here's the schema:

CREATE TABLE capacity_data (
data_id SERIAL,
data TEXT,
modified TIMESTAMPTZ DEFAULT NOW(),
modified_by INTEGER NOT NULL,
CONSTRAINT capacity_data_pk PRIMARY KEY (data_id),
CONSTRAINT capacity_data_modified_by_fk FOREIGN KEY (modified_by)
REFERENCES editors(editor_id)
);

CREATE TABLE capacities (
CREATE TABLE capacities (
room_id BIGINT NOT NULL,
capacity_type_id BIGINT NOT NULL,
data_id BIGINT NOT NULL,
modified TIMESTAMPTZ DEFAULT NOW(),
modified_by INTEGER NOT NULL,
CONSTRAINT capacities_pk PRIMARY KEY (room_id, data_id),
CONSTRAINT capacities_room_id_fk FOREIGN KEY (room_id) REFERENCES
meeting_rooms(room_id) ON DELETE CASCADE,
CONSTRAINT capacities_capacity_type_id_fk FOREIGN KEY
(capacity_type_id) REFERENCES capacity_types(capacity_type_id) ON
DELETE CASCADE,
CONSTRAINT capacities_data_id_fk FOREIGN KEY (data_id) REFERENCES
capacity_data(data_id) ON DELETE CASCADE,
CONSTRAINT capacities_modified_by_fk FOREIGN KEY (modified_by)
REFERENCES editors(editor_id)
); data_id BIGINT NOT NULL,
modified TIMESTAMPTZ DEFAULT NOW(),
modified_by INTEGER NOT NULL,
CONSTRAINT capacities_pk PRIMARY KEY (room_id, data_id),
CONSTRAINT capacities_room_id_fk FOREIGN KEY (room_id) REFERENCES
meeting_rooms(room_id) ON DELETE CASCADE,
CONSTRAINT capacities_capacity_type_id_fk FOREIGN KEY
(capacity_type_id) REFERENCES capacity_types(capacity_type_id) ON
DELETE CASCADE,
CONSTRAINT capacities_data_id_fk FOREIGN KEY (data_id) REFERENCES
capacity_data(data_id) ON DELETE CASCADE,
CONSTRAINT capacities_modified_by_fk FOREIGN KEY (modified_by)
REFERENCES editors(editor_id)
);

I'm using a subquery to find all the capacity_data.data_id's that are
not in capacities:

foo=# SELECT data_id FROM capacity_data WHERE data_id NOT IN (SELECT
data_id FROM capacities);

However, I have over 15,000 records in capacity_data. Here is the
query plan:

QUERY PLAN
------------------------------------------------------------------------
Seq Scan on capacity_data (cost=0.00..2086295.56 rows=7538 width=4)
Filter: (subplan)
SubPlan
-> Seq Scan on capacities (cost=0.00..276.75 rows=15075
width=8)
(4 rows)

A little on the slow side! I have indexes on data_id in both tables
(in capacity_data it's the primary key) how can I use them to quickly
acheive what I want?

Thanks in advance,

Bob.

Reply With Quote
  #2  
Old   
Tom Lane
 
Posts: n/a

Default Re: Slow subquery on large dataset - 02-21-2004 , 12:58 AM






bob_bamber (AT) hotmail (DOT) com (Bob) writes:
Quote:
I'm having some performance issues when querying a couple of tables
containing a large amount of data.

foo=# SELECT data_id FROM capacity_data WHERE data_id NOT IN (SELECT
data_id FROM capacities);
Try PG 7.4, or if you're on 7.4, try increasing sort_mem. IN and NOT IN
have pretty sucky performance in earlier releases. (If you can't
upgrade for some reason, you could try using EXISTS instead of IN.)

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match



Reply With Quote
  #3  
Old   
daq
 
Posts: n/a

Default Fwd: Re: Slow subquery on large dataset - 02-21-2004 , 06:42 AM




B> foo=# SELECT data_id FROM capacity_data WHERE data_id NOT IN (SELECT
B> data_id FROM capacities);

Don't use the IN operator if it posible! Too slow.

Select data_id from capacity_data where not exists(select * from capacities where capacity_data.data_id::bigint=capacities.data_id);

Note the "::bigint" cast. If you don't cast capacity_data.data_id to bigint
Postgres will not use the pk index on capacities table. You must cast, or use BIGSERIAL type in capacity_data.

DAQ


---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.