![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
#2
| |||
| |||
|
|
Does anyone have any suggestions on how to make this query run faster? |
#3
| |||
| |||
|
|
explain select notificationID from NOTIFICATION n, ITEM i where n.itemID = i.itemID; QUERY PLAN ------------------------------------------------------------------------ ------ Hash Join (cost=47162.85..76291.32 rows=223672 width=44) Hash Cond: ("outer".itemid = "inner".itemid) -> Seq Scan on notification n (cost=0.00..12023.71 rows=223671 width=48) -> Hash (cost=42415.28..42415.28 rows=741028 width=4) -> Seq Scan on item i (cost=0.00..42415.28 rows=741028 width=4) This query takes about 20 seconds to run. |
#4
| |||
| |||
|
|
Well, you're joining the entire two tables, so yes, the seq scan might be faster. |
#5
| |||
| |||
|
|
explain select notificationID from NOTIFICATION n, ITEM i where n.itemID = i.itemID; QUERY PLAN ------------------------------------------------------------------------ ------ Hash Join (cost=47162.85..76291.32 rows=223672 width=44) Hash Cond: ("outer".itemid = "inner".itemid) -> Seq Scan on notification n (cost=0.00..12023.71 rows=223671 width=48) -> Hash (cost=42415.28..42415.28 rows=741028 width=4) -> Seq Scan on item i (cost=0.00..42415.28 rows=741028 width=4) This query takes about 20 seconds to run. Well, you're joining the entire two tables, so yes, the seq scan might be faster. Try your query with enable_seqscan=0 so it'll use an index scan and compare the times. You may be surprised to find that the planner has indeed made the right choice. This query selects 223672 rows, are you surprised it's slow ? |
#6
| |||
| |||
|
|
I'm not a SQL guru by any stretch but would a constrained sub-select be appropriate here? Well, you're joining the entire two tables, so yes, the seq scan might be faster. |
#7
| |||
| |||
|
|
Here is the corrected version, which still has the sequential scan... explain select notificationID from NOTIFICATION n, ITEM i where n.itemID = i.itemID and i.projectID = 12; QUERY PLAN --------------------------------------------------------------------------- ----------------------- Hash Join (cost=2237.54..15382.32 rows=271 width=44) Hash Cond: ("outer".itemid = "inner".itemid) -> Seq Scan on notification n (cost=0.00..12023.71 rows=223671 width=48) -> Hash (cost=2235.31..2235.31 rows=895 width=4) -> Index Scan using item_ix_item_4_idx on item i (cost=0.00..2235.31 rows=895width=4) Index Cond: (projectid = 12) |
#8
| |||
| |||
|
#9
| |||
| |||
|
|
I have some more info on my indexing situation, and a new question. In my previous email, I told about 2 tables: Notification and Item, which join on a field called ItemID. The joining query didn't execute as quickly as I thought it should. I now notice that I have another table, Folder, which joins with Item in a similar way, and the performance of that join is excellent. So my new questions is... What makes the Folder join faster than the Notification join? Here is some info on the tables, queries, and "explain analyze" output... Item's primary key is ItemID (int4). Folder's primary key is ItemID (int4). Folder also contains 4 varchar columns, 2 text columns, 6 bool columns, 7 datetime columns and 1 int4 column. Notification has an index on its ItemID (int4) field. Notification also contains 7 text columns (1 of them being the primary key), 3 timestamp columns and 4 int4 columns. Folder and Notification have a similar number of rows. "select count(*) from folder" returns 193043. "select count(*) from notification" returns 223689. The first query is: "select count(*) from FOLDER f, ITEM i where f.itemID = i.itemID and i.projectid=7720". This query returns the result "5" and executes in less than 1 second. The second query is: "select count(*) from NOTIFICATION n, ITEM i where n.itemID = i.itemID and i.projectid=7720". This query returns the result "2" and executes in about 40 seconds. Here's the "explain analyze" output... The Folder query uses the indexes: explain analyze select count(*) from FOLDER f, ITEM i where f.itemID = i.itemID and i.projectid=7720; Aggregate (cost=6371.88..6371.88 rows=1 width=0) (actual time=83.557..83.558 rows=1 loops=1) -> Nested Loop (cost=0.00..6371.31 rows=227 width=0) (actual time=17.929..83.502 rows=5 loops=1) -> Index Scan using item_ix_item_4_idx on item i (cost=0.00..2105.51 rows=869 width=4) (actual time=0.098..19.409 rows=51 loops=1) Index Cond: (projectid = 7720) -> Index Scan using folder_pkey on folder f (cost=0.00..4.90 rows=1 width=4) (actual time=1.255..1.255 rows=0 loops=51) Index Cond: (f.itemid = "outer".itemid) Total runtime: 92.185 ms The Notification query does a sequential scan on Notification: explain analyze select count(*) from NOTIFICATION n, ITEM i where n.itemID = i.itemID and i.projectid=7720; Aggregate (cost=38732.31..38732.31 rows=1 width=0) (actual time=40380.497..40380.498 rows=1 loops=1) -> Hash Join (cost=2107.69..38731.65 rows=263 width=0) (actual time=36341.174..40380.447 rows=2 loops=1) Hash Cond: ("outer".itemid = "inner".itemid) -> Seq Scan on notification n (cost=0.00..35502.89 rows=223689 width=4) (actual time=8289.236..40255.341 rows=223689 loops=1) -> Hash (cost=2105.51..2105.51 rows=869 width=4) (actual time=0.177..0.177 rows=0 loops=1) -> Index Scan using item_ix_item_4_idx on item i (cost=0.00..2105.51 rows=869 width=4) (actual time=0.025..0.127 rows=51 loops=1) Index Cond: (projectid = 7720) Total runtime: 40380.657 ms So my question is... What difference do you see between the Folder and Notification tables that would account for such a big difference in query performance? And how can I make the Notification query run about as fast as the Folder query? ---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend |
|
Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a tool for doing 5% of the work and then sitting around waiting for someone else to do the other 95% so you can sue them. |
![]() |
| Thread Tools | |
| Display Modes | |
| |