dbTalk Databases Forums  

Select from on table via 4 other tables... painfully slow query.

comp.databases.mysql comp.databases.mysql


Discuss Select from on table via 4 other tables... painfully slow query. in the comp.databases.mysql forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Simon
 
Posts: n/a

Default Select from on table via 4 other tables... painfully slow query. - 10-28-2011 , 02:31 AM






Hi,

I am looking for certain users that are connected via some other values.

TABLE_A A_ID, type, ... (~124K rows)
TABLE_B B_ID, A_ID, C_ID, ... (~400K rows)
TABLE_C C_ID, D_ID, ... (~250K rows)
TABLE_D D_ID, E_ID, ... (~110K rows)
TABLE_E E_ID, deleted ... (~110K rows)

The query below should return less than 300 rows, (212 to be exact).

SELECT TABLE_E.* FROM TABLE_A, TABLE_B, TABLE_C, TABLE_D, TABLE_E
WHERE
TABLE_A.type IN (2,5,7)
AND TABLE_B.A_ID = TABLE_A.A_ID
AND TABLE_C.C_id = TABLE_B.C_ID
AND TABLE_D.D_id = TABLE_C.D_ID
AND TABLE_E.E_id = TABLE_D.E_ID AND TABLE_E.deleted=1

In TABLE_A, there are only ~10 'types' available.

When I do an 'EXPLAIN', all the primary keys are used, (A_ID, B_ID,
C_ID, E_ID are all unique in their own tables).

The query is so slow that it takes about 60 minutes to run on the
server, on the dev environment, (with a full copy of the db), the query
is better, but it still takes 15 minutes, (I am guessing it is purely
because there is less load on the dev server).

I have tried

SELECT TABLE_C.* FROM TABLE_A, TABLE_B, TABLE_C, TABLE_D, TABLE_E
WHERE
TABLE_A.type =2
AND TABLE_B.A_ID = TABLE_A.A_ID
AND TABLE_C.C_id = TABLE_B.C_ID
AND TABLE_D.D_id = TABLE_C.D_ID
AND TABLE_E.E_id = TABLE_D.E_ID
AND
TABLE_E.deleted=1

Without any real success.

Any suggestions on what I could do or how I could optimise the query?
Any tool I could use that might help me find what/where the bottleneck
might be?

Many thanks

Simon

Reply With Quote
  #2  
Old   
The Natural Philosopher
 
Posts: n/a

Default Re: Select from on table via 4 other tables... painfully slow query. - 10-28-2011 , 11:26 AM






Simon wrote:
Quote:
Hi,

I am looking for certain users that are connected via some other values.

TABLE_A A_ID, type, ... (~124K rows)
TABLE_B B_ID, A_ID, C_ID, ... (~400K rows)
TABLE_C C_ID, D_ID, ... (~250K rows)
TABLE_D D_ID, E_ID, ... (~110K rows)
TABLE_E E_ID, deleted ... (~110K rows)

The query below should return less than 300 rows, (212 to be exact).

SELECT TABLE_E.* FROM TABLE_A, TABLE_B, TABLE_C, TABLE_D, TABLE_E
WHERE
TABLE_A.type IN (2,5,7)
AND TABLE_B.A_ID = TABLE_A.A_ID
AND TABLE_C.C_id = TABLE_B.C_ID
AND TABLE_D.D_id = TABLE_C.D_ID
AND TABLE_E.E_id = TABLE_D.E_ID AND TABLE_E.deleted=1

In TABLE_A, there are only ~10 'types' available.

When I do an 'EXPLAIN', all the primary keys are used, (A_ID, B_ID,
C_ID, E_ID are all unique in their own tables).

The query is so slow that it takes about 60 minutes to run on the
server, on the dev environment, (with a full copy of the db), the query
is better, but it still takes 15 minutes, (I am guessing it is purely
because there is less load on the dev server).

I have tried

SELECT TABLE_C.* FROM TABLE_A, TABLE_B, TABLE_C, TABLE_D, TABLE_E
WHERE
TABLE_A.type =2
AND TABLE_B.A_ID = TABLE_A.A_ID
AND TABLE_C.C_id = TABLE_B.C_ID
AND TABLE_D.D_id = TABLE_C.D_ID
AND TABLE_E.E_id = TABLE_D.E_ID
AND
TABLE_E.deleted=1

Without any real success.

Any suggestions on what I could do or how I could optimise the query?
Any tool I could use that might help me find what/where the bottleneck
might be?

might be worth indexing TABLE_A.type..and TABLE_E.deleted.Oh and since
they are part the cross fields as well like *.C_ID, *.B_ID. etc..

I cant remember what exactly I had, but i do know that careful choice of
indices doubled the insert time but reduced the search on several linked
tables by 60 times or more.


If the subtleties of EXPLAIN are too much, simply index everything
that's used in that query, and then remove the indices one by one till
you see a major leap in select() time, and then put that one back :-)

You should also check INSERT and UPDATE times and see that they are not
hugely adversely affected.


Quote:
Many thanks

Simon

Reply With Quote
  #3  
Old   
Lennart Jonsson
 
Posts: n/a

Default Re: Select from on table via 4 other tables... painfully slow query. - 10-28-2011 , 11:49 AM



On 2011-10-28 09:31, Simon wrote:
Quote:
Hi,

I am looking for certain users that are connected via some other values.

TABLE_A A_ID, type, ... (~124K rows)
TABLE_B B_ID, A_ID, C_ID, ... (~400K rows)
TABLE_C C_ID, D_ID, ... (~250K rows)
TABLE_D D_ID, E_ID, ... (~110K rows)
TABLE_E E_ID, deleted ... (~110K rows)

The query below should return less than 300 rows, (212 to be exact).

SELECT TABLE_E.* FROM TABLE_A, TABLE_B, TABLE_C, TABLE_D, TABLE_E
WHERE
TABLE_A.type IN (2,5,7)
AND TABLE_B.A_ID = TABLE_A.A_ID
AND TABLE_C.C_id = TABLE_B.C_ID
AND TABLE_D.D_id = TABLE_C.D_ID
AND TABLE_E.E_id = TABLE_D.E_ID AND TABLE_E.deleted=1

In TABLE_A, there are only ~10 'types' available.

It's difficult to guess, but indexes like:

create index ... on table_a (type, a_id);
create index ... on table_e (deleted, e_id);

might help. It probably wont help with performance (but it makes the
query easier to understand) using ansi style joins like:

SELECT TABLE_E.*
FROM TABLE_A
JOIN TABLE_B
ON TABLE_B.A_ID = TABLE_A.A_ID
JOIN TABLE_C
ON TABLE_C.C_id = TABLE_B.C_ID
JOIN TABLE_D
ON TABLE_D.D_id = TABLE_C.D_ID
JOIN TABLE_E
TABLE_E.E_id = TABLE_D.E_ID
WHERE TABLE_A.type IN (2,5,7)
AND TABLE_E.deleted=1

[...]

Reply With Quote
  #4  
Old   
Simon
 
Posts: n/a

Default Re: Select from on table via 4 other tables... painfully slow query. - 11-02-2011 , 12:32 AM



Quote:
Any suggestions on what I could do or how I could optimise the query?
Any tool I could use that might help me find what/where the bottleneck
might be?

Many thanks

Thanks for the replies.
I added indexes without much success.

I will try with some code, (get ids from one query, then loop around and
get the ids from another query and so forth).

The other issue is that the data is cached.
So I never really know if the query is cached or if I was really able to
optimise it somehow.

Simon

Reply With Quote
  #5  
Old   
Axel Schwenke
 
Posts: n/a

Default Re: Select from on table via 4 other tables... painfully slow query. - 11-02-2011 , 02:01 AM



Simon <bad (AT) example (DOT) com> wrote:
Quote:
TABLE_A A_ID, type, ... (~124K rows)
TABLE_B B_ID, A_ID, C_ID, ... (~400K rows)
TABLE_C C_ID, D_ID, ... (~250K rows)
TABLE_D D_ID, E_ID, ... (~110K rows)
TABLE_E E_ID, deleted ... (~110K rows)

The query below should return less than 300 rows, (212 to be exact).

SELECT TABLE_E.* FROM TABLE_A, TABLE_B, TABLE_C, TABLE_D, TABLE_E
WHERE
TABLE_A.type IN (2,5,7)
AND TABLE_B.A_ID = TABLE_A.A_ID
AND TABLE_C.C_id = TABLE_B.C_ID
AND TABLE_D.D_id = TABLE_C.D_ID
AND TABLE_E.E_id = TABLE_D.E_ID AND TABLE_E.deleted=1

In TABLE_A, there are only ~10 'types' available.

When I do an 'EXPLAIN', all the primary keys are used, (A_ID, B_ID,
C_ID, E_ID are all unique in their own tables).

The query is so slow that it takes about 60 minutes to run
Yeah. That's not going to get much better without denormalizing the
data. The problem is, that you have filters (WHERE conditions) only
on the first and the last table in the join. And the JOIN order is
effectively fix by the "chain" of join conditions.

You don't show the EXPLAIN result, but I trust that you have an index
on TABLE_A for the `type` column. If we assume an even distribution
of the 10 `type` values, then 30% of the rows in TABLE_A will match.
The executor will then have to trace all those 37K rows through the
other 4 tables until it can finally check the TABLE_E.deleted field.
Depending on the "fanout" in the subsequent tables, this can become
several millions or even trillions of lookups in TABLE_E.

In theory the join could be executed in the other direction too. But
probably the `deleted` field is not very selective, so I guess the
optimizer decides to start with TABLE_A.

Quote:
Any suggestions on what I could do or how I could optimise the query?
Any tool I could use that might help me find what/where the bottleneck
might be?
There is a small chance for improvement by making MySQL use covering
index reads. For that you should have indexes like so:

TABLE_A: (type, A_ID)
TABLE_B: (A_ID, C_ID)
TABLE_C: (C_ID, D_ID)
TABLE_D: (D_ID, E_ID)
TABLE_E: (E_ID, deleted)

The schema for the indexes is (field to reference this table, field
that connects to next table). After that EXPLAIN should show
"using index" for all but the last table.

To leverage the full power of that optimization, indexes should fit
in memory.


XL

Reply With Quote
  #6  
Old   
Willem Bogaerts
 
Posts: n/a

Default Re: Select from on table via 4 other tables... painfully slow query. - 11-02-2011 , 02:12 AM



On 02/11/11 07:32, Simon wrote:
Quote:
Any suggestions on what I could do or how I could optimise the query?
Any tool I could use that might help me find what/where the bottleneck
might be?

Many thanks


Thanks for the replies.
I added indexes without much success.

I will try with some code, (get ids from one query, then loop around and
get the ids from another query and so forth).

The other issue is that the data is cached.
So I never really know if the query is cached or if I was really able to
optimise it somehow.

Simon
Maybe it helps to add a "FORCE INDEX" clause for TABLE_A.type in the
SELECT statement. Anyway, the EXPLAIN command should tell you a lot.
Without knowing the output of the EXPLAIN command and without knowing
the data, my guess is that TABLE_E is handled first because of the
lesser number of rows in it. But probably TABLE_A.type is more specific
(yields fewer rows) that TABLE_E.deleted.

If that does not speed up the query, STRAIGHT_JOIN clauses may help.

Good luck,
--
Willem Bogaerts

Application smith
Kratz B.V.
http://www.kratz.nl/

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.