dbTalk Databases Forums  

indexes, am I over-simplifying it?

comp.databases.mysql comp.databases.mysql


Discuss indexes, am I over-simplifying it? in the comp.databases.mysql forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Simon
 
Posts: n/a

Default indexes, am I over-simplifying it? - 10-05-2011 , 03:29 AM






Hi,

If I have a query:

select * from table1 where x=somevalue or x=someothervalue GROUP BY x

it would make sense to have an index on 'x'

and if I have:

select * from table1 t1, table2 t2 where (t1.x=somevalue or
t1.x=someothervalue) and t2.y=t1.x GROUP BY t1.x

it would make sense to have an index on 't1.x' and an index on 't2.y'

Would I need to take anything else into consideration? Or would those
two be the best indexes?

The reason I ask is because, when I do an EXPLAIN on the query I get
'Using temporary; Using filesort'

Simon

Reply With Quote
  #2  
Old   
Axel Schwenke
 
Posts: n/a

Default Re: indexes, am I over-simplifying it? - 10-05-2011 , 06:33 AM






Simon <bad (AT) example (DOT) com> wrote:
Quote:
If I have a query:
select * from table1 where x=somevalue or x=someothervalue GROUP BY x
it would make sense to have an index on 'x'
Yes.

Quote:
and if I have:
select * from table1 t1, table2 t2 where (t1.x=somevalue or
t1.x=someothervalue) and t2.y=t1.x GROUP BY t1.x
it would make sense to have an index on 't1.x' and an index on 't2.y'
Yes.

Quote:
The reason I ask is because, when I do an EXPLAIN on the query I get
'Using temporary; Using filesort'
This is for the GROUP BY. I cannot see why you have such a clause.

In fact any other DBMS than MySQL would refuse to run those queries,
because with GROUP BY all result fields must either be aggregates
or being grouped by. MySQL lifts that restriction, but then returns
an arbitrary row from each group. Probably not what you want.


XL

Reply With Quote
  #3  
Old   
Simon
 
Posts: n/a

Default Re: indexes, am I over-simplifying it? - 10-06-2011 , 08:42 AM



Quote:
The reason I ask is because, when I do an EXPLAIN on the query I get
'Using temporary; Using filesort'

This is for the GROUP BY. I cannot see why you have such a clause.
Thanks,

Maybe I should include my real world example...

EXPLAIN SELECT *
FROM table1 AS t1
WHERE t1.enabled =1
AND t1.deleted =0
AND t1.score >=50
ORDER BY t1.date_created DESC
LIMIT 0 , 15

I get

id |select_type|table|type|possible_keys|key|key_len| ref|rows|Extra
1|SIMPLE|t1|range|idx_trust_score|idx_score|1|NULL |52138|Using where;
Using filesort

The index is "enabled,deleted,score,date_created"

The total number of rows in that table is only 106020

In this case, can I get rid of the "Using where; Using filesort"?
the ref is NULL, is that good/bad?

Thanks

Simon

Reply With Quote
  #4  
Old   
Axel Schwenke
 
Posts: n/a

Default Re: indexes, am I over-simplifying it? - 10-06-2011 , 10:18 AM



Simon <bad (AT) example (DOT) com> wrote:
Quote:
Maybe I should include my real world example...

EXPLAIN SELECT *
FROM table1 AS t1
WHERE t1.enabled =1
AND t1.deleted =0
AND t1.score >=50
ORDER BY t1.date_created DESC
LIMIT 0 , 15

I get

id |select_type|table|type|possible_keys|key|key_len| ref|rows|Extra
1|SIMPLE|t1|range|idx_trust_score|idx_score|1|NULL |52138|Using where;
Using filesort
Hint: this is much better readable in vertical form. End your EXPLAIN
query with \G instead of ; to get it.

Quote:
The index is "enabled,deleted,score,date_created"
The index idx_score (picked for the query) is on those 4 fields?
I bet it's not. Seems like `enabled` and `deleted` are 0|1 fields,
so an index is not selective. The optimizer guesses that `score`
is most selective and that it has to scan 52138 rows.

Quote:
In this case, can I get rid of the "Using where; Using filesort"?
You might get rid of the "using where" which means that additional
filtering is done on the rows read from the index. I.e. you could
force the use of that 4-part index you mention above.

"using filesort" is not necessarily bad. It just means that the
result will be sorted. If it fits into sort_buffer, then no file
will be touched. For small results this sorting pass is no issue.

Quote:
the ref is NULL, is that good/bad?
Since there is no referring table, it must be NULL. ref is only
meaningful for joins or subqueries.


XL

Reply With Quote
  #5  
Old   
Simon
 
Posts: n/a

Default Re: indexes, am I over-simplifying it? - 10-06-2011 , 01:29 PM



Quote:
The index is "enabled,deleted,score,date_created"

The index idx_score (picked for the query) is on those 4 fields?
I bet it's not. Seems like `enabled` and `deleted` are 0|1 fields,
yes, the index is on those 4 fields.
enabled, deleted and score are tinyint(3)
date_created is an int(14).

Quote:
so an index is not selective. The optimizer guesses that `score`
is most selective and that it has to scan 52138 rows.
Note sure I understand by most selective.
I want a score >= 50, not sure I understand what you mean by having to
'guess'.

Quote:
In this case, can I get rid of the "Using where; Using filesort"?

You might get rid of the "using where" which means that additional
filtering is done on the rows read from the index. I.e. you could
force the use of that 4-part index you mention above.

Sorry, but I am afraid I also don't understand what you mean.

Simon

Reply With Quote
  #6  
Old   
Gordon Burditt
 
Posts: n/a

Default Re: indexes, am I over-simplifying it? - 10-06-2011 , 04:54 PM



Quote:
EXPLAIN SELECT *
FROM table1 AS t1
WHERE t1.enabled =1
AND t1.deleted =0
AND t1.score >=50
ORDER BY t1.date_created DESC
LIMIT 0 , 15

I get

id |select_type|table|type|possible_keys|key|key_len| ref|rows|Extra
1|SIMPLE|t1|range|idx_trust_score|idx_score|1|NULL |52138|Using where;
Using filesort

Hint: this is much better readable in vertical form. End your EXPLAIN
query with \G instead of ; to get it.

The index is "enabled,deleted,score,date_created"

The index idx_score (picked for the query) is on those 4 fields?
I bet it's not.
It seems like just the index I'd create for that specific query,
especially if there are other queries involving just enabled=1 and
deleted=0.

Quote:
Seems like `enabled` and `deleted` are 0|1 fields,
so an index is not selective.
MySQL should be able to use the given index on the query above to
select records. If that selects more than some percentage of the
records (like 50%), MySQL decides that doing a full scan is faster.

Perhaps that won't be the case when you've loaded up a full set of
production data. Or not. In the full production data, what
percentage of records do you think will have enabled=1 and deleted=0?
If it's fairly large, like > 50%, perhaps your index should just
be on (score, date_created). But if your query is actually selecting
more than half of the records, perhaps a full scan *is* the best.
Using an index does have some overhead.

Quote:
The optimizer guesses that `score`
is most selective and that it has to scan 52138 rows.
I don't see why it would guess that `score` is more selective than
(enabled, deleted, score), which it can get on a range-select from
the given index.

Quote:
In this case, can I get rid of the "Using where; Using filesort"?

You might get rid of the "using where" which means that additional
filtering is done on the rows read from the index. I.e. you could
force the use of that 4-part index you mention above.

"using filesort" is not necessarily bad. It just means that the
result will be sorted. If it fits into sort_buffer, then no file
will be touched. For small results this sorting pass is no issue.

Reply With Quote
  #7  
Old   
Axel Schwenke
 
Posts: n/a

Default Re: indexes, am I over-simplifying it? - 10-07-2011 , 02:28 AM



Simon <bad (AT) example (DOT) com> wrote:
Quote:
The index is "enabled,deleted,score,date_created"

The index idx_score (picked for the query) is on those 4 fields?
I bet it's not. Seems like `enabled` and `deleted` are 0|1 fields,

yes, the index is on those 4 fields.
enabled, deleted and score are tinyint(3)
date_created is an int(14).
Let me quote the explain result once more:

Quote:
id |select_type|table|type |possible_keys |key |key_len|ref |rows |Extra
1 |SIMPLE |t1 |range|idx_trust_score|idx_score|1 |NULL|52138|Using where; Using filesort
The index in use is `idx_score`, *not* `idx_trust_score`

Quote:
so an index is not selective. The optimizer guesses that `score`
is most selective and that it has to scan 52138 rows.

Note sure I understand by most selective.
I want a score >= 50, not sure I understand what you mean by having to
'guess'.
There are multiple (at least 2) indexes on your table. The optimizer
picks one of them to be used for the query. For that, it looks at each
of the indexes and estimates (guesses) the costs that it would have.
Then it picks the index with the least cost. In most cases it's the
one that yields fewest rows to look at. Such an index is called
"selective"

This estimating (or "guessing") is rather crude. For a good reason:
if one puts too much effort into the decision which index to use,
this could be more costly than the actual query.

Quote:
In this case, can I get rid of the "Using where; Using filesort"?

You might get rid of the "using where" which means that additional
filtering is done on the rows read from the index. I.e. you could
force the use of that 4-part index you mention above.

Sorry, but I am afraid I also don't understand what you mean.
There is nothing wrong with "Using where; Using filefort" as such.

Only if the query is running (too) slow, it could be worth looking
at it. But if the result set is small, especially if it fits into
the sort_buffer, then it's rather unlikely that "Using filesort"
has any impact on performance at all.


XL

Reply With Quote
  #8  
Old   
Axel Schwenke
 
Posts: n/a

Default Re: indexes, am I over-simplifying it? - 10-07-2011 , 04:49 AM



gordonb.77gus (AT) burditt (DOT) org (Gordon Burditt) wrote:

Quote:
The optimizer guesses that `score`
is most selective and that it has to scan 52138 rows.

I don't see why it would guess that `score` is more selective than
(enabled, deleted, score), which it can get on a range-select from
the given index.
I agree. An index on (enabled,deleted,score) should be more useful
for the query than an index on (score) alone. Except when the
distribution of `enabled` and `deleted` values is pathological.
But the MySQL optimizer is known to to have ... deficites ... when
it comes to estimating the selectiveness of a compound index.

There might be another reason why the optimizer picks the index
on (score) - this index is smaller. So even if there are more index
entries to be scanned, it could be fewer index pages being read
from disk.


XL

Reply With Quote
  #9  
Old   
Gordon Burditt
 
Posts: n/a

Default Re: indexes, am I over-simplifying it? - 10-07-2011 , 05:55 AM



Quote:
I don't see why it would guess that `score` is more selective than
(enabled, deleted, score), which it can get on a range-select from
the given index.

I agree. An index on (enabled,deleted,score) should be more useful
for the query than an index on (score) alone. Except when the
distribution of `enabled` and `deleted` values is pathological.
If even one record does NOT have enabled=1 and deleted=0, the
compound index should be more useful. In theory, anyway (wishful
thinking). Even though I'd call that distribution pathological.

Quote:
But the MySQL optimizer is known to to have ... deficites ... when
it comes to estimating the selectiveness of a compound index.
Perhaps running "OPTIMIZE TABLE" would reset the estimates a bit better.
This is in addition to other benefits like squeezing out free space from
MyISAM tables, defragging long strings, and sorting indexes.

Quote:
There might be another reason why the optimizer picks the index
on (score) - this index is smaller. So even if there are more index
entries to be scanned, it could be fewer index pages being read
from disk.

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.