Quote:
Thanks very much for your detailed reply.
I was wondering if there is a way to analyze the sql that is being used to access the database and customize the indices based on the sql? |
If you need to get hold of the queries being used, you can turn on
the MySQL log. Or look at the source code of your applications.
Then look at the SQL.
First, if you have auto_increment fields, you need a unique index
on that field (probably a primary key).
Also, if you need to enforce uniqueness on the values in a particular
column or combination of particular columns, you need a unique index
on that (the columns may be in any order). This is a candidate for
a primary key if you don't already have one.
Now, look at the WHERE and ON clauses used in your queries.
WHERE field = 'value'
or WHERE field > 'value'
or WHERE field < 'value'
suggests an index on field.
WHERE field1 = 'value1' AND field2 = 'value2'
suggests an index on (field1, field2) or (field2, field1).
WHERE field LIKE '%value%'
does not suggest an index, since there is no leading character
in the pattern.
WHERE field LIKE 'value%'
(with no leading % or _ wildcards) suggests an index on field.
WHERE last_name = 'value1' and first_name = 'value2' and
middle_name > 'S'
suggests an index on (last_name, first_name, middle_name) or
(first_name, last_name, middle_name). The exact-equality tests go
first, then an inequality test or a LIKE with a non-wildcard prefix.
If you JOIN two tables on one or more fields being equal,
that suggests an index on those fields (in any order).
Now, try to merge all the suggestions into a reasonable (and small)
set of indexes. Keep in mind that the most important indexes are
those suggested by queries that are:
- run most often
- deal with the most data
- are most time-critical
- are the most time-consuming
- are most selective (narrow down the choices the most)
There is no point in having multiple indexes on the same fields in
the same order on the same table. Drop duplicates.
If a suggested index has multiple fields, and one or more of those
fields is a unique index or primary key compared for exact equality,
you might as well drop all but one unique value, as the unique value
will select at most one record, and the others do not add much (that
one record will either be included or excluded by the values of the
other fields). Since you already have a unique index for the
uniqueness, no additional indexes are needed for that suggested
index.
Order of fields in an index matters. Phone books are ordered by
(last_name, first_name), and that ordering is pretty useless for
searching for first_name='George' AND street_address='123 Main
Street'. You end up having to scan the whole book.
An index on (a, b, c, d) works as an index on (a), (a, b), and (a, b, c)
as well, but *not* on (b, c, d).
An index on (a, b) works as a *PARTIAL* substitute for an index on
(a, b, c), or (a, b, c, d) (but it's slightly slower).
MySQL generally uses at most one index on the WHERE clause and one
index on each JOIN. An index on (a) and a separate index on (b)
is not a complete substitute for an index on (b, a) or (a, b) (but
it is a partial substitute, using the index on (a) as a partial
substitute for one on (a,b) ).
MySQL can also use an index for the fields in an ORDER BY clause.
An index on a field that has few different values does little to
improve the search. For example, for a customer database for a
lawn-mowing service or other *local* small business in Fort Worth,
Texas, an index on (planet, country, state, city) as opposed to
just (city) may be rather pointless since 99.9% of them are for
('Earth', 'United States', 'Texas', city), (hint: Fort Worth is
not near the Texas state boundary) and, surprise, no entries for
Australia, France, Oklahoma, or Mars. For a mail-order business,
this can be very different.
Example: I have a database of video recordings, which is accessed
by private web pages. The 'recording' table has one entry for each
copy of a particular recording. There are other tables I'm not
going to mention here.
1. The table has an `id` auto_increment field. This is the primary key.
2. A given `track` on a given `disk` may contain no more than one
episode recording, so this suggests a unique index on (disk, track)
or (track, disk) to prevent duplicates.
The web site has several pages that use the following WHERE clauses:
3. WHERE series = 'value'
(all the recordings for a particular series)
suggests an index on (series).
4. WHERE series = 'value1' and prod = 'value2'
(all the recordings of a particular episode (series
and production number) - there can be more than one
such recording)
suggests an index on (series, prod) or (prod, series)
5. WHERE disk = 'value'
(all the recordings on a particular disk)
suggests an index on (disk)
6. Updates to particular records, for example, when the recording is
moved from a DVR hard disk to a DVD, are done by:
WHERE id = 'value'.
There are other fields, such as recording length, recording date,
quality, and title, which do not need an index. You might think
that this is not normalized, as title might seem to depend only on
(series, prod), but it turns out that the same episode can have
different titles on different recordings.
Ok, we've got suggestions for indexes:
1. (id) unique
2. (disk, track) or (track, disk) unique
3. (series)
4. (series, prod) or (prod, series)
5. (disk)
6. (id)
I need a primary key of (id) to satisfy #1, which also covers #6.
From #2, I choose a unique index on (disk, track) which also satisfies #5.
If I had chosen (track, disk), that wouldn't satisfy #5.
From #4, I choose a non-unique index on (series, prod) which also
satisfies #3.
That's 3 indexes.