![]() |
|
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
SELECT
s and Other Queries First, one thing that affects all queries: The more complex permission system setup you have, the more overhead you get.
If you do not have any GRANT
statements done, MySQL will
optimise the permission checking somewhat. So if you have a very high
volume it may be worth the time to avoid grants. Otherwise, more
permission check results in a larger overhead.
If your problem is with some explicit MySQL function, you can always time this in the MySQL client:
mysql> SELECT BENCHMARK(1000000,1+1); +------------------------+ | BENCHMARK(1000000,1+1) | +------------------------+ | 0 | +------------------------+ 1 row in set (0.32 sec) |
The above shows that MySQL can execute 1,000,000 +
expressions in 0.32 seconds on a PentiumII 400MHz
.
All MySQL functions should be very optimised, but there may be
some exceptions, and the BENCHMARK(loop_count,expression)
is a
great tool to find out if this is a problem with your query.
EXPLAIN
Syntax (Get Information About a SELECT
) EXPLAIN tbl_name or EXPLAIN SELECT select_options |
EXPLAIN tbl_name
is a synonym for DESCRIBE tbl_name
or
SHOW COLUMNS FROM tbl_name
.
When you precede a SELECT
statement with the keyword EXPLAIN
,
MySQL explains how it would process the SELECT
, providing
information about how tables are joined and in which order.
With the help of EXPLAIN
, you can see when you must add indexes
to tables to get a faster SELECT
that uses indexes to find the
records.
You should frequently run ANALYZE TABLE
to update table statistics
such as cardinality of keys which can affect the choices the optimiser
makes. See section ANALYZE TABLE
Syntax.
You can also see if the optimiser joins the tables in an optimal
order. To force the optimiser to use a specific join order for a
SELECT
statement, add a STRAIGHT_JOIN
clause.
For non-simple joins, EXPLAIN
returns a row of information for each
table used in the SELECT
statement. The tables are listed in the order
they would be read. MySQL resolves all joins using a single-sweep
multi-join method. This means that MySQL reads a row from the first
table, then finds a matching row in the second table, then in the third table
and so on. When all tables are processed, it outputs the selected columns and
backtracks through the table list until a table is found for which there are
more matching rows. The next row is read from this table and the process
continues with the next table.
In MySQL version 4.1 the EXPLAIN
output was changed to work better
with constructs like UNION
s, subqueries and derived tables. Most
notable is the addition of two new columns: id
and select_type
.
Output from EXPLAIN
consists of the following columns:
id
SELECT
identifier, the sequential number of this SELECT
within the query.
select_type
Type of SELECT
clause, which can be any of the following:
SIMPLE
Simple SELECT
(without UNION
s or subqueries).
PRIMARY
Outermost SELECT
.
UNION
Second and further UNION
SELECT
s.
DEPENDENT UNION
Second and further UNION
SELECTS
s, dependent on outer
subquery.
SUBQUERY
First SELECT
in subquery.
DEPENDENT SUBQUERY
First SELECT
, dependent on outer subquery.
DERIVED
Derived table SELECT
.
table
The table to which the row of output refers.
type
The join type. The different join types are listed here, ordered from best to worst type:
system
The table has only one row (= system table). This is a special case of
the const
join type.
const
The table has at most one matching row, which will be read at the start
of the query. Because there is only one row, values from the column in
this row can be regarded as constants by the rest of the
optimiser. const
tables are very fast as they are read only once!
const
is used when you compare all parts of a
PRIMARY
/UNIQUE
key with constants:
SELECT * FROM const_table WHERE primary_key=1; SELECT * FROM const_table WHERE primary_key_part1=1 and primary_key_part2=2; |
eq_ref
One row will be read from this table for each combination of rows from
the previous tables. This is the best possible join type, other than the
const
types. It is used when all parts of an index are used by
the join and the index is UNIQUE
or a PRIMARY KEY
.
eq_ref
can be used for indexed columns that is compared with
=
. The compared item may be a constant or an expression that uses
columns from tables that are read before this table.
In the following examples, ref_table
will be able to use eq_ref
SELECT * FROM ref_table,other_table WHERE ref_table.key_column=other_table.column; SELECT * FROM ref_table,other_table WHERE ref_table.key_column_part1=other_table.column AND ref_table.key_column_part2=1; |
ref
All rows with matching index values will be read from this table for each
combination of rows from the previous tables. ref
is used if the join
uses only a leftmost prefix of the key, or if the key is not UNIQUE
or a PRIMARY KEY
(in other words, if the join cannot select a single
row based on the key value). If the key that is used matches only a few rows,
this join type is good.
ref
can be used for indexed columns that is compared with =
.
In the following examples, ref_table
will be able to use ref
SELECT * FROM ref_table WHERE key_column=expr; SELECT * FROM ref_table,other_table WHERE ref_table.key_column=other_table.column; SELECT * FROM ref_table,other_table WHERE ref_table.key_column_part1=other_table.column AND ref_table.key_column_part2=1; |
ref_or_null
Like ref
, but with the addition that we will do an extra search for
rows with NULL
. See section How MySQL Optimises IS NULL
.
SELECT * FROM ref_table WHERE key_column=expr OR key_column IS NULL; |
This optimisation is new for MySQL 4.1.1 and is mostly used when resolving sub queries.
range
Only rows that are in a given range will be retrieved, using an index to
select the rows. The key
column indicates which index is used.
The key_len
contains the longest key part that was used.
The ref
column will be NULL
for this type.
range
can be used for when an key column is compared to a
constant with =
, <>
, >
, >=
, <
,
<=
, IS NULL
, <=>
, BETWEEN
and IN
.
SELECT * FROM range_table WHERE key_column = 10; SELECT * FROM range_table WHERE key_column BETWEEN 10 and 20; SELECT * FROM range_table WHERE key_column IN (10,20,30); SELECT * FROM range_table WHERE key_part1= 10 and key_part2 IN (10,20,30); |
index
This is the same as ALL
, except that only the index tree is
scanned. This is usually faster than ALL
, as the index file is usually
smaller than the datafile.
This can be used when the query only uses columns that are part of one index.
ALL
A full table scan will be done for each combination of rows from the
previous tables. This is normally not good if the table is the first
table not marked const
, and usually very bad in all other
cases. You normally can avoid ALL
by adding more indexes, so that
the row can be retrieved based on constant values or column values from
earlier tables.
possible_keys
The possible_keys
column indicates which indexes MySQL
could use to find the rows in this table. Note that this column is
totally independent of the order of the tables. That means that some of
the keys in possible_keys
may not be usable in practice with the
generated table order.
If this column is empty, there are no relevant indexes. In this case,
you may be able to improve the performance of your query by examining
the WHERE
clause to see if it refers to some column or columns
that would be suitable for indexing. If so, create an appropriate index
and check the query with EXPLAIN
again. See section ALTER TABLE
Syntax.
To see what indexes a table has, use SHOW INDEX FROM tbl_name
.
key
The key
column indicates the key (index) that MySQL actually
decided to use. The key is NULL
if no index was chosen. To force
MySQL to use an key listed in the possible_keys
column, use
USE KEY/IGNORE KEY
in your query.
See section SELECT
Syntax.
Also, running myisamchk --analyze
(see section myismchk
syntax) or ANALYZE TABLE
(see section ANALYZE TABLE
) on the table will help the
optimiser choose better indexes.
key_len
The key_len
column indicates the length of the key that
MySQL decided to use. The length is NULL
if the
key
is NULL
. Note that this tells us how many parts of a
multi-part key MySQL will actually use.
ref
The ref
column shows which columns or constants are used with the
key
to select rows from the table.
rows
The rows
column indicates the number of rows MySQL
believes it must examine to execute the query.
Extra
This column contains additional information of how MySQL will resolve the query. Here is an explanation of the different text strings that can be found in this column:
Distinct
MySQL will not continue searching for more rows for the current row combination after it has found the first matching row.
Not exists
MySQL was able to do a LEFT JOIN
optimisation on the
query and will not examine more rows in this table for the previous row
combination after it finds one row that matches the LEFT JOIN
criteria.
Here is an example for this:
SELECT * FROM t1 LEFT JOIN t2 ON t1.id=t2.id WHERE t2.id IS NULL; |
Assume that t2.id
is defined with NOT NULL
. In this case
MySQL will scan t1
and look up the rows in t2
through t1.id
. If MySQL finds a matching row in
t2
, it knows that t2.id
can never be NULL
, and will
not scan through the rest of the rows in t2
that has the same
id
. In other words, for each row in t1
, MySQL
only needs to do a single lookup in t2
, independent of how many
matching rows there are in t2
.
range checked for each record (index map: #)
MySQL didn't find a real good index to use. It will, instead, for each row combination in the preceding tables, do a check on which index to use (if any), and use this index to retrieve the rows from the table. This isn't very fast but is faster than having to do a join without an index.
Using filesort
MySQL will need to do an extra pass to find out how to retrieve
the rows in sorted order. The sort is done by going through all rows
according to the join type
and storing the sort key + pointer to
the row for all rows that match the WHERE
. Then the keys are
sorted. Finally the rows are retrieved in sorted order.
Using index
The column information is retrieved from the table using only information in the index tree without having to do an additional seek to read the actual row. This can be done when all the used columns for the table are part of the same index.
Using temporary
To resolve the query MySQL will need to create a
temporary table to hold the result. This typically happens if you do an
ORDER BY
on a different column set than you did a GROUP
BY
on.
Using where
A WHERE
clause will be used to restrict which rows will be
matched against the next table or sent to the client. If you don't have
this information and the table is of type ALL
or index
,
you may have something wrong in your query (if you don't intend to
fetch/examine all rows from the table).
If you want to get your queries as fast as possible, you should look out for
Using filesort
and Using temporary
.
You can get a good indication of how good a join is by multiplying all values
in the rows
column of the EXPLAIN
output. This should tell you
roughly how many rows MySQL must examine to execute the query. This
number is also used when you restrict queries with the max_join_size
variable.
See section Tuning Server Parameters.
The following example shows how a JOIN
can be optimised progressively
using the information provided by EXPLAIN
.
Suppose you have the SELECT
statement shown here, that you examine
using EXPLAIN
:
EXPLAIN SELECT tt.TicketNumber, tt.TimeIn, tt.ProjectReference, tt.EstimatedShipDate, tt.ActualShipDate, tt.ClientID, tt.ServiceCodes, tt.RepetitiveID, tt.CurrentProcess, tt.CurrentDPPerson, tt.RecordVolume, tt.DPPrinted, et.COUNTRY, et_1.COUNTRY, do.CUSTNAME FROM tt, et, et AS et_1, do WHERE tt.SubmitTime IS NULL AND tt.ActualPC = et.EMPLOYID AND tt.AssignedPC = et_1.EMPLOYID AND tt.ClientID = do.CUSTNMBR; |
For this example, assume that:
Table | Column | Column type |
| | |
| | |
| | |
| | |
| | |
Table | Index |
| |
| |
| |
| |
| |
tt.ActualPC
values aren't evenly distributed.
Initially, before any optimisations have been performed, the EXPLAIN
statement produces the following information:
table type possible_keys key key_len ref rows Extra et ALL PRIMARY NULL NULL NULL 74 do ALL PRIMARY NULL NULL NULL 2135 et_1 ALL PRIMARY NULL NULL NULL 74 tt ALL AssignedPC,ClientID,ActualPC NULL NULL NULL 3872 range checked for each record (key map: 35) |
Because type
is ALL
for each table, this output indicates that
MySQL is generating a Cartesian product of all the tables! This will take
quite a long time, as the product of the number of rows in each table must be
examined! For the case at hand, this is 74 * 2135 * 74 * 3872 =
45,268,558,720
rows. If the tables were bigger, you can only imagine how
long it would take.
One problem here is that MySQL can't (yet) use indexes on columns
efficiently if they are declared differently. In this context,
VARCHAR
and CHAR
are the same unless they are declared as
different lengths. Because tt.ActualPC
is declared as CHAR(10)
and et.EMPLOYID
is declared as CHAR(15)
, there is a length
mismatch.
To fix this disparity between column lengths, use ALTER TABLE
to
lengthen ActualPC
from 10 characters to 15 characters:
mysql> ALTER TABLE tt MODIFY ActualPC VARCHAR(15); |
Now tt.ActualPC
and et.EMPLOYID
are both VARCHAR(15)
.
Executing the EXPLAIN
statement again produces this result:
table type possible_keys key key_len ref rows Extra tt ALL AssignedPC,ClientID,ActualPC NULL NULL NULL 3872 Using where do ALL PRIMARY NULL NULL NULL 2135 range checked for each record (key map: 1) et_1 ALL PRIMARY NULL NULL NULL 74 range checked for each record (key map: 1) et eq_ref PRIMARY PRIMARY 15 tt.ActualPC 1 |
This is not perfect, but is much better (the product of the rows
values is now less by a factor of 74). This version is executed in a couple
of seconds.
A second alteration can be made to eliminate the column length mismatches
for the tt.AssignedPC = et_1.EMPLOYID
and tt.ClientID =
do.CUSTNMBR
comparisons:
mysql> ALTER TABLE tt MODIFY AssignedPC VARCHAR(15), -> MODIFY ClientID VARCHAR(15); |
Now EXPLAIN
produces the output shown here:
table type possible_keys key key_len ref rows Extra et ALL PRIMARY NULL NULL NULL 74 tt ref AssignedPC, ActualPC 15 et.EMPLOYID 52 Using where ClientID, ActualPC et_1 eq_ref PRIMARY PRIMARY 15 tt.AssignedPC 1 do eq_ref PRIMARY PRIMARY 15 tt.ClientID 1 |
This is almost as good as it can get.
The remaining problem is that, by default, MySQL assumes that values
in the tt.ActualPC
column are evenly distributed, and that isn't the
case for the tt
table. Fortunately, it is easy to tell MySQL
about this:
shell> myisamchk --analyze PATH_TO_MYSQL_DATABASE/tt shell> mysqladmin refresh |
Now the join is perfect, and EXPLAIN
produces this result:
table type possible_keys key key_len ref rows Extra tt ALL AssignedPC NULL NULL NULL 3872 Using where ClientID, ActualPC et eq_ref PRIMARY PRIMARY 15 tt.ActualPC 1 et_1 eq_ref PRIMARY PRIMARY 15 tt.AssignedPC 1 do eq_ref PRIMARY PRIMARY 15 tt.ClientID 1 |
Note that the rows
column in the output from EXPLAIN
is an
educated guess from the MySQL join optimiser. To optimise a
query, you should check if the numbers are even close to the truth. If not,
you may get better performance by using STRAIGHT_JOIN
in your
SELECT
statement and trying to list the tables in a different order in
the FROM
clause.
In most cases you can estimate the performance by counting disk seeks.
For small tables, you can usually find the row in 1 disk seek (as the
index is probably cached). For bigger tables, you can estimate that
(using B++ tree indexes) you will need: log(row_count) /
log(index_block_length / 3 * 2 / (index_length + data_pointer_length)) +
1
seeks to find a row.
In MySQL an index block is usually 1024 bytes and the data
pointer is usually 4 bytes. A 500,000 row table with an
index length of 3 (medium integer) gives you:
log(500,000)/log(1024/3*2/(3+4)) + 1
= 4 seeks.
As the above index would require about 500,000 * 7 * 3/2 = 5.2M, (assuming that the index buffers are filled to 2/3, which is typical) you will probably have much of the index in memory and you will probably only need 1-2 calls to read data from the OS to find the row.
For writes, however, you will need 4 seek requests (as above) to find where to place the new index and normally 2 seeks to update the index and write the row.
Note that the above doesn't mean that your application will slowly degenerate by log N! As long as everything is cached by the OS or SQL server things will only go marginally slower while the table gets bigger. After the data gets too big to be cached, things will start to go much slower until your applications is only bound by disk-seeks (which increase by log N). To avoid this, increase the index cache as the data grows. See section Tuning Server Parameters.
SELECT
Queries In general, when you want to make a slow SELECT ... WHERE
faster, the
first thing to check is whether you can add an index. See section MySQL indexes. All references between different tables
should usually be done with indexes. You can use the EXPLAIN
command
to determine which indexes are used for a SELECT
.
See section EXPLAIN
.
Some general tips:
myisamchk
--analyze
on a table after it has been loaded with relevant data. This
updates a value for each index part that indicates the average number of
rows that have the same value. (For unique indexes, this is always 1,
of course.) MySQL will use this to decide which index to
choose when you connect two tables with 'a non-constant expression'.
You can check the result from the analyze
run by doing SHOW
INDEX FROM table_name
and examining the Cardinality
column.
myisamchk
--sort-index --sort-records=1
(if you want to sort on index 1). If you
have a unique index from which you want to read all records in order
according to that index, this is a good way to make that faster. Note,
however, that this sorting isn't written optimally and will take a long
time for a large table!
WHERE
Clauses The WHERE
optimisations are put in the SELECT
part here because
they are mostly used with SELECT
, but the same optimisations apply for
WHERE
in DELETE
and UPDATE
statements.
Also note that this section is incomplete. MySQL does many optimisations, and we have not had time to document them all.
Some of the optimisations performed by MySQL are listed here:
((a AND b) AND c OR (((a AND b) AND (c AND d)))) -> (a AND b AND c) OR (a AND b AND c AND d) |
(a<b AND b=c) AND a=5 -> b>5 AND b=c AND a=5 |
(B>=5 AND B=5) OR (B=6 AND 5=5) OR (B=7 AND 5=6) -> B=5 OR B=6 |
COUNT(*)
on a single table without a WHERE
is retrieved
directly from the table information for MyISAM
and HEAP
tables.
This is also done for any NOT NULL
expression when used with only one
table.
SELECT
statements are impossible and returns no rows.
HAVING
is merged with WHERE
if you don't use GROUP BY
or group functions (COUNT()
, MIN()
...).
WHERE
is constructed to get a fast
WHERE
evaluation for each sub-join and also to skip records as
soon as possible.
WHERE
clause on a UNIQUE
index, or a PRIMARY KEY
, where all index parts are used with constant
expressions and the index parts are defined as NOT NULL
.
All the following tables are used as constant tables:
mysql> SELECT * FROM t WHERE primary_key=1; mysql> SELECT * FROM t1,t2 -> WHERE t1.primary_key=1 AND t2.primary_key=t1.id; |
ORDER BY
and in GROUP
BY
come from the same table, then this table is preferred first when
joining.
ORDER BY
clause and a different GROUP BY
clause, or if the ORDER BY
or GROUP BY
contains columns
from tables other than the first table in the join queue, a temporary
table is created.
SQL_SMALL_RESULT
, MySQL will use an in-memory
temporary table.
HAVING
clause
are skipped.
Some examples of queries that are very fast:
mysql> SELECT COUNT(*) FROM tbl_name; mysql> SELECT MIN(key_part1),MAX(key_part1) FROM tbl_name; mysql> SELECT MAX(key_part2) FROM tbl_name -> WHERE key_part_1=constant; mysql> SELECT ... FROM tbl_name -> ORDER BY key_part1,key_part2,... LIMIT 10; mysql> SELECT ... FROM tbl_name -> ORDER BY key_part1 DESC,key_part2 DESC,... LIMIT 10; |
The following queries are resolved using only the index tree (assuming the indexed columns are numeric):
mysql> SELECT key_part1,key_part2 FROM tbl_name WHERE key_part1=val; mysql> SELECT COUNT(*) FROM tbl_name -> WHERE key_part1=val1 AND key_part2=val2; mysql> SELECT key_part2 FROM tbl_name GROUP BY key_part1; |
The following queries use indexing to retrieve the rows in sorted order without a separate sorting pass:
mysql> SELECT ... FROM tbl_name -> ORDER BY key_part1,key_part2,... ; mysql> SELECT ... FROM tbl_name -> ORDER BY key_part1 DESC,key_part2 DESC,... ; |
IS NULL
MySQL can do the same optimisation on column IS NULL
as it can do
with column = constant_value
. For example, MySQL can use
indexes and ranges to search for NULL
with IS NULL
.
SELECT * FROM table_name WHERE key_col IS NULL; SELECT * FROM table_name WHERE key_col <=> NULL; SELECT * FROM table_name WHERE key_col=# OR key_col=# OR key_col IS NULL |
If you use column_name IS NULL
on a NOT NULL
in a WHERE clause
on table that is not used OUTER JOIN
that expression will be optimised
away.
MySQL 4.1.1 can additionally optimise the combination column =
expr AND column IS NULL
, an form that is common in resolved sub
queries. EXPLAIN
will show ref_or_null
when this
optimisation is used.
This optimisation can handle one IS NULL
for any key part.
Some examples of queries that are optimised (assuming key on t2 (a,b)):
SELECT * FROM t1 WHERE t1.a=expr OR t1.a IS NULL; SELECT * FROM t1,t2 WHERE t1.a=t2.a OR t2.a IS NULL; SELECT * FROM t1,t2 WHERE (t1.a=t2.a OR t2.a IS NULL) AND t2.b=t1.b; SELECT * FROM t1,t2 WHERE t1.a=t2.a AND (t2.b=t1.b OR t2.b IS NULL); SELECT * FROM t1,t2 WHERE (t1.a=t2.a AND t2.a IS NULL AND ...) OR (t1.a=t2.a AND t2.a IS NULL AND ...); |
ref_or_null
works by first doing a read on the reference key
and after that a separate search after rows with NULL key.
Note that the optimisation can only handle one IS NULL
level.
SELECT * FROM t1,t2 where (t1.a=t2.a AND t2.a IS NULL) OR (t1.b=t2.b AND t2.b IS NULL); |
Int the above case MySQL will only use key lookups on the part
(t1.a=t2.a AND t2.a IS NULL)
and not be able to use the key part on
b
.
DISTINCT
DISTINCT
is converted to a GROUP BY
on all columns,
DISTINCT
combined with ORDER BY
will in many cases also
need a temporary table.
Note that as DISTINCT
may use GROUP BY
, you should be aware of
how MySQL works with in fields in ORDER BY
or HAVING
that
are not part of the selected fields. See section GROUP BY
with Hidden Fields.
When combining LIMIT row_count
with DISTINCT
, MySQL will stop
as soon as it finds row_count
unique rows.
If you don't use columns from all used tables, MySQL will stop the scanning of the not used tables as soon as it has found the first match.
SELECT DISTINCT t1.a FROM t1,t2 where t1.a=t2.a; |
In the case, assuming t1
is used before t2
(check with
EXPLAIN
), then MySQL will stop reading from t2
(for that
particular row in t1
) when the first row in t2
is found.
LEFT JOIN
and RIGHT JOIN
A LEFT JOIN B join_condition
in MySQL is implemented as follows:
B
is set to be dependent on table A
and all tables
that A
is dependent on.
A
is set to be dependent on all tables (except B
)
that are used in the LEFT JOIN
condition.
LEFT JOIN
condition is used to decide how we should retrieve rows
from table B. (In other words, any condition in the WHERE
clause
is not used).
WHERE
optimisations are done.
A
that matches the WHERE
clause, but there
wasn't any row in B
that matched the ON
condition,
then an extra B
row is generated with all columns set to NULL
.
LEFT JOIN
to find rows that don't exist in some
table and you have the following test: column_name IS NULL
in the
WHERE
part, where column_name is a column that is declared as
NOT NULL
, then MySQL will stop searching after more rows
(for a particular key combination) after it has found one row that
matches the LEFT JOIN
condition.
RIGHT JOIN
is implemented analogously as LEFT JOIN
.
The table read order forced by LEFT JOIN
and STRAIGHT JOIN
will help the join optimiser (which calculates in which order tables
should be joined) to do its work much more quickly, as there are fewer
table permutations to check.
Note that the above means that if you do a query of type:
SELECT * FROM a,b LEFT JOIN c ON (c.key=a.key) LEFT JOIN d (d.key=a.key) WHERE b.key=d.key |
MySQL will do a full scan on b
as the LEFT JOIN
will force
it to be read before d
.
The fix in this case is to change the query to:
SELECT * FROM b,a LEFT JOIN c ON (c.key=a.key) LEFT JOIN d (d.key=a.key) WHERE b.key=d.key |
Starting from 4.0.14 MySQL does the following left join optimisation:
If the WHERE
condition is always be false for the generated
NULL
row, the LEFT JOIN
is changed to a normal join.
For example, in the following query the WHERE
clause would be
false if t2.column would be NULL
so it's safe to convert to
a normal join.
SELECT * FROM t1 LEFT t2 ON (column) WHERE t2.column2 =5; -> SELECT * FROM t1,t2 WHERE t2.column2=5 AND t1.column=t2.column; |
this can be made faster as MySQL can now use table t2
before
table t1
if this would result in a better query plan. To force a
specific table order one should use STRAIGHT JOIN
.
ORDER BY
In some cases MySQL can uses index to satisfy an ORDER BY
or
GROUP BY
request without doing any extra sorting.
The index can also be used even if the ORDER BY
doesn't match the
index exactly, as long as all the unused index parts and all the extra
are ORDER BY
columns are constants in the WHERE
clause. The following queries will use the index to resolve the
ORDER BY
/ GROUP BY
part:
SELECT * FROM t1 ORDER BY key_part1,key_part2,... SELECT * FROM t1 WHERE key_part1=constant ORDER BY key_part2 SELECT * FROM t1 WHERE key_part1=constant GROUP BY key_part2 SELECT * FROM t1 ORDER BY key_part1 DESC,key_part2 DESC SELECT * FROM t1 WHERE key_part1=1 ORDER BY key_part1 DESC,key_part2 DESC |
Some cases where MySQL can not use indexes to resolve the ORDER
BY
: (Note that MySQL will still use indexes to find the rows that
matches the WHERE
clause):
ORDER BY
on different keys:
SELECT * FROM t1 ORDER BY key1,key2
ORDER BY
using non-consecutive key parts.
SELECT * FROM t1 WHERE key2=constant ORDER BY key_part2
ASC
and DESC
.
SELECT * FROM t1 ORDER BY key_part1 DESC,key_part2 ASC
ORDER BY
:
SELECT * FROM t1 WHERE key2=constant ORDER BY key1
ORDER
BY
on are not all from the first not-const
table that is used to
retrieve rows (This is the first table in the EXPLAIN
output which
doesn't use a const
row fetch method).
ORDER BY
and GROUP BY
expressions.
HASH
index in HEAP
tables).
In the cases where MySQL have to sort the result, it uses the following algorithm:
WHERE
clause are skipped.
sort_buffer
).
MERGEBUFF
(7) regions to one block in
another temporary file. Repeat until all blocks from the first file
are in the second file.
MERGEBUFF2
(15)
blocks left.
read_rnd_buffer_size
) .
You can with EXPLAIN SELECT ... ORDER BY
check if MySQL can use
indexes to resolve the query. If you get Using filesort
in the
extra
column, then MySQL can't use indexes to resolve the
ORDER BY
. See section EXPLAIN
Syntax (Get Information About a SELECT
).
If you want to have a higher ORDER BY
speed, you should first
see if you can get MySQL to use indexes instead of having to do an extra
sorting phase. If this is not possible, then you can do:
sort_buffer_size
variable.
read_rnd_buffer_size
variable.
tmpdir
to point to a dedicated disk with lots of empty space.
If you use MySQL 4.1 or later you can spread load between
several physical disks by setting tmpdir
to a list of paths
separated by colon :
(semicolon ;
on Windows). They
will be used in round-robin fashion.
Note: These paths should end up on different physical disks,
not different partitions of the same disk.
MySQL by default sorts all GROUP BY x,y[,...]
queries as if you
would have specified ORDER BY x,y[,...]
. MySQL will optimise
away any ORDER BY
as above without any speed penalty. If you by
in some cases don't want to have the result sorted you can specify
ORDER BY NULL
:
INSERT INTO foo SELECT a,COUNT(*) FROM bar GROUP BY a ORDER BY NULL; |
LIMIT
In some cases MySQL will handle the query differently when you are
using LIMIT row_count
and not using HAVING
:
LIMIT
, MySQL
will use indexes in some cases when it normally would prefer to do a
full table scan.
LIMIT row_count
with ORDER BY
, MySQL will end the
sorting as soon as it has found the first row_count
lines instead of sorting
the whole table.
LIMIT row_count
with DISTINCT
, MySQL will stop
as soon as it finds row_count
unique rows.
GROUP BY
can be resolved by reading the key in order
(or do a sort on the key) and then calculate summaries until the
key value changes. In this case LIMIT row_count
will not calculate any
unnecessary GROUP BY
s.
#
rows to the client, it
will abort the query (if you are not using SQL_CALC_FOUND_ROWS
).
LIMIT 0
will always quickly return an empty set. This is useful
to check the query and to get the column types of the result columns.
LIMIT row_count
is used to calculate how much space is required.
INSERT
Queries The time to insert a record consists approximately of:
where the numbers are somewhat proportional to the overall time. This does not take into consideration the initial overhead to open tables (which is done once for each concurrently running query).
The size of the table slows down the insertion of indexes by log N (B-trees).
Some ways to speed up inserts:
INSERT
statements. This is much faster (many times
in some cases) than using separate INSERT
statements. If you are adding
data to non-empty table, you may tune up the bulk_insert_buffer_size
variable to make it even faster.
See section bulk_insert_buffer_size
.
INSERT DELAYED
statement. See section INSERT
.
MyISAM
tables you can insert rows at the same time
SELECT
s are running if there are no deleted rows in the tables.
LOAD DATA INFILE
. This
is usually 20 times faster than using a lot of INSERT
statements.
See section LOAD DATA
.
LOAD DATA INFILE
run even
faster when the table has many indexes. Use the following procedure:
CREATE TABLE
. For example, using
mysql
or Perl-DBI.
FLUSH TABLES
statement or the shell command mysqladmin
flush-tables
.
myisamchk --keys-used=0 -rq /path/to/db/tbl_name
. This will
remove all usage of all indexes from the table.
LOAD DATA INFILE
. This will not
update any indexes and will therefore be very fast.
myisampack
on it to make it smaller. See section Compressed Table Characteristics.
myisamchk -r -q
/path/to/db/tbl_name
. This will create the index tree in memory before
writing it to disk, which is much faster because it avoids lots of disk
seeks. The resulting index tree is also perfectly balanced.
FLUSH TABLES
statement or the shell command mysqladmin
flush-tables
.
Note that LOAD DATA INFILE
also does the above optimisation if
you insert into an empty table; the main difference with the above
procedure is that you can let myisamchk
allocate much more temporary
memory for the index creation that you may want MySQL to allocate for
every index recreation.
Since MySQL 4.0 you can also use
ALTER TABLE tbl_name DISABLE KEYS
instead of
myisamchk --keys-used=0 -rq /path/to/db/tbl_name
and
ALTER TABLE tbl_name ENABLE KEYS
instead of
myisamchk -r -q /path/to/db/tbl_name
. This way you can also skip
FLUSH TABLES
steps.
mysql> LOCK TABLES a WRITE; mysql> INSERT INTO a VALUES (1,23),(2,34),(4,33); mysql> INSERT INTO a VALUES (8,26),(6,29); mysql> UNLOCK TABLES; |
The main speed difference is that the index buffer is flushed to disk only
once, after all INSERT
statements have completed. Normally there would
be as many index buffer flushes as there are different INSERT
statements. Locking is not needed if you can insert all rows with a single
statement.
For transactional tables, you should use BEGIN/COMMIT
instead of
LOCK TABLES
to get a speedup.
Locking will also lower the total time of multi-connection tests, but the maximum wait time for some threads will go up (because they wait for locks). For example:
thread 1 does 1000 inserts thread 2, 3, and 4 does 1 insert thread 5 does 1000 inserts |
If you don't use locking, 2, 3, and 4 will finish before 1 and 5. If you use locking, 2, 3, and 4 probably will not finish before 1 or 5, but the total time should be about 40% faster.
As INSERT
, UPDATE
, and DELETE
operations are very
fast in MySQL, you will obtain better overall performance by
adding locks around everything that does more than about 5 inserts or
updates in a row. If you do very many inserts in a row, you could do a
LOCK TABLES
followed by an UNLOCK TABLES
once in a while
(about each 1000 rows) to allow other threads access to the table. This
would still result in a nice performance gain.
Of course, LOAD DATA INFILE
is much faster for loading data.
To get some more speed for both LOAD DATA INFILE
and
INSERT
, enlarge the key buffer. See section Tuning Server Parameters.
UPDATE
Queries Update queries are optimised as a SELECT
query with the additional
overhead of a write. The speed of the write is dependent on the size of
the data that is being updated and the number of indexes that are
updated. Indexes that are not changed will not be updated.
Also, another way to get fast updates is to delay updates and then do many updates in a row later. Doing many updates in a row is much quicker than doing one at a time if you lock the table.
Note that, with dynamic record format, updating a record to
a longer total length may split the record. So if you do this often,
it is very important to OPTIMIZE TABLE
sometimes.
See section OPTIMIZE TABLE
.
DELETE
Queries If you want to delete all rows in the table, you should use
TRUNCATE TABLE table_name
. See section TRUNCATE
Syntax.
The time to delete a record is exactly proportional to the number of indexes. To delete records more quickly, you can increase the size of the index cache. See section Tuning Server Parameters.
Unsorted tips for faster systems:
thread_cache_size
variable. See section Tuning Server Parameters.
EXPLAIN
command. See Explain: (manual)EXPLAIN.
SELECT
queries on MyISAM
tables that are
updated a lot. This is to avoid problems with table locking.
MyISAM
tables can insert rows in a table without deleted
rows at the same time another table is reading from it. If this is important
for you, you should consider methods where you don't have to delete rows
or run OPTIMIZE TABLE
after you have deleted a lot of rows.
ALTER TABLE ... ORDER BY expr1,expr2...
if you mostly
retrieve rows in expr1,expr2...
order. By using this option after big
changes to the table, you may be able to get higher performance.
SELECT * FROM table_name WHERE hash=MD5(CONCAT(col1,col2))
AND col_1='constant' AND col_2='constant'
VARCHAR
or BLOB
columns. You will get dynamic row length as soon as you
are using a single VARCHAR
or BLOB
column. See section MySQL Table Types.
UPDATE table SET count=count+1 WHERE index_column=constant
is very fast!
This is really important when you use MySQL table types like MyISAM and ISAM that only have table locking (multiple readers / single writers). This will also give better performance with most databases, as the row locking manager in this case will have less to do.
INSERT /*! DELAYED */
when you do not need to know when your
data is written. This speeds things up because many records can be written
with a single disk write.
INSERT /*! LOW_PRIORITY */
when you want your selects to be
more important.
SELECT /*! HIGH_PRIORITY */
to get selects that jump the
queue. That is, the select is done even if there is somebody waiting to
do a write.
INSERT
statement to store many rows with one
SQL command (many SQL servers supports this).
LOAD DATA INFILE
to load bigger amounts of data. This is
faster than normal inserts and will be even faster when myisamchk
is integrated in mysqld
.
AUTO_INCREMENT
columns to make unique values.
OPTIMIZE TABLE
once in a while to avoid fragmentation when
using a dynamic table format. See section OPTIMIZE TABLE
.
HEAP
tables to get more speed when possible. See section MySQL Table Types.
Try to keep the names simple (use name
instead of
customer_name
in the customer table). To make your names portable
to other SQL servers you should keep them shorter than 18 characters.
MyISAM
directly, you could
get a speed increase of 2-5 times compared to using the SQL interface.
To be able to do this the data must be on the same server as
the application, and usually it should only be accessed by one process
(because external file locking is really slow). One could eliminate the
above problems by introducing low-level MyISAM
commands in the
MySQL server (this could be one easy way to get more
performance if needed). By carefully designing the database interface,
it should be quite easy to support this types of optimisation.
DELAY_KEY_WRITE=1
will make the updating of
indexes faster, as these are not logged to disk until the file is closed.
The downside is that you should run myisamchk
on these tables before
you start mysqld
to ensure that they are okay if something killed
mysqld
in the middle. As the key information can always be generated
from the data, you should not lose anything by using DELAY_KEY_WRITE
.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] |
Hosting by: Hurra Communications Ltd.
Generated: 2007-01-26 17:58:45