MySQL Reference Manual for version 3.23.58.: 5.2 Optimising SELECTs and Other Queries

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

5.2 Optimising `SELECT`s and Other Queries

First, one thing that affects all queries: The more complex permission system setup you have, the more overhead you get.

If you do not have any GRANT statements done, MySQL will optimise the permission checking somewhat. So if you have a very high volume it may be worth the time to avoid grants. Otherwise, more permission check results in a larger overhead.

If your problem is with some explicit MySQL function, you can always time this in the MySQL client:

mysql> SELECT BENCHMARK(1000000,1+1);
+------------------------+
| BENCHMARK(1000000,1+1) |
+------------------------+
|                      0 |
+------------------------+
1 row in set (0.32 sec)

The above shows that MySQL can execute 1,000,000 + expressions in 0.32 seconds on a PentiumII 400MHz.

All MySQL functions should be very optimised, but there may be some exceptions, and the BENCHMARK(loop_count,expression) is a great tool to find out if this is a problem with your query.

5.2.1 `EXPLAIN` Syntax (Get Information About a `SELECT`)

    EXPLAIN tbl_name
or  EXPLAIN SELECT select_options

EXPLAIN tbl_name is a synonym for DESCRIBE tbl_name or SHOW COLUMNS FROM tbl_name.

When you precede a SELECT statement with the keyword EXPLAIN, MySQL explains how it would process the SELECT, providing information about how tables are joined and in which order.

With the help of EXPLAIN, you can see when you must add indexes to tables to get a faster SELECT that uses indexes to find the records.

You should frequently run ANALYZE TABLE to update table statistics such as cardinality of keys which can affect the choices the optimiser makes. See section ANALYZE TABLE Syntax.

You can also see if the optimiser joins the tables in an optimal order. To force the optimiser to use a specific join order for a SELECT statement, add a STRAIGHT_JOIN clause.

For non-simple joins, EXPLAIN returns a row of information for each table used in the SELECT statement. The tables are listed in the order they would be read. MySQL resolves all joins using a single-sweep multi-join method. This means that MySQL reads a row from the first table, then finds a matching row in the second table, then in the third table and so on. When all tables are processed, it outputs the selected columns and backtracks through the table list until a table is found for which there are more matching rows. The next row is read from this table and the process continues with the next table.

In MySQL version 4.1 the EXPLAIN output was changed to work better with constructs like UNIONs, subqueries and derived tables. Most notable is the addition of two new columns: id and select_type.

Output from EXPLAIN consists of the following columns:

id

SELECT identifier, the sequential number of this SELECT within the query.

select_type

Type of SELECT clause, which can be any of the following:

SIMPLE: Simple SELECT (without UNIONs or subqueries).
PRIMARY: Outermost SELECT.
UNION: Second and further UNION SELECTs.
DEPENDENT UNION: Second and further UNION SELECTSs, dependent on outer subquery.
SUBQUERY: First SELECT in subquery.
DEPENDENT SUBQUERY: First SELECT, dependent on outer subquery.
DERIVED: Derived table SELECT.

table

The table to which the row of output refers.

type

The join type. The different join types are listed here, ordered from best to worst type:

system

The table has only one row (= system table). This is a special case of the const join type.

const

The table has at most one matching row, which will be read at the start of the query. Because there is only one row, values from the column in this row can be regarded as constants by the rest of the optimiser. const tables are very fast as they are read only once!

const is used when you compare all parts of a PRIMARY/UNIQUE key with constants:

SELECT * FROM const_table WHERE primary_key=1;

SELECT * FROM const_table WHERE primary_key_part1=1 and primary_key_part2=2;

eq_ref

One row will be read from this table for each combination of rows from the previous tables. This is the best possible join type, other than the const types. It is used when all parts of an index are used by the join and the index is UNIQUE or a PRIMARY KEY.

eq_ref can be used for indexed columns that is compared with =. The compared item may be a constant or an expression that uses columns from tables that are read before this table.

In the following examples, ref_table will be able to use eq_ref

SELECT * FROM ref_table,other_table WHERE
ref_table.key_column=other_table.column;

SELECT * FROM ref_table,other_table WHERE
ref_table.key_column_part1=other_table.column AND
ref_table.key_column_part2=1;

ref

All rows with matching index values will be read from this table for each combination of rows from the previous tables. ref is used if the join uses only a leftmost prefix of the key, or if the key is not UNIQUE or a PRIMARY KEY (in other words, if the join cannot select a single row based on the key value). If the key that is used matches only a few rows, this join type is good.

ref can be used for indexed columns that is compared with =.

In the following examples, ref_table will be able to use ref

SELECT * FROM ref_table WHERE key_column=expr;

SELECT * FROM ref_table,other_table WHERE
ref_table.key_column=other_table.column;

SELECT * FROM ref_table,other_table WHERE
ref_table.key_column_part1=other_table.column AND
ref_table.key_column_part2=1;

ref_or_null

Like ref, but with the addition that we will do an extra search for rows with NULL. See section How MySQL Optimises IS NULL.

SELECT * FROM ref_table WHERE key_column=expr OR key_column IS NULL;

This optimisation is new for MySQL 4.1.1 and is mostly used when resolving sub queries.

range

Only rows that are in a given range will be retrieved, using an index to select the rows. The key column indicates which index is used. The key_len contains the longest key part that was used. The ref column will be NULL for this type.

range can be used for when an key column is compared to a constant with =, <>, >, >=, <, <=, IS NULL, <=>, BETWEEN and IN.

SELECT * FROM range_table WHERE key_column = 10;

SELECT * FROM range_table WHERE key_column BETWEEN 10 and 20;

SELECT * FROM range_table WHERE key_column IN (10,20,30);

SELECT * FROM range_table WHERE key_part1= 10 and key_part2 IN (10,20,30);

index

This is the same as ALL, except that only the index tree is scanned. This is usually faster than ALL, as the index file is usually smaller than the datafile.

This can be used when the query only uses columns that are part of one index.

ALL

A full table scan will be done for each combination of rows from the previous tables. This is normally not good if the table is the first table not marked const, and usually very bad in all other cases. You normally can avoid ALL by adding more indexes, so that the row can be retrieved based on constant values or column values from earlier tables.

possible_keys

The possible_keys column indicates which indexes MySQL could use to find the rows in this table. Note that this column is totally independent of the order of the tables. That means that some of the keys in possible_keys may not be usable in practice with the generated table order.

If this column is empty, there are no relevant indexes. In this case, you may be able to improve the performance of your query by examining the WHERE clause to see if it refers to some column or columns that would be suitable for indexing. If so, create an appropriate index and check the query with EXPLAIN again. See section ALTER TABLE Syntax.

To see what indexes a table has, use SHOW INDEX FROM tbl_name.

key

The key column indicates the key (index) that MySQL actually decided to use. The key is NULL if no index was chosen. To force MySQL to use an key listed in the possible_keys column, use USE KEY/IGNORE KEY in your query. See section SELECT Syntax.

Also, running myisamchk --analyze (see section myismchk syntax) or ANALYZE TABLE (see section ANALYZE TABLE) on the table will help the optimiser choose better indexes.

key_len

The key_len column indicates the length of the key that MySQL decided to use. The length is NULL if the key is NULL. Note that this tells us how many parts of a multi-part key MySQL will actually use.

ref

The ref column shows which columns or constants are used with the key to select rows from the table.

rows

The rows column indicates the number of rows MySQL believes it must examine to execute the query.

Extra

This column contains additional information of how MySQL will resolve the query. Here is an explanation of the different text strings that can be found in this column:

Distinct

MySQL will not continue searching for more rows for the current row combination after it has found the first matching row.

Not exists

MySQL was able to do a LEFT JOIN optimisation on the query and will not examine more rows in this table for the previous row combination after it finds one row that matches the LEFT JOIN criteria.

Here is an example for this:

SELECT * FROM t1 LEFT JOIN t2 ON t1.id=t2.id WHERE t2.id IS NULL;

Assume that t2.id is defined with NOT NULL. In this case MySQL will scan t1 and look up the rows in t2 through t1.id. If MySQL finds a matching row in t2, it knows that t2.id can never be NULL, and will not scan through the rest of the rows in t2 that has the same id. In other words, for each row in t1, MySQL only needs to do a single lookup in t2, independent of how many matching rows there are in t2.

range checked for each record (index map: #)

MySQL didn't find a real good index to use. It will, instead, for each row combination in the preceding tables, do a check on which index to use (if any), and use this index to retrieve the rows from the table. This isn't very fast but is faster than having to do a join without an index.

Using filesort

MySQL will need to do an extra pass to find out how to retrieve the rows in sorted order. The sort is done by going through all rows according to the join type and storing the sort key + pointer to the row for all rows that match the WHERE. Then the keys are sorted. Finally the rows are retrieved in sorted order.

Using index

The column information is retrieved from the table using only information in the index tree without having to do an additional seek to read the actual row. This can be done when all the used columns for the table are part of the same index.

Using temporary

To resolve the query MySQL will need to create a temporary table to hold the result. This typically happens if you do an ORDER BY on a different column set than you did a GROUP BY on.

Using where

A WHERE clause will be used to restrict which rows will be matched against the next table or sent to the client. If you don't have this information and the table is of type ALL or index, you may have something wrong in your query (if you don't intend to fetch/examine all rows from the table).

If you want to get your queries as fast as possible, you should look out for Using filesort and Using temporary.

You can get a good indication of how good a join is by multiplying all values in the rows column of the EXPLAIN output. This should tell you roughly how many rows MySQL must examine to execute the query. This number is also used when you restrict queries with the max_join_size variable. See section Tuning Server Parameters.

The following example shows how a JOIN can be optimised progressively using the information provided by EXPLAIN.

Suppose you have the SELECT statement shown here, that you examine using EXPLAIN:

EXPLAIN SELECT tt.TicketNumber, tt.TimeIn,
            tt.ProjectReference, tt.EstimatedShipDate,
            tt.ActualShipDate, tt.ClientID,
            tt.ServiceCodes, tt.RepetitiveID,
            tt.CurrentProcess, tt.CurrentDPPerson,
            tt.RecordVolume, tt.DPPrinted, et.COUNTRY,
            et_1.COUNTRY, do.CUSTNAME
        FROM tt, et, et AS et_1, do
        WHERE tt.SubmitTime IS NULL
            AND tt.ActualPC = et.EMPLOYID
            AND tt.AssignedPC = et_1.EMPLOYID
            AND tt.ClientID = do.CUSTNMBR;

For this example, assume that:

The columns being compared have been declared as follows:

Table	Column	Column type
`tt`	`ActualPC`	`CHAR(10)`
`tt`	`AssignedPC`	`CHAR(10)`
`tt`	`ClientID`	`CHAR(10)`
`et`	`EMPLOYID`	`CHAR(15)`
`do`	`CUSTNMBR`	`CHAR(15)`

The tables have the indexes shown here:

Table
Index

tt
ActualPC

tt
AssignedPC

tt
ClientID

et
EMPLOYID (primary key)

do
CUSTNMBR (primary key)
The tt.ActualPC values aren't evenly distributed.

Initially, before any optimisations have been performed, the EXPLAIN statement produces the following information:

table type possible_keys                key  key_len ref  rows  Extra
et    ALL  PRIMARY                      NULL NULL    NULL 74
do    ALL  PRIMARY                      NULL NULL    NULL 2135
et_1  ALL  PRIMARY                      NULL NULL    NULL 74
tt    ALL  AssignedPC,ClientID,ActualPC NULL NULL    NULL 3872
      range checked for each record (key map: 35)

Because type is ALL for each table, this output indicates that MySQL is generating a Cartesian product of all the tables! This will take quite a long time, as the product of the number of rows in each table must be examined! For the case at hand, this is 74 * 2135 * 74 * 3872 = 45,268,558,720 rows. If the tables were bigger, you can only imagine how long it would take.

One problem here is that MySQL can't (yet) use indexes on columns efficiently if they are declared differently. In this context, VARCHAR and CHAR are the same unless they are declared as different lengths. Because tt.ActualPC is declared as CHAR(10) and et.EMPLOYID is declared as CHAR(15), there is a length mismatch.

To fix this disparity between column lengths, use ALTER TABLE to lengthen ActualPC from 10 characters to 15 characters:

mysql> ALTER TABLE tt MODIFY ActualPC VARCHAR(15);

Now tt.ActualPC and et.EMPLOYID are both VARCHAR(15). Executing the EXPLAIN statement again produces this result:

table type   possible_keys   key     key_len ref         rows    Extra
tt    ALL    AssignedPC,ClientID,ActualPC NULL NULL NULL 3872    Using where
do    ALL    PRIMARY         NULL    NULL    NULL        2135
      range checked for each record (key map: 1)
et_1  ALL    PRIMARY         NULL    NULL    NULL        74
      range checked for each record (key map: 1)
et    eq_ref PRIMARY         PRIMARY 15      tt.ActualPC 1

This is not perfect, but is much better (the product of the rows values is now less by a factor of 74). This version is executed in a couple of seconds.

A second alteration can be made to eliminate the column length mismatches for the tt.AssignedPC = et_1.EMPLOYID and tt.ClientID = do.CUSTNMBR comparisons:

mysql> ALTER TABLE tt MODIFY AssignedPC VARCHAR(15),
    ->                MODIFY ClientID   VARCHAR(15);

Now EXPLAIN produces the output shown here:

table type   possible_keys   key      key_len ref           rows Extra
et    ALL    PRIMARY         NULL     NULL    NULL          74
tt    ref    AssignedPC,     ActualPC 15      et.EMPLOYID   52   Using where
             ClientID,
             ActualPC
et_1  eq_ref PRIMARY         PRIMARY  15      tt.AssignedPC 1
do    eq_ref PRIMARY         PRIMARY  15      tt.ClientID   1

This is almost as good as it can get.

The remaining problem is that, by default, MySQL assumes that values in the tt.ActualPC column are evenly distributed, and that isn't the case for the tt table. Fortunately, it is easy to tell MySQL about this:

shell> myisamchk --analyze PATH_TO_MYSQL_DATABASE/tt
shell> mysqladmin refresh

Now the join is perfect, and EXPLAIN produces this result:

table type   possible_keys key     key_len ref           rows Extra
tt    ALL    AssignedPC    NULL    NULL    NULL          3872 Using where
             ClientID,
             ActualPC
et    eq_ref PRIMARY       PRIMARY 15      tt.ActualPC   1
et_1  eq_ref PRIMARY       PRIMARY 15      tt.AssignedPC 1
do    eq_ref PRIMARY       PRIMARY 15      tt.ClientID   1

Note that the rows column in the output from EXPLAIN is an educated guess from the MySQL join optimiser. To optimise a query, you should check if the numbers are even close to the truth. If not, you may get better performance by using STRAIGHT_JOIN in your SELECT statement and trying to list the tables in a different order in the FROM clause.

5.2.2 Estimating Query Performance

In most cases you can estimate the performance by counting disk seeks. For small tables, you can usually find the row in 1 disk seek (as the index is probably cached). For bigger tables, you can estimate that (using B++ tree indexes) you will need: log(row_count) / log(index_block_length / 3 * 2 / (index_length + data_pointer_length)) + 1 seeks to find a row.

In MySQL an index block is usually 1024 bytes and the data pointer is usually 4 bytes. A 500,000 row table with an index length of 3 (medium integer) gives you: log(500,000)/log(1024/3*2/(3+4)) + 1 = 4 seeks.

As the above index would require about 500,000 * 7 * 3/2 = 5.2M, (assuming that the index buffers are filled to 2/3, which is typical) you will probably have much of the index in memory and you will probably only need 1-2 calls to read data from the OS to find the row.

For writes, however, you will need 4 seek requests (as above) to find where to place the new index and normally 2 seeks to update the index and write the row.

Note that the above doesn't mean that your application will slowly degenerate by log N! As long as everything is cached by the OS or SQL server things will only go marginally slower while the table gets bigger. After the data gets too big to be cached, things will start to go much slower until your applications is only bound by disk-seeks (which increase by log N). To avoid this, increase the index cache as the data grows. See section Tuning Server Parameters.

5.2.3 Speed of `SELECT` Queries

In general, when you want to make a slow SELECT ... WHERE faster, the first thing to check is whether you can add an index. See section MySQL indexes. All references between different tables should usually be done with indexes. You can use the EXPLAIN command to determine which indexes are used for a SELECT. See section EXPLAIN.

Some general tips:

To help MySQL optimise queries better, run myisamchk --analyze on a table after it has been loaded with relevant data. This updates a value for each index part that indicates the average number of rows that have the same value. (For unique indexes, this is always 1, of course.) MySQL will use this to decide which index to choose when you connect two tables with 'a non-constant expression'. You can check the result from the analyze run by doing SHOW INDEX FROM table_name and examining the Cardinality column.
To sort an index and data according to an index, use myisamchk --sort-index --sort-records=1 (if you want to sort on index 1). If you have a unique index from which you want to read all records in order according to that index, this is a good way to make that faster. Note, however, that this sorting isn't written optimally and will take a long time for a large table!

5.2.4 How MySQL Optimises `WHERE` Clauses

The WHERE optimisations are put in the SELECT part here because they are mostly used with SELECT, but the same optimisations apply for WHERE in DELETE and UPDATE statements.

Also note that this section is incomplete. MySQL does many optimisations, and we have not had time to document them all.

Some of the optimisations performed by MySQL are listed here:

Removal of unnecessary parentheses:

   ((a AND b) AND c OR (((a AND b) AND (c AND d))))
-> (a AND b AND c) OR (a AND b AND c AND d)

Constant folding:

   (a<b AND b=c) AND a=5
-> b>5 AND b=c AND a=5

Constant condition removal (needed because of constant folding):

   (B>=5 AND B=5) OR (B=6 AND 5=5) OR (B=7 AND 5=6)
-> B=5 OR B=6

Constant expressions used by indexes are evaluated only once.
COUNT(*) on a single table without a WHERE is retrieved directly from the table information for MyISAM and HEAP tables. This is also done for any NOT NULL expression when used with only one table.
Early detection of invalid constant expressions. MySQL quickly detects that some SELECT statements are impossible and returns no rows.
HAVING is merged with WHERE if you don't use GROUP BY or group functions (COUNT(), MIN()...).
For each sub-join, a simpler WHERE is constructed to get a fast WHERE evaluation for each sub-join and also to skip records as soon as possible.
All constant tables are read first, before any other tables in the query. A constant table is:
- - An empty table or a table with 1 row.
- - A table that is used with a WHERE clause on a UNIQUE index, or a PRIMARY KEY, where all index parts are used with constant expressions and the index parts are defined as NOT NULL.
All the following tables are used as constant tables:
mysql> SELECT * FROM t WHERE primary_key=1; mysql> SELECT * FROM t1,t2 -> WHERE t1.primary_key=1 AND t2.primary_key=t1.id;
The best join combination to join the tables is found by trying all possibilities. If all columns in ORDER BY and in GROUP BY come from the same table, then this table is preferred first when joining.
If there is an ORDER BY clause and a different GROUP BY clause, or if the ORDER BY or GROUP BY contains columns from tables other than the first table in the join queue, a temporary table is created.
If you use SQL_SMALL_RESULT, MySQL will use an in-memory temporary table.
Each table index is queried, and the best index that spans fewer than 30% of the rows is used. If no such index can be found, a quick table scan is used.
In some cases, MySQL can read rows from the index without even consulting the datafile. If all columns used from the index are numeric, then only the index tree is used to resolve the query.
Before each record is output, those that do not match the HAVING clause are skipped.

Some examples of queries that are very fast:

mysql> SELECT COUNT(*) FROM tbl_name;
mysql> SELECT MIN(key_part1),MAX(key_part1) FROM tbl_name;
mysql> SELECT MAX(key_part2) FROM tbl_name
    ->        WHERE key_part_1=constant;
mysql> SELECT ... FROM tbl_name
    ->        ORDER BY key_part1,key_part2,... LIMIT 10;
mysql> SELECT ... FROM tbl_name
    ->        ORDER BY key_part1 DESC,key_part2 DESC,... LIMIT 10;

The following queries are resolved using only the index tree (assuming the indexed columns are numeric):

mysql> SELECT key_part1,key_part2 FROM tbl_name WHERE key_part1=val;
mysql> SELECT COUNT(*) FROM tbl_name
    ->        WHERE key_part1=val1 AND key_part2=val2;
mysql> SELECT key_part2 FROM tbl_name GROUP BY key_part1;

The following queries use indexing to retrieve the rows in sorted order without a separate sorting pass:

mysql> SELECT ... FROM tbl_name
    ->            ORDER BY key_part1,key_part2,... ;
mysql> SELECT ... FROM tbl_name
    ->            ORDER BY key_part1 DESC,key_part2 DESC,... ;

5.2.5 How MySQL Optimises `IS NULL`

MySQL can do the same optimisation on column IS NULL as it can do with column = constant_value. For example, MySQL can use indexes and ranges to search for NULL with IS NULL.

SELECT * FROM table_name WHERE key_col IS NULL;

SELECT * FROM table_name WHERE key_col <=> NULL;

SELECT * FROM table_name WHERE key_col=# OR key_col=# OR key_col IS NULL

If you use column_name IS NULL on a NOT NULL in a WHERE clause on table that is not used OUTER JOIN that expression will be optimised away.

MySQL 4.1.1 can additionally optimise the combination column = expr AND column IS NULL, an form that is common in resolved sub queries. EXPLAIN will show ref_or_null when this optimisation is used.

This optimisation can handle one IS NULL for any key part.

Some examples of queries that are optimised (assuming key on t2 (a,b)):

SELECT * FROM t1 WHERE t1.a=expr OR t1.a IS NULL;

SELECT * FROM t1,t2 WHERE t1.a=t2.a OR t2.a IS NULL;

SELECT * FROM t1,t2 WHERE (t1.a=t2.a OR t2.a IS NULL) AND t2.b=t1.b;

SELECT * FROM t1,t2 WHERE t1.a=t2.a AND (t2.b=t1.b OR t2.b IS NULL);

SELECT * FROM t1,t2 WHERE (t1.a=t2.a AND t2.a IS NULL AND ...) OR (t1.a=t2.a AND t2.a IS NULL AND ...);

ref_or_null works by first doing a read on the reference key and after that a separate search after rows with NULL key.

Note that the optimisation can only handle one IS NULL level.

SELECT * FROM t1,t2 where (t1.a=t2.a AND t2.a IS NULL) OR (t1.b=t2.b AND t2.b IS NULL);

Int the above case MySQL will only use key lookups on the part (t1.a=t2.a AND t2.a IS NULL) and not be able to use the key part on b.

5.2.6 How MySQL Optimises `DISTINCT`

DISTINCT is converted to a GROUP BY on all columns, DISTINCT combined with ORDER BY will in many cases also need a temporary table.

Note that as DISTINCT may use GROUP BY, you should be aware of how MySQL works with in fields in ORDER BY or HAVING that are not part of the selected fields. See section GROUP BY with Hidden Fields.

When combining LIMIT row_count with DISTINCT, MySQL will stop as soon as it finds row_count unique rows.

If you don't use columns from all used tables, MySQL will stop the scanning of the not used tables as soon as it has found the first match.

SELECT DISTINCT t1.a FROM t1,t2 where t1.a=t2.a;

In the case, assuming t1 is used before t2 (check with EXPLAIN), then MySQL will stop reading from t2 (for that particular row in t1) when the first row in t2 is found.

5.2.7 How MySQL Optimises `LEFT JOIN` and `RIGHT JOIN`

A LEFT JOIN B join_condition in MySQL is implemented as follows:

The table B is set to be dependent on table A and all tables that A is dependent on.
The table A is set to be dependent on all tables (except B) that are used in the LEFT JOIN condition.
The LEFT JOIN condition is used to decide how we should retrieve rows from table B. (In other words, any condition in the WHERE clause is not used).
All standard join optimisations are done, with the exception that a table is always read after all tables it is dependent on. If there is a circular dependence then MySQL will issue an error.
All standard WHERE optimisations are done.
If there is a row in A that matches the WHERE clause, but there wasn't any row in B that matched the ON condition, then an extra B row is generated with all columns set to NULL.
If you use LEFT JOIN to find rows that don't exist in some table and you have the following test: column_name IS NULL in the WHERE part, where column_name is a column that is declared as NOT NULL, then MySQL will stop searching after more rows (for a particular key combination) after it has found one row that matches the LEFT JOIN condition.

RIGHT JOIN is implemented analogously as LEFT JOIN.

The table read order forced by LEFT JOIN and STRAIGHT JOIN will help the join optimiser (which calculates in which order tables should be joined) to do its work much more quickly, as there are fewer table permutations to check.

Note that the above means that if you do a query of type:

SELECT * FROM a,b LEFT JOIN c ON (c.key=a.key) LEFT JOIN d (d.key=a.key)
         WHERE b.key=d.key

MySQL will do a full scan on b as the LEFT JOIN will force it to be read before d.

The fix in this case is to change the query to:

SELECT * FROM b,a LEFT JOIN c ON (c.key=a.key) LEFT JOIN d (d.key=a.key)
         WHERE b.key=d.key

Starting from 4.0.14 MySQL does the following left join optimisation:

If the WHERE condition is always be false for the generated NULL row, the LEFT JOIN is changed to a normal join.

For example, in the following query the WHERE clause would be false if t2.column would be NULL so it's safe to convert to a normal join.

SELECT * FROM t1 LEFT t2 ON (column) WHERE t2.column2 =5;
->
SELECT * FROM t1,t2 WHERE t2.column2=5 AND t1.column=t2.column;

this can be made faster as MySQL can now use table t2 before table t1 if this would result in a better query plan. To force a specific table order one should use STRAIGHT JOIN.

5.2.8 How MySQL Optimises `ORDER BY`

In some cases MySQL can uses index to satisfy an ORDER BY or GROUP BY request without doing any extra sorting.

The index can also be used even if the ORDER BY doesn't match the index exactly, as long as all the unused index parts and all the extra are ORDER BY columns are constants in the WHERE clause. The following queries will use the index to resolve the ORDER BY / GROUP BY part:

SELECT * FROM t1 ORDER BY key_part1,key_part2,...
SELECT * FROM t1 WHERE key_part1=constant ORDER BY key_part2
SELECT * FROM t1 WHERE key_part1=constant GROUP BY key_part2
SELECT * FROM t1 ORDER BY key_part1 DESC,key_part2 DESC
SELECT * FROM t1 WHERE key_part1=1 ORDER BY key_part1 DESC,key_part2 DESC

Some cases where MySQL can not use indexes to resolve the ORDER BY: (Note that MySQL will still use indexes to find the rows that matches the WHERE clause):

You are doing an ORDER BY on different keys:
SELECT * FROM t1 ORDER BY key1,key2
You are doing an ORDER BY using non-consecutive key parts.
SELECT * FROM t1 WHERE key2=constant ORDER BY key_part2
You are mixing ASC and DESC.
SELECT * FROM t1 ORDER BY key_part1 DESC,key_part2 ASC
The key used to fetch the rows are not the same one that is used to do the ORDER BY:
SELECT * FROM t1 WHERE key2=constant ORDER BY key1
You are joining many tables and the columns you are doing an ORDER BY on are not all from the first not-const table that is used to retrieve rows (This is the first table in the EXPLAIN output which doesn't use a const row fetch method).
You have different ORDER BY and GROUP BY expressions.
The used table index is an index type that doesn't store rows in order. (Like the HASH index in HEAP tables).

In the cases where MySQL have to sort the result, it uses the following algorithm:

Read all rows according to key or by table scanning. Rows that don't match the WHERE clause are skipped.
Store the sort-key in a buffer (of size sort_buffer).
When the buffer gets full, run a qsort on it and store the result in a temporary file. Save a pointer to the sorted block. (In the case where all rows fits into the sort buffer, no temporary file is created)
Repeat the above until all rows have been read.
Do a multi-merge of up to MERGEBUFF (7) regions to one block in another temporary file. Repeat until all blocks from the first file are in the second file.
Repeat the following until there is less than MERGEBUFF2 (15) blocks left.
On the last multi-merge, only the pointer to the row (last part of the sort-key) is written to a result file.
Now the code in `sql/records.cc' will be used to read through them in sorted order by using the row pointers in the result file. To optimise this, we read in a big block of row pointers, sort these and then we read the rows in the sorted order into a row buffer (read_rnd_buffer_size) .

You can with EXPLAIN SELECT ... ORDER BY check if MySQL can use indexes to resolve the query. If you get Using filesort in the extra column, then MySQL can't use indexes to resolve the ORDER BY. See section EXPLAIN Syntax (Get Information About a SELECT).

If you want to have a higher ORDER BY speed, you should first see if you can get MySQL to use indexes instead of having to do an extra sorting phase. If this is not possible, then you can do:

Increase the size of the sort_buffer_size variable.
Increase the size of the read_rnd_buffer_size variable.
Change tmpdir to point to a dedicated disk with lots of empty space. If you use MySQL 4.1 or later you can spread load between several physical disks by setting tmpdir to a list of paths separated by colon : (semicolon ; on Windows). They will be used in round-robin fashion. Note: These paths should end up on different physical disks, not different partitions of the same disk.

MySQL by default sorts all GROUP BY x,y[,...] queries as if you would have specified ORDER BY x,y[,...]. MySQL will optimise away any ORDER BY as above without any speed penalty. If you by in some cases don't want to have the result sorted you can specify ORDER BY NULL:

INSERT INTO foo SELECT a,COUNT(*) FROM bar GROUP BY a ORDER BY NULL;

5.2.9 How MySQL Optimises `LIMIT`

In some cases MySQL will handle the query differently when you are using LIMIT row_count and not using HAVING:

If you are selecting only a few rows with LIMIT, MySQL will use indexes in some cases when it normally would prefer to do a full table scan.
If you use LIMIT row_count with ORDER BY, MySQL will end the sorting as soon as it has found the first row_count lines instead of sorting the whole table.
When combining LIMIT row_count with DISTINCT, MySQL will stop as soon as it finds row_count unique rows.
In some cases a GROUP BY can be resolved by reading the key in order (or do a sort on the key) and then calculate summaries until the key value changes. In this case LIMIT row_count will not calculate any unnecessary GROUP BYs.
As soon as MySQL has sent the first # rows to the client, it will abort the query (if you are not using SQL_CALC_FOUND_ROWS).
LIMIT 0 will always quickly return an empty set. This is useful to check the query and to get the column types of the result columns.
When the server uses temporary tables to resolve the query, the LIMIT row_count is used to calculate how much space is required.

5.2.10 Speed of `INSERT` Queries

The time to insert a record consists approximately of:

Connect: (3)
Sending query to server: (2)
Parsing query: (2)
Inserting record: (1 x size of record)
Inserting indexes: (1 x number of indexes)
Close: (1)

where the numbers are somewhat proportional to the overall time. This does not take into consideration the initial overhead to open tables (which is done once for each concurrently running query).

The size of the table slows down the insertion of indexes by log N (B-trees).

Some ways to speed up inserts:

If you are inserting many rows from the same client at the same time, use multiple value lists INSERT statements. This is much faster (many times in some cases) than using separate INSERT statements. If you are adding data to non-empty table, you may tune up the bulk_insert_buffer_size variable to make it even faster. See section bulk_insert_buffer_size.
If you are inserting a lot of rows from different clients, you can get higher speed by using the INSERT DELAYED statement. See section INSERT.
Note that with MyISAM tables you can insert rows at the same time SELECTs are running if there are no deleted rows in the tables.
When loading a table from a text file, use LOAD DATA INFILE. This is usually 20 times faster than using a lot of INSERT statements. See section LOAD DATA.
It is possible with some extra work to make LOAD DATA INFILE run even faster when the table has many indexes. Use the following procedure:
1. Optionally create the table with CREATE TABLE. For example, using mysql or Perl-DBI.
2. Execute a FLUSH TABLES statement or the shell command mysqladmin flush-tables.
3. Use myisamchk --keys-used=0 -rq /path/to/db/tbl_name. This will remove all usage of all indexes from the table.
4. Insert data into the table with LOAD DATA INFILE. This will not update any indexes and will therefore be very fast.
5. If you are going to only read the table in the future, run myisampack on it to make it smaller. See section Compressed Table Characteristics.
6. Re-create the indexes with myisamchk -r -q /path/to/db/tbl_name. This will create the index tree in memory before writing it to disk, which is much faster because it avoids lots of disk seeks. The resulting index tree is also perfectly balanced.
7. Execute a FLUSH TABLES statement or the shell command mysqladmin flush-tables.
Note that LOAD DATA INFILE also does the above optimisation if you insert into an empty table; the main difference with the above procedure is that you can let myisamchk allocate much more temporary memory for the index creation that you may want MySQL to allocate for every index recreation.

Since MySQL 4.0 you can also use ALTER TABLE tbl_name DISABLE KEYS instead of myisamchk --keys-used=0 -rq /path/to/db/tbl_name and ALTER TABLE tbl_name ENABLE KEYS instead of myisamchk -r -q /path/to/db/tbl_name. This way you can also skip FLUSH TABLES steps.
You can speed up insertions that is done over multiple statements by locking your tables:
mysql> LOCK TABLES a WRITE; mysql> INSERT INTO a VALUES (1,23),(2,34),(4,33); mysql> INSERT INTO a VALUES (8,26),(6,29); mysql> UNLOCK TABLES;
The main speed difference is that the index buffer is flushed to disk only once, after all INSERT statements have completed. Normally there would be as many index buffer flushes as there are different INSERT statements. Locking is not needed if you can insert all rows with a single statement.

For transactional tables, you should use BEGIN/COMMIT instead of LOCK TABLES to get a speedup.

Locking will also lower the total time of multi-connection tests, but the maximum wait time for some threads will go up (because they wait for locks). For example:
thread 1 does 1000 inserts thread 2, 3, and 4 does 1 insert thread 5 does 1000 inserts
If you don't use locking, 2, 3, and 4 will finish before 1 and 5. If you use locking, 2, 3, and 4 probably will not finish before 1 or 5, but the total time should be about 40% faster.

As INSERT, UPDATE, and DELETE operations are very fast in MySQL, you will obtain better overall performance by adding locks around everything that does more than about 5 inserts or updates in a row. If you do very many inserts in a row, you could do a LOCK TABLES followed by an UNLOCK TABLES once in a while (about each 1000 rows) to allow other threads access to the table. This would still result in a nice performance gain.

Of course, LOAD DATA INFILE is much faster for loading data.

To get some more speed for both LOAD DATA INFILE and INSERT, enlarge the key buffer. See section Tuning Server Parameters.

5.2.11 Speed of `UPDATE` Queries

Update queries are optimised as a SELECT query with the additional overhead of a write. The speed of the write is dependent on the size of the data that is being updated and the number of indexes that are updated. Indexes that are not changed will not be updated.

Also, another way to get fast updates is to delay updates and then do many updates in a row later. Doing many updates in a row is much quicker than doing one at a time if you lock the table.

Note that, with dynamic record format, updating a record to a longer total length may split the record. So if you do this often, it is very important to OPTIMIZE TABLE sometimes. See section OPTIMIZE TABLE.

5.2.12 Speed of `DELETE` Queries

If you want to delete all rows in the table, you should use TRUNCATE TABLE table_name. See section TRUNCATE Syntax.

The time to delete a record is exactly proportional to the number of indexes. To delete records more quickly, you can increase the size of the index cache. See section Tuning Server Parameters.

5.2.13 Other Optimisation Tips

Unsorted tips for faster systems:

Use persistent connections to the database to avoid the connection overhead. If you can't use persistent connections and you are doing a lot of new connections to the database, you may want to change the value of the thread_cache_size variable. See section Tuning Server Parameters.
Always check that all your queries really use the indexes you have created in the tables. In MySQL you can do this with the EXPLAIN command. See Explain: (manual)EXPLAIN.
Try to avoid complex SELECT queries on MyISAM tables that are updated a lot. This is to avoid problems with table locking.
The new MyISAM tables can insert rows in a table without deleted rows at the same time another table is reading from it. If this is important for you, you should consider methods where you don't have to delete rows or run OPTIMIZE TABLE after you have deleted a lot of rows.
Use ALTER TABLE ... ORDER BY expr1,expr2... if you mostly retrieve rows in expr1,expr2... order. By using this option after big changes to the table, you may be able to get higher performance.
In some cases it may make sense to introduce a column that is 'hashed' based on information from other columns. If this column is short and reasonably unique it may be much faster than a big index on many columns. In MySQL it's very easy to use this extra column: SELECT * FROM table_name WHERE hash=MD5(CONCAT(col1,col2)) AND col_1='constant' AND col_2='constant'
For tables that change a lot you should try to avoid all VARCHAR or BLOB columns. You will get dynamic row length as soon as you are using a single VARCHAR or BLOB column. See section MySQL Table Types.
It's not normally useful to split a table into different tables just because the rows gets 'big'. To access a row, the biggest performance hit is the disk seek to find the first byte of the row. After finding the data most new disks can read the whole row fast enough for most applications. The only cases where it really matters to split up a table is if it's a dynamic row size table (see above) that you can change to a fixed row size, or if you very often need to scan the table and don't need most of the columns. See section MySQL Table Types.
If you very often need to calculate things based on information from a lot of rows (like counts of things), it's probably much better to introduce a new table and update the counter in real time. An update of type UPDATE table SET count=count+1 WHERE index_column=constant is very fast!
This is really important when you use MySQL table types like MyISAM and ISAM that only have table locking (multiple readers / single writers). This will also give better performance with most databases, as the row locking manager in this case will have less to do.
If you need to collect statistics from big log tables, use summary tables instead of scanning the whole table. Maintaining the summaries should be much faster than trying to do statistics 'live'. It's much faster to regenerate new summary tables from the logs when things change (depending on business decisions) than to have to change the running application!
If possible, one should classify reports as 'live' or 'statistical', where data needed for statistical reports are only generated based on summary tables that are generated from the actual data.
Take advantage of the fact that columns have default values. Insert values explicitly only when the value to be inserted differs from the default. This reduces the parsing that MySQL need to do and improves the insert speed.
In some cases it's convenient to pack and store data into a blob. In this case you have to add some extra code in your application to pack/unpack things in the blob, but this may save a lot of accesses at some stage. This is practical when you have data that doesn't conform to a static table structure.
Normally you should try to keep all data non-redundant (what is called 3rd normal form in database theory), but you should not be afraid of duplicating things or creating summary tables if you need these to gain more speed.
Stored procedures or UDF (user-defined functions) may be a good way to get more performance. In this case you should, however, always have a way to do this some other (slower) way if you use some database that doesn't support this.
You can always gain something by caching queries/answers in your application and trying to do many inserts/updates at the same time. If your database supports lock tables (like MySQL and Oracle), this should help to ensure that the index cache is only flushed once after all updates.
Use INSERT /*! DELAYED */ when you do not need to know when your data is written. This speeds things up because many records can be written with a single disk write.
Use INSERT /*! LOW_PRIORITY */ when you want your selects to be more important.
Use SELECT /*! HIGH_PRIORITY */ to get selects that jump the queue. That is, the select is done even if there is somebody waiting to do a write.
Use the multi-line INSERT statement to store many rows with one SQL command (many SQL servers supports this).
Use LOAD DATA INFILE to load bigger amounts of data. This is faster than normal inserts and will be even faster when myisamchk is integrated in mysqld.
Use AUTO_INCREMENT columns to make unique values.
Use OPTIMIZE TABLE once in a while to avoid fragmentation when using a dynamic table format. See section OPTIMIZE TABLE.
Use HEAP tables to get more speed when possible. See section MySQL Table Types.
When using a normal web server setup, images should be stored as files. That is, store only a file reference in the database. The main reason for this is that a normal web server is much better at caching files than database contents. So it it's much easier to get a fast system if you are using files.
Use in memory tables for non-critical data that are accessed often (like information about the last shown banner for users that don't have cookies).
Columns with identical information in different tables should be declared identical and have identical names. Before Version 3.23 you got slow joins otherwise.
Try to keep the names simple (use name instead of customer_name in the customer table). To make your names portable to other SQL servers you should keep them shorter than 18 characters.
If you need really high speed, you should take a look at the low-level interfaces for data storage that the different SQL servers support! For example, by accessing the MySQL MyISAM directly, you could get a speed increase of 2-5 times compared to using the SQL interface. To be able to do this the data must be on the same server as the application, and usually it should only be accessed by one process (because external file locking is really slow). One could eliminate the above problems by introducing low-level MyISAM commands in the MySQL server (this could be one easy way to get more performance if needed). By carefully designing the database interface, it should be quite easy to support this types of optimisation.
In many cases it's faster to access data from a database (using a live connection) than accessing a text file, just because the database is likely to be more compact than the text file (if you are using numerical data), and this will involve fewer disk accesses. You will also save code because you don't have to parse your text files to find line and column boundaries.
You can also use replication to speed things up. See section Replication in MySQL.
Declaring a table with DELAY_KEY_WRITE=1 will make the updating of indexes faster, as these are not logged to disk until the file is closed. The downside is that you should run myisamchk on these tables before you start mysqld to ensure that they are okay if something killed mysqld in the middle. As the key information can always be generated from the data, you should not lose anything by using DELAY_KEY_WRITE.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

Hosting by: Hurra Communications Ltd.
Generated: 2007-01-26 17:58:45

5.2.1 `EXPLAIN` Syntax (Get Information About a `SELECT`)
5.2.2 Estimating Query Performance
5.2.3 Speed of `SELECT` Queries
5.2.4 How MySQL Optimises `WHERE` Clauses
5.2.5 How MySQL Optimises `IS NULL`
5.2.6 How MySQL Optimises `DISTINCT`
5.2.7 How MySQL Optimises `LEFT JOIN` and `RIGHT JOIN`
5.2.8 How MySQL Optimises `ORDER BY`
5.2.9 How MySQL Optimises `LIMIT`
5.2.10 Speed of `INSERT` Queries
5.2.11 Speed of `UPDATE` Queries
5.2.12 Speed of `DELETE` Queries
5.2.13 Other Optimisation Tips

Table	Index
`tt`	`ActualPC`
`tt`	`AssignedPC`
`tt`	`ClientID`
`et`	`EMPLOYID` (primary key)
`do`	`CUSTNMBR` (primary key)

5.2 Optimising SELECTs and Other Queries

5.2.1 EXPLAIN Syntax (Get Information About a SELECT)

5.2.2 Estimating Query Performance

5.2.3 Speed of SELECT Queries

5.2.4 How MySQL Optimises WHERE Clauses

5.2.5 How MySQL Optimises IS NULL

5.2.6 How MySQL Optimises DISTINCT

5.2.7 How MySQL Optimises LEFT JOIN and RIGHT JOIN

5.2.8 How MySQL Optimises ORDER BY

5.2.9 How MySQL Optimises LIMIT

5.2.10 Speed of INSERT Queries

5.2.11 Speed of UPDATE Queries

5.2.12 Speed of DELETE Queries