![]() |
|
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
There are default settings for character sets and collations at four levels: server, database, table, connection. The following description may appear complex, but it's been found in practice that multi-level defaulting leads to natural and obvious results.
The MySQL Server has a server character set and a server collation, which may not be null.
MySQL determines the server character set and server collation thus:
At this level, the decision is simple. The server character set
and collation depend on the options that you use when you start
mysqld
. You can use --default-character-set=character_set_name
for the character set, and along with it you can add
--default-collation=collation_name
for the collation. If you
don't specify a character set, that is the same as saying
--default-character-set=latin1
. If you specify only a character
set (for instance, latin1
) but not a collation, that is the same as
saying --default-charset=latin1
--collation=latin1_swedish_ci
because latin1_swedish_ci
is the default collation for latin1
.
Therefore the following three commands all have the same effect:
shell> mysqld shell> mysqld --default-character-set=latin1 shell> mysqld --default-character-set=latin1 --default-collation=latin1_swedish_ci |
One way to change the settings is by recompiling. If you want to
change the default server character set and collation when building
from sources, use: --with-character-set
and --with-collation
as arguments for configure
. For example:
shell> ./configure --with-character-set=latin1 |
or
shell> ./configure --with-character-set=latin1 --with-collation=latin1_german1_ci |
Both mysqld
and configure
check that
the character set/collation combination is valid. Each program displays
an error message and terminates if the combination is not valid.
Every database has a database character set and a database
collation, which may not be null. The CREATE DATABASE
and ALTER
DATABASE
commands now have optional clauses for specifying the
database character set and collation:
CREATE DATABASE db_name [CHARACTER SET character_set_name [COLLATE collation_name]] ALTER DATABASE db_name [CHARACTER SET character_set_name [COLLATE collation_name]] |
Example:
CREATE DATABASE db_name CHARACTER SET latin1 COLLATE latin1_swedish_ci; |
MySQL chooses the database character set and database collation thus:
CHARACTER SET X
and COLLATE Y
were specified, then
character set X
and collation Y
.
CHARACTER SET X
was specified without COLLATE
, then
character set X
and its default collation.
MySQL's CREATE DATABASE ... CHARACTER SET ...
syntax is
analogous to the standard-SQL CREATE SCHEMA ... CHARACTER SET ...
syntax. Because of this, it is possible to create databases with
different character sets and collations, on the same MySQL
server.
The database character set and collation are used as default
values if the table character set and collation are not specified
in CREATE TABLE
statements. They have no other purpose.
Every table has a table character set and a table collation, which
may not be null. The CREATE TABLE
and ALTER TABLE
statements now
have optional clauses for specifying the table character set and
collation:
CREATE TABLE table_name ( column_list ) [CHARACTER SET character_set_name [COLLATE collation_name]] ALTER TABLE table_name [CHARACTER SET character_set_name] [COLLATE collation_name] |
Example:
CREATE TABLE t1 ( ... ) CHARACTER SET latin1 COLLATE latin1_danish_ci; |
MySQL chooses the table character set and collation thus:
CHARACTER SET X
and COLLATE Y
were specified, then
character set X
and collation Y
.
CHARACTER SET X
was specified without COLLATE
, then
character set X
and its default collation.
The table character set and collation are used as default values, if the column character set and collation are not specified in individual column definitions. The table character set and collation are MySQL extensions; there are no such things in standard SQL.
Every "character" column (that is, a column of type CHAR
,
VARCHAR
, or TEXT
) has a column character set and a column
collation, which may not be null. Column definition syntax now has optional
clauses for specifying the column character set and collation:
column_name {CHAR | VARCHAR | TEXT} (column_length) [CHARACTER SET character_set_name [COLLATE collation_name]] |
Example:
CREATE TABLE Table1 ( column1 VARCHAR(5) CHARACTER SET latin1 COLLATE latin1_german1_ci ); |
MySQL chooses the column character set and collation thus:
CHARACTER SET X
and COLLATE Y
were specified, then
character set X
and collation Y
.
CHARACTER SET X
was specified without COLLATE
, then
character set X
and its default collation.
The CHARACTER SET
and COLLATE
clauses are standard SQL.
The following examples show how MySQL determines default character set and collation values.
CREATE TABLE t1 ( c1 CHAR(10) CHARACTER SET latin1 COLLATE latin1_german1_ci ) CHARACTER SET latin2 COLLATE latin2_bin; |
Here you have a column with a latin1
character set
and a latin1_german1_ci
collation. The definition is explicit, so
that's straightforward. Notice that there's no problem storing a
latin1
column in a latin2
table.
CREATE TABLE t1 ( c1 CHAR(10) CHARACTER SET latin1 ) CHARACTER SET latin1 COLLATE latin1_danish_ci; |
This time we have a column with a latin1
character
set and a default collation. Now, although it might seem natural,
the default collation is not taken from the table level. Instead,
because the default collation for latin1
is always
latin1_swedish_ci
,
column c1
will have a collation of latin1_swedish_ci
(not
latin1_danish_ci
).
CREATE TABLE t1 ( c1 CHAR(10) ) CHARACTER SET latin1 COLLATE latin1_danish_ci; |
We have a column with a default character set and
a default collation. In this circumstance, MySQL looks up to the
table level for inspiration in determining the column character set and
collation. So the character set for column c1
is
latin1
and its collation is latin1_danish_ci
.
CREATE DATABASE d1 CHARACTER SET latin2 COLLATE latin2_czech_ci; USE d1; CREATE TABLE t1 ( c1 CHAR(10) ); |
We create a column without specifying its
character set and collation. We're also not specifying a character
set and a collation at the table level. In this circumstance, MySQL
looks up to the database level for inspiration. (The database's
settings become the table's settings, and thereafter become the
column's setting.) So the character set for column c1
is latin2
and its collation is latin2_czech_ci
.
Every connection has connection character sets and connection collations, which may not be null. There are actually two connection character sets, which we will call "connection/literals" and "connection/results" when it is necessary to distinguish them.
Consider what a "connection" is: It's what you make when you connect to the server. The client sends SQL statements, such as queries, over the connection to the server. The server sends responses, such as result sets, over the connection back to the client. This leads to several questions, such as: (a) what character set is the query in when it leaves the client? (b) what character set should the server translate a query to after receiving it? (c) what character set should the server translate to before shipping result sets or error messages back to the client? You can fine-tune the setting for these things, or you can depend on the defaults (in which case, you can skip this section).
There are two statements that affect the connection character sets:
SET NAMES character_set_name SET CHARACTER SET character_set_name |
SET NAMES
indicates what is in the SQL statement that the client
sends. Thus, SET NAMES cp1251
tells the server "future
incoming messages from this client will be in character set cp1251
"
and the server is free to translate to its own character set, if
appropriate.
SET CHARACTER SET
indicates what is in the SQL statement that
the client sends, and also what is in the result set that the
server sends back to the client. Thus, SET CHARACTER SET
includes
SET NAMES
, and also specifies what character set the column values
will have if, for example, you use a SELECT
statement.
EXAMPLE: Suppose that column1
is defined as CHAR(5)
CHARACTER SET latin2
. If you do not say SET CHARACTER SET
,
then for SELECT column1 FROM t
the server will send back all
the values for column1
using character set latin2
. If on
the other hand you say SET CHARACTER SET latin1
then the server
will, just before sending back, convert the latin2
values to
latin1
. Such conversion is slow and may be lossy.
When you execute SET NAMES
or SET CHARACTER SET
, you are also
changing the "connection collation". However, the connection
collation exists for consistency only. Usually its value doesn't
matter.
With the mysql
client, it is not necessary to
execute SET NAMES
every time you start up. You can add the
--default-character-set-name
option
setting to your mysql
statement line, or in your option file.
For example, the following option file setting will change the connection
character set each time you run mysql
:
[mysql] default-character-set-name=character_set_name |
Every character string literal has a character set and a collation, which may not be null.
A character string literal may have an optional character set
introducer and COLLATE
clause:
[_character_set_name]'string' [COLLATE collation_name] |
Examples:
SELECT 'string'; SELECT _latin1'string'; SELECT _latin1'string' COLLATE latin1_danish_ci; |
The simple statement SELECT 'string'
uses the
connection/literal character set.
The _character_set_name
expression is formally called
an introducer. It tells the parser,
"the string that is about to follow is in character set X
."
Because this has confused people in the past, we emphasize
that an introducer does not cause any conversion, it is strictly a
signal that does not change the string's value. An introducer is
also legal before standard hex literal and numeric hex literal notation
(x'literal'
and 0xnnnn
), and before ?
(parameter
substitution when using prepared statements within a programming language
interface).
Examples:
SELECT _latin1 x'AABBCC'; SELECT _latin1 0xAABBCC; SELECT _latin1 ?; |
MySQL determines a literal's character set and collation thus:
_X
and COLLATE Y
were specified then the literal
character set is X
and the literal collation is Y
_X
is specified but COLLATE
is not specified, then the
literal character set is X
and the literal collation is X
's default
collation
Examples:
latin1
character set and latin1_german1_ci
collation:
SELECT _latin1'Müller' COLLATE latin1_german1_ci; |
latin1
character set and its default collation, that is,
latin1_swedish_ci
:
SELECT _latin1'Müller'; |
SELECT 'Müller'; |
Character set introducers and the COLLATE
clause are implemented
according to standard-SQL specifications.
COLLATE
Clause in Various Parts of an SQL Query With the COLLATE
clause you can override whatever the default
collation is for a comparison. COLLATE
may be used in
various parts of SQL queries. Here are some examples:
ORDER BY
:
SELECT k FROM t1 ORDER BY k COLLATE latin1_german2_ci; |
AS
:
SELECT k COLLATE latin1_german2_ci AS k1 FROM t1 ORDER BY k1; |
GROUP BY
:
SELECT k FROM t1 GROUP BY k COLLATE latin1_german2_ci; |
SELECT MAX(k COLLATE latin1_german2_ci) FROM t1; |
DISTINCT
:
SELECT DISTINCT k COLLATE latin1_german2_ci FROM t1; |
WHERE
:
SELECT * FROM t1 WHERE _latin1 'Müller' COLLATE latin1_german2_ci = k; |
HAVING
:
SELECT k FROM t1 GROUP BY k HAVING k = _latin1 'Müller' COLLATE latin1_german2_ci; |
COLLATE
Clause Precedence The COLLATE
clause has high precedence (higher than ||), so the expression
x || y COLLATE z |
is equivalent to:
x || (y COLLATE z) |
BINARY
Operator The BINARY
operator is a shorthand for a COLLATE
clause. For
example, BINARY 'x'
is equivalent to 'x' COLLATE y
, where
y
is the
name of an appropriate binary collation. For example, assuming that
column a
is of character set latin1
, these two queries have the
same effect:
SELECT * FROM t1 ORDER BY BINARY a; SELECT * FROM t1 ORDER BY a COLLATE latin1_bin; |
Note: Every character set has a binary collation.
In the great majority of queries, it is obvious what collation
MySQL uses to resolve a comparison operation. For example, in the
following cases it should be clear that the collation will be "the
column collation of column x
":
SELECT x FROM T ORDER BY x; SELECT x FROM T WHERE x = x; SELECT DISTINCT x FROM T; |
However, when multiple operands are involved, there can be ambiguity. For example:
SELECT x FROM T WHERE x = 'Y'; |
Should this query use the collation of the column x
, or of the
string literal 'Y'
?
Standard SQL resolves such questions using what used to be
called "coercibility" rules. The essence is: Because x
and 'Y'
both have collations, whose collation takes precedence? It's complex,
but these rules would take care of most situations:
COLLATE
clause has precedence 4.
Those rules resolve ambiguities thus:
Examples:
| Use collation of |
| Use collation of |
| Error |
Recall that each character set has one or more
collations, and each collation is associated with one and only one
character set. Therefore, the following statement
causes an error message because the latin2_bin
collation is not
legal with the latin1
character set:
mysql> SELECT _latin1 'x' COLLATE latin2_bin; ERROR 1251: COLLATION 'latin2_bin' is not valid for CHARACTER SET 'latin1' |
Suppose column X
in table T
has these latin1
column values:
Muffler Müller MX Systems MySQL |
And suppose that the column values are retrieved using the following statement:
SELECT X FROM T ORDER BY X COLLATE collation_name; |
The resulting order of the values for different collations is shown in this table:
| | |
Muffler | Muffler | Müller |
MX Systems | Müller | Muffler |
Müller | MX Systems | MX Systems |
MySQL | MySQL | MySQL |
The table is an example that shows what the effect would
be if we used different collations in an ORDER BY
clause. The
character that's causing the trouble in this example is the U with
two dots over it, which the Germans call U-umlaut, but we'll call
it U-diaeresis.
The first column shows the result of the SELECT
using the
Swedish/Finnish collating rule, which says that U-diaeresis sorts
with Y.
The second column shows the result of the SELECT
using the
German DIN-1 rule, which says that U-diaeresis sorts with U.
The third column shows the result of the SELECT
using the German
DIN-2 rule, which says that U-diaeresis sorts with UE.
Three different collations, three different results. That's what MySQL is here to handle. By using the appropriate collation, you can choose the sort order you want.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] |
Hosting by: Hurra Communications Ltd.
Generated: 2007-01-26 17:58:46