This chapter describes the languages MySQL supports, how sorting works in MySQL, and how to add new character sets to MySQL. You will also find information about maximum table sizes in this chapter.
mysqld can issue error messages in the following languages:
Czech, Danish, Dutch, English (the default), Estonian, French, German, Greek,
Hungarian, Italian, Japanese, Korean, Norwegian, Norwegian-ny, Polish,
Portuguese, Romanian, Russian, Slovak, Spanish, and Swedish.
mysqld with a particular language, use either the
-L lang options. For example:
shell> mysqld --language=swedish
shell> mysqld --language=/usr/local/share/swedish
Note that all language names are specified in lowercase.
The language files are located (by default) in `mysql_base_dir/share/LANGUAGE/'.
To update the error message file, you should edit the `errmsg.txt' file and execute the following command to generate the `errmsg.sys' file:
shell> comp_err errmsg.txt errmsg.sys
If you upgrade to a newer version of MySQL, remember to repeat your changes with the new `errmsg.txt' file.
By default, MySQL uses the ISO-8859-1 (Latin1) character set. This is the character set used in the USA and western Europe.
All standard MySQL binaries are compiled with
--with-extra-charsets=complex. This will add code to all
standard programs to be able to handle
latin1 and all multi-byte
character sets within the binary. Other character sets will be
loaded from a character-set definition file when needed.
The character set determines what characters are allowed in names and how
things are sorted by the
ORDER BY and
GROUP BY clauses of
You can change the character set with the
--default-character-set option when you start the server.
The character sets available depend on the
configure, and the character set configuration files
listed in `SHAREDIR/charsets/Index'.
See section 4.7.1 Quick Installation Overview.
If you change the character set when running MySQL (which may also change the sort order), you must run myisamchk -r -q on all tables. Otherwise your indexes may not be ordered correctly.
When a client connects to a MySQL server, the server sends the default character set in use to the client. The client will switch to use this character set for this connection.
One should use
mysql_real_escape_string() when escaping strings
for a SQL query.
mysql_real_escape_string() is identical to the
mysql_escape_string() function, except that it takes the MYSQL
connection handle as the first parameter.
If the client is compiled with different paths than where the server is installed and the user who configured MySQL didn't included all character sets in the MySQL binary, one must specify for the client where it can find the additional character sets it will need if the server runs with a different character set than the client.
One can specify this by putting in a MySQL option file:
where the path points to where the dynamic MySQL character sets are stored.
One can force the client to use specific character set by specifying:
but normally this is never needed.
To add another character set to MySQL, use the following procedure.
Decide if the set is simple or complex. If the character set does not need to use special string collating routines for sorting and does not need multi-byte character support, it is simple. If it needs either of those features, it is complex.
danish are simple charactersets while
czech are complex character sets.
In the following section, we have assumed that you name your character
For a simple character set do the following:
ctypearray takes up the first 257 words. The
sort_orderarrays take up 256 words each after that.
For a complex character set do the following:
to_lower_MYSET, and so on. This corresponds to the arrays in the simple character set. See section 10.1.3 The character definition arrays. For a complex character set
/* * This comment is parsed by configure to create ctype.c, * so don't change it unless you know what you are doing. * * .configure. number_MYSET=MYNUMBER * .configure. strxfrm_multiply_MYSET=N * .configure. mbmaxlen_MYSET=N */The
configureprogram uses this comment to include the character set into the MySQL library automatically. The strxfrm_multiply and mbmaxlen lines will be explained in the following sections. Only include them if you the string collating functions or the multi-byte character set functions, respectively.
The file `sql/share/charsets/README' includes some more instructions.
If you want to have the character set included in the MySQL distribution, mail a patch to firstname.lastname@example.org.
to_upper are simple arrays that hold the
lowercase and uppercase characters corresponding to each member of the
character set. For example:
to_lower['A'] should contain 'a' to_upper['a'] should contain 'A'
sort_order is a map indicating how characters should be ordered for
comparison and sorting purposes. For many character sets, this is the same as
to_upper (which means sorting will be case insensitive).
MySQL will sort characters based on the value of
sort_order[character]. For more complicated sorting rules, see
the discussion of string collating below. See section 10.1.4 String Collating Support.
ctype is an array of bit values, with one element for one character.
are indexed by character value, but
ctype is indexed by character
value + 1. This is an old legacy to be able to handle EOF.)
You can find the following bitmask definitions in `m_ctype.h':
#define _U 01 /* Uppercase */ #define _L 02 /* Lowercase */ #define _N 04 /* Numeral (digit) */ #define _S 010 /* Spacing character */ #define _P 020 /* Punctuation */ #define _C 040 /* Control character */ #define _B 0100 /* Blank */ #define _X 0200 /* heXadecimal digit */
ctype entry for each character should be the union of the
applicable bitmask values that describe the character. For example,
'A' is an uppercase character (
_U) as well as a
hexadecimal digit (
ctype['A'+1] should contain the
_U + _X = 01 + 0200 = 0201
If the sorting rules for your language are too complex to be handled
with the simple
sort_order table, you need to use the string
Right now the best documentation on this is the character sets that are already implemented. Look at the big5, czech, gbk, sjis, and tis160 character sets for examples.
You must specify the
strxfrm_multiply_MYSET=N value in the
special comment at the top of the file.
N should be set to
the maximum ratio the strings may grow during
must be a positive integer).
If your want to add support for a new character set that includes multi-byte characters, you need to use the multi-byte character functions.
Right now the best documentation on this is the character sets that are
already implemented. Look at the euc_kr, gb2312, gbk, sjis and ujis
character sets for examples. These are implemented in the
ctype-'charset'.c files in the `strings' directory.
You must specify the
mbmaxlen_MYSET=N value in the special
comment at the top of the source file.
N should be set to the
size in bytes of the largest character in the set.
MySQL Version 3.22 has a 4G limit on table size. With the new
MyISAM in MySQL Version 3.23 the maximum table size is
pushed up to 8 million terabytes (2 ^ 63 bytes).
Note, however, that operating systems have their own file size limits. Here are some examples:
|Operating System||File Size Limit|
|Linux-Intel 32 bit||2G, 4G or more, depends on Linux version|
|Solaris 2.5.1||2G (possible 4G with patch)|
|Solaris 2.7 Intel||4G|
|Solaris 2.7 ULTRA-SPARC||8T (?)|
On Linux 2.2 you can get bigger tables than 2G by using the LFS patch for the ext2 file system. On Linux 2.4 there exists also patches for ReiserFS to get support for big files.
This means that the table size for MySQL is normally limited by the operating system.
By default, MySQL tables have a maximum size of about 4G. You can
check the maximum table size for a table with the
SHOW TABLE STATUS
command or with the
myisamchk -dv table_name.
See section 7.28
If you need bigger tables than 4G (and your operating system supports
this), you should set the
parameter when you create your table. See section 7.7
CREATE TABLE Syntax. You can
also set these later with
ALTER TABLE. See section 7.8
ALTER TABLE Syntax.
If your big table is going to be read-only, you could use
myisampack to merge and compress many tables to one.
myisampack usually compresses a table by at least 50%, so you can
have, in effect, much bigger tables. See section 15.12 The MySQL Compressed Read-only Table Generator.
You can go around the operating system file limit for
files by using the
RAID option. See section 7.7
CREATE TABLE Syntax.
Another solution can be the included MERGE library, which allows you to handle a collection of identical tables as one. See section 8.2 MERGE Tables.
Go to the first, previous, next, last section, table of contents.