| ||
About GEMINI Tables |
|
GEMINI is a transaction-safe table handler for MySQL. It provides row-level locking, robust transaction support and reliable crash recovery. It is targeted for databases that need to handle heavy multi-user updates typical of transaction processing applications while still providing excellent performance for read-intensive operations. The GEMINI table type is developed and supported by NuSphere Corporation (see http://www.nusphere.com/).
GEMINI provides full ACID transaction properties (Atomic, Consistent, Independent, and Durable) with a programming model that includes support for statement atomicity and all four standard isolation levels (Read Uncommitted, Read Committed, Repeatable Read, and Serializable) defined in the SQL standard.
GEMINI tables support row-level and table-level locking to increase concurrency in applications and allow reading of tables without locking for maximum concurrency in a heavy update environment. The transaction, locking, and recovery mechanisms are tightly integrated to eliminate unnecessary administration overhead.
In general, if GEMINI tables are selected for an application, it is recommended that all tables updated in the application be GEMINI tables to provide well-defined system behavior. If non-GEMINI tables are mixed into the application then, ACID transaction properties cannot be maintained. While there are clearly cases where mixing table types is appropriate, it should always be done with careful consideration of the impact on transaction consistency and recoverability needs of the application and underlying database.
The GEMINI table type is derived from a successful commercial database and uses the storage kernel technology tightly integrated with MySQL server. The basic GEMINI technology is in use by millions of users worldwide in production environments today. This maturity allows GEMINI tables to provide a solution for those users who require transaction-based behavior as part of their applications.
The GEMINI table handler supports a configurable data cache that allows a significant portion of any database to be maintained in memory while still allowing durable updates.
The following summarizes the major features provided by GEMINI tables.
Supports all optimization statistics used by the MySQL optimizer including table cardinality, index range estimates and multi-component selectivity to insure optimal query performance.
Maintains exact cardinality information for each table so
SELECT COUNT(*) FROM
table-name always returns an answer immediately.Supports index-only queries; when index data is sufficient to resolve a query no record data is read (for non character types).
GEMINI uses block based I/O for better performance. There is no performance penalty for using
VARCHAR
fields. The maximum record size is currently 32K.The number of rows in a single GEMINI table can be 4 quintillion (full use of 64 bits).
Individual tables can be as large as 16 petabytes.
Locking is done at a record or row level rather than at table level unless table locks are explicitly requested. When a row is inserted into a table, other rows can be updated, inserted or deleted without waiting for the inserted row to be committed.
Provides durable transactions backed by a crash recovery mechanism that returns the database to a known consistent state in the event of an unexpected failure.
Support for all isolation levels and statement atomicity defined in the SQL standard.
Reliable Master Replication; the master database can survive system failure and recover all committed transactions.
This section highlights some of the important concepts behind GEMINI and the GEMINI programming model, including:
- ACID Transactions
- Transaction COMMIT/ROLLBACK
- Statement Atomicity
- Recovery
- Isolation Levels
- Row-Level Locking
These features are described below.
ACID Transactions
ACID in the context of transactions is an acronym which stands for Atomicity, Consistency, Isolation, Durability.
Attribute Description Atomicity A transaction allows for the grouping of one or more changes to tables and rows in the database to form an atomic or indivisible operation. That is, either all of the changes occur or none of them do. If for any reason the transaction cannot be completed, everything this transaction changed can be restored to the state it was in prior to the start of the transaction via a rollback operation. Consistency Transactions always operate on a consistent view of the data and when they end always leave the data in a consistent state. Data may be said to be consistent as long as it conforms to a set of invariants, such as no two rows in the customer table have the same customer ID and all orders have an associated customer row. While a transaction executes, these invariants may be violated, but no other transaction will be allowed to see these inconsistencies, and all such inconsistencies will have been eliminated by the time the transaction ends. Isolation To a given transaction, it should appear as though it is running all by itself on the database. The effects of concurrently running transactions are invisible to this transaction, and the effects of this transaction are invisible to others until the transaction is committed. Durability Once a transaction is committed, its effects are guaranteed to persist even in the event of subsequent system failures. Until the transaction commits, not only are any changes made by that transaction not durable, but are guaranteed not to persist in the face of a system failures, as crash recovery will rollback their effects. Transaction COMMIT/ROLLBACK
As stated above, a transaction is a group of work being done to data. Unless otherwise directed, MySQL considers each statement a transaction in itself. Multiple updates can be accomplished by placing them in a single statement, however they are limited to a single table.
Applications tend to require more robust use of transaction concepts. Take, for example, a system that processes an order: A row may be inserted in an order table, additional rows may be added to an order-line table, updates may be made to inventory tables, etc. It is important that if the order completes, all the changes are made to all the tables involved; likewise if the order fails, none of the changes to the tables must occur. To facilitate this requirement, MySQL has syntax to start a transaction called
BEGIN WORK
. All statements that occur after theBEGIN WORK
statement are grouped into a single transaction. The end of this transaction occurs when aCOMMIT
orROLLBACK
statement is encountered. After theCOMMIT
orROLLBACK
the system returns back to the behavior before theBEGIN WORK
statement was encountered where every statement is a transaction.To permanently turn off the behavior where every statement is a transaction, MySQL added a variable called
AUTOCOMMIT
. TheAUTOCOMMIT
variable can have two values,1
and0
. The mode where every statement is a transaction is whenAUTOCOMMIT
is set to1
(AUTOCOMMIT=1
). WhenAUTOCOMMIT
is set to0
(AUTOCOMMIT=0
), then every statement is part of the same transaction until the transaction end by eitherCOMMIT
orROLLBACK
. Once a transaction completes, a new transaction is immediately started and the process repeats.Here is an example of the SQL statements that you may find in a typical order:
BEGIN WORK;
INSERT INTO order VALUES ...;
INSERT INTO order-lines VALUES ...;
INSERT INTO order-lines VALUES ...;
INSERT INTO order-lines VALUES ...;
UPDATE inventory WHERE ...;
COMMIT;
This example shows how to use the
BEGIN WORK
statement to start a transaction. If the variableAUTOCOMMIT
is set to0
, then a transaction would have been started already. In this case, theBEGIN WORK
commits the current transaction and starts a new one.Statement Atomicity
As mentioned above, when running with
AUTOCOMMIT
set to1
, each statement executes as a single transaction. When a statement has an error, then all changes make by the statement must be undone. Transactions support this behavior. Non-transaction safe table handlers would have a partial statement update where some of the changes from the statement would be contained in the database and other changes from the statement would not. Work would need to be done to manually recover from the error.Recovery
Transactions are the basis for database recovery. Recovery is what supports the Durability attribute of the ACID transaction.
GEMINI uses a separate file called the Recovery Log located in the
$DATADIR
directory namedgemini.rl
. This file maintains the integrity of all the GEMINI tables. GEMINI can not recover any data from non-GEMINI tables. In addition, thegemini.rl
file is used to rollback transactions in support of theROLLBACK
statement.In the event of a system failure, the next time the MySQL server is started, GEMINI will automatically go through its crash recovery process. The result of crash recovery is that all the GEMINI tables will contain the latest changes made to them, and all transactions that were open at the time of the crash will have been rolled back.
The GEMINI Recovery Log reuses space when it can. Space can be reused when information in the Recovery Log is no longer needed for crash recovery or rollback.
Isolation Levels
There are four isolation levels supported by GEMINI:
These isolation levels apply only to shared locks obtained by select statements, excluding select for update. Statements that get exclusive locks always retain those locks until the transaction commits or rolls back.
By default, GEMINI operates at the
READ COMMITTED
level. You can override the default using the following command:
SET [GLOBAL | SESSION] TRANSACTION ISOLATION LEVEL [READ UNCOMMITTED | READ COMMITTED | REPEATABLE READ | SERIALIZABLE ]
If the
SESSION
qualifier used, the specified isolation level persists for the entire session. If theGLOBAL
qualifier is used, the specified isolation level is applied to all new connections from this point forward. Note that the specified isolation level will not change the behavior for existing connections including the connection that executes theSET GLOBAL TRANSACTION ISOLATION LEVEL
statement.The statements that get exclusive locks are
INSERT
,UPDATE
,DELETE
andSELECT ... FOR UPDATE
. Select statements without theFOR UPDATE
qualifier get shared locks which allow other not "for update" select statements to read the same rows but block anyone trying to update the row from accessing it. Rows or tables with exclusive locks block all access to the row from other transactions until the transaction ends.In general terms, the higher the Isolation level the more likelihood of having concurrent locks and therefore lock conflicts. In such cases, adjust the
-O gemini_lock_table_size
accordingly.Row-Level Locking
GEMINI uses row locks, which allows high concurrency for requests on the same table.
In order to avoid lock table overflow, SQL statements that require applying locks to a large number of rows should either be run at the serializable isolation level or should be covered by a lock table statement.
Memory must be pre-allocated for the lock table. The mysqld server startup option
-0 gemini_lock_table_size
can be used to adjust the number of concurrent locks.
The following limitations are in effect for the current version of GEMINI:
DROP DATABASE
does not work with GEMINI tables; instead, drop all the tables in the database first, then drop the database.Maximum number of GEMINI tables is 1012.
Maximum number of GEMINI files a server can manage is 1012. Each table consumes one file; an additional file is consumed if the table has any indexes defined on it.
Maximum size of BLOBs is 16MB.
FULLTEXT
indexes are not supported with GEMINI tables.There is no support for multi-component
AUTO_INCREMENT
fields that provide alternating values at the component level. If you try to create such a field, GEMINI will refuse.
TEMPORARY TABLES
are not supported by GEMINI. The statementCREATE TEMPORARY TABLE ... TYPE=GEMINI
will generate the response:ERROR 1005: Can't create table '/tmp/#sqlxxxxx' (errno: 0)
.
FLUSH TABLES
has not been implemented with GEMINI tables.
| ||
Using GEMINI Tables |
|
This section explains the various startup options you can use with GEMINI tables, how to backup GEMINI tables, some performance considerations and sample configurations, and a brief discussion of when to use GEMINI tables.
Specifically, the topics covered in this section are:
The table below lists options to mysqld that can be used to change the behavior of GEMINI tables.
Option Description --default-table-type=gemini
Sets the default table handler to be GEMINI. All create table statements will create GEMINI tables unless otherwise specified with TYPE=table-type
. As noted above, there is currently a limitation withTEMPORARY
tables using GEMINI.--gemini-flush-log-at-commit
Forces the recovery log buffers to be flushed after every commit. This can have a serious performance penalty, so use with caution. --gemini-recovery=FULL | NONE | FORCE
Sets the recovery mode. Default is FULL
.NONE
is useful for performing repeatable batch operations because the updates are not recorded in the recovery log.FORCE
skips crash recovery upon startup; this corrupts the database, and should be used in emergencies only.--gemini-unbuffered-io
All database writes bypass the OS cache. This can provide a performance boost on heavily updated systems where most of the dataset being worked on is cached in memory with the gemini_buffer_cache
parameter.--O gemini_buffer_cache=size
Amount of memory to allocate for database buffers, including Index and Record information. It is recommended that this number be 10% of the total size of all GEMINI tables. Do not exceed amount of memory on the system! --O gemini_connection_limit=#
Maximum number of connections to GEMINI; default is 100
. Each connection consumes about 1K of memory.--O gemini_io_threads=#
Number of background I/O threads; default is 2
. Increase the number when using--gemini-unbuffered-io
--O gemini_lock_table_size=#
Sets the maximum number of concurrent locks; default is 4096. Using SET [ GLOBAL | SESSION ] TRANSACTION ISOLATION = ...
will determine how long a program will hold row locks.--O gemini_lock_wait_timeout=seconds
Number of seconds to wait for record locks when performing queries; default is 10 seconds. Using SET [ GLOBAL | SESSION ] TRANSACTION ISOLATION = ...
will determine how long a program will hold row locks.--skip-gemini
Do not use GEMINI. If you use --skip-gemini
, MySQL will not initialize the GEMINI table handler, saving memory; you cannot use GEMINI tables if you use--skip-gemini
.--transaction-isolation=READ-UNCOMMITTED | READ-COMMITTED | REPEATABLE-READ | SERIALIZABLE
Sets the GLOBAL transaction isolation level for all users that connect to the server; can be overridden with the SET ISOLATION LEVEL statement.
GEMINI tables can be created by either using the
CREATE TABLE
syntax or theALTER TABLE
syntax.
- The syntax for creating a GEMINI table is:
CREATE TABLE table-name (....) TYPE=GEMINI;
- The syntax to convert a table to GEMINI is:
ALTER TABLE table-name TYPE=GEMINI;
See Chapter 9, "MySQL Tutorial," in the MySQL Reference Manual for more information on how to create and use MySQL tables.
GEMINI supports both
BACKUP TABLE
andRESTORE TABLE
syntax. To learn more about how to useBACKUP
andRESTORE
, see Section 7.3, "BACKUP TABLE Syntax," and Section 7.4, "RESTORE TABLE Syntax," in the MySQL Reference Manual.To backup GEMINI tables outside of the MySQL environment, you must first shut down the MySQL server. Once the server is shut down, you can copy the files associated with GEMINI to a different location. The files that make up the GEMINI table handler are:
All files associated with a table with a
.gmd
extension below the$DATADIR
directory. Such files includetable.gmd
,table.gmi
, andtable.frm
gemini.db
in the$DATADIR
directory
gemini.rl
in the$DATADIR
directory
gemini.lg
in the$DATADIR
directoryAll the GEMINI files must be copied together. You can not copy just the
.gmi
and.gmd
files to a different$DATADIR
and have them become part of a new database. You can copy an entire$DATADIR
directory to another location and start a MySQL server using the new$DATADIR
.
To restore GEMINI tables outside of the MySQL environment, you must first shut down the MySQL server. Once the server is shut down, you can remove all GEMINI files in the target
$DATADIR
and then copy the files previously backed up into the$DATADIR
directory.As mentioned above, the files that make up the GEMINI table handler are:
All files associated with a table with a
.gmd
extention below the$DATADIR
directory. Such files includetable.gmd
,table.gmi
, andtable.frm
gemini.db
in the$DATADIR
directory
gemini.rl
in the$DATADIR
directory
gemini.lg
in the$DATADIR
directoryWhen restoring a table, all the GEMINI files must be copied together. You can not restore just the
.gmi
and.gmd
files.
As mentioned previously, GEMINI tables support row-level and table-level locking to increase concurrency in applications and to allow reading of tables without locking for maximum concurrency in heavy update environments. This feature has several implications when working with
auto_increment
tables.In MySQL, when a column is defined as an
auto_increment
column, and a row is inserted into the table with aNULL
for the column, theauto_increment
column is updated to be 1 higher than the highest value in the column.With
MyISAM
tables, theauto_increment
function is implemented by looking in the index and finding the highest value and adding 1 to it. This is possible because the entireISAM
table is locked during the update period and the increment value is therefore guaranteed to not be changing.With GEMINI tables, the
auto_increment
function is implemented by maintaining a counter in a separate location from the table data. Instead of looking at the highest value in the table index, GEMINI tables look at this separately maintained counter. This means that in a transactional model, unlike the bottleneck inherent in theMyISAM
approach, GEMINI users do not have to wait until the transaction that added the last value either commits or rollbacks before looking at the value.Two side-effects of the GEMINI implementation are:
If an insert is done where the column with the
auto_increment
is specified, and this specified value is the highest value,MyISAM
uses it as itsauto_increment
value, and every subsequent insert is based on this. By contrast, GEMINI does not use this value, but instead uses the value maintained in the separate GEMINI counter location.To set the counter to a specific value, you can use
SET insert_id=#
and insert a new row in the table. However, as a general rule, values should not be inserted into anauto_increment
column; the database manager should be maintaining this field, not the application.SET insert_id
is a recovery mechanism that should be used in case of error only.Note that if you delete the row containing the maximum value for an
auto_increment
column, the value will be reused with a GEMINI table but not with aMyISAM
table.See Section 7.7, "CREATE TABLE Syntax," in the MySQL Reference Manual for more information about creating
auto_increment
columns.
In addition to designing the best possible application, configuration of the data and the server startup parameters need to be considered. How the hardware is being used can have a dramatic affect on how fast the system will respond to queries. Disk Drives and Memory must both be considered.
Disk Drives
For best performance, you want to spread the data out over as many disks as possible. Using RAID 10 stripes work very well. If there are a lot of updates then the recovery log (
gemini.rl
) should be on a relatively quiet disk drive.To spread the data out without using RAID 10, you can do the following:
Group all the tables into three categories: Heavy Use, Moderate Use, Light Use.
Take the number of disk drives available and use a round-robin approach to the three categories grouping the tables on a disk drive. The result will be an equal distribution of Heavy/Moderate/Light tables assigned to each disk drive.
Once the tables have been converted to GEMINI by using the
ALTER TABLE <name> TYPE=GEMINI
statements, move (mv
) the.gmd
and.gmi
files to a different disk drive and link (ln -s
) them back to the original directory where the.frm
file resides.Finally, move the
gemini.rl
file to its quiet disk location and link the file back to the$DATADIR
directory.Memory
The more data that can be placed in memory the faster the access to the data. Figure out how large the GEMINI data is by adding up the
.gmd
and.gmi
file sizes. If you can, put at least 10% of the data into memory. You allocate memory for the rows and indexes by using thegemini_buffer_cache
startup parameter. For example:
mysqld -O gemini_buffer_cache=800M
would allocate 800MB of memory for the GEMINI buffer cache.
Based on the performance considerations above, we can look at some examples for how to get the best performance out of the system when using GEMINI tables.
Hardware Configuration One CPU, 128MB memory, one disk drive Allocate 80MB of memory for reading and updating GEMINI tables by starting the mysqld server with the following option:
-O gemini_buffer_cache=80M
Two CPUs, 512MB memory, four disk drives Use RAID 10 to stripe the data across all available disks, or use the method described in the performance considerations section, above. Allocate 450MB of memory for reading/updating GEMINI tables:
-O gemini_buffer_cache=450M
Because the GEMINI table handler provides crash recovery and transaction support, there is extra overhead that is not found in other non-transaction safe table handlers. Here are some general guidelines for when to employ GEMINI and when to use other non-transaction safe tables (NTST).
Access Trends Table Type Reason Read-only NTST Less overhead and faster Critical data GEMINI Crash recovery protection High concurrency GEMINI Row-level locking Heavy update GEMINI Row-level locking The table below shows how a typical application schema could be defined.
Table Contents Table Type Reason account Customer account data GEMINI Critical data, heavy update order Orders for a customer GEMINI Critical data, heavy update orderline Orderline detail for an order GEMINI Critical data, heavy update invdesc Inventory description NTST Read-only, frequent access salesrep Sales rep information NTST Infrequent update inventory Inventory information GEMINI High concurrency, critical data config System configuration NTST Read-only