Internationalization with Qt

Internationalization of software is the process of allowing the software to be used efficiently by all people of the world. This means adapting to user and locality preferences such as language, input techniques, character encodings, and presentation conventions.

Step by Step

Writing cross-platform international software with Qt is a gentle, incremental process. Your software can become internationalized in the following stages:

Use QString for all user-visible text.
Since QString uses the Unicode encoding internally, all the languages of the world can be processed transparently using familiar text processing operations. Also, since all Qt functions that present text to the user take a QString as a parameter, there is no char* to QString conversion time.
Strings that are in "programmer space" - such as QObject names and file format texts need not use QString - the traditional char*, or the QCString class will suffice.
You're unlikely to notice that you are using unicode - QString, and QChar are just like easier versions of the clumsy const char* and char from traditional C.
Use tr() for all literal text.
Where your program uses "quoted text" for text that will be presented to the user, ensure it goes through the QApplication::translate() function, usually this simply means using tr(). For example, assuming LoginWidget is a subclass of QWidget:
```
        LoginWidget::LoginWidget()
        {
            QLabel *label = new QLabel( tr("Password:"), this );
            ...
        }
```
This is 99% of the strings you're likely to write.
If the quoted text is not in a member function of a QObject/QWidget subclass, use either the tr() function of an approriate class, or the QApplication::translate() function directly:
```
        void some_global_function(LoginWidget* logwid)
        {
            QLabel *label = new QLabel( LoginWidget::tr("Password:"), logwid );
        }

        void same_global_function(LoginWidget* logwid)
        {
            QLabel *label = new QLabel( qApp->translate("LoginWidget", "Password:"), logwid );
        }
```
Finally, if you need to have translatable text completely outside a funciton, there are two macros to help: QT_TR_NOOP() and QT_TRANSLATE_NOOP(). They merely mark the text for extraction by the findtr utility described below - the macros expand to just the text (without the scope). Example usages are shown below.
```
        QString FriendlyConversation::greeting(int greet_type)
        {
            static const char* greeting_strings[] = {
                QT_TR_NOOP("Hello"),
                QT_TR_NOOP("Goodbye")
            };
            return tr(greeting_strings[greet_type]);
        }

        static const char* greeting_strings[] = {
            QT_TRANSLATE_NOOP("FriendlyConversation","Hello"),
            QT_TRANSLATE_NOOP("FriendlyConversation","Goodbye")
        };
        QString FriendlyConversation::greeting(int greet_type)
        {
            return tr(greeting_strings[greet_type]);
        }
```
If you disable the const char* to QString automatic conversion by compiling your software with the macro QT_NO_CAST_ASCII defined, you'll be very likely to catch any strings you are missing. See QString::fromLatin1() for more details. Disabling the conversion can make programming cumbersome.

Use QString::arg() for simple arguments.

The printf() style of inserting arguments in strings is a poor choice for internationalized text, as it is sometimes necessary to change the order of arguments when translating. The QString::arg() functions offer a simple means for substituting arguments:

        void FileCopier::showProgress(int done, int total,
                                      const QString& current_file )
        {
            label.setText( tr("%1 of %2 files copied.\nCopying: %3")
                            .arg(done)
                            .arg(total)
                            .arg(current_file)
                         );
        }

Produce translation.
Once you are using tr() sufficiently, you can start producing translations of the user-visible text in your program.
Provided with Qt are three utility programs that assist in the management of translations:
findtr
Extracts information about text to be translated. It recognizes the tr() constructs described above and produces a file in ".po" format, a simple text format that your translation team will copy and edit. For example, the base .po file might be myapp.po and translated versions of the file would then be myapp_de.po, myapp_fr.po, and myapp_ja.po for translations in German, French and Japanese respectively.
```
                findtr *.cpp *.h >myapp.po
                copy myapp.po myapp_de.po
                edit myapp_de.po
```
msg2qm
Converts translated .po files to a Qt-specific binary format (".qm" Qt message files). The Qt message files are platform and locale independent, containing translations in Unicode and various hash tables to provide fast look-up.
```
                msg2qm myapp_de.po myapp_de.qm
                msg2qm myapp_fr.po myapp_fr.qm
                msg2qm myapp_ja.po myapp_ja.qm
```
In your application, use QTranslator::load() to load translation files appropriate for the user's language.
mergetr
When the texts in your program change as it is developed, a the base .po file can be regenerated using findtr, then mergetr can be used to merge the changes into the other .po files:
```
                mergetr myapp_de.po myapp.po
                mergetr myapp_fr.po myapp.po
                mergetr myapp_ja.po myapp.po
```
The translation team then edits the new .po files to translate the new or changed texts. When texts change, the old text is included in the .po file as a comment to guide the new translation (no "fuzzy" matching is done).
Note that Qt itself contains a small number of strings that will also need to be translated to the languages which you are targetting. In the near future Qt will ship with translations for some languages. We recommend that if you need to translate the Qt strings now that you put the translations in separate .po and .qm files. This will simplify transition to the official Qt translations.
The findtr ships in qt/bin, while the other utilities are under the qt/src/util directory and need to be compiled.
Support encodings.
The QTextCodec class and the facilities in QTextStream make it easy to support many input and output encodings for your users' data. When the application starts, the locale of the machine will determine the 8-bit encoding used when dealing with 8-bit data - such as for font selection, text display, 8-bit text I/O, and character input.
The application may occassionally have need for encodings other than the default local 8-bit encoding. For example, an application in a Cyrillic KOI8-R locale (the defacto-standard locale in Russia) might need to output Cyrillic in the ISO 8859-5 encoding. Code for this would be:
```
        QString string = ...; // Some Unicode text.

        QTextCodec* codec = QTextCodec::codecByName("ISO 8859-5");
        QCString encoded_string = codec->fromUnicode(string);

        ...; // Use encoded_string in 8-bit operations
```
For converting Unicode to local 8-bit encodings, a shortcut is available: the local8Bit() method of QString returns such 8-bit data. Another useful shortcut is the utf8() method, which returns text in the 8-bit UTF-8 encoding - interesting in that it perfectly preserves Unicode information while looking like plain US-ASCII if the Unicode is wholly US-ASCII.
For converting the other way, there are the QString::fromUtf8() and QString::fromLocal8Bit() convenience functions, or the general code, demonstrated by this conversion from ISO 8859-5 Cyrillic to Unicode conversion:
```
        QCString encoded_string = ...; // Some ISO 8859-5 encoded text.

        QTextCodec* codec = QTextCodec::codecByName("ISO 8859-5");
        QString string = codec->toUnicode(encoded_string);

        ...; // Use string in all of Qt's QString operations.
```
Ideally Unicode I/O should be used as this maximizes the portability of documents between users around the world, but in reality it is useful to support all the appropriate encodings that your users' will need to process existing documents. In general, Unicode (UTF16 or UTF8) is the best for information transferred between arbitrary people, while within a language or national group, a local standard is often more appropriate. The most important encoding to support is the one returned by QTextCodec::codecForLocale(), as this is the one the user is most likely to need for communicating with other people and applications (this is the codec used by local8Bit()).
Since most Unix systems do not have built-in support for converting between local 8-bit encodings and Unicode, it may be necessary to write your own QTextCodec subclass. Depending on the urgency, it may be useful to contact Troll Tech techical support or ask on the qt-interest mailinglist to see if someone else is already working on supporting the encoding. A useful interim measure can be to use the QTextCodec::loadCharmapFile() function to build a data-driven codec; this has a memory and speed penalty, especially with dynamically loaded libraries. For details of writing your own QTextCodec, see the mail QTextCodec class documentation.
Localization.
Localization is the process of adapting to local conventions such as date and time presentations. Such localizations can be accomplished using appropriate tr() strings, even "magic" words, as this somewhat contrived example shows:
```
        void Clock::setTime(const QTime& t)
        {
            if ( tr("AMPM") == "AMPM" ) {
                // 12-hour clock
            } else {
                // 24-hour clock
            }
        }
```
In general, it is recommended that you do not attempt to localize images - choose clear icons that are appropriate for all localities, rather than relying on local puns or stretched metaphors.

System Support

Operating systems and window systems supporting Unicode are still in the early stages of development. The level of support available in the underlying system influences the support Qt provides on that platform, but applications written with Qt need not generally be too concerned with the actual limitations.

Unix/X11

Locale-oriented fonts and input methods. Qt hides these and provides Unicode input and output.
Filesystem conventions such as UTF-8 are under development in some Unix variants. All Qt file functions allow Unicode, but convert all filenames to the local 8-bit encoding, as this is the Unix convention (see QFile::setEncodingFunction() if you are interested in exploring alternative encodings).
File I/O defaults to the local 8-bit encoding, with Unicode options in QTextStream.

Windows 95/98/NT

Qt provides full Unicode support, including input methods, fonts, clipboard, drag-and-drop, and file names.
File I/O defaults to Latin-1, with Unicode options in QTextStream. Note that some Windows programs do not understand big-endian Unicode text files even though that is the order prescribed by the Unicode Standard in the absence of higher-level protocols.
Note that unlike programs written with MFC or plain winlib, Qt programs are portable between Windows 95/98 and Windows NT - you do not need different binaries to support Unicode.

Supporting more Input Methods

While Troll Tech doesn't have the resources or expertise in all the languages of the world to immediately include support in Qt, we are very keen to work with people who do have the expertise. Over the next few minor version numbers, we hope to add support for your language of choice, until everyone can use Qt and all the programs developed with Qt, regardless of their language.

Initially, languages with uni-directional single-byte encodings (European Latin1 and KOI8-R, etc.) and the uni-directional multi-byte encodings (East Asian EUC-JP, etc.) will be supported. Later, support for the "complex" encodings - those requiring right-to-left input or complex character composition (eg. Arabic, Hebrew, and Thai script) will be implemented. The current state of activity is:

All encodings on Windows: On Windows, the local encoding is always supported.
ISO standard encodings ISO 8859-1, ISO 8859-2, ISO 8859-3, ISO 8859-4, ISO 8859-5, ISO 8859-7, ISO 8859-9, and ISO 8859-15: Fully supported. The Arabic (ISO 8859-6-I) and Hebrew (ISO 8859-8-I) encodings are not supported, but are under development externally.
KOI8-R: Fully supported.
eucJP, JIS, and ShiftJIS: Fully supported. Uses eucJP with the XIM protocol on X11, and the IME Windows NT in Japanese Windows NT. Serika Kurusugawa and other are assisting with this effort. kinput2 is the tested input method for X11.
eucKR: Under external development, Mizi Research are assisting with this effort. hanIM is the tested input method.
Big5: Qt contains a Big5 codec developed by Ming Che-Chuang. Testing is underway with the xcin (2.5.x) XIM server.
eucTW: Under external development.

If you are interested in contributing to existing efforts, or supporting new encodings beyond the more standard ones above, your work can be considered for inclusion in the official Qt distribution, or just included with your application.

Eventually, we hope to help Unix become as Unicode-oriented as Windows NT is becoming. This means better font support in the font servers, with new developments like the True Type font servers xfsft, xfstt, and x-tt, as well as UTF-8 (a Unicode encoding) filenames such as with the Unicode support in Solaris^TM 7.

Note about Kinput2 on X11

If using the Kinput2 XIM Server, users may need to apply the bug-fix patch below. Without this patch, Kinput2 can produce bad XIM Protocol, causing application lock-up.

For a recent version of Kinput2 that includes all users need, see ftp://ftp.sra.co.jp/pub/x11/kinput2/

Patch...

--- lib/imlib/imrequest.c~      Mon Feb  3 10:21:46 1997
+++ lib/imlib/imrequest.c       Wed Mar 10 19:23:09 1999
@@ -844,6 +844,6 @@
        IMSendBadLength(conn, icp->im->id, icp->id);
        return;
     }
-    IMDestroyIC(icp);
     IMSendRequestWithIC(conn, XIM_DESTROY_IC_REPLY, 0, icp);
+    IMDestroyIC(icp);
 }

Trademarks

Qt version 2.0.2