Localization and TDS 7.0

TDS 7.0 uses 2-byte Unicode (technically, UCS-2) to transfer character data between servers and clients. Included in "character data" are query text (i.e., SQL), metadata (table names and such), and bona fide data of datatypes nchar, nvarchar, and ntext. (Background information on Unicode and how it affects FreeTDS can be found in the appendix.)

Because most Unix tools and environments do not support UCS-2, FreeTDS provides for conversion by the client to other character sets. The mechanism used is determined by the configure script, which looks for a iconv(3) function, an implementation of the iconv standard. If no iconv library is found, or if it is explicitly disabled, FreeTDS will use its built-in iconv substitute, and will be capable of converting among only ISO 8859-1, UTF-8, and UCS-2.

To learn what character set the client wants, FreeTDS prefers the applicable freetds.conf client charset property. If that is not set, it parses the LANG environment variable. In either case, the found string is passed to iconv(3) (or its built-in replacment). [1]. If neither is found, UCS-2 data are converted to ISO 8859-1.

To list all supported iconv character sets try iconv(1). GNU's does:

$ iconv --list

For other systems, consult your documentation (most likely man iconv will give you some hints).

In this example a server named mssql will return data encoded in the GREEK character set.

Example 5-2. Configuring for GREEK freetds.conf setting

[mssql]
	host = ntbox.mydomain.com
	port = 1433
	tds version = 7.0
	client charset = GREEK

If FreeTDS runs into a character it can not convert, its behavior varies according to the severity of the problem. On retrieving data from the server, FreeTDS substitutes an ASCII '?' in the character's place, and emits a warning message stating that some characters could not be converted. On sending data to the server, FreeTDS aborts the query and emits an error message. It is well to ensure that the data contained in the database is representable in the client's character set.

If you have a mix of character data that can not be contained in a single-byte character set, you may wish to use UTF-8. UTF-8 is a variable length unicode encoding that is compatible with ASCII in the range 0 to 127. With UTF-8, you are guaranteed to never have an unconvertible character.

Important

FreeTDS is not fully compatible with multi-byte character sets such as UCS-2. You must use an ASCII-extension charset (e.g., UTF-8, ISO-8859-*)[2]. Great care should be taken testing applications using these encodings. Specifically, many applications do not expect the number of characters returned to exceed the column size (in bytes).

In the following example, a server named mssql will return data encoded in the UTF-8 character set.

Example 5-3. Configuring for UTF-8 freetds.conf setting

[mssql]
	host = ntbox.mydomain.com
	port = 1433
	tds version = 7.0
	client charset = UTF-8

It is also worth clarifying that TDS 7.0 and above do not accept any specified character set during login, as 4.2 does. A TDS 7.0 login packet uses UCS-2.

Notes

[1]

The built-in replacement expects GNU iconv names: ISO-8859-1, US-ASCII, or UTF-8.

[2]

not EBCDIC or other weird charsets