3.2.5 Multi-byte String types

For multi-byte string types, the basic character has a size of at least 2. This means it can be used to store a unicode character in UTF16 or UCS2 encoding.


Unicodestrings (used to represent unicode character strings) are implemented in much the same way as ansistrings: reference counted, null-terminated arrays, only they are implemented as arrays of WideChars instead of regular Chars. A WideChar is a two-byte character (an element of a DBCS: Double Byte Character Set). Mostly the same rules apply for UnicodeStrings as for AnsiStrings. The compiler transparently converts UnicodeStrings to AnsiStrings and vice versa.

Similarly to the typecast of an Ansistring to a PChar null-terminated array of characters, a UnicodeString can be converted to a PUnicodeChar null-terminated array of characters. Note that the PUnicodeChar array is terminated by 2 null bytes instead of 1, so a typecast to a pchar is not automatic.

The compiler itself provides no support for any conversion from Unicode to ansistrings or vice versa. The system unit has a unicodestring manager record, which can be initialized with some OS-specific unicode handling routines. For more information, see the system unit reference.

A unicode string literal can be constructed in a similar manner as a widechar:

  ws2: unicodestring = ’phi omega : ’#$03A8’ ’#$03A9;


Widestrings (used to represent unicode character strings in COM applications) are implemented in much the same way as unicodestrings. Unlike the latter, they are not reference counted, and on Windows, they are allocated with a special windows function which allows them to be used for OLE automation. This means they are implemented as null-terminated arrays of WideChars instead of regular Chars. Mostly the same rules apply for WideStrings as for AnsiStrings. Similar to unicodestrings, the compiler transparenty converts WideStrings to AnsiStrings and vice versa.

For typecasting and conversion, the same rules apply as for the unicodestring type.