String Conversion Convenience Functions

The IceUtil namespace contains two helper functions that allow you to convert between narrow strings and UTF-8 encoded strings using a string converter:

C++
std::string nativeToUTF8(const std::string&, const StringConverterPtr&);

std::string UTF8ToNative(const std::string&, const StringConverterPtr&);

No conversion is performed when the provided string converter is null.

The IceUtil namespace contains two additional helper functions that allow you to convert between wide strings and narrow strings using the provided string converters:

C++
enum ConversionFlags { strictConversion, lenientConversion };

std::string wstringToString(const std::wstring&, const StringConverterPtr& = 0, 
                const WstringConverterPtr& = 0, ConversionFlags = lenientConversion);
std::wstring stringToWstring(const std::string&, const StringConverterPtr& = 0, 
                const WstringConverterPtr& = 0, ConversionFlags = lenientConversion);
 

When the narrow string converter given to wstringToString is null, the encoding of the returned narrow string is UTF-8. When the wide string converter given to wstringToString is null, the encoding of wide string parameter is UTF-16 or UTF-32, depending on the size of wchar_t.

Likewise for stringToWstring, when the wide string converter is null, the encoding of the returned wide string is UTF-16 or UTF-32 depending on the size of wchar_t. When the narrow string converter given to stringToWstring is null, the encoding of narrow string parameter is UTF-8.

When using the built-in converters (through null parameters), byte sequences that are illegal, such as 0xF4908080, result in a UTFConversionException. For other errors, the ConversionFlags parameter determines how rigorously the functions check for errors. When set to lenientConversion (the default), the functions tolerate isolated surrogates and irregular sequences, and substitute the UTF-32 replacement character 0x0000FFFD for character values above 0x10FFFF. When set to strictConversion, the functions do not tolerate such errors and throw a UTFConversionException instead:

C++
enum ConversionError { partialCharacter, badEncoding };

class UTFConversionException : public Exception {
public:
    UTFConversionException(const char* file, int line, ConversionError r);

    ConversionError conversionError() const;
    // ...
};

The conversionError member function returns the reason for the failure:

  • partialCharacter
    The UTF-8 source string contains a trailing incomplete UTF-8 byte sequence.
  • badEncoding
    The UTF-8 source string contains a byte sequence that is not a valid UTF-8 encoded character, or the Unicode source string contains a bit pattern that does not represent a valid Unicode character. 

See Also