String Conversion Convenience Functions
The IceUtil
namespace contains two helper functions that allow you to convert between narrow strings and UTF-8 encoded strings using a string converter:
std::string nativeToUTF8(const std::string&, const StringConverterPtr&); std::string UTF8ToNative(const std::string&, const StringConverterPtr&);
No conversion is performed when the provided string converter is null.
The IceUtil
namespace contains two additional helper functions that allow you to convert between wide strings and narrow strings using the provided string converters:
enum ConversionFlags { strictConversion, lenientConversion }; std::string wstringToString(const std::wstring&, const StringConverterPtr& = 0, const WstringConverterPtr& = 0, ConversionFlags = lenientConversion); std::wstring stringToWstring(const std::string&, const StringConverterPtr& = 0, const WstringConverterPtr& = 0, ConversionFlags = lenientConversion);
When the narrow string converter given to wstringToString
is null, the encoding of the returned narrow string is UTF-8. When the wide string converter given to wstringToString
is null, the encoding of wide string parameter is UTF-16 or UTF-32, depending on the size of wchar_t
.
Likewise for stringToWstring
, when the wide string converter is null, the encoding of the returned wide string is UTF-16 or UTF-32 depending on the size of wchar_t. When the narrow string converter given to stringToWstring
is null, the encoding of narrow string parameter is UTF-8.
When using the built-in converters (through null parameters), byte sequences that are illegal, such as 0xF4908080
, result in a UTFConversionException
. For other errors, the ConversionFlags
parameter determines how rigorously the functions check for errors. When set to lenientConversion
(the default), the functions tolerate isolated surrogates and irregular sequences, and substitute the UTF-32 replacement character 0x0000FFFD
for character values above 0x10FFFF
. When set to strictConversion
, the functions do not tolerate such errors and throw a UTFConversionException
instead:
enum ConversionError { partialCharacter, badEncoding }; class UTFConversionException : public Exception { public: UTFConversionException(const char* file, int line, ConversionError r); ConversionError conversionError() const; // ... };
The conversionError
member function returns the reason for the failure:
partialCharacter
The UTF-8 source string contains a trailing incomplete UTF-8 byte sequence.
badEncoding
The UTF-8 source string contains a byte sequence that is not a valid UTF-8 encoded character, or the Unicode source string contains a bit pattern that does not represent a valid Unicode character.