|
|
|
|
It is easy to convert between ASCII and Unicode. Sometimes the term ASCII is used interchangeably with MBCS but this is not strictly true. ASCII is the base for MBCS but it is not the same thing. Some languages, Japanese for example, may use multiple bytes to construct a single character. Windows has been able to handle this sort of thing from very early on. It also treats our more familiar ASCII as MBCS (multi-byte character system) except that each multi-byte character is always a single byte, hence the preponderance of the word ‘multibyte’ in so many of the character set specific operations. MultiByteToWideCharThis function will convert a multi-byte string to a wide character string. int MultiByteToWideChar( The arguments CodePage, dwFlags, lpMultiByteStr and cbMultiByte are input values that describe the incoming data. The converted string goes into the buffer pointed to by lpWideCharStr. The size of the buffer is supplied by cchWideChar. If cchWideChar is 0 then the returned value is how much buffer space is required in order to contain the converted string. If cchWideChar is non-zero then the returned value is how many wide characters were written into the destination buffer. WideCharToMultiByteThis function will convert a wide character string into a multi-byte string. int WideCharToMultiByte( The arguments CodePage, dwFlags, lpWideCharStr and cchWideChar are input values that describe how the incoming wide character data is to be translated. lpDefaultChar points to the character to use if there is no conversion from source to destination. *lpUsedDefaultChar will get set if it was necessary to use the default character. The converted string goes into the buffer pointed to by lpMultiByteStr. The size of the buffer is supplied by cbMultiByte. If cbMultiByte is 0 then the returned value is how much buffer space is required in order to contain the converted string. If cbMultiByte is non-zero then the returned value is how many bytes were written into the destination buffer. USES_CONVERSIONThere’s an old standby that everyone familiar with ATL knows about for converting character types. Microsoft has provided a number of macros that ease the programmer burden involved in converting between Unicode and MBCS. USES_CONVERSION is an initializer macro that enables the use of these macros.
The placement of braces around USES_CONVERSION is important because USES_CONVERSION allocates memory for each conversion operation. The closing brace triggers the release of memory. Here is an example of how to use the macros. LPWSTR ConvertToUnicode( LPCSTR lpsz ) Converting Between UTF-8 and UNICODEUTF-8 is a type of multi-byte character system wherein a character can be represented by from 1 to 4 bytes of data. Being a variation on MBCS, it is not surprising that the same means of going between MBCS and Unicode also works for UTF-8. All that is needed in order to make the conversion work properly is to specify that the code page is CP_UTF8. MultiByteToWideCharThis function will convert a UTF-8 string to a wide character string. int MultiByteToWideChar( WideCharToMultiByteThis function will convert a wide character string into a UTF-8 string. int WideCharToMultiByte(
|
|
|