. Home Feedback Contents Search

Converting ASCII and Unicode 

It is easy to convert between ASCII and Unicode. Sometimes the term ASCII is used interchangeably with MBCS but this is not strictly true. ASCII is the base for MBCS but it is not the same thing. Some languages, Japanese for example, may use multiple bytes to construct a single character. Windows has been able to handle this sort of thing from very early on. It also treats our more familiar ASCII as MBCS (multi-byte character system) except that each multi-byte character is always a single byte, hence the preponderance of the word ‘multibyte’ in so many of the character set specific operations.

MultiByteToWideChar

This function will convert a multi-byte string to a wide character string.

int MultiByteToWideChar(
  UINT CodePage,         // code page
  DWORD dwFlags,         // character-type options
  LPCSTR lpMultiByteStr, // string to map
  int cbMultiByte,       // number of bytes in string
  LPWSTR lpWideCharStr,  // wide-character buffer
  int cchWideChar        // size of buffer
);

The arguments CodePage, dwFlags, lpMultiByteStr and cbMultiByte are input values that describe the incoming data. The converted string goes into the buffer pointed to by lpWideCharStr. The size of the buffer is supplied by cchWideChar.

If cchWideChar is 0 then the returned value is how much buffer space is required in order to contain the converted string. If cchWideChar is non-zero then the returned value is how many wide characters were written into the destination buffer.

WideCharToMultiByte

This function will convert a wide character string into a multi-byte string.

int WideCharToMultiByte(
  UINT CodePage,            // code page
  DWORD dwFlags,            // performance and mapping flags
  LPCWSTR lpWideCharStr,    // wide-character string
  int cchWideChar,          // number of chars in string.
  LPSTR lpMultiByteStr,     // buffer for new string
  int cbMultiByte,          // size of buffer
  LPCSTR lpDefaultChar,     // default for unmappable chars
  LPBOOL lpUsedDefaultChar  // set when default char used
);

The arguments CodePage, dwFlags, lpWideCharStr and cchWideChar are input values that describe how the incoming wide character data is to be translated. lpDefaultChar points to the character to use if there is no conversion from source to destination. *lpUsedDefaultChar will get set if it was necessary to use the default character. The converted string goes into the buffer pointed to by lpMultiByteStr. The size of the buffer is supplied by cbMultiByte.

If cbMultiByte is 0 then the returned value is how much buffer space is required in order to contain the converted string. If cbMultiByte is non-zero then the returned value is how many bytes were written into the destination buffer.

USES_CONVERSION

There’s an old standby that everyone familiar with ATL knows about for converting character types. Microsoft has provided a number of macros that ease the programmer burden involved in converting between Unicode and MBCS. USES_CONVERSION is an initializer macro that enables the use of these macros.

  • A2CW – converts from LPCSTR to LPCWSTR
  • A2W – converts from LPSTR to LPWSTR
  • W2CA – converts from LPCWSTR to LPCSTR
  • W2A – converts from LPWSTR to LPSTR
  • T2W – Converts from LPTSTR to LPWSTR
  • W2T – Converts from LPWSTR to LPTSTR

 The placement of braces around USES_CONVERSION is important because USES_CONVERSION allocates memory for each conversion operation. The closing brace triggers the release of memory.

Here is an example of how to use the macros.

LPWSTR ConvertToUnicode( LPCSTR lpsz )
{
    LPWSTR lpwstrReturn = 0;
    USES_CONVERSION;
 
    A2W( lpsz, lpwstrReturn );
    return lpwstrReturn;
}

Converting Between UTF-8 and UNICODE

UTF-8 is a type of multi-byte character system wherein a character can be represented by from 1 to 4 bytes of data. Being a variation on MBCS, it is not surprising that the same means of going between MBCS and Unicode also works for UTF-8. All that is needed in order to make the conversion work properly is to specify that the code page is CP_UTF8.

MultiByteToWideChar

This function will convert a UTF-8 string to a wide character string.

int MultiByteToWideChar(
  UINT CP_UTF8,          // code page
  DWORD dwFlags,         // character-type options
  LPCSTR lpMultiByteStr, // string to map
  int cbMultiByte,       // number of bytes in string
  LPWSTR lpWideCharStr,  // wide-character buffer
  int cchWideChar        // size of buffer
);

WideCharToMultiByte

This function will convert a wide character string into a UTF-8 string.

int WideCharToMultiByte(
  UINT CP_UTF8,                // code page
  DWORD WC_DEFAULTCHAR,        // performance and mapping flags
  LPCWSTR lpWideCharString,    // wide-character string
  int cchWideChar,             // number of chars in string.
  LPSTR lpMultiByteStr,        // buffer for new string
  int cbMultiByte,             // size of buffer
  LPCSTR lpDefaultChar,        // default for unmappable chars
  LPBOOL lpUsedDefaultChar     // set when default char used
);

Back Up

Hit Counter