Re: D3 Accented Characters and Code Pages -
09-14-2006
, 03:03 AM
Hello Nick,
Nice to see You.
LCMapString is more useful. For convert MB string to WideChar (Unicode)
MultiByteToWideChar && WideCharToMultiByte API or StrConv(Str,
vbFromUnicode) in VB
LCMapString
The LCMapString function either maps an input character string to
another using a specified transformation or generates a sort key for
the input string.
int LCMapString(
LCID Locale, // locale identifier
DWORD dwMapFlags, // mapping transformation type
LPCTSTR lpSrcStr, // source string
int cchSrc, // number of characters in source string
LPTSTR lpDestStr, // destination buffer
int cchDest // size of destination buffer
);
Parameters
Locale
[in] Specifies a locale identifier. The locale provides a context for
the string mapping or sort key generation. An application can use the
MAKELCID macro to create a locale identifier.
dwMapFlags
[in] Specifies the type of transformation used during string mapping or
the type of sort key generated. An application can specify one or more
of the following options. Restrictions are noted following the table.
Option Meaning
LCMAP_BYTEREV Windows NT/2000/XP: Use byte reversal. For example, if
you pass in 0x3450 0x4822 the result is 0x5034 0x2248.
LCMAP_FULLWIDTH Uses wide characters (where applicable).
LCMAP_HALFWIDTH Uses narrow characters (where applicable).
LCMAP_HIRAGANA Hiragana.
LCMAP_KATAKANA Katakana.
LCMAP_LINGUISTIC_CASING Uses linguistic rules for casing, rather than
file system rules (the default). Valid with LCMAP_LOWERCASE or
LCMAP_UPPERCASE only.
LCMAP_LOWERCASE Uses lowercase.
LCMAP_SIMPLIFIED_CHINESE Windows NT 4.0 and later: Maps traditional
Chinese characters to simplified Chinese characters.
LCMAP_SORTKEY Produces a normalized wide character-sort key. For more
information, see lpDestStr and Remarks.
LCMAP_TRADITIONAL_CHINESE Windows NT 4.0 and later: Maps simplified
Chinese characters to traditional Chinese characters.
LCMAP_UPPERCASE Uses uppercase.
The following flags are used only with the LCMAP_SORTKEY flag. Flag
Meaning
NORM_IGNORECASE Ignores case.
NORM_IGNOREKANATYPE Does not differentiate between Hiragana and
Katakana characters. Corresponding Hiragana and Katakana will compare
as equal.
NORM_IGNORENONSPACE Ignores nonspacing. This flag also removes Japanese
accent characters.
NORM_IGNORESYMBOLS Ignores symbols.
NORM_IGNOREWIDTH Does not differentiate between a single-byte character
and the same character as a double-byte character.
SORT_STRINGSORT Treats punctuation the same as symbols.
If the LCMAP_SORTKEY flag is not specified, the LCMapString function
performs string mapping. In this case the following restrictions apply:
LCMAP_LOWERCASE and LCMAP_UPPERCASE are mutually exclusive.
LCMAP_HIRAGANA and LCMAP_KATAKANA are mutually exclusive.
LCMAP_HALFWIDTH and LCMAP_FULLWIDTH are mutually exclusive.
LCMAP_TRADITIONAL_CHINESE and LCMAP_SIMPLIFIED_CHINESE are mutually
exclusive.
LCMAP_LOWERCASE and LCMAP_UPPERCASE are not valid in combination with
any of these flags: LCMAP_HIRAGANA, LCMAP_KATAKANA, LCMAP_HALFWIDTH,
LCMAP_FULLWIDTH.
When the LCMAP_SORTKEY flag is specified, the LCMapString function
generates a sort key. In this case the following restriction applies:
LCMAP_SORTKEY is mutually exclusive with all other LCMAP_* flags, with
the sole exception of LCMAP_BYTEREV.
lpSrcStr
[in] Pointer to a source string that the function maps or uses for sort
key generation.
cchSrc
[in] Specifies the number of TCHARs in the string pointed to by the
lpSrcStr parameter.
This count can include the NULL terminator, or not include it. If the
NULL terminator is included in the character count, it does not greatly
affect the mapping behavior. That is because NULL is considered to be
unsortable, and always maps to itself.
A cchSrc value of -1 specifies that the string pointed to by lpSrcStr
is null-terminated. If this is the case, and LCMapString is being used
in its string-mapping mode, the function calculates the string's length
itself, and null-terminates the mapped string stored into *lpDestStr.
lpDestStr
[out] Pointer to a buffer that receives the mapped string or sort key.
If LCMAP_SORTKEY is specified, LCMapString stores a sort key into the
buffer. The sort key is stored as an array of byte values in the
following format:
[all Unicode sort weights] 0x01 [all Diacritic weights] 0x01 [all Case
weights] 0x01 [all Special weights] 0x00
Note that the sort key is null-terminated. This is true regardless of
the value of cchSrc. Also note that, even if some of the sort weights
are absent from the sort key, due to the presence of one or more ignore
flags in dwMapFlags, the 0x01 separators and the 0x00 terminator are
still present.
cchDest
[in] Specifies the size, in TCHARs, of the buffer pointed to by
lpDestStr.
If the function is being used for string mapping, the size is a
character count. If space for a NULL terminator is included in cchSrc,
then cchDest must also include space for a NULL terminator.
If the function is being used to generate a sort key, the size is a
byte count. This byte count must include space for the sort key 0x00
terminator.
If cchDest is zero, the function's return value is the number of
characters, or bytes if LCMAP_SORTKEY is specified, required to hold
the mapped string or sort key. In this case, the buffer pointed to by
lpDestStr is not used.
Return Values
If the function succeeds, and the value of cchDest is nonzero, the
return value is the number of characters, or bytes if LCMAP_SORTKEY is
specified, written to the buffer. This count includes room for a NULL
terminator.
If the function succeeds, and the value of cchDest is zero, the return
value is the size of the buffer in characters, or bytes if
LCMAP_SORTKEY is specified, required to receive the translated string
or sort key. This size includes room for a NULL terminator.
If the function fails, the return value is 0. To get extended error
information, call GetLastError. GetLastError may return one of the
following error codes:
ERROR_INSUFFICIENT_BUFFER
ERROR_INVALID_FLAGS
ERROR_INVALID_PARAMETER
Remarks
The mapped string is null terminated if the source string is null
terminated.
The ANSI version of this function maps strings to and from Unicode
based on the specified LCID's default ANSI code page.
For the ANSI version of this function, the LCMAP_UPPERCASE flag
produces the same result as CharUpper in the locale. Likewise, the
LCMAP_LOWERCASE flag produces the same result as CharLower. This
function always maps a single character to a single character. Note
that, in these cases, the function maps the lowercase I to the
uppercase I, even when the current language is Turkish or Azeri. To
change this for Turkish and Azeri, specify LCMAP_LINGUISTIC_CASING.
If LCMAP_UPPERCASE or LCMAP_LOWERCASE is set and if LCMAP_SORTKEY is
not set, the lpSrcStr and lpDestStr pointers can be the same.
Otherwise, the lpSrcStr and lpDestStr pointers must not be the same. If
they are the same, the function fails, and GetLastError returns
ERROR_INVALID_PARAMETER.
If the LCMAP_HIRAGANA flag is specified to map Katakana characters to
Hiragana characters, and LCMAP_FULLWIDTH is not specified, the function
only maps full-width characters to Hiragana. In this case, any
half-width Katakana characters are placed as-is in the output string,
with no mapping to Hiragana. An application must specify
LCMAP_FULLWIDTH if it wants half-width Katakana characters mapped to
Hiragana.
Even if the Unicode version of this function is called, the output
string is only in WCHAR or CHAR format if the string mapping mode of
LCMapString is used. If the sort key generation mode is used, specified
by LCMAP_SORTKEY, the output is an array of byte values. To compare
sort keys, use a byte-by-byte comparison.
An application can call the function with the NORM_IGNORENONSPACE and
NORM_IGNORESYMBOLS flags set, and all other options flags cleared, in
order to simply strip characters from the input string. If this is done
with an input string that is not null-terminated, it is possible for
LCMapString to return an empty string and not return an error.
The LCMapString function ignores the Arabic Kashida. If an application
calls the function to create a sort key for a string containing an
Arabic Kashida, there will be no sort key value for the Kashida.
The function treats the hyphen and apostrophe a bit differently than
other punctuation symbols, so that words like coop and co-op stay
together in a list. All punctuation symbols other than the hyphen and
apostrophe sort before the alphanumeric characters. An application can
change this behavior by setting the SORT_STRINGSORT flag. See
CompareString for a more detailed discussion of this issue.
When LCMapString is used to generate a sort key, by setting the
LCMAP_SORTKEY flag, the sort key stored into *lpDestStr may contain an
odd number of bytes. The LCMAP_BYTEREV option only reverses an even
number of bytes. If both options are chosen, the last (odd-positioned)
byte in the sort key is not reversed. If the terminating 0x00 byte is
an odd-positioned byte, then it remains the last byte in the sort key.
If the terminating 0x00 byte is an even-positioned byte, it exchanges
positions with the byte that precedes it.
When LCMAP_SORTKEY flag is specified, the function generates a sort key
that, when used in strcmp, produces the same order as when the original
string is used in CompareString. When LCMAP_SORTKEY flag is specified,
the output string is a string, but the character values are not
meaningful display values.
Windows 2000/XP: The ANSI version of this function will fail if it is
used with a Unicode-only locale. See Table of Language Identifiers.
Windows 95/98/Me: LCMapStringW is supported by the Microsoft Layer for
Unicode. To use this, you must add certain files to your application,
as outlined in Microsoft Layer for Unicode on Windows 95/98/Me Systems.
Requirements
Windows NT/2000/XP: Included in Windows NT 3.1 and later.
Windows 95/98/Me: Included in Windows 95 and later.
Header: Declared in Winnls.h; include Windows.h.
Library: Use Kernel32.lib.
Unicode: Implemented as Unicode and ANSI versions on Windows
NT/2000/XP. Also supported by Microsoft Layer for Unicode.
See Also |