Tatsuo Ishii wrote:
I have looked into my Linux box and found this in /usr/share/i18n/charmaps/BIG5.gz: |
% Chinese charmap for BIG5 (CP950)
% version: 0.92
% Contact: Tung-Han Hsieh <thhsieh (AT) linux (DOT) org.tw
% Yuan-Chung Cheng <platin (AT) ms31 (DOT) hinet.net
% Distribution and use is free, even for comercial purpose.
% This charmap is converted from:
There "my" characters are in.
That's a Microsoft's definition, not a standard. I think there should be a
reason why the Unicode org. does not use it.
Ok, I do not know the reason. But since also the glibc uses it, couldn't you use it too?
I believe the glibc delveloper have thought about this a lot. And they came to the
conclusion to use this definition. Why not postgresql?
Don't you agree that it is strange that I can (for EUC_TW) copy "to" file without error |
but I can not copy "from" file without error?
I'm not quite sure what you are saying. Are you complaining that (for
example) 0xe7a281 in UTF-8 does not convert to EUC_TW?
Yes exactly, since this value comes from a "copy to" with PGCLIENTENCODING=EUC_TW
BTW, what do you think about below? |
FYI, CNS 11643-1993 is the standard character set and EUC_TW is the
one of the encodings. That means your problem below will disappear.
WARNING: copy: line 2, LocalToUtf: could not convert (0x8ea3cfd0) EUC_TW to UTF-8. Ignored |
WARNING: copy: line 3, LocalToUtf: could not convert (0x8ea3c4ce) EUC_TW to UTF-8. Ignored
WARNING: copy: line 4, LocalToUtf: could not convert (0x8ea3bdfe) EUC_TW to UTF-8. Ignored
Hum. These seem to be CNS 11643-1993, plane 3. Currently PostgreSQL
CNS 11643-1993, plane 0
CNS 11643-1993, plane 1
CNS 11643-1993, plane 2
CNS 11643-1993, plane 15
Would you like to have support for rest of CNS 11643-1993 planes:
CNS 11643-1993, plane 3
CNS 11643-1993, plane 4
CNS 11643-1993, plane 5
CNS 11643-1993, plane 6
CNS 11643-1993, plane 7
support for upcoming 7.4?
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend