dbTalk Databases Forums  

[BUGS] BUG #2685: Wrong charset of server messages on client [PATCH]

mailing.database.pgsql-bugs mailing.database.pgsql-bugs


Discuss [BUGS] BUG #2685: Wrong charset of server messages on client [PATCH] in the mailing.database.pgsql-bugs forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Sergiy Vyshnevetskiy
 
Posts: n/a

Default [BUGS] BUG #2685: Wrong charset of server messages on client [PATCH] - 10-10-2006 , 09:56 AM







The following bug has been logged online:

Bug reference: 2685
Logged by: Sergiy Vyshnevetskiy
Email address: serg (AT) vostok (DOT) net
PostgreSQL version: 8.1
Operating system: FreeBSD-6 stable
Description: Wrong charset of server messages on client [PATCH]
Details:

DESCRIPTION:

PostgreSQL backend uses gettext() to localize its messages. The charset of
localized messages is determined by LC_CTYPE by default.

Then the message is processed through sprintf-like mechanism (with database
data as possible arguments) and fed to send_message_to_frontend(), that
converts data from _database_charset_(!) to client charset.

If LC_CTYPE is not the same as (at least binary compatible to) database
charset, then client gets garbage characters in server messages. If database
charset is UTF-8, then cluster may recusively generate "invalid byte
sequence for encoding" errors till it fills up
errordata[ERRORDATA_STACK_SIZE], then it panics.

SOLUTION:

Convert server messages to database charset.

PATCH:

--- src/backend/utils/mb/mbutils.c.o0 Tue Oct 10 11:51:13 2006

+++ src/backend/utils/mb/mbutils.c Tue Oct 10 11:49:22 2006

@@ -615,6 +615,7 @@

DatabaseEncoding = &pg_enc2name_tbl[encoding];

Assert(DatabaseEncoding->encoding == encoding);

#ifdef USE_ICU

+
bind_textdomain_codeset("postgres",(&pg_enc2ianana me_tbl[encoding])->name);

ucnv_setDefaultName((&pg_enc2iananame_tbl[encoding])->name);

#endif

}




This, however, uncovers another bug: PostgreSQL dumps the messages into
stderr/syslog as-is, without converting database data from database charset
to charset from LC_MESSAGES. After this patch it will do so with message
text too. The fix should be trivial - set up a conversion from database
charset to server charset. I will post a patch for it later.

NOTE:

I used pg_enc2iananame_tbl instead of pg_enc2name_tbl, because gettext
doesn't accept many

Possible TODO:
Change PostgreSQL charset names to IANA-standard names.

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Reply With Quote
  #2  
Old   
Tom Lane
 
Posts: n/a

Default Re: [BUGS] BUG #2685: Wrong charset of server messages on client [PATCH] - 10-10-2006 , 10:37 AM






"Sergiy Vyshnevetskiy" <serg (AT) vostok (DOT) net> writes:
Quote:
Convert server messages to database charset.
This has been discussed before:
http://archives.postgresql.org/pgsql...8/msg00245.php

The magic pg_enc2iananame_tbl[] you reference in your patch does not exist,
and if it did exist it wouldn't work on all platforms, since encoding
names aren't sufficiently well standardized :-(

Quote:
This, however, uncovers another bug: PostgreSQL dumps the messages into
stderr/syslog as-is, without converting database data from database charset
to charset from LC_MESSAGES.
I'm quite unconvinced that that's a bug. If we tried to do a conversion
here, it would be trivial to set up denials of service for logging ---
just include a character in a comment in your SQL command that cannot be
converted to the LC_MESSAGES character set.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings


Reply With Quote
  #3  
Old   
Tom Lane
 
Posts: n/a

Default Re: [BUGS] BUG #2685: Wrong charset of server messages on client [PATCH] - 10-10-2006 , 11:19 AM



Sergiy Vyshnevetskiy <serg (AT) vostok (DOT) net> writes:
Quote:
It's not magic, it's from ICU patch. Want me to send you a copy?
You're missing my point, which is that non-ICU locale support doesn't
necessarily recognize the same encoding names. We would have done this
years ago if we had a solution to that problem.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.