dbTalk Databases Forums  

The Encoding problem

comp.databases.postgresql comp.databases.postgresql


Discuss The Encoding problem in the comp.databases.postgresql forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Hadanite Marasek
 
Posts: n/a

Default The Encoding problem - 12-06-2009 , 06:25 PM






Hello,

today I ran into the "encoding problem". Migrating from Postgres 8 to
Postgres 8.3 on a new Debian on another server, psql refused to import
my LATIN1 databases, because the server has UTF8 set as locale.

As I wanted to migrate to UTF8 with that specific database anyway, I
edited my dump and went on with it, of course not after wasting a lot of
time on research (because it was not mentioned, or at least not
highlighted in the migration guide).

So before I put the PostgreSQL people on the "black list", I'd like to
try to understand first. What is it good for, why is it necessary? I
don't see a "hard" reason for it, as certainly it hasn't been necessary
up to Postgres 8. After all, what should be the problem of a RDBMS to
handle different encoding standards in different databases, and what
should be the problem of "conflicting" with the server? If I couldn't
use UTF-8 in my application, well, I would switch the client encoding to
LATIN1, so he would need to do conversion anyway. I just don't get the
f... necessity of why it should be so inflexible, especially as it has
not been necessary before.

Even so, I would at least wish they made it clearer in their migration
guide, and provided more comfortable tools to migrate.

Maybe someone can help me to understand why the behaviour has been changed.

Reply With Quote
  #2  
Old   
Laurenz Albe
 
Posts: n/a

Default Re: The Encoding problem - 12-07-2009 , 04:26 AM






Hadanite Marasek wrote:
Quote:
today I ran into the "encoding problem". Migrating from Postgres 8 to Postgres 8.3 on a new Debian on another server, psql refused
to import my LATIN1 databases, because the server has UTF8 set as locale.
What exactly do you mean by "Postgres 8"?

If you did everything correctly, that would mean that you had some
non-Latin1 characters in your Latin1 database.

Older versions of PostgreSQL have been less strict about verifying
that the data in the database are properly encoded. This is an
ongoing effort.

Quote:
As I wanted to migrate to UTF8 with that specific database anyway, I edited my dump and went on with it, of course not after
wasting a lot of time on research (because it was not mentioned, or at least not highlighted in the migration guide).

So before I put the PostgreSQL people on the "black list", I'd like to try to understand first.
Not that I am particularly afraid of your black list, but I aknowledge
the willingness to try to understand. It is deplorable that your
posting has such an annoyed subtext.

Quote:
What is it good for, why is it necessary? I don't see a "hard" reason for it, as certainly it hasn't been
necessary up to Postgres 8.
What exactly do you mean by "it"? UTF-8? Error messages?

Quote:
After all, what should be the problem of a RDBMS to handle different encoding standards in different databases,
and what should be the problem of "conflicting" with the server? If I couldn't use UTF-8 in my application, well, I would switch
the client encoding to LATIN1, so he would need to do conversion anyway. I just don't get the f... necessity of why it should be
so inflexible, especially as it has not been necessary before.

Even so, I would at least wish they made it clearer in their migration guide, and provided more comfortable tools to migrate.
Ok, I admit that you lost me somewhere in that longer paragraph.

What behaviour of PostgreSQL are you complaining about? That it makes
sure that strings are stored in the database encoding?

Is it that you want to store random byte sequences in the database and
avoid any encoding checks? If that is the case, why don't you either
store these data as binary fields or use SQL_ASCII as database encoding,
which is particularly designed for people who want their database to act
in garbage in - garbage out mode?

If you want your special characters to come out "right" on different
platforms with different encodings, your database has to understand
what each character means. For that you will have to put up with some
fussiness on the database's side.

Quote:
Maybe someone can help me to understand why the behaviour has been changed.
If you are referring to stricter encoding checks in recent PostgreSQL
versions, then the answer is that the old, laxer behaviour was a bug
that has been fixed. If you rely on the upward compatibility of buggy
behaviour, you're in for an unpleasant surprise every now and then.

Yours,
Laurenz Albe

Reply With Quote
  #3  
Old   
Hadanite Marasek
 
Posts: n/a

Default Re: The Encoding problem - 12-07-2009 , 01:58 PM



Laurenz Albe schrieb:
Quote:
Hadanite Marasek wrote:
today I ran into the "encoding problem". Migrating from Postgres 8
to Postgres 8.3 on a new Debian on another server, psql refused to
import my LATIN1 databases, because the server has UTF8 set as
locale.

What exactly do you mean by "Postgres 8"?
PostgreSQL 8.0.0

Quote:
If you did everything correctly, that would mean that you had some
non-Latin1 characters in your Latin1 database.
Older versions of PostgreSQL have been less strict about verifying
that the data in the database are properly encoded. This is an
ongoing effort.
That was not the cause, I'm talking about this:

http://www.ashtech.net/~syntax/blog/...ng-Issues.html

Quote:
As I wanted to migrate to UTF8 with that specific database anyway,
I edited my dump and went on with it, of course not after wasting a
lot of time on research (because it was not mentioned, or at least
not highlighted in the migration guide).

So before I put the PostgreSQL people on the "black list", I'd like
to try to understand first.

Not that I am particularly afraid of your black list, but I
aknowledge the willingness to try to understand. It is deplorable
that your posting has such an annoyed subtext.
Well, I found it deplorable to deal with that problem, and yes, it
annoyed me a LOT.
I went strictly after the manual presented in
http://www.postgresql.org/docs/8.3/i...P-DUMP-RESTORE

Do you see something like "use vim to change the create statements" in
there?

Quote:
What behaviour of PostgreSQL are you complaining about? That it makes
sure that strings are stored in the database encoding?
No, that it is not possible to create databases which have a different
encoding than the server's locale.

Reply With Quote
  #4  
Old   
Laurenz Albe
 
Posts: n/a

Default Re: The Encoding problem - 12-09-2009 , 06:37 AM



Hadanite Marasek wrote:
Quote:
What exactly do you mean by "Postgres 8"?

PostgreSQL 8.0.0
Ouch.

Quote:
If you did everything correctly, that would mean that you had some non-Latin1 characters in your Latin1 database.
Older versions of PostgreSQL have been less strict about verifying that the data in the database are properly encoded. This is an
ongoing effort.

That was not the cause, I'm talking about this:

http://www.ashtech.net/~syntax/blog/...ng-Issues.html
Ah, you want to create databases with different encodings in one
database cluster, right?

In PostgreSQL 8.3, you can do that if you create your cluster
with the C locale.

From PostgreSQL 8.4 on, you can set LC_COLLATE and LC_CTYPE per database
in CREATE DATABASE if you use template0 as database template.

Both should solve your problem without trouble.
Just pre-create the database and restore the dump into it.

Quote:
Well, I found it deplorable to deal with that problem, and yes, it annoyed me a LOT.
I went strictly after the manual presented in http://www.postgresql.org/docs/8.3/i...P-DUMP-RESTORE

Do you see something like "use vim to change the create statements" in there?
I understand your gripe in this case.
It is only mentioned in the release notes, but I guess it would have
deserved a mention in chapter 15.4.

Nobody is perfect.

But your problem should be addressed with the above, right?

Yours,
Laurenz Albe

Reply With Quote
  #5  
Old   
Hadanite Marasek
 
Posts: n/a

Default Re: The Encoding problem - 12-09-2009 , 07:19 PM



Laurenz Albe schrieb:
Quote:
Hadanite Marasek wrote:
What exactly do you mean by "Postgres 8"?
PostgreSQL 8.0.0

Ouch.
Whats Ouch?

Quote:
Ah, you want to create databases with different encodings in one
database cluster, right?

In PostgreSQL 8.3, you can do that if you create your cluster
with the C locale.
Yes, but maybe to give you more details: I run PostgreSQL on a server
with other services, as it serves as a storage for two internal
webapplications with little concurrent access. So I didn't see that as
an option.
In the end, I changed the dump as well as the one application that still
ran on Latin1 (wanted to do that anyway).

Quote:
I understand your gripe in this case.
It is only mentioned in the release notes, but I guess it would have
deserved a mention in chapter 15.4.

Nobody is perfect.

But your problem should be addressed with the above, right?

Yours,
Laurenz Albe
Well, I solved it the other way, but maybe it helps someone else. In the
end, I don't want to miss UTF-8 anymore.

Reply With Quote
  #6  
Old   
Laurenz Albe
 
Posts: n/a

Default Re: The Encoding problem - 12-10-2009 , 04:41 AM



Hadanite Marasek wrote:
Quote:
PostgreSQL 8.0.0

Ouch.

Whats Ouch?
"Ouch" meaning it hurts to hear that somebody runs 8.0.0 when 8.0.23
is in the making. You don't care much about data corruption and
similar problems, right?

Quote:
Ah, you want to create databases with different encodings in one
database cluster, right?

In PostgreSQL 8.3, you can do that if you create your cluster
with the C locale.

Yes, but maybe to give you more details: I run PostgreSQL on a server with other services, as it serves as a storage for two
internal webapplications with little concurrent access. So I didn't see that as an option.
You don't have to change anything on the server, all you have to do is
use the --locale option to initdb. Easy as that.
Sorry if I was unclear.

Quote:
Well, I solved it the other way, but maybe it helps someone else. In the end, I don't want to miss UTF-8 anymore.
Nor do I.

Yours,
Laurenz Albe

Reply With Quote
  #7  
Old   
M. Strobel
 
Posts: n/a

Default Re: The Encoding problem - 12-31-2009 , 03:03 PM



Hadanite Marasek schrieb:
Quote:
Laurenz Albe schrieb:
Hadanite Marasek wrote:
What exactly do you mean by "Postgres 8"?
PostgreSQL 8.0.0

Ouch.

Whats Ouch?

Ah, you want to create databases with different encodings in one
database cluster, right?

In PostgreSQL 8.3, you can do that if you create your cluster
with the C locale.

Yes, but maybe to give you more details: I run PostgreSQL on a server
with other services, as it serves as a storage for two internal
webapplications with little concurrent access. So I didn't see that as
an option.
In the end, I changed the dump as well as the one application that still
ran on Latin1 (wanted to do that anyway).

I understand your gripe in this case.
It is only mentioned in the release notes, but I guess it would have
deserved a mention in chapter 15.4.

Nobody is perfect.

But your problem should be addressed with the above, right?

Yours,
Laurenz Albe

Well, I solved it the other way, but maybe it helps someone else. In the
end, I don't want to miss UTF-8 anymore.
You are SO RIGHT, I am annoyed too about the problem you can not restore your dump without very detailed
investigation.

It is a big waste of time for me.

/Str.

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.