dbTalk Databases Forums  

Character encoding in database dumps

comp.databases.postgresql.novice comp.databases.postgresql.novice


Discuss Character encoding in database dumps in the comp.databases.postgresql.novice forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Lynna Landstreet
 
Posts: n/a

Default Character encoding in database dumps - 06-09-2004 , 01:11 PM






Hi there,

My database was created with Unicode encoding, as a fair number of the
artists and exhibitions have French names, and thus include accented
characters. I thought I'd solved all the various problems this generates,
but have just discovered one new one.

When I do a dump of the database to back it up (because I'm paranoid and
don't want my web host to be my only source of security on this matter), the
accented characters don't come through correctly. They all display correctly
in phpPgAdmin, and in the PHP pages I've made to search and display the data
- it's just in dumps that they're garbled. I presume this means the Unicode
encoding is somehow being lost and the data read as standard ASCII.

I'm currently using phpPgAdmin to export the data, and it gives me a choice
of downloading it as a .sql file or displaying it in the browser window, but
either way, the special characters are toasted. Does anyone have any idea
how I can do a database dump that keeps the character encoding intact?

Thanks,

Lynna

--
Resource Centre Database Coordinator
Gallery 44
www.gallery44.org


---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo (AT) postgresql (DOT) org


Reply With Quote
  #2  
Old   
Aarni Ruuhimäki
 
Posts: n/a

Default Re: Character encoding in database dumps - 06-10-2004 , 07:19 AM






Hi Lynna,

Have you tried restoring your data from the dumps ?

I use Latin1 encoding with my dbs. All russian characters are a mess in the
dump files and in the psql terminal as well. However, they restore, display
and distribute ok, even into win-platform. Out of curiosity, I renamed one
dump.file to dump.html and opened it with a browser and set the browser
encoding to cyrillic. Characters display ok.

So could this be more an issue with the client application's ability to use
proper encoding for displaying the characters right ? Dunno for sure, just my
thought out of some experience with cyrillic and Pg. Encodings are a bit of
jungle, it seems.

BR,

Aarni

On Wednesday 09 June 2004 21:11, you wrote:
Quote:
Hi there,

My database was created with Unicode encoding, as a fair number of the
artists and exhibitions have French names, and thus include accented
characters. I thought I'd solved all the various problems this generates,
but have just discovered one new one.

When I do a dump of the database to back it up (because I'm paranoid and
don't want my web host to be my only source of security on this matter),
the accented characters don't come through correctly. They all display
correctly in phpPgAdmin, and in the PHP pages I've made to search and
display the data - it's just in dumps that they're garbled. I presume this
means the Unicode encoding is somehow being lost and the data read as
standard ASCII.

I'm currently using phpPgAdmin to export the data, and it gives me a choice
of downloading it as a .sql file or displaying it in the browser window,
but either way, the special characters are toasted. Does anyone have any
idea how I can do a database dump that keeps the character encoding intact?

Thanks,

Lynna
--
-------------------------------------------------
Aarni Ruuhimäki
-------------------------------------------------
This is a bugfree broadcast to you from a linux system.

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend



Reply With Quote
  #3  
Old   
Markus Bertheau
 
Posts: n/a

Default Re: Character encoding in database dumps - 06-10-2004 , 12:51 PM



Š’ ДрГ, 09.06.2004, в 20:11, Lynna Landstreet ŠæŠøŃˆŠµŃ‚:
Quote:
Hi there,

My database was created with Unicode encoding, as a fair number of the
artists and exhibitions have French names, and thus include accented
characters. I thought I'd solved all the various problems this generates,
but have just discovered one new one.

When I do a dump of the database to back it up (because I'm paranoid and
don't want my web host to be my only source of security on this matter), the
accented characters don't come through correctly. They all display correctly
in phpPgAdmin, and in the PHP pages I've made to search and display the data
- it's just in dumps that they're garbled. I presume this means the Unicode
encoding is somehow being lost and the data read as standard ASCII.

I'm currently using phpPgAdmin to export the data, and it gives me a choice
of downloading it as a .sql file or displaying it in the browser window, but
either way, the special characters are toasted. Does anyone have any idea
how I can do a database dump that keeps the character encoding intact?
How do you determine that the character encoding is wrong? Maybe
whatever it is you use to look at the dump just doesn't interpret the
data as UTF-8.

--
Markus Bertheau <twanger (AT) bluetwanger (DOT) de>


---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html



Reply With Quote
  #4  
Old   
Lynna Landstreet
 
Posts: n/a

Default Re: Character encoding in database dumps - 06-11-2004 , 02:06 PM



on 6/10/04 8:19 AM, Aarni Ruuhimäki at aarni (AT) kymi (DOT) com wrote:

Quote:
Have you tried restoring your data from the dumps ?
No - I wanted to, but my web host's control panel appears to be down so I
can't create a new database to restore it into, and I don't want to risk
restoring it into the existing database in case it screws everything up. And
the dump file was still messed up when I tried viewing it in my browser, as
you suggest.

However, I found the answer. I should have actually figured it out sooner,
because I originally had the same problem in reverse when trying to get data
*into* the database: FTP programs tend to send text files in ASCII format,
which doesn't support Unicode. When I uploaded the text files to \copy my
data into the database from, I had to make sure the FTP program used binary
mode, and in order to make it do that, I had to change the file extension to
something my internet preferences did not brand as ASCII automatically.

Turned out to be what was happening here as well. I had to skip using
phpPgAdmin's Export function and just use pg_dump via a shell connection,
and name the dump something that did not end in .txt or .sql or anything
like that, and then download it as binary, and *then* open it in BBEdit with
the "Read as" option set to UTF-8. Once I did that, all my special
characters were OK.

Major headache, though. I'm hoping as things evolve, Unicode support will be
built into more programs including those where it's currently not thought
necessary like FTP programs. People need to learn that ASCII is *not*
necessarily the correct format for all text files...


Lynna

--
Resource Centre Database Coordinator
Gallery 44
www.gallery44.org


---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html



Reply With Quote
  #5  
Old   
Bruno Wolff III
 
Posts: n/a

Default Re: Character encoding in database dumps - 06-11-2004 , 02:33 PM



On Fri, Jun 11, 2004 at 15:06:08 -0400,
Lynna Landstreet <lynna (AT) gallery44 (DOT) org> wrote:
Quote:
Major headache, though. I'm hoping as things evolve, Unicode support will be
built into more programs including those where it's currently not thought
necessary like FTP programs. People need to learn that ASCII is *not*
necessarily the correct format for all text files...
The ascii mode in ftp is for files that need to be converted to or from
using LF to end lines from or to using CRLF to end lines. Normally you want
to do binary transfers.

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo (AT) postgresql (DOT) org)



Reply With Quote
  #6  
Old   
M. Bastin
 
Posts: n/a

Default Re: Character encoding in database dumps - 06-11-2004 , 05:32 PM



Quote:
Does anyone have any idea
how I can do a database dump that keeps the character encoding intact?
If you can access your database directly with a TCP/IP PostgreSQL
client then you can use Eduphant which supports copy to stdout/stdin.
This will allow you to copy directly to your local computer in 1 step.

<http://aliacta.com/download>

Cheers,

Marc

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match



Reply With Quote
  #7  
Old   
ghaverla@freenet.edmonton.ab.ca
 
Posts: n/a

Default Re: Character encoding in database dumps - 06-11-2004 , 07:12 PM



On Sat, 12 Jun 2004, M. Bastin wrote:

Quote:
Does anyone have any idea
how I can do a database dump that keeps the character encoding intact?
[ Sorry, idle mind. :-) ]

Reading a bunch of these messages over the last while, it appears
that the database will only support a single character encoding
internally/natively. Maybe I am wrong, and maybe this will change
in the future.

But, if you need to interact in multiple character sets now, this
seems to be something which you are now doing with a front-end of
some kind. Perhaps the thing to do, is to add a field to your
tables (or make them into 2 column arrays?) involving character
data, where this new field (or 0'th array element?) stores the
character set used when the data was originally input. If a
front-end makes a query involving character data, it gets back the
character set involved and the character data. Then the front-end
has to deal with translating from one character set to another.

Just my $0.02 (CDN)
Gord



---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Reply With Quote
  #8  
Old   
Lynna Landstreet
 
Posts: n/a

Default Re: Character encoding in database dumps - 06-17-2004 , 12:32 PM



on 6/11/04 3:33 PM, Bruno Wolff III at bruno (AT) wolff (DOT) to wrote:

Quote:
Major headache, though. I'm hoping as things evolve, Unicode support will be
built into more programs including those where it's currently not thought
necessary like FTP programs. People need to learn that ASCII is *not*
necessarily the correct format for all text files...

The ascii mode in ftp is for files that need to be converted to or from
using LF to end lines from or to using CRLF to end lines. Normally you want
to do binary transfers.
I'm used to having to use ASCII on anything remotely CGI-related because
normally if I do binary transfers for any sort of CGI file it breaks the
script. But maybe that is the LF issue - I know MacOS, Windows and UNIX all
use different sorts of line returns and since CGIs are usually running on
UNIX systems, Mac or Windows line returns will mess everything up. I don't
know if PHP scripts are as sensitive...

But the bigger problem is that it seems like these days most FTP programs -
as well as browsers and text editors that have some FTP functionality -
automatically set the transfer mode based on the file extension (and text
editors like BBEdit *only* have ASCII transfer mode, presumably on the basis
that, hey, it's text, and text *always* means ASCII, right?). Some will
allow you to override that, some won't. That's why I had to change the
extension on my text data files and database dumps in order for a binary
transfer to even be possible.

It's really annoying - I hate software that thinks it knows what I'm trying
to do better than I do.


Lynna

--
Resource Centre Database Coordinator
Gallery 44
www.gallery44.org


---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.