dbTalk Databases Forums  

pg_dump problem

comp.databases.postgresql.novice comp.databases.postgresql.novice


Discuss pg_dump problem in the comp.databases.postgresql.novice forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Noel Faux
 
Posts: n/a

Default pg_dump problem - 05-12-2004 , 01:58 AM






Hi all,

We are trying to do a dump of our database using the command line.
However, the dump falls over during a dump of a particular table.

blast=> \d psiblast_round
Table "public.psiblast_round"
Column | Type | Modifiers
----------------------+-------------------+-----------
id | integer | not null
region_db_comparison | integer |
round_number | smallint |
matrix | character varying |
runtime | integer |
Indexes: psiblast_round_pkey primary key btree (id),
psiblast_round_region_db_comp btree (region_db_comparison),
psiblast_round_round_number btree (round_number)
Foreign Key constraints: $1 FOREIGN KEY (region_db_comparison)
REFERENCES region_db_comparison(id) ON UPDATE NO ACTION ON DELETE NO ACTION

The matrix field holds a string which is very long, in some cases ~30K+
char long.

The erorr throw is:
pg_dump: message type 0x44 arrived from server while idle
pg_dump: dumpClasses(): SQL command failed
pg_dump: Error message from server: server closed the connection
unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_dump: The command was: FETCH 100 FROM _pg_dump_cursor

We are using version: psql (PostgreSQL) 7.3.4-RH

Is the var char the best type to hold such a long string and is this
likely to be the source of the problem? There is another table which
hold similar info, including the matrix field and we are able to dump
that without any problems.

Any help would be much appreciated. If there is any further info
required, I'll do my best to get it


Cheers
Noel





---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo (AT) postgresql (DOT) org


Reply With Quote
  #2  
Old   
M. Bastin
 
Posts: n/a

Default ORDER BY and Unicode - 05-12-2004 , 07:39 AM






There seems to be a big problem with Unicode for
which a solution might already exist. Somebody
had the following problem on another mailing
list. My suggestion is at the bottom of this
message but if another solution already exists
I'd like to hear about it.

The problem is that special characters aren't
treated right under Unicode. Here are a few
examples:

1. "UPPER('é')" doesn't work.
(That's an accented "e" in there if it doesn't
come through in your e-mail application)

The implication of this is that

SELECT ... WHERE UPPER(mycolumn) LIKE UPPER('my search string')

doesn't give the functionality you'd want.
UPPER and LOWER seem to work on ASCII only. The
Greek, French, etc. are out of luck.


2. "ORDER BY mycolumn" gives a wrong sort order.

Uppercase ASCII characters come first, then
lowercase ASCII, then accented characters...
This really isn't what a human would like to see.


I think the two examples above illustrate this
Unicode problem quite well. Is there an existing
solution? If not could we work together on
creating one, as suggested at the very bottom of
this message?

Thanks,

Marc

------------------------------------------------------

You can use the translate function to solve your problem.
<http://www.postgresql.org/docs/7.4/interactive/functions-string.html>

e.g. for the letter "a":

SELECT * FROM mytable ORDER BY
translate(textcolumn, 'àáâäÀÁÂÄa', 'AAAAAAAAA');

Then you build an index like this to speed things up:

CREATE INDEX MyIndex ON MyTable
(translate(textcolumn, 'àáâäÀÁÂÄa', 'AAAAAAAAA'));

And your select queries will also be case and
accent independent from then on e.g:

SELECT * FROM mytable WHERE
translate(textcolumn, 'àáâäÀÁÂÄa', 'AAAAAAAAA')
LIKE translate('my search string', 'àáâäÀÁÂÄa',
'AAAAAAAAA');


Since the alphabet is very long you're better off
creating your own function that does this
translation for the whole alphabet and then you'd
get something like this:

SELECT * FROM mytable WHERE
MySimpleABC(textcolumn) LIKE MySimpleABC('my
search string') ORDER BY MySimpleABC(textcolumn);

Your index would be like this:

CREATE INDEX MyIndex ON MyTable (MySimpleABC(textcolumn));

Cheers,

Marc

PS: Maybe we should work together to create the
mother of all functions that would do this for
all of Unicode? Anybody else up to this?

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match


Reply With Quote
  #3  
Old   
Stephan Szabo
 
Posts: n/a

Default Re: ORDER BY and Unicode - 05-12-2004 , 08:51 AM



On Wed, 12 May 2004, M. Bastin wrote:

Quote:
There seems to be a big problem with Unicode for
which a solution might already exist. Somebody
had the following problem on another mailing
list. My suggestion is at the bottom of this
message but if another solution already exists
I'd like to hear about it.

The problem is that special characters aren't
treated right under Unicode. Here are a few
examples:

1. "UPPER('é')" doesn't work.
IIRC, right now upper and lower only work correctly in
single byte encodings. I think when full sql collation
and character set behavior is done this problem will
go away.

Quote:
2. "ORDER BY mycolumn" gives a wrong sort order.

Uppercase ASCII characters come first, then
lowercase ASCII, then accented characters...
This really isn't what a human would like to see.
This is driven by locale, what LC_COLLATE value
was the database created with (if you don't know then
pg_controldata should give that to you)?

It sounds like the locale is "C" locale which means
sort by byte value or perhaps the locale is one that isn't
for the correct encoding.

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html



Reply With Quote
  #4  
Old   
M. Bastin
 
Posts: n/a

Default Re: ORDER BY and Unicode - 05-12-2004 , 10:30 AM



Quote:
2. "ORDER BY mycolumn" gives a wrong sort order.

Uppercase ASCII characters come first, then
lowercase ASCII, then accented characters...
This really isn't what a human would like to see.

This is driven by locale, what LC_COLLATE value
was the database created with (if you don't know then
pg_controldata should give that to you)?

It sounds like the locale is "C" locale which means
sort by byte value or perhaps the locale is one that isn't
for the correct encoding.
I've found this:
<http://www.postgresql.org/docs/7.4/interactive/charset.html#LOCALE>

"locale -a" isn't recognized on OS X. How else can I find the
possible locales?

And how can I do an initdb so that sorting on Unicode will work for
French, Greek, Japanase, etc. users of a single database?

Thanks,

Marc

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match



Reply With Quote
  #5  
Old   
Stephan Szabo
 
Posts: n/a

Default Re: ORDER BY and Unicode - 05-12-2004 , 11:33 AM



On Wed, 12 May 2004, M. Bastin wrote:

Quote:
2. "ORDER BY mycolumn" gives a wrong sort order.

Uppercase ASCII characters come first, then
lowercase ASCII, then accented characters...
This really isn't what a human would like to see.

This is driven by locale, what LC_COLLATE value
was the database created with (if you don't know then
pg_controldata should give that to you)?

It sounds like the locale is "C" locale which means
sort by byte value or perhaps the locale is one that isn't
for the correct encoding.

I've found this:
http://www.postgresql.org/docs/7.4/i...et.html#LOCALE

"locale -a" isn't recognized on OS X. How else can I find the
possible locales?
Well, you can try looking around the system, maybe /usr/share/locale
or such would have the locale set.

Quote:
And how can I do an initdb so that sorting on Unicode will work for
French, Greek, Japanase, etc. users of a single database?
AFAIK, you can't really at this time. With an appropriately crafted
locale, you could probably get reasonably close, but I've never actually
tried to work with creating one so I don't know what's involved. And, if
two languages had different rules for two characters you'd not be
supporting both.

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo (AT) postgresql (DOT) org so that your
message can get through to the mailing list cleanly



Reply With Quote
  #6  
Old   
Tom Lane
 
Posts: n/a

Default Re: ORDER BY and Unicode - 05-12-2004 , 11:38 AM



Stephan Szabo <sszabo (AT) megazone (DOT) bigpanda.com> writes:
Quote:
IIRC, right now upper and lower only work correctly in
single byte encodings. I think when full sql collation
and character set behavior is done this problem will
go away.
That unfortunately isn't looking like something that will happen soon.
What might be more practical to do in the short run is to make use of
libc's <wctype.h> routines to do upper/lower case conversion, rather
than <ctype.h>. On machines that have mbstowcs() and wcstombs() I think
this might be fairly straightforward, but I don't have the time or
background to code it up and test it. Any volunteers out there?

Note that this would amount to assuming that LC_CTYPE matches the
database encoding, but the truth of the matter is that the existing
code fails if that's not true anyway.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo (AT) postgresql (DOT) org so that your
message can get through to the mailing list cleanly



Reply With Quote
  #7  
Old   
M. Bastin
 
Posts: n/a

Default Re: ORDER BY and Unicode - 05-12-2004 , 11:49 AM



Quote:
And how can I do an initdb so that sorting on Unicode will work for
French, Greek, Japanase, etc. users of a single database?

AFAIK, you can't really at this time. With an appropriately crafted
locale, you could probably get reasonably close, but I've never actually
tried to work with creating one so I don't know what's involved. And, if
two languages had different rules for two characters you'd not be
supporting both.
Thanks Stephan! I've found my list of locales. It's a pity only one
language can be used at a time but as you say there are conflicting
rules anyway.

The docs say there is a speed penalty on using locales. Does anyone
have any idea on how severe this is? I'm wondering wether I should
use the translate() function after all because of this. It would
solve multilingual issues to a certain level and there wouldn't be a
speed penalty since the indexes would be build on the translate()
function too.

Marc

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Reply With Quote
  #8  
Old   
Tom Lane
 
Posts: n/a

Default Re: pg_dump problem - 05-12-2004 , 12:04 PM



Noel Faux <noel.faux (AT) med (DOT) monash.edu.au> writes:
Quote:
The erorr throw is:
pg_dump: message type 0x44 arrived from server while idle
pg_dump: dumpClasses(): SQL command failed
pg_dump: Error message from server: server closed the connection
unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_dump: The command was: FETCH 100 FROM _pg_dump_cursor
What shows up in the postmaster log when this happens?

If you're not keeping the postmaster's log output, now would be a good
time to start. Personally I'd suggest configuring it to log to syslog;
you'll need to enable this in postgresql.conf and possibly adjust your
syslogd configuration to determine exactly where Postgres messages go.
If that seems like too much trouble, adjust the init.d start script to
send the postmaster's stderr to some file instead of /dev/null.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Reply With Quote
  #9  
Old   
Oliver Fromme
 
Posts: n/a

Default Re: ORDER BY and Unicode - 05-12-2004 , 12:41 PM




I'd like to make a small comment on this one ...

M. Bastin wrote:
Quote:
"locale -a" isn't recognized on OS X. How else can I find the
possible locales?
On most UNIX-like systems they're under /usr/share/locale
or similar. Maybe the setlocale(3) manpage will help you
(at least on BSD it mentions the default location of the
locales).

Quote:
And how can I do an initdb so that sorting on Unicode will work for
French, Greek, Japanase, etc. users of a single database?
You can't do that in general, because different languages
can use different rules for sorting and case conversion of
the same characters.

For example, in the Turkish language there is a character
"i" without a dot, and "I" with a dot (I only have an ISO-
8859-15 set right now, so I can't demonstrate them), where
upper("i") == "I" with dot, and lower("I") == "i" without
dot. Most other languages have upper("i") == "I" and
lower("I") == "i", so it's not possible to have a locale
setting that supports both at the same time.

I guess that PostgreSQL's ORDER BY and case conversions
ignore the client's locale setting completely, and only
respect the locale of the database (when it was created).
Beware, I'm not a PostgreSQL developer, so don't take that
as an authoritative statement. ;-)

Regards
Oliver

--
Oliver Fromme, secnetix GmbH & Co KG, Oettingenstr. 2, 80538 München
Any opinions expressed in this message may be personal to the author
and may not necessarily reflect the opinions of secnetix in any way.

"The scanf() function is a large and complex beast that often does
something almost but not quite entirely unlike what you desired."
-- Chris Torek

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster



Reply With Quote
  #10  
Old   
Stephan Szabo
 
Posts: n/a

Default Re: ORDER BY and Unicode - 05-12-2004 , 04:51 PM



On Wed, 12 May 2004, M. Bastin wrote:

Quote:
And how can I do an initdb so that sorting on Unicode will work for
French, Greek, Japanase, etc. users of a single database?

AFAIK, you can't really at this time. With an appropriately crafted
locale, you could probably get reasonably close, but I've never actually
tried to work with creating one so I don't know what's involved. And, if
two languages had different rules for two characters you'd not be
supporting both.

Thanks Stephan! I've found my list of locales. It's a pity only one
language can be used at a time but as you say there are conflicting
rules anyway.

The docs say there is a speed penalty on using locales. Does anyone
have any idea on how severe this is? I'm wondering wether I should
I'm not an expert really, but since you're already doing unicode I think
it's not going to be major with the one caveat that if you're doing LIKE
queries, you should look at the Operator Classes section of the
documentation about the *_pattern_ops operator classes.

Quote:
use the translate() function after all because of this. It would
solve multilingual issues to a certain level and there wouldn't be a
speed penalty since the indexes would be build on the translate()
function too.
The translate version would presumably work for cases where you want
multiple characters to sort to the same position, but if you want say an
accented A to follow a regular A I think it might be difficult to
formulate.

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend



Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.