dbTalk Databases Forums  

[BUGS] Sorting Problem in UNICODE/german

mailing.database.pgsql-bugs mailing.database.pgsql-bugs


Discuss [BUGS] Sorting Problem in UNICODE/german in the mailing.database.pgsql-bugs forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Klaus Ita
 
Posts: n/a

Default [BUGS] Sorting Problem in UNICODE/german - 09-01-2005 , 06:03 AM






Hi there!

I have a Problem with a DB that was created in UNICODE

* createdb -E UNICODE

and actually shows that it _is_ in UNICODE.
i was able to input data and can read it and everything is fine.
but when i want to "ORDER BY ..." it does not sort the german Umlauts at the
correct postition.

should be:
m n o ö p

and is:
ö a b c d

I have tried starting postgres with LC_ALL=de_AT.utf8@euro
locale but that did not help.

what xould i do?

regs,
klaus

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Reply With Quote
  #2  
Old   
Tom Lane
 
Posts: n/a

Default Re: [BUGS] Sorting Problem in UNICODE/german - 09-01-2005 , 08:31 AM






Klaus Ita <postgres (AT) stro (DOT) at> writes:
Quote:
I have tried starting postgres with LC_ALL=de_AT.utf8@euro
locale but that did not help.
You need to run initdb under that setting. See "Localization" in
the documentation.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq


Reply With Quote
  #3  
Old   
Klaus Ita
 
Posts: n/a

Default Re: [BUGS] Sorting Problem in UNICODE/german - 09-02-2005 , 01:39 AM




On Thu, Sep 01, 2005 at 09:30:15AM -0400, Tom Lane wrote:
Quote:
Klaus Ita <postgres (AT) stro (DOT) at> writes:
I have tried starting postgres with LC_ALL=de_AT.utf8@euro
locale but that did not help.

i did read the docs and am still not quite happy with my sorting results.
ok initdb has been rerun

made sure, i had the locale:

locale -a

created new db-cluster with
LC_ALL=de_AT.utf8@euro initdb --locale=de_AT.utf8@euro -E UNICODE -D /dev/shm/pgutf8

and then still the sorting was not right when i restored another UNICODE db.

another "funny" thing is:

ita@aipc54:~/.mutt$ LC_ALL=de_AT.utf8@euro sort /tmp/testfile
Abend
Oma
Ãterreich
Ãerflieger
Unter
Zetrix

this is also wrong (There should be 'Unter' and then 'U:berflieger' [Überflieger]). so is this a libc bug?


thank you for your help so far! I more than appreciate it. Support for this
DB is sooo much better than for oracle!

klaus

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq


Reply With Quote
  #4  
Old   
Andreas Seltenreich
 
Posts: n/a

Default Re: [BUGS] Sorting Problem in UNICODE/german - 09-02-2005 , 02:57 AM



Klaus Ita schrob:

Quote:
On Thu, Sep 01, 2005 at 09:30:15AM -0400, Tom Lane wrote:
Klaus Ita <postgres (AT) stro (DOT) at> writes:
I have tried starting postgres with LC_ALL=de_AT.utf8@euro
locale but that did not help.


i did read the docs and am still not quite happy with my sorting results.
ok initdb has been rerun

made sure, i had the locale:

locale -a

created new db-cluster with
LC_ALL=de_AT.utf8@euro initdb --locale=de_AT.utf8@euro -E UNICODE -D /dev/shm/pgutf8

and then still the sorting was not right when i restored another
UNICODE db.
Well, I used the very same command with 8.0.3 to create a database,
and the sort order was correct:

--8<---------------cut here---------------start------------->8---
scratch=# select w from w order by w;
w
-------------
Abend
Oma
Österreich
Überflieger
Unter
Zetrix
(6 rows)
--8<---------------cut here---------------end--------------->8---

So I guess there was some misconfiguration of your current
client_encoding during import, or maybe the dump of your unicode db
got unexpectedly converted by improper settings during dumping.

Quote:
another "funny" thing is:

ita@aipc54:~/.mutt$ LC_ALL=de_AT.utf8@euro sort /tmp/testfile
Abend
Oma
Ãterreich
Ãerflieger
Unter
Zetrix

this is also wrong (There should be 'Unter' and then 'U:berflieger'
[Überflieger]). so is this a libc bug?
The sort order is correct, so libc did succeed in its part. Maybe your
terminal is having issues with utf-8? If you're using xterm: Did you
run it with -u8 or some utf-8-enabling X-resource? To verify that the
terminal is working properly, typing

echo ö > /tmp/foo
file /tmp/foo

on a shell should tell you that you have a utf-8 text file.

HTH
Andreas
--

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org


Reply With Quote
  #5  
Old   
Andreas Seltenreich
 
Posts: n/a

Default Re: [BUGS] Sorting Problem in UNICODE/german - 09-02-2005 , 04:59 AM




Sorry, I just reread your mail: Your MUA is declaring it with

Quote:
Content-Type: text/plain; charset=unknown-8bit
This makes it even harder to discuss problems with Umlauts :-).

Andreas Seltenreich schrob:

Quote:
Klaus Ita schrob:

another "funny" thing is:

ita@aipc54:~/.mutt$ LC_ALL=de_AT.utf8@euro sort /tmp/testfile
Abend
Oma
Ãterreich
Ãerflieger
Unter
Zetrix

this is also wrong (There should be 'Unter' and then 'U:berflieger'
[Überflieger]). so is this a libc bug?
I think I got your point now. Libc appears to be using iso-14651
sorting for all "de" locales. I'm afraid you will have compile a
customized locale to depart from that.

regards,
Andreas
--

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend


Reply With Quote
  #6  
Old   
Tom Lane
 
Posts: n/a

Default Re: [BUGS] Sorting Problem in UNICODE/german - 09-02-2005 , 09:29 AM



Andreas Seltenreich <andreas+pg (AT) gate450 (DOT) dyndns.org> writes:
Quote:
Klaus Ita schrob:
this is also wrong (There should be 'Unter' and then 'U:berflieger'
[Überflieger]). so is this a libc bug?

I think I got your point now. Libc appears to be using iso-14651
sorting for all "de" locales. I'm afraid you will have compile a
customized locale to depart from that.
I wouldn't call it a libc bug, but a bug in the locale definition.
In any case it doesn't appear to be Postgres' problem --- if we sort
the same way "sort" does under the same locale setting, then we are
doing what we expect.

I think at this point Klaus needs to find some people who know about
hacking locale definitions. I sure don't know enough about them to
help further. Is there a libc mailing list anywhere?

One thing I do know --- if you install a new version of the locale
Postgres is using, you'd better re-initdb, or at least REINDEX all
your indexes on textual columns. Changing sort order is equivalent
to making such indexes corrupt.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.