dbTalk Databases Forums  

[BUGS] BUG #1775: UTF8 to ISO does not convert the german "sharp s" (ß)

mailing.database.pgsql-bugs mailing.database.pgsql-bugs


Discuss [BUGS] BUG #1775: UTF8 to ISO does not convert the german "sharp s" (ß) in the mailing.database.pgsql-bugs forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Marcus Raphelt
 
Posts: n/a

Default [BUGS] BUG #1775: UTF8 to ISO does not convert the german "sharp s" (ß) - 07-19-2005 , 09:28 AM







The following bug has been logged online:

Bug reference: 1775
Logged by: Marcus Raphelt
Email address: marcus (AT) raphelt (DOT) de
PostgreSQL version: latest
Operating system: WinXP (dev), Win2000 (server)
Description: UTF8 to ISO does not convert the german "sharp s"
(&szlig
Details:

Hi,

I currently have a iso_8859_1 app and included the opengeodb tables to have
a set of german cities and zipcodes.

http://opengeodb.sourceforge.net/index.php?FrontPage_en
I downloaded this dump:

http://prdownloads.sourceforge.net/o...-UTF8-postgres.
zip?download

They offer a Postgresql dump that can be imported without hassle, but is in
UTF8. I did several attempts using convert() to convert the city names to
iso. It works, and it also works fine on german umlauts (äöü), but it
seems like convert() does NOT convert the german "ß" / sharp s /
ß...

cheers,
marcus

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Reply With Quote
  #2  
Old   
Peter Eisentraut
 
Posts: n/a

Default Re: [BUGS] BUG #1775: UTF8 to ISO does not convert the german "sharp s" (ß) - 07-19-2005 , 03:23 PM






Magnus Hagander wrote:
Quote:
PostgreSQL 8.0 does not support UTF8 on Win32. You will need to
convert the file before you load it into the database - look at for
example the GNU iconv program to do this.
PostgreSQL does not support UTF8 locales on Windows, but the character
conversion should work the same on all platforms, shouldn't it?

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq


Reply With Quote
  #3  
Old   
Magnus Hagander
 
Posts: n/a

Default Re: [BUGS] BUG #1775: UTF8 to ISO does not convert the german "sharp s" (ß) - 07-19-2005 , 03:42 PM



Quote:
PostgreSQL 8.0 does not support UTF8 on Win32. You will need to=20
convert the file before you load it into the database - look at for=20
example the GNU iconv program to do this.
=20
PostgreSQL does not support UTF8 locales on Windows, but the=20
character conversion should work the same on all platforms,=20
shouldn't it?
I thought that was also handled in the locale code. But now that you say
it, it's probably not - it should be the same.
I'll withdraw my comment then, clearly wrong. Thanks for clearing that
up.


//Magnus

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend


Reply With Quote
  #4  
Old   
John Hansen
 
Posts: n/a

Default Re: [BUGS] BUG #1775: UTF8 to ISO does not convert the german "sharp s" (ß) - 07-19-2005 , 05:36 PM



Magnus Hagander Wrote:
Quote:
PostgreSQL 8.0 does not support UTF8 on Win32. You will need to=20
convert the file before you load it into the database -=20
look at for=20
example the GNU iconv program to do this.
=20
PostgreSQL does not support UTF8 locales on Windows, but=20
the character=20
conversion should work the same on all platforms, shouldn't it?
=20
I thought that was also handled in the locale code. But now=20
that you say it, it's probably not - it should be the same.
I'll withdraw my comment then, clearly wrong. Thanks for=20
clearing that up.
Currently, it is very much so handled by the systems locale support,
hence it is handled differently depending on platform.

ICU support should make it to 8.1 I hear, which will make the bahaviour
uniform.

.... John

Quote:
=20
=20
//Magnus
=20
---------------------------(end of=20
broadcast)---------------------------
TIP 6: explain analyze is your friend
=20
=20
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings


Reply With Quote
  #5  
Old   
Marcus Raphelt
 
Posts: n/a

Default Re: [BUGS] BUG #1775: UTF8 to ISO does not convert the german "sharp s" (ß) - 07-20-2005 , 08:21 AM



Hi,

Am Mittwoch, 20. Juli 2005 um 10:27 schrieben Sie:

TI> Problem here is the result of convert(ort using utf_8_to_iso_8859_1)
TI> is ISO-8859-1 but your database encoding is UTF-8,

Not exactly, my database is in iso_8859_1, only the opengeodb-dump is
solely available in utf-8.
The whole thing is no problem so far, as I can convert the values in
the application that uses the data - I just wanted to let you know.

TI> I guess what you want to do is:
TI> SELECT plz, loc_id, ort from orte_de
TI> where plz between 20000 and 30000
TI> order by convert(ort using utf_8_to_iso_8859_1)

Well, that was a fantasy-query just to show some "odd" values, I do
not need that much data in the application. Currently, I cannot
retrieve ready-converted iso data from the orte_de view, as converted
names that contain "ß" are not shown...


--
Mit freundlichen Grüssen
Marcus Raphelt mailto:marcus (AT) raphelt (DOT) de


---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo (AT) postgresql (DOT) org so that your
message can get through to the mailing list cleanly

Reply With Quote
  #6  
Old   
Marcus Raphelt
 
Posts: n/a

Default Re: [BUGS] BUG #1775: UTF8 to ISO does not convert the german "sharp s" (ß) - 07-20-2005 , 08:23 AM



Hi Tatsuo,

Am Mittwoch, 20. Juli 2005 um 01:00 schrieben Sie:

TI> conversion tables. So if german umlauts are converted fine, there's no
TI> reason the conversion for german sharp s does not work.

TI> Marcus,

TI> Can you give me the exact error message from PostgreSQL when the
TI> conversio failed?

Well - actually, there IS no error message, convert() just returns an
empty sting. In detail, here's what I did:

-Set up PostgreSQL (Ascii)

-Imported the opengeodb
http://sourceforge.net/project/showf...roup_id=132421
As they offer a PostgreSQL dump, I chose this one.

-Imported the dump using pgAdminIII

-Created a view that returns all german cities with ZIP and
opengeodb-locationID:

CREATE OR REPLACE VIEW orte_de AS
SELECT code.text_val AS plz, code.loc_id, town.text_val AS ort
FROM geodb_hierarchies hi, geodb_textdata state, geodb_textdata town, geodb_textdata code
WHERE hi.id_lvl2 = state.loc_id AND state.text_val = 'DE'::text
AND state.text_type = 500100001 AND town.loc_id = hi.loc_id
AND town.text_type = 500100000 AND code.loc_id = town.loc_id
AND code.text_type = 500300000;

So now I've got three columns: "plz" (zip), "ort" (city) and loc_id.
Assuming I want to retrieve cites in the Hamburg / Hannover area...
(perfect for this task, as they have pretty strange city names there


SELECT plz, loc_id, ort, convert(ort using utf_8_to_iso_8859_1) as
ort_conv
from orte_de
where plz between 20000 and 30000
order by ort_conv

This query returns empty values for "ort_conv" if "ort" contains a
sharp s.
Btw, it seems like it is the same for "Ä" (&Auml, have a look at
loc_id 25182.

--
Mit freundlichen Grüssen
Marcus Raphelt mailto:marcus (AT) raphelt (DOT) de


---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Reply With Quote
  #7  
Old   
Marcus Raphelt
 
Posts: n/a

Default Re: [BUGS] BUG #1775: UTF8 to ISO does not convert the german "sharp s" (ß) - 07-20-2005 , 08:25 AM



Hi,

PE> PostgreSQL does not support UTF8 locales on Windows, but the character
PE> conversion should work the same on all platforms, shouldn't it?

It works pretty fine even under win32, all umlauts are coverted as
expexted, except for the "ß" (&szlig.
For example, "Köln" converts to "Köln", but ""Groß Kreutz" results
in an empty string.

--
Mit freundlichen Grüssen
Marcus Raphelt mailto:marcus (AT) raphelt (DOT) de


---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Reply With Quote
  #8  
Old   
Bruno Wolff III
 
Posts: n/a

Default Re: [BUGS] BUG #1775: UTF8 to ISO does not convert the german "sharp s" (ß) - 07-20-2005 , 10:44 AM



On Wed, Jul 20, 2005 at 08:35:33 +1000,
John Hansen <john (AT) geeknet (DOT) com.au> wrote:
Quote:
ICU support should make it to 8.1 I hear, which will make the bahaviour
uniform.
There was some discussion about that, but my impression was that that
wasn't going to happen.

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match


Reply With Quote
  #9  
Old   
Magnus Hagander
 
Posts: n/a

Default Re: [BUGS] BUG #1775: UTF8 to ISO does not convert the german "sharp s" (&szlig;) - 07-20-2005 , 10:47 AM



Quote:
ICU support should make it to 8.1 I hear, which will make the=20
bahaviour uniform.
=20
There was some discussion about that, but my impression was=20
that that wasn't going to happen.
AFAIK the result was that it would go in as an alternative, but not to
*replace* what's there now.

//Magnus

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match


Reply With Quote
  #10  
Old   
Tom Lane
 
Posts: n/a

Default Re: [BUGS] BUG #1775: UTF8 to ISO does not convert the german "sharp s" (&szlig;) - 07-20-2005 , 12:39 PM



"Magnus Hagander" <mha (AT) sollentuna (DOT) net> writes:
Quote:
ICU support should make it to 8.1 I hear, which will make the
bahaviour uniform.

There was some discussion about that, but my impression was
that that wasn't going to happen.

AFAIK the result was that it would go in as an alternative, but not to
*replace* what's there now.
The whole issue is still TBD as far as I know --- there were arguments
on both sides but not consensus, and if any committer has reviewed the
ICU patch in any detail, I didn't hear about it.

It'd probably be easier to get consensus for a "make it a compile-time
option" proposal than for a "wholesale replacement" proposal.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.