dbTalk Databases Forums  

Question about non-English characters

comp.databases comp.databases


Discuss Question about non-English characters in the comp.databases forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Snoopy Dogg
 
Posts: n/a

Default Question about non-English characters - 08-11-2003 , 05:43 AM






Hi,
I've created a database of composers and their works on Postgres 7.3
(on FreeBSD 4.7). It was an import of a spreadsheet done by a friend and
contaions about 15,000 entries. He was very meticulous about entering
the data and used all of the correct characters in the composers' names
and the names of their works, for example, the last 'a' in 'Antonin
Dvorak' was entered with the Czech 'a' with an accent above it as well
as the umlauts in the German titles and names and the accents in French
words. I've converted all of the names to lower case, as I'll deal with
case conversions in display.
I want to retain the correct characters in the data. However, I want
to be able to use the Western 'a' character in queries, for example:
SELECT c.last_name, w.work_name FROM composers c, works w WHERE
c.composer_id = w.composer_id AND c.last_name LIKE 'dvorak';

The above query does not return the desired result if I use an English
'a'. I'm working on a web frontend and don't want the user to have to
paste in all sorts of characters. Can I create a many-to-one mapping of
accented or marked up characters to alphabetic characters? Any other
ideas? I'm sure I'm not the first person to think of this...

Paul Marquardt


Reply With Quote
  #2  
Old   
Jarl Hermansson
 
Posts: n/a

Default Re: Question about non-English characters - 08-11-2003 , 08:47 AM






Snoopy Dogg <snoopy (AT) dogg (DOT) net> wrote

[...]
Quote:
I want to retain the correct characters in the data. However, I want
to be able to use the Western 'a' character in queries, for example:
SELECT c.last_name, w.work_name FROM composers c, works w WHERE
c.composer_id = w.composer_id AND c.last_name LIKE 'dvorak';

The above query does not return the desired result if I use an English
'a'. I'm working on a web frontend and don't want the user to have to
paste in all sorts of characters. Can I create a many-to-one mapping of
accented or marked up characters to alphabetic characters? Any other
ideas? I'm sure I'm not the first person to think of this...

Paul Marquardt

Use an English collation that treats all different A's as equals.

SELECT c.last_name, w.work_name FROM composers c, works w WHERE
c.composer_id = w.composer_id
AND c.last_name COLLATE some_english_collation LIKE 'dvorak';

Collation names are DBMS specific, don't know about Postgres...

Try english_ci_ai (MS SQL Server?) for an English case insensitive,
accent insensitive collation.

Or try english_1 (Mimer SQL) for a level 1 comparison. (Level 1 is
pretty much the same as case and accent insensitive, at least for
English.)


/Jarl


Reply With Quote
  #3  
Old   
Gene Wirchenko
 
Posts: n/a

Default Re: Question about non-English characters - 08-11-2003 , 12:41 PM



jarl (AT) mimer (DOT) com (Jarl Hermansson) wrote:

Quote:
Snoopy Dogg <snoopy (AT) dogg (DOT) net> wrote

[...]
I want to retain the correct characters in the data. However, I want
to be able to use the Western 'a' character in queries, for example:
SELECT c.last_name, w.work_name FROM composers c, works w WHERE
c.composer_id = w.composer_id AND c.last_name LIKE 'dvorak';

The above query does not return the desired result if I use an English
'a'. I'm working on a web frontend and don't want the user to have to
paste in all sorts of characters. Can I create a many-to-one mapping of
accented or marked up characters to alphabetic characters? Any other
ideas? I'm sure I'm not the first person to think of this...

Use an English collation that treats all different A's as equals.

SELECT c.last_name, w.work_name FROM composers c, works w WHERE
c.composer_id = w.composer_id
AND c.last_name COLLATE some_english_collation LIKE 'dvorak';

Collation names are DBMS specific, don't know about Postgres...

Try english_ci_ai (MS SQL Server?) for an English case insensitive,
accent insensitive collation.

Or try english_1 (Mimer SQL) for a level 1 comparison. (Level 1 is
pretty much the same as case and accent insensitive, at least for
English.)
Another approach is to create a column that has the name stored
without accents and search on it. The column would be updated each
time the record is written. The way to do it is likely DBMS-specific
as Mr. Hermansson notes. If you can spare the storage, this extra
column could save time since you probably are going to do more reading
from than writing to the table.

Sincerely,

Gene Wirchenko

Computerese Irregular Verb Conjugation:
I have preferences.
You have biases.
He/She has prejudices.


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.