dbTalk Databases Forums  

database internationalization (i8n)

comp.databases comp.databases


Discuss database internationalization (i8n) in the comp.databases forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
metaperl
 
Posts: n/a

Default database internationalization (i8n) - 03-31-2010 , 10:09 AM






Let's say you have a database of movies. The MySQL sakila database is
such a pubicly available sample schema.

In that database, there is a film table (
http://dev.mysql.com/doc/sakila/en/s...re-tables-film
)

Two of the columns in that table are title and description, defined as
VARCHAR(255) and VARCHAR(65535) respectively.

But obviously that only allows for title and description in one
language.

What approach would you use to 'scale' these fields to any number of
arbitrary languages?

Reply With Quote
  #2  
Old   
Thomas Kellerer
 
Posts: n/a

Default Re: database internationalization (i8n) - 03-31-2010 , 10:14 AM






metaperl, 31.03.2010 17:09:
Quote:
Let's say you have a database of movies. The MySQL sakila database is
such a pubicly available sample schema.

In that database, there is a film table (
http://dev.mysql.com/doc/sakila/en/s...re-tables-film
)

Two of the columns in that table are title and description, defined as
VARCHAR(255) and VARCHAR(65535) respectively.

But obviously that only allows for title and description in one
language.

What approach would you use to 'scale' these fields to any number of
arbitrary languages?
Remove those columns from the base table and create a new table

film_title (film_id, language_id, localized_title, localized_description)

You might want to keep the title column in the film table to record the original title under which it was released in the originating country.

Regards
Thomas

Reply With Quote
  #3  
Old   
Gene Wirchenko
 
Posts: n/a

Default Re: database internationalization (i8n) - 03-31-2010 , 02:15 PM



On Wed, 31 Mar 2010 17:14:27 +0200, Thomas Kellerer
<OTPXDAJCSJVU (AT) spammotel (DOT) com> wrote:

Quote:
metaperl, 31.03.2010 17:09:
Let's say you have a database of movies. The MySQL sakila database is
such a pubicly available sample schema.

In that database, there is a film table (
http://dev.mysql.com/doc/sakila/en/s...re-tables-film
)

Two of the columns in that table are title and description, defined as
VARCHAR(255) and VARCHAR(65535) respectively.

But obviously that only allows for title and description in one
language.

What approach would you use to 'scale' these fields to any number of
arbitrary languages?
Remove those columns from the base table and create a new table

film_title (film_id, language_id, localized_title, localized_description)
I would go with Thomas's approach. Some additional
complications:


Consider character sets. Consider having to represent more than
one character set in your database.

There could be more than one title in a language.

At least one of the Harry Potter books has more than one title.
(UK and US versions)

"Il buono, il brutto, il cattivo" is usually known in English as
"The Good, the Bad, and the Ugly", but it is also known by the literal
translation of "The Good, the Ugly, the Bad".

Quote:
You might want to keep the title column in the film table to record the original title under which it was released in the originating country.
That assumes that it was released there first. That is probably
the case but need not be so.

Sincerely,

Gene Wirchenko

Reply With Quote
  #4  
Old   
metaperl.santmat@gmail.com
 
Posts: n/a

Default Re: database internationalization (i8n) - 03-31-2010 , 02:22 PM



On Mar 31, 3:15*pm, Gene Wirchenko <ge... (AT) ocis (DOT) net> wrote:

Quote:
* * *Consider character sets. *Consider having to represent more than
one character set in your database.


Well, I would imagine you would just go with a universal character
encoding, like utf8 or something.

Reply With Quote
  #5  
Old   
Gene Wirchenko
 
Posts: n/a

Default Re: database internationalization (i8n) - 03-31-2010 , 04:31 PM



On Wed, 31 Mar 2010 12:22:27 -0700 (PDT), "metaperl.santmat (AT) gmail (DOT) com"
<metaperl.santmat (AT) gmail (DOT) com> wrote:

Quote:
On Mar 31, 3:15*pm, Gene Wirchenko <ge... (AT) ocis (DOT) net> wrote:

* * *Consider character sets. *Consider having to represent more than
one character set in your database.

Well, I would imagine you would just go with a universal character
encoding, like utf8 or something.
Some DBMSs do not support it.

Sincerely,

Gene Wirchenko

Reply With Quote
  #6  
Old   
Thomas Kellerer
 
Posts: n/a

Default Re: database internationalization (i8n) - 03-31-2010 , 05:10 PM



Gene Wirchenko wrote on 31.03.2010 23:31:
Quote:
Well, I would imagine you would just go with a universal character
encoding, like utf8 or something.

Some DBMSs do not support it.
Seriously? Which one?

Reply With Quote
  #7  
Old   
Gene Wirchenko
 
Posts: n/a

Default Re: database internationalization (i8n) - 03-31-2010 , 05:54 PM



On Thu, 01 Apr 2010 00:10:44 +0200, Thomas Kellerer
<OTPXDAJCSJVU (AT) spammotel (DOT) com> wrote:

Quote:
Gene Wirchenko wrote on 31.03.2010 23:31:
Well, I would imagine you would just go with a universal character
encoding, like utf8 or something.

Some DBMSs do not support it.

Seriously? Which one?
I do not know the full list. Microsoft Visual FoxPro is one.
Microsoft strangled VFP. AIUI, VFP can handle different character
sets but has limited support for non-8-bit character sets.

VFP is a good mid-level DBMS, but this character set limitation
might be a dealbreaker for some. I have never had to do work with
other character sets, and people have worked around some of the
limitations. I do not know the details.

Similarly, VB6 has limitations with its built-in controls not
handling wider character sets.

Sincerely,

Gene Wirchenko

Reply With Quote
  #8  
Old   
Tom Anderson
 
Posts: n/a

Default Re: database internationalization (i8n) - 04-30-2010 , 12:40 PM



On Wed, 31 Mar 2010, Gene Wirchenko wrote:

Quote:
metaperl, 31.03.2010 17:09:
Let's say you have a database of movies. The MySQL sakila database is
such a pubicly available sample schema.

There could be more than one title in a language.
You'd usually use locale rather than language, a locale being a language
qualified with a country. For instance, English is a language, but British
English is a locale. It can be further qualified with a variant, but the
only example i come across is Norwegian, where Norwegian is written (in
Norway) in two orthographies, Bokmal and Nynorsk. Although i have been
told (by a Dane - probably unreliably) nobody actually uses Nynorsk.

The (de facto, i think) standard encoding for locales combines ISO-639 for
the languages and ISO-3166 for the countries, joined with an underscore:

en_GB proper English
en_US colonial English
fr_CA Quebecois
no_NO Bokmal Norwegian
no_NO_NY Nynorsk Norwegian

And just to mix it up a bit:

nb_NO also Bokmal Norwegian!
nn_NO also Nynorsk Norwegian!

I don't know if nb and nn are properly standard, though. They are
definitely a mistake.

Anyway, those strings are unique, and small enough that they make
practical keys. Encode them as a CHAR(4), skipping the underscore, and
they're no bigger than an int. If you can forget about variants, anyway.

tom

--
The idiots are winning.

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.