dbTalk Databases Forums  

Question about terminology

comp.databases comp.databases


Discuss Question about terminology in the comp.databases forum.



Reply
 
Thread Tools Display Modes
  #11  
Old   
Tom Anderson
 
Posts: n/a

Default Re: Question about terminology - 06-22-2010 , 02:37 PM






On Tue, 22 Jun 2010, David Kerber wrote:

Quote:
In article <87k4prnq1c.fsf (AT) benfinney (DOT) id.au>, bignose+hates-
spam (AT) benfinney (DOT) id.au says...

Jasen Betts <jasen (AT) xnet (DOT) co.nz> writes:

On 2010-06-22, Ben Finney <bignose+hates-spam (AT) benfinney (DOT) id.au> wrote:

A better example is street addresses (though, of course, there are
some locations that make this not as simple as it might first
appear). Country, state or province, postal code, city or town,
street, property number, unit number, addressee's full name. That's
much less ambiguous and more reliably queried and re-composed.

Not really. There are national variations in addressing too,

Yes, and if you read my message again you'll see that I allowed that. I
maintain that, compared to people's names, addresses are much less
ambiguous in their components and much more easily re-composed from
components by simple mechanical process, and thus a better candidate for
functional decomposition.

I disagree. Basically every culture has "surname" and "given" personal
name components (or in some cases, just a single name). It's just the
order that they are written (or spoken) will vary.
It's a bit worse than that, because you have things like Iceland, where
people have a given name and a patronymic (for example, former president
Vigdis Finnbogadottir is the daughter of Finnbogi Thorvaldsson), Burma,
where some people of have no surnames at all (for example, former UN
secretary-general U Thant, who was really just called Thant - 'U' is
Burmese for 'Mr'), Spain (and some other Spanish-speaking countries),
where everyone has two family names, each inherited from the first family
name of each parent (so Sr Alvarez Blanco and Sra Calderon Diaz would have
children whose family names were Alvarez Calderon - unless they elected to
put the mother's contribution first, as some do), except when sometimes
they choose to take both of one parent's names, so ending up with three,
or rather two, where the first one is double-barelled (as in the case of
Mexican ambassador Agustin Barrios-Gomez Mendez, or perhaps one of his
paternal ancestors), or to take just one parent's name, if it's
double-barrelled, and expand it to use as both surnames (as in the case of
his son, journalist Agustin Barrios Gomez, who inherited a Segues from his
mother, but dropped it), and nobody has a middle name, but their first
name may have a space in (eg Juan Pablo), and the USA, where people have
particles like 'Jr' and 'III' after their names which are derived from
their name and family relations, but are nominally significant.

That said, i think you can pretty easily deal with all that within the
framework of family name + sequence of given names + order flag. It's just
that sometimes the family name is inherited from a parent's given name,
rather than their family name, sometimes it has a space in it, and
sometimes it's NULL.

Oh, and you have to ignore Americans.

Quote:
Addresses have a much wider range from what I've seen.
Oh, don't even get me started on addresses. Go and read the wikipedia
article on Japanese address formats if you want to see how bad it gets.

On the plus side, there is at least a liferaft on the stormy sea of
addresses: the Universal Postal Union maintains a standard called S42
which is a completely comprehensive manual for address formats, including
how to take them apart and put them back together, how different national
formats correspond semantically, what bits you can drop if you want to
abbreviate them, etc.

Someone really ought to do the same for names.

tom

--
What we learn about is not nature itself, but nature exposed to our
methods of questioning. -- Werner Heisenberg

Reply With Quote
  #12  
Old   
Gene Wirchenko
 
Posts: n/a

Default Re: Question about terminology - 06-22-2010 , 04:45 PM






On Tue, 22 Jun 2010 14:49:48 -0400, David Kerber
<dkerber (AT) WarrenRogersAssociates (DOT) invalid> wrote:

[snip]

Quote:
Normalization and this breaking down of fields into component parts (I
hadn't heard the term "functional decomposition" before) aren't the same
thing, though they are related. WRT the street address thing, I
personally think breaking down the street address into multiple pieces
is a mistake, but the city, state and postal code should be their own
fields.
For Canadian Postal Codes, the position on the street part of the
address -- Does it have a name? -- does matter. Different values can
have different Postal Codes.

Sincerely,

Gene Wirchenko

Reply With Quote
  #13  
Old   
Tom Anderson
 
Posts: n/a

Default Re: Question about terminology - 06-23-2010 , 06:06 AM



On Tue, 22 Jun 2010, Gene Wirchenko wrote:

Quote:
On Tue, 22 Jun 2010 14:49:48 -0400, David Kerber
dkerber (AT) WarrenRogersAssociates (DOT) invalid> wrote:

Normalization and this breaking down of fields into component parts (I
hadn't heard the term "functional decomposition" before) aren't the
same thing, though they are related. WRT the street address thing, I
personally think breaking down the street address into multiple pieces
is a mistake, but the city, state and postal code should be their own
fields.

For Canadian Postal Codes, the position on the street part of the
address -- Does it have a name? -- does matter. Different values can
have different Postal Codes.
Same in the UK. For example at the far ends of the rather long but
entirely continuous Holloway Road:

91 Holloway Road is N7 8LT (and is a nice Georgian restaurant)
746 Holloway Road is N19 3JF (and is a very good bakery)

But i don't see why that means breaking down the address as David suggests
is a problem. There is redundancy, because the postcode implies the street
name even if not vice versa, and so this data wouldn't be normalised, but
it would be decomposed.

tom

--
Voltan tires of life upon Super Secret Sea-Base Beta. Perhaps this
Holloway Road of which you speak is the solution. Voltan shall investigate
it during Voltans campaign to overrun London. (This is but a part of
Voltans plan for world domination.) -- Voltan

Reply With Quote
  #14  
Old   
David Kerber
 
Posts: n/a

Default Re: Question about terminology - 06-23-2010 , 07:32 AM



In article <alpine.DEB.1.10.1006231159300.28525 (AT) urchin (DOT) earth.li>,
twic (AT) urchin (DOT) earth.li says...
Quote:
On Tue, 22 Jun 2010, Gene Wirchenko wrote:

On Tue, 22 Jun 2010 14:49:48 -0400, David Kerber
dkerber (AT) WarrenRogersAssociates (DOT) invalid> wrote:

Normalization and this breaking down of fields into component parts (I
hadn't heard the term "functional decomposition" before) aren't the
same thing, though they are related. WRT the street address thing, I
personally think breaking down the street address into multiple pieces
is a mistake, but the city, state and postal code should be their own
fields.

For Canadian Postal Codes, the position on the street part of the
address -- Does it have a name? -- does matter. Different values can
have different Postal Codes.

Same in the UK. For example at the far ends of the rather long but
entirely continuous Holloway Road:

91 Holloway Road is N7 8LT (and is a nice Georgian restaurant)
746 Holloway Road is N19 3JF (and is a very good bakery)
That's not uncommon in the US either; there are some streets in the
Kansas City area that are continuous with the same name and contiguous
house numbers, that cross 3 different cities/towns, and multiple Zip
codes (which is what we call our postal codes). In the US, the 5-digit
zip code tells you what post office handles the mail, and the last 4
(optional) digits narrow it down to a range of a block or a few blocks.

My own zip code covers *parts* of 3 different towns (and all those towns
have other zip codes as well), all handled from one post office.

D

Reply With Quote
  #15  
Old   
LC's No-Spam Newsreading account
 
Posts: n/a

Default Re: Question about terminology - 06-23-2010 , 10:51 AM



Quote:
91 Holloway Road is N7 8LT (and is a nice Georgian restaurant)
746 Holloway Road is N19 3JF (and is a very good bakery)

codes (which is what we call our postal codes). In the US, the 5-digit
zip code tells you what post office handles the mail, and the last 4
(optional) digits narrow it down to a range of a block or a few blocks.
Italian CAP post codes are just 5 digits. In principle they correspond
to a "comune" (municipality), assuming it has just one post office. But
large cities have many offices and therefore more codes, and small
villages can share a post office (and a CAP code).

The first two digits used to have a relationship with provinces (the
first digit could not have a direct relationship with regions, because
we have more than 10 regions, so they are somehow grouped
geographically).

E.g. 24058 is the comune of Romano in province of Bergamo, region
Lombardy (third digit 0 is NOT a province capital)

24100 is Bergamo itself (the numbers ending in 100 are all
province capitals, some places however became province
capitals after the CAP was assigned so they have retained
the original code)

24020 is a catch-all code for many small places in the mountains
above Bergamo (ending in 0 is a collective CAP)

20100 is a generic code for Milano (region capital in Lombardy)
which is however normally not used (second digit 0 is region
capital or most important region capital if the first digit
covers 2 regions ... for instance Turin and Genoa are both
1x100 but Turin is 10100)

20133 is the code for a particular area in Milan (these are the
codes used). Of course a long street can have several CAPs
along its course, or sometimes different ones on the left
and right sides if it is on a boundary)

--
----------------------------------------------------------------------
nospam (AT) mi (DOT) iasf.cnr.it is a newsreading account used by more persons to
avoid unwanted spam. Any mail returning to this address will be rejected.
Users can disclose their e-mail address in the article if they wish so.

Reply With Quote
  #16  
Old   
Gene Wirchenko
 
Posts: n/a

Default Re: Question about terminology - 06-23-2010 , 11:49 AM



On Wed, 23 Jun 2010 12:06:19 +0100, Tom Anderson
<twic (AT) urchin (DOT) earth.li> wrote:

Quote:
On Tue, 22 Jun 2010, Gene Wirchenko wrote:

On Tue, 22 Jun 2010 14:49:48 -0400, David Kerber
dkerber (AT) WarrenRogersAssociates (DOT) invalid> wrote:

Normalization and this breaking down of fields into component parts (I
hadn't heard the term "functional decomposition" before) aren't the
same thing, though they are related. WRT the street address thing, I
personally think breaking down the street address into multiple pieces
is a mistake, but the city, state and postal code should be their own
fields.

For Canadian Postal Codes, the position on the street part of the
address -- Does it have a name? -- does matter. Different values can
have different Postal Codes.

Same in the UK. For example at the far ends of the rather long but
entirely continuous Holloway Road:

91 Holloway Road is N7 8LT (and is a nice Georgian restaurant)
746 Holloway Road is N19 3JF (and is a very good bakery)

But i don't see why that means breaking down the address as David suggests
is a problem. There is redundancy, because the postcode implies the street
name even if not vice versa, and so this data wouldn't be normalised, but
it would be decomposed.
If you are trying to look up Postal Codes, it is easier to have
the address parsed.

There are some unusual road names. Consider
North Road in Cuquitlam, BC
Avenue Road in Toronto, ON
Kent Avenue North and Kent Avenue South in Vancouver, BC.
Since they are on both sides of the east-west division, there are East
Kent Avenue North, West Kent Avenue North, East Kent Avenue South, and
West Kent Avenue South.

There are some unusual addresses. Consider
North Foot of Main Street, Vancouver, BC
and
South Foot of Main Street, Vancouver, BC

The Postal Code does not imply the street name. It often is the
case, but not always. Rural routes are a trivial example.

Going outside of Canada, consider
Yew Street in Bellingham, WA
and
Yew Street Road in Bellingham, WA

Sincerely,

Gene Wirchenko

Reply With Quote
  #17  
Old   
Philipp Post
 
Posts: n/a

Default Re: Question about terminology - 06-25-2010 , 07:17 AM



Quote:
If you are trying to look up Postal Codes, it is easier to have
the address parsed.
That is a valid reason for decomposition, if you need to do in-depth
validation of the postal code.

For Germany you can buy a list of all streets in the country with the
corresponding house number range and the postal code.

Also some complications here:
* Streets may span several postal codes
* A street with the same name can exist in different cities. Sometimes
a street with the same name exists twice in a city, but administration
aims to clean such things up renaming one of them.
* One postal code can be valid for several smaller towns.
* Town names are not unique, e. g. "Neustadt" exists several times at
different points in the country with different postal codes
* The first digit of the postal code leads you to the county
("Bundesland") but the postal code sub-areas have nothing to do with
the administrative/political subareas. Same for the phone area code
which is not always in the borders of the political areas.

brgds

Philipp Post

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.