dbTalk Databases Forums  

Searching for a database "translation" tool

comp.databases comp.databases


Discuss Searching for a database "translation" tool in the comp.databases forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
David
 
Posts: n/a

Default Searching for a database "translation" tool - 05-09-2006 , 10:00 PM






Hello,

I'm searching for a tool to help translate databases
or keep two databases in sync. I'd like for this to
be an existing tool that can be bought off the shelf.
I can create something like this, but I have need for
a solution today.

The primary use of this tool will be to interface
a customer's database system to our product's
database. Besides database connectivity (ODBC
may be sufficient for most customers) the customer
database will likely have little resemblance to
the data that my product requires. An intermediate
set of data would be needed to perform the translation.
I'd also like this tool to have the hooks or
ability to notify the user when the sync step fails.

For example, consider a product that needs the name
and address information extracted from a customers
system. There are any number of ways to split up
the name and address fields in both databases.
One-to-one, one-to-many, many-to-one, and
many-to-many relationships should be supported.
Some translations might be simple string operations.
My side of the product database will likely stay
fairly fixed over the long term. Each customers
database may have totally unique relationships
for storing the data that we'd like to have.

Consider the following customer data:
First Name
Last Name
Address line 1
Address line 2
City
State
ZIP Code

Consider the following desired product data:
Name
House Number
Street Name
Location Code

I would like to "teach" the tool that
the desired Name can be created by appending
the First and Last Names. The Address lines
would be looked at and split apart to get a
general address format. The city, state, and
ZIP Code would be translated to a location code.

Once the desired mapping relationships were
established, there would likely be some intermediate
tables built that had the expected-desired
relationships. From there the tool would start the
translation and an error report or API would allow
a human to be asked for the desired relationships.
As the system learns all the rules the data is
translated automatically. Once a full set of data
is successfully mapped, successive translations
are likely to be satisfied by the current data. As
new relationships develop when a customer enters new
or unexpected data, the tool would again query for the
correct output. In theory this process could run in
both directions, though I don't expect that need for
a couple years. My presumption, of course, is that
this problem has been solved by other database tool
makers and I've just not run across a good one yet.

The translation will be run once a day or more but
need not be near real time. I expect less than
a million records to translate for a large customer.
The amount of mapping data will likely stabilize
quickly and could be built into our cost of setting
up such a system. After that the customer would be
responsible for new data. This would hopefully be
presented as part of our product. The overall
translation and integration of the tool with all
databases should be very reliable. Price is an
issue, but I'm not looking to spend a fortune
for each customer. Obviously there are many
trade offs to consider.

Have you used or seen such a product? What would
you recommend for a problem like this?

Thank you,

David

Reply With Quote
  #2  
Old   
Ed Prochak
 
Posts: n/a

Default Re: Searching for a database "translation" tool - 05-10-2006 , 03:47 PM







David wrote:
Quote:
Hello,

I'm searching for a tool to help translate databases
or keep two databases in sync. I'd like for this to
be an existing tool that can be bought off the shelf.
I can create something like this, but I have need for
a solution today.

The primary use of this tool will be to interface
a customer's database system to our product's
database. Besides database connectivity (ODBC
may be sufficient for most customers) the customer
database will likely have little resemblance to
the data that my product requires. An intermediate
set of data would be needed to perform the translation.
I'd also like this tool to have the hooks or
ability to notify the user when the sync step fails.

For example, consider a product that needs the name
and address information extracted from a customers
system. There are any number of ways to split up
the name and address fields in both databases.
One-to-one, one-to-many, many-to-one, and
many-to-many relationships should be supported.
Some translations might be simple string operations.
My side of the product database will likely stay
fairly fixed over the long term. Each customers
database may have totally unique relationships
for storing the data that we'd like to have.

Consider the following customer data:
First Name
Last Name
Address line 1
Address line 2
City
State
ZIP Code

Consider the following desired product data:
Name
House Number
Street Name
Location Code

I would like to "teach" the tool that
the desired Name can be created by appending
the First and Last Names. The Address lines
would be looked at and split apart to get a
general address format. The city, state, and
ZIP Code would be translated to a location code.

Once the desired mapping relationships were
established, there would likely be some intermediate
tables built that had the expected-desired
relationships. From there the tool would start the
translation and an error report or API would allow
a human to be asked for the desired relationships.
As the system learns all the rules the data is
translated automatically. Once a full set of data
is successfully mapped, successive translations
are likely to be satisfied by the current data. As
new relationships develop when a customer enters new
or unexpected data, the tool would again query for the
correct output. In theory this process could run in
both directions, though I don't expect that need for
a couple years. My presumption, of course, is that
this problem has been solved by other database tool
makers and I've just not run across a good one yet.

The translation will be run once a day or more but
need not be near real time. I expect less than
a million records to translate for a large customer.
The amount of mapping data will likely stabilize
quickly and could be built into our cost of setting
up such a system. After that the customer would be
responsible for new data. This would hopefully be
presented as part of our product. The overall
translation and integration of the tool with all
databases should be very reliable. Price is an
issue, but I'm not looking to spend a fortune
for each customer. Obviously there are many
trade offs to consider.

Have you used or seen such a product? What would
you recommend for a problem like this?

Thank you,

David
I know of no such OTS tool. your address example is a good example of
why. Any such tool would have to handle the mapping and data
translations. Configuring such a tool is going to be very close to
programming your own interface, especially for things like "spliting"
Address_line_1 into House_Number and Street_Name. Other issues
include:
how "realtime" this interface must be.
how failures are handled.
what triggers the sync process.
what happens of the customer system crashes and must be restored from
backup?
is the interface one-way or bydirectional?
What customer data are you allowed to change and what data must you
not change?
what of your data is the customer allowed to change and what data must
they not change?

You might "sync" the databases by creating an intermediate "standard"
format for the data. XML or EDI file transfers might be safer.


Finally a word about your address example. Hopefully you were just
posing two different datamodels to show your desired feature. However
it is a bad idea to split postal information into separate fields like
House_Number and Street_Name. Why? because addresses don't work that
way.

just consider:
house numbers are not always numbers (128B Taylor Road)
address is a post office box (e.g. PO Box41)
address is a roural route (e.g. RR 5 Box 27)
address is an apartment building (e.g. Gaudy Towers Suite 201)
And we haven't left the USA yet!


HTH,
ed



Reply With Quote
  #3  
Old   
Michael Zedeler
 
Posts: n/a

Default Re: Searching for a database "translation" tool - 05-10-2006 , 04:50 PM



David wrote:
Quote:
For example, consider a product that needs the name
and address information extracted from a customers
system. [cut]

Have you used or seen such a product? What would
you recommend for a problem like this?
The issues related to merging addresses are so vast that you will
require a product dedicated to do that alone. Besides the many different
available formats, the fact that addresses can be written very
differently but still be pointing to the same place as well as spellings
errors, leaves a very complicated problem to solve.

I believe that Experian has some sort of online data washing solution
for enterprise use. Take a look here:

http://www.experian.com/direct_marke...integrity.html

Bu I am not sure if they will actually be able to provide what you need.
Maybe it is just all hot air.

I think that a data merge between two customer databases should be done
in two steps: (1) each database is washed and to each customer (record),
some global identifier is attached and (2) the two databases are merged
using the global identifiers.

Trying to do direct matching (of inconsistent databases) leaves too many
open ends.

Regards,

Michael.
--
Which is more dangerous? TV guided missiles or TV guided families?
Visit my home page at http://michael.zedeler.dk/
Get my vcard at http://michael.zedeler.dk/vcard.vcf


Reply With Quote
  #4  
Old   
David
 
Posts: n/a

Default Re: Searching for a database "translation" tool - 05-11-2006 , 12:51 AM



Hello Ed,

Thank you for the response. Sadly the examples given are possible
translations that could be needed. Some customers would have the
equivalent of hardcoded values and other customer might actually
have some portion of the desired data. My product uses a standard
(though not common to the public) form of address and other
information.

I agree that the problem is not easily solved. However, any system
that may be developed could improve the usefulness of my product
to the customers seeking the solution. I'm just painfully aware
that I don't have the bandwith to customize every installation
that goes out the door and would like to find some reasonable
middle ground where the customer can help supervise the translation
process and use their data. Duplicate entry in both databases
is extremely problematic as the customer shouldn't need to maintain
both systems. However, standardization is a hard thing to agree
upon, especially when simple concepts like "addresses" have such
wide variations.

I've also answered a couple of your questions below. We're still
giving lots of thought as to how to address these customers needs.
They do pay the bills after all.

<my examples snipped>
Quote:
I know of no such OTS tool. your address example is a good example of
why. Any such tool would have to handle the mapping and data
translations. Configuring such a tool is going to be very close to
programming your own interface, especially for things like "spliting"
Address_line_1 into House_Number and Street_Name. Other issues
include:
how "realtime" this interface must be.
Not very. Right now the backend results may take days to recognize.
My product easily handles changes in seconds, but downstream products
are unstandardized, legacy bahemuths. A centralized authority would
be better, but that is outside anyones control.

Quote:
how failures are handled.
I mainly want the customer to be aware of the problem. The typical
use should stabalize once the data is massaged. My product provides
its own feedback. The customer really just needs to know when the
translation fails and address the problem in their database or in
the translation tool.

Thus the address problem could have a one to one or many to one
solution.

Quote:
what triggers the sync process.
It would essentially be continuous or a periodic scan. For databases
that have triggers changes are easy to get. I already handle the problem
of extacting just the changes from a full reload of the database.

Quote:
what happens of the customer system crashes and must be restored from
backup?
Customer, translator, or target system dies. The process is restarted
and eventually stabalizes. It can go down for maintenance, though
one customer wants high reliability. That may take a bit longer.<grin>

Quote:
is the interface one-way or bydirectional?
I'd say one way for now. The error feedback still makes it back to
the source, but may or may not be automated. Some customers already
use my feedback API. Others choose to just monitor the daily status
for problems and resolve them when they have free time.

Quote:
What customer data are you allowed to change and what data must you
not change?
Most of the data must achieve a standardized form. The Name and Address
logic must match downstream or we could be reading about the details of
the problem in comp.risks. Many other fields are just free form and
have to fit in the space that the downstream systems accomodate.

Quote:
what of your data is the customer allowed to change and what data must
they not change?
They can change anything. The end result must still verify with a
central database. Often the customer doesn't even want to maintain
their own data and other companies are stepping up to solve that
problem.

Quote:
You might "sync" the databases by creating an intermediate "standard"
format for the data. XML or EDI file transfers might be safer.
That is essentially what I'm thinking. Some transfers could be
database, LDAP, XML, or whatever. Many of the downstream systems
are still stuck in 1960s flat file technology. Hopefully a central
translation system with standard hooks and APIs will ease the task
of adapting to all the various customer demands.

Quote:
Finally a word about your address example. Hopefully you were just
posing two different datamodels to show your desired feature. However
it is a bad idea to split postal information into separate fields like
House_Number and Street_Name. Why? because addresses don't work that
way.
Our downstream databases have just that type of address mess.
Fortunately it isn't often that the address even appears in the customer
database and is something that gets hardcoded or associated from
something else like a department or postal code.

Quote:
just consider:
house numbers are not always numbers (128B Taylor Road)
address is a post office box (e.g. PO Box41)
address is a roural route (e.g. RR 5 Box 27)
address is an apartment building (e.g. Gaudy Towers Suite 201)
And we haven't left the USA yet!
Yes, the international variations are horrendous. Some of the
customers that want our products are not in the USA. While they
don't care about the address mess our downstream systems have,
there are many more useful information the want to get from their
own databases into our products.

Thank you,

David


Reply With Quote
  #5  
Old   
David
 
Posts: n/a

Default Re: Searching for a database "translation" tool - 05-11-2006 , 12:53 AM



Thank you Michael.

I'll take a look at Experian.

David

On Wed, 10 May 2006 21:50:34 UTC, Michael Zedeler <michael (AT) zedeler (DOT) dk> wrote:

Quote:
David wrote:
For example, consider a product that needs the name
and address information extracted from a customers
system. [cut]

Have you used or seen such a product? What would
you recommend for a problem like this?

The issues related to merging addresses are so vast that you will
require a product dedicated to do that alone. Besides the many different
available formats, the fact that addresses can be written very
differently but still be pointing to the same place as well as spellings
errors, leaves a very complicated problem to solve.

I believe that Experian has some sort of online data washing solution
for enterprise use. Take a look here:

http://www.experian.com/direct_marke...integrity.html

Bu I am not sure if they will actually be able to provide what you need.
Maybe it is just all hot air.

I think that a data merge between two customer databases should be done
in two steps: (1) each database is washed and to each customer (record),
some global identifier is attached and (2) the two databases are merged
using the global identifiers.

Trying to do direct matching (of inconsistent databases) leaves too many
open ends.

Regards,

Michael.



Reply With Quote
  #6  
Old   
David Segall
 
Posts: n/a

Default Re: Searching for a database "translation" tool - 05-11-2006 , 08:43 AM



"David" <FlyLikeAnEagle (AT) United (DOT) Com> wrote:
Quote:
I'm searching for a tool to help translate databases
or keep two databases in sync. I'd like for this to
be an existing tool that can be bought off the shelf.
I can create something like this, but I have need for
a solution today.
Some years ago I was involved with an application that used
Silverstream for this purpose. That product is now Novell exteNd
<http://www.novell.com/products/extend/>. It is a crippling price but
that has the advantage, in your circumstances, that you will have a
salesman on your doorstep within minutes of your phone call.


Reply With Quote
  #7  
Old   
David
 
Posts: n/a

Default Re: Searching for a database "translation" tool - 05-11-2006 , 09:38 PM



Thank you David. I'll set my stopwatch and give them a call.

I would never have thought of Novell having a product like that.
Then again, I expect that several groups have something they've
used in house and never really tried to market.

On Thu, 11 May 2006 13:43:53 UTC, David Segall <david (AT) nowhere (DOT) net> wrote:

Quote:
"David" <FlyLikeAnEagle (AT) United (DOT) Com> wrote:
I'm searching for a tool to help translate databases
or keep two databases in sync. I'd like for this to
be an existing tool that can be bought off the shelf.
I can create something like this, but I have need for
a solution today.
Some years ago I was involved with an application that used
Silverstream for this purpose. That product is now Novell exteNd
http://www.novell.com/products/extend/>. It is a crippling price but
that has the advantage, in your circumstances, that you will have a
salesman on your doorstep within minutes of your phone call.



Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.