dbTalk Databases Forums  

Re: Help with Database Solution

comp.databases comp.databases


Discuss Re: Help with Database Solution in the comp.databases forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Jeff L
 
Posts: n/a

Default Re: Help with Database Solution - 08-04-2006 , 09:03 AM






Stephen,
I have done something similar where I work. The difficult part is
putting the multiple databases together into one. What I did is I took
one database as my starting point. Then I took a second database and
designed a routine that looped though each record of the second
database to see if that individual was already in my first db. I check
for the person's first name and last name. If it was not found, I then
check to see if there were any records in the first db that had a
variation of the person's first name and matching last name. For
example, Bill and William, John and Jonathan, Sue and Susan, etc. If a
duplicate was found, I saved a record to a table called Duplicates that
contained the data from the record I was checking as well as the
possible duplicate from the database. I then went back and checked the
possible duplicates. If I found a match, I deleted it. If the
possible duplicate was not a match, I then inserted it into the
database.

I hope that gives you a little direction at least.


Reply With Quote
  #2  
Old   
Ed Prochak
 
Posts: n/a

Default Re: Help with Database Solution - 08-04-2006 , 01:18 PM







Jeff L wrote:
Quote:
Stephen,
I have done something similar where I work. The difficult part is
putting the multiple databases together into one. What I did is I took
one database as my starting point. Then I took a second database and
designed a routine that looped though each record of the second
database to see if that individual was already in my first db. I check
for the person's first name and last name. If it was not found, I then
check to see if there were any records in the first db that had a
variation of the person's first name and matching last name. For
example, Bill and William, John and Jonathan, Sue and Susan, etc. If a
duplicate was found, I saved a record to a table called Duplicates that
contained the data from the record I was checking as well as the
possible duplicate from the database. I then went back and checked the
possible duplicates. If I found a match, I deleted it. If the
possible duplicate was not a match, I then inserted it into the
database.

I hope that gives you a little direction at least.
I've done something similar for a customer DB on a project a few years
ago. A couple things to consider:
* validating the addresses
* matching name AND address.

There are some software products you can buy (not cheap) that validate
your addresses against the US Postal Service address database. They
used to have a place on their website (usps.com) where you could
manually validate an address against thier DB. Looks like you need
special permission to use that service now. You'll find the link for it
under the Zipcode look up.

When matching names and addresses, it is sometimes easier to match last
first. If you have a potential duplicate based on name, then try
matching on the address. It might be that Bill Jones is the son of
William Jones. And while they live in the same city, they on different
parts of town (different street addresses) Or Bill Jones moved when he
was promoted to VP and now lists his name more formally to go along
with his new job.

For a club you might make a report of potential dups and get some human
input into what is the real situation.

HTH,
Ed



Reply With Quote
  #3  
Old   
dataman
 
Posts: n/a

Default Re: Help with Database Solution - 08-04-2006 , 04:33 PM



Quote:
Jeff L wrote:
Stephen,
I have done something similar where I work. The
difficult part is putting the multiple databases
together into one. What I did is I took one database as
my starting point. Then I took a second database and
designed a routine that looped though each record of the
second database to see if that individual was already in
my first db. I check for the person's first name and
last name. If it was not found, I then check to see if
there were any records in the first db that had a
variation of the person's first name and matching last
name. For example, Bill and William, John and Jonathan,
Sue and Susan, etc. If a duplicate was found, I saved a
record to a table called Duplicates that contained the
data from the record I was checking as well as the
possible duplicate from the database. I then went back
and checked the possible duplicates. If I found a match
, I deleted it. If the possible duplicate was not a
match, I then inserted it into the database.

I hope that gives you a little direction at least.

I've done something similar for a customer DB on a project
a few years ago. A couple things to consider:
* validating the addresses
* matching name AND address.

There are some software products you can buy (not cheap)
that validate your addresses against the US Postal Service
address database. They used to have a place on their
website (usps.com) where you could manually validate an
address against thier DB. Looks like you need special
permission to use that service now. You'll find the link
for it under the Zipcode look up.

When matching names and addresses, it is sometimes easier
to match last first. If you have a potential duplicate
based on name, then try matching on the address. It might
be that Bill Jones is the son of William Jones. And while
they live in the same city, they on different parts of
town (different street addresses) Or Bill Jones moved
when he was promoted to VP and now lists his name more
formally to go along with his new job.

For a club you might make a report of potential dups and
get some human input into what is the real situation.

HTH,
Ed

I agree with this approach. Cleanse the data, standardize
the data structure, de-dup the records from all databases,
load into uniform structure.


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.