![]() | |
#1
| |||
| |||
|
#2
| |||
| |||
|
|
Stephen, I have done something similar where I work. The difficult part is putting the multiple databases together into one. What I did is I took one database as my starting point. Then I took a second database and designed a routine that looped though each record of the second database to see if that individual was already in my first db. I check for the person's first name and last name. If it was not found, I then check to see if there were any records in the first db that had a variation of the person's first name and matching last name. For example, Bill and William, John and Jonathan, Sue and Susan, etc. If a duplicate was found, I saved a record to a table called Duplicates that contained the data from the record I was checking as well as the possible duplicate from the database. I then went back and checked the possible duplicates. If I found a match, I deleted it. If the possible duplicate was not a match, I then inserted it into the database. I hope that gives you a little direction at least. |
#3
| |||
| |||
|
|
Jeff L wrote: Stephen, I have done something similar where I work. The difficult part is putting the multiple databases together into one. What I did is I took one database as my starting point. Then I took a second database and designed a routine that looped though each record of the second database to see if that individual was already in my first db. I check for the person's first name and last name. If it was not found, I then check to see if there were any records in the first db that had a variation of the person's first name and matching last name. For example, Bill and William, John and Jonathan, Sue and Susan, etc. If a duplicate was found, I saved a record to a table called Duplicates that contained the data from the record I was checking as well as the possible duplicate from the database. I then went back and checked the possible duplicates. If I found a match , I deleted it. If the possible duplicate was not a match, I then inserted it into the database. I hope that gives you a little direction at least. I've done something similar for a customer DB on a project a few years ago. A couple things to consider: * validating the addresses * matching name AND address. There are some software products you can buy (not cheap) that validate your addresses against the US Postal Service address database. They used to have a place on their website (usps.com) where you could manually validate an address against thier DB. Looks like you need special permission to use that service now. You'll find the link for it under the Zipcode look up. When matching names and addresses, it is sometimes easier to match last first. If you have a potential duplicate based on name, then try matching on the address. It might be that Bill Jones is the son of William Jones. And while they live in the same city, they on different parts of town (different street addresses) Or Bill Jones moved when he was promoted to VP and now lists his name more formally to go along with his new job. For a club you might make a report of potential dups and get some human input into what is the real situation. HTH, Ed I agree with this approach. Cleanse the data, standardize |
![]() |
| Thread Tools | |
| Display Modes | |
| |