dbTalk Databases Forums  

Fuzzy matching of names/addresses for fraud prevention

comp.databases comp.databases


Discuss Fuzzy matching of names/addresses for fraud prevention in the comp.databases forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
nowhere@home.com
 
Posts: n/a

Default Fuzzy matching of names/addresses for fraud prevention - 07-10-2003 , 10:51 AM






Hi,

We need to 'filter' real-time transactions that can contain names and
addresses against a blacklist of names and address held in a database.

Is there a good 'standard' way of doing this given that there may be
spelling or format differences between the original and the blacklist?
The addresses would probably not be limited to one single country so
I doubt we could make many assumptions about the address formats.

Ideally we want to calculate a number which gives a 'closeness' to
each name/address on the blacklist. If the maximum value calculated
is above some threshold we can assume that the person is blacklisted.

Also if anyone knows of a reasonably priced library which could do
this then we would also be interested.

The code to implement this would probably be written in C or Java, if
that makes a difference. As this would be a real-time filter then
speed would be a major factor in deciding what solution to pick. As
yet I have no accuracy requirements for this project.

If anyone has any useful suggestions I would be please to read them.

Thanks, Mark


Reply With Quote
  #2  
Old   
Paul E. Black
 
Posts: n/a

Default Re: Fuzzy matching of names/addresses for fraud prevention - 07-10-2003 , 11:14 AM






nowhere (AT) home (DOT) com wrote:
Quote:
We need to 'filter' real-time transactions that can contain names and
addresses against a blacklist of names and address held in a database.

Is there a good 'standard' way of doing this given that there may be
spelling or format differences between the original and the blacklist?
Matching names (people's names, as well as street or city names) might
be done with a hueristic like double metaphone
http://www.nist.gov/dads/HTML/doubleMetaphone.html
or soundex
http://www.nist.gov/dads/HTML/soundex.html

The U.S. Census Bureau runs into this kind of problem all the time and
may be willing to share results or direct you to packages.

-paul-
--
Paul E. Black (p.black (AT) acm (DOT) org)


Reply With Quote
  #3  
Old   
Ben Pfaff
 
Posts: n/a

Default Re: Fuzzy matching of names/addresses for fraud prevention - 07-10-2003 , 01:20 PM



"Paul E. Black" <p.black (AT) acm (DOT) org> writes:

Quote:
nowhere (AT) home (DOT) com wrote:
We need to 'filter' real-time transactions that can contain names and
addresses against a blacklist of names and address held in a database.

Is there a good 'standard' way of doing this given that there may be
spelling or format differences between the original and the blacklist?

Matching names (people's names, as well as street or city names) might
be done with a hueristic like double metaphone
http://www.nist.gov/dads/HTML/doubleMetaphone.html
or soundex
http://www.nist.gov/dads/HTML/soundex.html
It's amusing that you suggest Soundex for this purpose, seeing
how well it has worked out for keeping terrorists off airplanes.


Reply With Quote
  #4  
Old   
osmium
 
Posts: n/a

Default Re: Fuzzy matching of names/addresses for fraud prevention - 07-10-2003 , 02:27 PM



<nowhere (AT) home (DOT) com> wrote:

Quote:
We need to 'filter' real-time transactions that can contain names and
addresses against a blacklist of names and address held in a database.

Is there a good 'standard' way of doing this given that there may be
spelling or format differences between the original and the blacklist?
The addresses would probably not be limited to one single country so
I doubt we could make many assumptions about the address formats.

Ideally we want to calculate a number which gives a 'closeness' to
each name/address on the blacklist. If the maximum value calculated
is above some threshold we can assume that the person is blacklisted.

Also if anyone knows of a reasonably priced library which could do
this then we would also be interested.

The code to implement this would probably be written in C or Java, if
that makes a difference. As this would be a real-time filter then
speed would be a major factor in deciding what solution to pick. As
yet I have no accuracy requirements for this project.

If anyone has any useful suggestions I would be please to read them.
Knuth describes a method called 'soundex' in Vol 3 p. 391. Googling on
soundex might be worthwhile.




Reply With Quote
  #5  
Old   
Ed Guy
 
Posts: n/a

Default Re: Fuzzy matching of names/addresses for fraud prevention - 07-11-2003 , 01:48 AM



osmium wrote:

Quote:
nowhere (AT) home (DOT) com> wrote:

We need to 'filter' real-time transactions that can contain names and
addresses against a blacklist of names and address held in a database.

Is there a good 'standard' way of doing this given that there may be
spelling or format differences between the original and the blacklist?
The addresses would probably not be limited to one single country so
I doubt we could make many assumptions about the address formats.

Ideally we want to calculate a number which gives a 'closeness' to
each name/address on the blacklist. If the maximum value calculated
is above some threshold we can assume that the person is blacklisted.

Also if anyone knows of a reasonably priced library which could do
this then we would also be interested.

The code to implement this would probably be written in C or Java, if
that makes a difference. As this would be a real-time filter then
speed would be a major factor in deciding what solution to pick. As
yet I have no accuracy requirements for this project.

If anyone has any useful suggestions I would be please to read them.

Knuth describes a method called 'soundex' in Vol 3 p. 391. Googling on
soundex might be worthwhile.
It's a lot older than Knuth. It predates computers. I built it into
ParseRat (http://www.parserat.com) as an option to assist in de-duplicating
lists.

The algorithm is simple, but breaks down with sound-alike INITIAL letters.
e.g. it won't match "phone" and "fone".

--
Ed Guy P.Eng,CDP,MIEE
Information Technology Consultant
Internet: ed (AT) guysoftware (DOT) com
http://www.guysoftware.com
"Check out HELLLP!, WinHelp author tool for WinWord 2.0 through 8.0,
PlanBee Project Management Planning System
and ParseRat, the File Parser, Converter and Reorganizer"




Reply With Quote
  #6  
Old   
--CELKO--
 
Posts: n/a

Default Re: Fuzzy matching of names/addresses for fraud prevention - 07-11-2003 , 04:18 PM



Quote:
We need to 'filter' real-time transactions that can contain names
and addresses against a blacklist of names and address held in a
database. <<

Search Software America
1445 East Putnam Ave
Old Greenwich, CT 06870

Ask or their booklet "The Math, Myth & Magic of Name Searching and
Matching". It is quite informative.


Reply With Quote
  #7  
Old   
nowhere@home.com
 
Posts: n/a

Default Re: Fuzzy matching of names/addresses for fraud prevention - 07-14-2003 , 04:12 AM



On Fri, 11 Jul 2003 23:58:21 +0800, Andy Dent <dent (AT) oofile (DOT) com.au>
wrote:

Quote:
In article <qa2rgvoo0m8r7fj4epus11khcqsk10gse3 (AT) 4ax (DOT) com>,
nowhere (AT) home (DOT) com wrote:

We need to 'filter' real-time transactions that can contain names and
addresses against a blacklist of names and address held in a database.

Is there a good 'standard' way of doing this given that there may be
spelling or format differences between the original and the blacklist?

I have code which is an extension of code I found on the net (so it's
yours for free but if you want help with it I'll have to charge as I'm
very busy)

The original didn't say anything about the algorithm but based on
reading since, I think it's Metaphone.

I enhanced it considerable to cope with Latin names in plants (eg:
eucalypt found by yookalipd).

It was used very successfully for matching street names years ago here
in Perth for a bike hazard reporting system which had to work out
reports of the same hazard.


The code to implement this would probably be written in C or Java
The code is available in 4th Dimension (proprietary 4GL langauge) or C++.
This sounds ideal.

I would be very grateful if you could send me any code in C++. Please
use the email address q2qp3l502 (AT) sneakemail (DOT) com

Regards, Mark



Reply With Quote
  #8  
Old   
Wyatt Matthews
 
Posts: n/a

Default Re: Fuzzy matching of names/addresses for fraud prevention - 08-11-2003 , 05:01 PM




"Ed Guy" <ed_guy (AT) shaw (DOT) ca> wrote

Quote:
osmium wrote:

nowhere (AT) home (DOT) com> wrote:

We need to 'filter' real-time transactions that can contain names and
addresses against a blacklist of names and address held in a database.

Is there a good 'standard' way of doing this given that there may be
spelling or format differences between the original and the blacklist?
The addresses would probably not be limited to one single country so
I doubt we could make many assumptions about the address formats.

Ideally we want to calculate a number which gives a 'closeness' to
each name/address on the blacklist. If the maximum value calculated
is above some threshold we can assume that the person is blacklisted.

Also if anyone knows of a reasonably priced library which could do
this then we would also be interested.

The code to implement this would probably be written in C or Java, if
that makes a difference. As this would be a real-time filter then
speed would be a major factor in deciding what solution to pick. As
yet I have no accuracy requirements for this project.

If anyone has any useful suggestions I would be please to read them.

Knuth describes a method called 'soundex' in Vol 3 p. 391. Googling on
soundex might be worthwhile.

It's a lot older than Knuth. It predates computers. I built it into
ParseRat (http://www.parserat.com) as an option to assist in
de-duplicating
lists.

The algorithm is simple, but breaks down with sound-alike INITIAL letters.
e.g. it won't match "phone" and "fone".

--
Ed Guy P.Eng,CDP,MIEE
Information Technology Consultant
Internet: ed (AT) guysoftware (DOT) com
http://www.guysoftware.com
"Check out HELLLP!, WinHelp author tool for WinWord 2.0 through 8.0,
PlanBee Project Management Planning System
and ParseRat, the File Parser, Converter and Reorganizer"


Try predicating the cleaned list with a specific letter and then
re-soundexing.




Reply With Quote
  #9  
Old   
William Brogden
 
Posts: n/a

Default Re: Fuzzy matching of names/addresses for fraud prevention - 08-11-2003 , 08:00 PM




"Wyatt Matthews" <agnau (AT) sbcglobaldot (DOT) net> wrote

Quote:
"Ed Guy" <ed_guy (AT) shaw (DOT) ca> wrote

osmium wrote:

nowhere (AT) home (DOT) com> wrote:

We need to 'filter' real-time transactions that can contain names
and
addresses against a blacklist of names and address held in a
database.

Is there a good 'standard' way of doing this given that there may be
spelling or format differences between the original and the
blacklist?
The addresses would probably not be limited to one single country so
I doubt we could make many assumptions about the address formats.

Ideally we want to calculate a number which gives a 'closeness' to
each name/address on the blacklist. If the maximum value calculated
is above some threshold we can assume that the person is
blacklisted.

Also if anyone knows of a reasonably priced library which could do
this then we would also be interested.

The code to implement this would probably be written in C or Java,
if
that makes a difference. As this would be a real-time filter then
speed would be a major factor in deciding what solution to pick. As
yet I have no accuracy requirements for this project.

If anyone has any useful suggestions I would be please to read them.

Knuth describes a method called 'soundex' in Vol 3 p. 391. Googling
on
soundex might be worthwhile.

It's a lot older than Knuth. It predates computers. I built it into
ParseRat (http://www.parserat.com) as an option to assist in
de-duplicating
lists.

The algorithm is simple, but breaks down with sound-alike INITIAL
letters.
e.g. it won't match "phone" and "fone".

--
Ed Guy P.Eng,CDP,MIEE
Information Technology Consultant
Internet: ed (AT) guysoftware (DOT) com
http://www.guysoftware.com
"Check out HELLLP!, WinHelp author tool for WinWord 2.0 through 8.0,
PlanBee Project Management Planning System
and ParseRat, the File Parser, Converter and Reorganizer"


Try predicating the cleaned list with a specific letter and then
re-soundexing.
Use Metaphone instead of soundex to get around that initial letter
limitation




Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.