dbTalk Databases Forums  

Real-Time Fuzzy Lookup

microsoft.public.sqlserver.dts microsoft.public.sqlserver.dts


Discuss Real-Time Fuzzy Lookup in the microsoft.public.sqlserver.dts forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Brian Cowhig
 
Posts: n/a

Default Real-Time Fuzzy Lookup - 11-08-2005 , 08:25 PM






I am implementing Web site functionality whereby a user who enters a string
(which will usually not contain spaces/words) will be presented with the
nearest pre-existing string that matches their entry. Ideally, from what
I've seen of SQL Server 2005's functionality, I would like to apply Fuzzy
Lookup to the user's entry; however, that functionality appears to be
limited to ETL (i.e. batched external data loading) processing, not
real-time single-value lookup. A data mining query might meet my
requirements, but my understanding of the available data mining models (such
as text mining) is that they are word-based or otherwise not appropriate for
this application.

Is there some version of Fuzzy Lookup functionality in SQL Server 2005 that
would be appropriate for this real-time loose-string lookup?

Thanks in advance,

-- Brian Cowhig



Reply With Quote
  #2  
Old   
Allan Mitchell
 
Posts: n/a

Default Re: Real-Time Fuzzy Lookup - 11-09-2005 , 12:14 AM






Would something like Full Text search be more applicable?

ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/tsqlref9/html/996c72fc-b1ab-4c96-bd12-946be9c18f84.htm

Allan

"Brian Cowhig" <BCowhig (AT) yahoo (DOT) com> wrote


Quote:
I am implementing Web site functionality whereby a user who enters a
string
(which will usually not contain spaces/words) will be presented with the
nearest pre-existing string that matches their entry. Ideally, from
what
I've seen of SQL Server 2005's functionality, I would like to apply
Fuzzy
Lookup to the user's entry; however, that functionality appears to be
limited to ETL (i.e. batched external data loading) processing, not
real-time single-value lookup. A data mining query might meet my
requirements, but my understanding of the available data mining models
(such
as text mining) is that they are word-based or otherwise not appropriate
for
this application.

Is there some version of Fuzzy Lookup functionality in SQL Server 2005
that
would be appropriate for this real-time loose-string lookup?

Thanks in advance,

-- Brian Cowhig


Reply With Quote
  #3  
Old   
Darren Green
 
Posts: n/a

Default Re: Real-Time Fuzzy Lookup - 11-09-2005 , 03:26 AM



The data mining APIs are exposed for this type of scenario, the fuzzy lookup
is not, it is only available through a package.

Data mining is perhaps more appropriate to validate input, how likely is it
that your input fields are correct, the 13 year old boy with 12 kids is
unlikely, well it would be against my data in a trained model.

Have a search for soundex and related algorithms. There are several out
there.

Perhaps some domain validation may be better, such as using a postcode field
to help populate an address.

A simple drop down of values, rather than free text. If you allow free text
as well as drop-down you could then text mine this later to determine new
values for the drop-down.

--
Darren Green
http://www.sqldts.com
http://www.sqlis.com


"Brian Cowhig" <BCowhig (AT) yahoo (DOT) com> wrote

Quote:
I am implementing Web site functionality whereby a user who enters a string
(which will usually not contain spaces/words) will be presented with the
nearest pre-existing string that matches their entry. Ideally, from what
I've seen of SQL Server 2005's functionality, I would like to apply Fuzzy
Lookup to the user's entry; however, that functionality appears to be
limited to ETL (i.e. batched external data loading) processing, not
real-time single-value lookup. A data mining query might meet my
requirements, but my understanding of the available data mining models
(such as text mining) is that they are word-based or otherwise not
appropriate for this application.

Is there some version of Fuzzy Lookup functionality in SQL Server 2005
that would be appropriate for this real-time loose-string lookup?

Thanks in advance,

-- Brian Cowhig




Reply With Quote
  #4  
Old   
Brian Cowhig
 
Posts: n/a

Default Re: Real-Time Fuzzy Lookup - 11-14-2005 , 02:44 PM



Thanks for your reply, Allan. However, my experience with full-text indexing is that it only operates on whole words, separated by spaces, hyphens, or other delimiters. In this case, I want to be able to match on similar strings that are not broken into words. For example, 'ABC123' would have a stronger match with '123ABC' than with 'ABC'. From what I've seen (and researched) of Integration Services' Fuzzy Lookup and Fuzzy Grouping processing, it has the capability of doing loose (or fuzzy) matching such as that, but only during ETL activities, not as part of typical transactional database activities.

-- Brian
"Allan Mitchell" <allan (AT) no-spam (DOT) sqldts.com> wrote

Would something like Full Text search be more applicable?

ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/tsqlref9/html/996c72fc-b1ab-4c96-bd12-946be9c18f84.htm

Allan

"Brian Cowhig" <BCowhig (AT) yahoo (DOT) com> wrote


Quote:
I am implementing Web site functionality whereby a user who enters a
string
(which will usually not contain spaces/words) will be presented with the
nearest pre-existing string that matches their entry. Ideally, from
what
I've seen of SQL Server 2005's functionality, I would like to apply
Fuzzy
Lookup to the user's entry; however, that functionality appears to be
limited to ETL (i.e. batched external data loading) processing, not
real-time single-value lookup. A data mining query might meet my
requirements, but my understanding of the available data mining models
(such
as text mining) is that they are word-based or otherwise not appropriate
for
this application.

Is there some version of Fuzzy Lookup functionality in SQL Server 2005
that
would be appropriate for this real-time loose-string lookup?

Thanks in advance,

-- Brian Cowhig


Reply With Quote
  #5  
Old   
Allan Mitchell
 
Posts: n/a

Default Re: Real-Time Fuzzy Lookup - 11-15-2005 , 02:44 PM



Yep. Transactional SSIS is probably not your bag then. As for FTS
Hilary Cotter is the man. If he doesn't know it then it can't be done.
I'll send him your way.

Allan

"Brian Cowhig" <BCowhig (AT) yahoo (DOT) com> wrote


Quote:
Thanks for your reply, Allan. However, my experience with full-text
indexing is that it only operates on whole words, separated by spaces,
hyphens, or other delimiters. In this case, I want to be able to match
on similar strings that are not broken into words. For example,
'ABC123' would have a stronger match with '123ABC' than with 'ABC'.
From what I've seen (and researched) of Integration Services' Fuzzy
Lookup and Fuzzy Grouping processing, it has the capability of doing
loose (or fuzzy) matching such as that, but only during ETL activities,
not as part of typical transactional database activities.

-- Brian

"Allan Mitchell" <allan (AT) no-spam (DOT) sqldts.com> wrote in message
news:OWvfOTP5FHA.1276 (AT) TK2MSFTNGP09 (DOT) phx.gbl...
Would something like Full Text search be more applicable?


ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/tsqlref9/html/996c72fc-b1ab-4c96-b
d12-946be9c18f84.htm

Allan

"Brian Cowhig" <BCowhig (AT) yahoo (DOT) com> wrote in message
news:eFnVkTN5FHA.1248 (AT) TK2MSFTNGP14 (DOT) phx.gbl:

> I am implementing Web site functionality whereby a user who
enters a
> string
> (which will usually not contain spaces/words) will be
presented with the
> nearest pre-existing string that matches their entry.
Ideally, from
> what
> I've seen of SQL Server 2005's functionality, I would like to
apply
> Fuzzy
> Lookup to the user's entry; however, that functionality
appears to be
> limited to ETL (i.e. batched external data loading)
processing, not
> real-time single-value lookup. A data mining query might meet
my
> requirements, but my understanding of the available data
mining models
> (such
> as text mining) is that they are word-based or otherwise not
appropriate
> for
> this application.

> Is there some version of Fuzzy Lookup functionality in SQL
Server 2005
> that
> would be appropriate for this real-time loose-string lookup?

> Thanks in advance,

> -- Brian Cowhig



Reply With Quote
  #6  
Old   
Hilary Cotter
 
Posts: n/a

Default Re: Real-Time Fuzzy Lookup - 11-15-2005 , 02:50 PM



Basically what you want to do is build an inverted file index and then do a
like to get a list of rows with the match.

If you know for a fact that you would only be looking at the first 4 or 7
lets say letters you would index only the first 4 or 7 letters and then not
have to do the relatively expensive like. Note your like here will perform
better than a full column like.

Hilary
"Brian Cowhig" <BCowhig (AT) yahoo (DOT) com> wrote

Thanks for your reply, Allan. However, my experience with full-text
indexing is that it only operates on whole words, separated by spaces,
hyphens, or other delimiters. In this case, I want to be able to match on
similar strings that are not broken into words. For example, 'ABC123' would
have a stronger match with '123ABC' than with 'ABC'. From what I've seen
(and researched) of Integration Services' Fuzzy Lookup and Fuzzy Grouping
processing, it has the capability of doing loose (or fuzzy) matching such as
that, but only during ETL activities, not as part of typical transactional
database activities.

-- Brian
"Allan Mitchell" <allan (AT) no-spam (DOT) sqldts.com> wrote

Would something like Full Text search be more applicable?

ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/tsqlref9/html/996c72fc-b1ab-4c96-bd12-946be9c18f84.htm

Allan

"Brian Cowhig" <BCowhig (AT) yahoo (DOT) com> wrote


Quote:
I am implementing Web site functionality whereby a user who enters a
string
(which will usually not contain spaces/words) will be presented with the
nearest pre-existing string that matches their entry. Ideally, from
what
I've seen of SQL Server 2005's functionality, I would like to apply
Fuzzy
Lookup to the user's entry; however, that functionality appears to be
limited to ETL (i.e. batched external data loading) processing, not
real-time single-value lookup. A data mining query might meet my
requirements, but my understanding of the available data mining models
(such
as text mining) is that they are word-based or otherwise not appropriate
for
this application.

Is there some version of Fuzzy Lookup functionality in SQL Server 2005
that
would be appropriate for this real-time loose-string lookup?

Thanks in advance,

-- Brian Cowhig



Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.