dbTalk Databases Forums  

Approximate/Fuzzy Search for strings keys

comp.databases.theory comp.databases.theory


Discuss Approximate/Fuzzy Search for strings keys in the comp.databases.theory forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
walid.saba@gmail.com
 
Posts: n/a

Default Approximate/Fuzzy Search for strings keys - 06-19-2008 , 11:36 AM






Dear all,

I have read the literature on fuzzy/approximate search for strings
keys in large databases, but I could not find a clear answer on what
is the best that has been achieved thus far in terms of the cost of
search (regardless of cost of building any index).

I understand this depends on the application/domain/type of data, but,
to keep things simple, supoose the target data is some string (names,
addresses, or any sequence of characters), and assume any distance
function. I am simply asking this: what is the best that has been
achieved? Is there a logarithmic solution (i.e., that can retrieve a
set of "similar" objects in some logarithmic order), or are they all,
in the worst case, linear?

I thank you in advance...


Reply With Quote
  #2  
Old   
user923005
 
Posts: n/a

Default Re: Approximate/Fuzzy Search for strings keys - 06-19-2008 , 03:50 PM






On Jun 19, 9:36*am, walid.s... (AT) gmail (DOT) com wrote:
Quote:
Dear all,

I have read the literature on fuzzy/approximate search for strings
keys in large databases, but I could not find a clear answer on what
is the best that has been achieved thus far in terms of the cost of
search (regardless of cost of building any index).

I understand this depends on the application/domain/type of data, but,
to keep things simple, supoose the target data is some string (names,
addresses, or any sequence of characters), and assume any distance
function. I am simply asking this: what is the best that has been
achieved? Is there a logarithmic solution (i.e., that can retrieve a
set of "similar" objects in some logarithmic order), or are they all,
in the worst case, linear?
Without a clear definition of exactly what you are trying to
accomplish, your question cannot be answered.

If I have a hash table of strings, with no duplicates allowed, and my
hash table is large enough, then insert, update, and delete are all
O(1) for exact strings.

Are the strings short or long?
What are the rules for similarity?
Are you looking at soundex sort of similarity or Levenstein edit
distance or something else?
Can we have exact duplicate strings stored in our repository?
What is the domain data? Personal names? DNA sequences?
Do you expect to return the entire class in the case of a fuzzy match?
I do not think you can expect an exact answer with a fuzzy question
(even when the topic of the question is fuzzy).
I guess if you specify the problem exactly you will get a better
answer.


Reply With Quote
  #3  
Old   
user923005
 
Posts: n/a

Default Re: Approximate/Fuzzy Search for strings keys - 06-19-2008 , 03:50 PM



On Jun 19, 9:36*am, walid.s... (AT) gmail (DOT) com wrote:
Quote:
Dear all,

I have read the literature on fuzzy/approximate search for strings
keys in large databases, but I could not find a clear answer on what
is the best that has been achieved thus far in terms of the cost of
search (regardless of cost of building any index).

I understand this depends on the application/domain/type of data, but,
to keep things simple, supoose the target data is some string (names,
addresses, or any sequence of characters), and assume any distance
function. I am simply asking this: what is the best that has been
achieved? Is there a logarithmic solution (i.e., that can retrieve a
set of "similar" objects in some logarithmic order), or are they all,
in the worst case, linear?
Without a clear definition of exactly what you are trying to
accomplish, your question cannot be answered.

If I have a hash table of strings, with no duplicates allowed, and my
hash table is large enough, then insert, update, and delete are all
O(1) for exact strings.

Are the strings short or long?
What are the rules for similarity?
Are you looking at soundex sort of similarity or Levenstein edit
distance or something else?
Can we have exact duplicate strings stored in our repository?
What is the domain data? Personal names? DNA sequences?
Do you expect to return the entire class in the case of a fuzzy match?
I do not think you can expect an exact answer with a fuzzy question
(even when the topic of the question is fuzzy).
I guess if you specify the problem exactly you will get a better
answer.


Reply With Quote
  #4  
Old   
user923005
 
Posts: n/a

Default Re: Approximate/Fuzzy Search for strings keys - 06-19-2008 , 03:50 PM



On Jun 19, 9:36*am, walid.s... (AT) gmail (DOT) com wrote:
Quote:
Dear all,

I have read the literature on fuzzy/approximate search for strings
keys in large databases, but I could not find a clear answer on what
is the best that has been achieved thus far in terms of the cost of
search (regardless of cost of building any index).

I understand this depends on the application/domain/type of data, but,
to keep things simple, supoose the target data is some string (names,
addresses, or any sequence of characters), and assume any distance
function. I am simply asking this: what is the best that has been
achieved? Is there a logarithmic solution (i.e., that can retrieve a
set of "similar" objects in some logarithmic order), or are they all,
in the worst case, linear?
Without a clear definition of exactly what you are trying to
accomplish, your question cannot be answered.

If I have a hash table of strings, with no duplicates allowed, and my
hash table is large enough, then insert, update, and delete are all
O(1) for exact strings.

Are the strings short or long?
What are the rules for similarity?
Are you looking at soundex sort of similarity or Levenstein edit
distance or something else?
Can we have exact duplicate strings stored in our repository?
What is the domain data? Personal names? DNA sequences?
Do you expect to return the entire class in the case of a fuzzy match?
I do not think you can expect an exact answer with a fuzzy question
(even when the topic of the question is fuzzy).
I guess if you specify the problem exactly you will get a better
answer.


Reply With Quote
  #5  
Old   
user923005
 
Posts: n/a

Default Re: Approximate/Fuzzy Search for strings keys - 06-19-2008 , 03:50 PM



On Jun 19, 9:36*am, walid.s... (AT) gmail (DOT) com wrote:
Quote:
Dear all,

I have read the literature on fuzzy/approximate search for strings
keys in large databases, but I could not find a clear answer on what
is the best that has been achieved thus far in terms of the cost of
search (regardless of cost of building any index).

I understand this depends on the application/domain/type of data, but,
to keep things simple, supoose the target data is some string (names,
addresses, or any sequence of characters), and assume any distance
function. I am simply asking this: what is the best that has been
achieved? Is there a logarithmic solution (i.e., that can retrieve a
set of "similar" objects in some logarithmic order), or are they all,
in the worst case, linear?
Without a clear definition of exactly what you are trying to
accomplish, your question cannot be answered.

If I have a hash table of strings, with no duplicates allowed, and my
hash table is large enough, then insert, update, and delete are all
O(1) for exact strings.

Are the strings short or long?
What are the rules for similarity?
Are you looking at soundex sort of similarity or Levenstein edit
distance or something else?
Can we have exact duplicate strings stored in our repository?
What is the domain data? Personal names? DNA sequences?
Do you expect to return the entire class in the case of a fuzzy match?
I do not think you can expect an exact answer with a fuzzy question
(even when the topic of the question is fuzzy).
I guess if you specify the problem exactly you will get a better
answer.


Reply With Quote
  #6  
Old   
user923005
 
Posts: n/a

Default Re: Approximate/Fuzzy Search for strings keys - 06-19-2008 , 03:50 PM



On Jun 19, 9:36*am, walid.s... (AT) gmail (DOT) com wrote:
Quote:
Dear all,

I have read the literature on fuzzy/approximate search for strings
keys in large databases, but I could not find a clear answer on what
is the best that has been achieved thus far in terms of the cost of
search (regardless of cost of building any index).

I understand this depends on the application/domain/type of data, but,
to keep things simple, supoose the target data is some string (names,
addresses, or any sequence of characters), and assume any distance
function. I am simply asking this: what is the best that has been
achieved? Is there a logarithmic solution (i.e., that can retrieve a
set of "similar" objects in some logarithmic order), or are they all,
in the worst case, linear?
Without a clear definition of exactly what you are trying to
accomplish, your question cannot be answered.

If I have a hash table of strings, with no duplicates allowed, and my
hash table is large enough, then insert, update, and delete are all
O(1) for exact strings.

Are the strings short or long?
What are the rules for similarity?
Are you looking at soundex sort of similarity or Levenstein edit
distance or something else?
Can we have exact duplicate strings stored in our repository?
What is the domain data? Personal names? DNA sequences?
Do you expect to return the entire class in the case of a fuzzy match?
I do not think you can expect an exact answer with a fuzzy question
(even when the topic of the question is fuzzy).
I guess if you specify the problem exactly you will get a better
answer.


Reply With Quote
  #7  
Old   
user923005
 
Posts: n/a

Default Re: Approximate/Fuzzy Search for strings keys - 06-19-2008 , 03:50 PM



On Jun 19, 9:36*am, walid.s... (AT) gmail (DOT) com wrote:
Quote:
Dear all,

I have read the literature on fuzzy/approximate search for strings
keys in large databases, but I could not find a clear answer on what
is the best that has been achieved thus far in terms of the cost of
search (regardless of cost of building any index).

I understand this depends on the application/domain/type of data, but,
to keep things simple, supoose the target data is some string (names,
addresses, or any sequence of characters), and assume any distance
function. I am simply asking this: what is the best that has been
achieved? Is there a logarithmic solution (i.e., that can retrieve a
set of "similar" objects in some logarithmic order), or are they all,
in the worst case, linear?
Without a clear definition of exactly what you are trying to
accomplish, your question cannot be answered.

If I have a hash table of strings, with no duplicates allowed, and my
hash table is large enough, then insert, update, and delete are all
O(1) for exact strings.

Are the strings short or long?
What are the rules for similarity?
Are you looking at soundex sort of similarity or Levenstein edit
distance or something else?
Can we have exact duplicate strings stored in our repository?
What is the domain data? Personal names? DNA sequences?
Do you expect to return the entire class in the case of a fuzzy match?
I do not think you can expect an exact answer with a fuzzy question
(even when the topic of the question is fuzzy).
I guess if you specify the problem exactly you will get a better
answer.


Reply With Quote
  #8  
Old   
user923005
 
Posts: n/a

Default Re: Approximate/Fuzzy Search for strings keys - 06-19-2008 , 03:50 PM



On Jun 19, 9:36*am, walid.s... (AT) gmail (DOT) com wrote:
Quote:
Dear all,

I have read the literature on fuzzy/approximate search for strings
keys in large databases, but I could not find a clear answer on what
is the best that has been achieved thus far in terms of the cost of
search (regardless of cost of building any index).

I understand this depends on the application/domain/type of data, but,
to keep things simple, supoose the target data is some string (names,
addresses, or any sequence of characters), and assume any distance
function. I am simply asking this: what is the best that has been
achieved? Is there a logarithmic solution (i.e., that can retrieve a
set of "similar" objects in some logarithmic order), or are they all,
in the worst case, linear?
Without a clear definition of exactly what you are trying to
accomplish, your question cannot be answered.

If I have a hash table of strings, with no duplicates allowed, and my
hash table is large enough, then insert, update, and delete are all
O(1) for exact strings.

Are the strings short or long?
What are the rules for similarity?
Are you looking at soundex sort of similarity or Levenstein edit
distance or something else?
Can we have exact duplicate strings stored in our repository?
What is the domain data? Personal names? DNA sequences?
Do you expect to return the entire class in the case of a fuzzy match?
I do not think you can expect an exact answer with a fuzzy question
(even when the topic of the question is fuzzy).
I guess if you specify the problem exactly you will get a better
answer.


Reply With Quote
  #9  
Old   
user923005
 
Posts: n/a

Default Re: Approximate/Fuzzy Search for strings keys - 06-19-2008 , 03:50 PM



On Jun 19, 9:36*am, walid.s... (AT) gmail (DOT) com wrote:
Quote:
Dear all,

I have read the literature on fuzzy/approximate search for strings
keys in large databases, but I could not find a clear answer on what
is the best that has been achieved thus far in terms of the cost of
search (regardless of cost of building any index).

I understand this depends on the application/domain/type of data, but,
to keep things simple, supoose the target data is some string (names,
addresses, or any sequence of characters), and assume any distance
function. I am simply asking this: what is the best that has been
achieved? Is there a logarithmic solution (i.e., that can retrieve a
set of "similar" objects in some logarithmic order), or are they all,
in the worst case, linear?
Without a clear definition of exactly what you are trying to
accomplish, your question cannot be answered.

If I have a hash table of strings, with no duplicates allowed, and my
hash table is large enough, then insert, update, and delete are all
O(1) for exact strings.

Are the strings short or long?
What are the rules for similarity?
Are you looking at soundex sort of similarity or Levenstein edit
distance or something else?
Can we have exact duplicate strings stored in our repository?
What is the domain data? Personal names? DNA sequences?
Do you expect to return the entire class in the case of a fuzzy match?
I do not think you can expect an exact answer with a fuzzy question
(even when the topic of the question is fuzzy).
I guess if you specify the problem exactly you will get a better
answer.


Reply With Quote
  #10  
Old   
user923005
 
Posts: n/a

Default Re: Approximate/Fuzzy Search for strings keys - 06-19-2008 , 03:50 PM



On Jun 19, 9:36*am, walid.s... (AT) gmail (DOT) com wrote:
Quote:
Dear all,

I have read the literature on fuzzy/approximate search for strings
keys in large databases, but I could not find a clear answer on what
is the best that has been achieved thus far in terms of the cost of
search (regardless of cost of building any index).

I understand this depends on the application/domain/type of data, but,
to keep things simple, supoose the target data is some string (names,
addresses, or any sequence of characters), and assume any distance
function. I am simply asking this: what is the best that has been
achieved? Is there a logarithmic solution (i.e., that can retrieve a
set of "similar" objects in some logarithmic order), or are they all,
in the worst case, linear?
Without a clear definition of exactly what you are trying to
accomplish, your question cannot be answered.

If I have a hash table of strings, with no duplicates allowed, and my
hash table is large enough, then insert, update, and delete are all
O(1) for exact strings.

Are the strings short or long?
What are the rules for similarity?
Are you looking at soundex sort of similarity or Levenstein edit
distance or something else?
Can we have exact duplicate strings stored in our repository?
What is the domain data? Personal names? DNA sequences?
Do you expect to return the entire class in the case of a fuzzy match?
I do not think you can expect an exact answer with a fuzzy question
(even when the topic of the question is fuzzy).
I guess if you specify the problem exactly you will get a better
answer.


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.