dbTalk Databases Forums  

Return most likely results from MATCH - AGAINST

comp.databases.mysql comp.databases.mysql


Discuss Return most likely results from MATCH - AGAINST in the comp.databases.mysql forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
jwcarlton
 
Posts: n/a

Default Return most likely results from MATCH - AGAINST - 08-09-2011 , 09:46 PM






Let's say that someone is searching for, say, "billybobs" or
"billybobsslop", and the field contains "Billy Bob's Homemade Slop
House & Bar" (spaces, non-alphanumeric characters, etc).

How can I write the query so that, if there's not an exact result,
then return the most likely?

Here's what I'm currently using:

SELECT SQL_CACHE DISTINCT id, MATCH(name) AGAINST ('billybobsslop' IN
BOOLEAN MODE) AS occurrences FROM restaurants WHERE MATCH(name)
AGAINST ('billybobsslop' IN BOOLEAN MODE)


TIA,

Jason

Reply With Quote
  #2  
Old   
onedbguru
 
Posts: n/a

Default Re: Return most likely results from MATCH - AGAINST - 08-10-2011 , 05:33 PM






On Aug 9, 10:46*pm, jwcarlton <jwcarl... (AT) gmail (DOT) com> wrote:
Quote:
Let's say that someone is searching for, say, "billybobs" or
"billybobsslop", and the field contains "Billy Bob's Homemade Slop
House & Bar" (spaces, non-alphanumeric characters, etc).

How can I write the query so that, if there's not an exact result,
then return the most likely?

Here's what I'm currently using:

SELECT SQL_CACHE DISTINCT id, MATCH(name) AGAINST ('billybobsslop' IN
BOOLEAN MODE) AS occurrences FROM restaurants WHERE MATCH(name)
AGAINST ('billybobsslop' IN BOOLEAN MODE)

TIA,

Jason

So, does it work? What do you get when you execute it? Is it what you
expect? How does it stack up against the examples in the docs?

http://dev.mysql.com/doc/refman/5.5/...t-boolean.html

Reply With Quote
  #3  
Old   
jwcarlton
 
Posts: n/a

Default Re: Return most likely results from MATCH - AGAINST - 08-11-2011 , 03:49 AM



Quote:
I had originally just changed any space or non-alphanumeric character
to a -, then de-duped the - (so I wouldn't have 2 - in a row), then
instead of MATCH AGAINST I had replaced the - with %, and used LIKE.
Which works great if the address given is something like:

http://www.example.com/billy-bobs-slophouse/

Given *BY WHOM*? *The person conducting the search? *The
person who wants his info entered into your database so people
can find him? *
Thanks for the well-thought-out reply, Gordon. I realize, though, that
I need to make something a tad more clear.

I'm not referring to people typing this in to a search engine. In this
case, Billy Bob himself would have given his customer the website
address, probably on his menu or a business card. But, assuming that
it's on his menu, then people aren't going to remember an exact URL
when they get home, which is why I'm hoping to make it a bit more
intuitive.

This, of course, also means that he can't very well be expected to
give out an address like:

http://www.example.com/billy%20bob%27s%20slophouse/

I had considered another column that was loaded up with potential
search phrases, but I realize that it could become a bit large, have a
lot of room for programming error, and worse, run the risk of
duplicating another listing. That's why I was hoping there was a query
modification to just find the most likely result.

Reply With Quote
  #4  
Old   
Gordon Burditt
 
Posts: n/a

Default Re: Return most likely results from MATCH - AGAINST - 08-11-2011 , 11:55 PM



Quote:
case, Billy Bob himself would have given his customer the website
address, probably on his menu or a business card. But, assuming that
it's on his menu, then people aren't going to remember an exact URL
when they get home, which is why I'm hoping to make it a bit more
intuitive.
The most intuitive URL is probably:

http://www.billybobs.com/

However, I still think they'll type it into a Google or Bing search
box rather than the address bar, because they don't know what an
address bar is, and it's not necessarily visible.

Quote:
This, of course, also means that he can't very well be expected to
give out an address like:

http://www.example.com/billy%20bob%27s%20slophouse/
Ok, use underscores and/or dashes to separate words, and translate
both to spaces for the query. That's easier to remember than
squished-together words. And sometimes the way I parse run-together
URLS into words is *NOT* the way the owner intended, and some of
them are rather obscene.

If you don't get ANY spaces from the user (and no matches), you
can try hunting for relevant words in the user's input. You don't
care if these words are in the right order, or they overlap, or
whatever.

I assume you've already got a table with the customer's business
name in it (business_name), and that it has some kind of primary key
(which I'll call "id"). Now make a wordlist table which contains
the customer "id" and a word in his business name, one per record.
Mr. "Scrap It and Redesign It before changing the schema" will object
that this is redundant data. It is. So are indexes.

You might have in your customer table:

id business_name
37 "Billy Bob's Homemade Slophouse and Bar"
42 "Bobo's House of Whores"

and in your word table:

id word
37 billy
37 bob
37 homemade
37 slop
37 house
37 bar
42 bobo
42 house
42 whore

Now do a query like this:

select cust.id, cust.business_name, count(word.id) as score
from customer cust
left join wordtable word on cust.id = word.id
where '${userinputquery}' like concat('%', word.word, '%')
group by cust.id
order by score desc;

You get a list of possible matches, with the one with the best match
score first (ties are, of course, possible). You get one point for
each word matched. Some tips for the word table:

Avoid really short noise words ("the" and "of", for example). Avoid
using plurals of words. If you've got one long word that might be
two (slophouse), they might be remembered separately, so break them
up. Or you might give one point for "slop", one for "house" and
another for getting them next to each other in the right order
("slophouse"). You can bump up the priority of a word by putting
it in twice for a given customer.

A query for "bobos whore house" gets 3 points for customer #42
(bobo, whore, house) and 2 points for customer #37 (bob, house).

A one-word business name might have trouble competing with coincidental
matches with other businesses with longer names (e.g. if the customer
remembers "Exxon Oil", Exxon now gets tied with any business having
"oil" in its name, including "Bubble, Bubble, Toil and Trouble", a
local witches supply house.


Quote:
I had considered another column that was loaded up with potential
search phrases, but I realize that it could become a bit large, have a
lot of room for programming error, and worse, run the risk of
duplicating another listing. That's why I was hoping there was a query
modification to just find the most likely result.
I still wonder if the best approach is to put up a page with links
to all your customers (in randomized order), and have them click
on the one they wanted. How many different customers do you need
to choose between with this keyword mess?

Incidentally, does anyone really consider "slophouse" a non-derogatory
term to apply to a food business?

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.