dbTalk Databases Forums  

Search query, adding weight to exact result

comp.databases.mysql comp.databases.mysql


Discuss Search query, adding weight to exact result in the comp.databases.mysql forum.



Reply
 
Thread Tools Display Modes
  #11  
Old   
jwcarlton
 
Posts: n/a

Default Re: Search query, adding weight to exact result - 07-20-2011 , 09:35 PM






On Jul 17, 2:21*pm, onedbguru <onedbg... (AT) yahoo (DOT) com> wrote:
Quote:
On Jul 16, 3:29*am, Luuk <L... (AT) invalid (DOT) lan> wrote:



On 16-07-2011 01:42, onedbguru wrote:

On Jul 9, 9:15 am, Luuk <L... (AT) invalid (DOT) lan> wrote:
On 09-07-2011 05:07, jwcarlton wrote:

For a simple message board search engine, I'm using the following
query (in PHP, although I don't think the language matters):

$query = sprintf("
SELECT DISTINCT id, MATCH(subject, username, comment)
AGAINST ('%s' IN BOOLEAN MODE)
AS occurrences
FROM forum_posts WHERE
MATCH(subject, username, comment)
AGAINST ('%s' IN BOOLEAN MODE)
ORDER BY occurrences DESC",

$_GET['search'],
$_GET['search']);

(Note that the breaks are added here in an attempt to make that easier
for you to read.)

What I'm finding, though, is that I can have a thread with a lot of
posts that have the individual keywords (but not as an actual phrase)
that will have a higher result than a thread with less posts, but the
exact phrase.

For example, let's say that the search is for "Jason is cool". A post
that says "Jason went outside and said 'the weather today is pretty
cool'" is going to have the same weight as one that actually says
"Jason is cool", even though the second would be more likely to be
what the searcher is wanting.

So the question is, how do I modify the query to give a higher weight
to an exact result?

Your current search is:
'Jason is cool'

Try:
'>"Jason is cool" Jason is cool'

This should give 'Jason is cool' extra weight

see:http://dev.mysql.com/doc/refman/5.5/...t-boolean.html

--
Luuk

why would you not use
:
'>"Jason is cool" +Jason +is +cool'

As I read the docs, This says use give extra weight to "Jason is cool"
searching for 'Jason AND is AND cool' *although "is" is a noise word
that would get discarded anyway. Using 'Jason is cool' by itself
infers search for 'Jason OR is OR cool' with no specific weight or
ranking.

From the docs
"Note
In implementing this feature, MySQL uses what is sometimes referred to
as implied Boolean logic, in which
+ stands for AND
- stands for NOT
[no operator] implies OR
"
How you implement that in your search feature is going to be fun...

i did a simple test, and the '+' does not look like you want to have (in
this case)

mysql (test)> select i,t, match(t) AGAINST('>"aaaaaaaa bbbbbbbb"
aaaaaaaa bbbbbbbb' IN BOOLEAN MODE) f from search where match(t)
AGAINST('>"aaaaaaaa bbbbbbbb" aaaaaaaa bbbbbbbb' IN BOOLEAN MODE);
+----+-------------------+------+
| i *| t * * * * * * * * | f * *|
+----+-------------------+------+
| *6 | aaaaaaaa * * * * *| * *1 |
| *7 | bbbbbbbb * * * * *| * *1 |
| *8 | aaaaaaaa bbbbbbbb | *3.5 |
| *9 | cccccccc bbbbbbbb | * *1 |
| 10 | cccccccc aaaaaaaa | * *1 |
+----+-------------------+------+
5 rows in set (0.00 sec)

mysql (test)> select i,t, match(t) AGAINST('>"aaaaaaaa bbbbbbbb"
+aaaaaaaa bbbbbbbb' IN BOOLEAN MODE) f from search where match(t)
AGAINST('>"aaaaaaaa bbbbbbbb" +aaaaaaaa bbbbbbbb' IN BOOLEAN MODE);
+----+-------------------+-----------------+
| i *| t * * * * * * * * | f * * * * * * * |
+----+-------------------+-----------------+
| *6 | aaaaaaaa * * * * *| * * * * * * * 1 |
| *8 | aaaaaaaa bbbbbbbb | 1.8333333730698 |
| 10 | cccccccc aaaaaaaa | * * * * * * * 1 |
+----+-------------------+-----------------+
3 rows in set (0.00 sec)

mysql (test)> select i,t, match(t) AGAINST('>"aaaaaaaa bbbbbbbb"
+aaaaaaaa +bbbbbbbb' IN BOOLEAN MODE) f from search where match(t)
AGAINST('>"aaaaaaaa bbbbbbbb" +aaaaaaaa +bbbbbbbb' IN BOOLEAN MODE);
+---+-------------------+------+
| i | t * * * * * * * * | f * *|
+---+-------------------+------+
| 8 | aaaaaaaa bbbbbbbb | *1.5 |
+---+-------------------+------+
1 row in set (0.00 sec)

mysql (test)

--
Luuk

I dunno - it might be EXACTLY what he is looking for
From the OP: " says "Jason is cool", even though the second ( "Jason
is cool") would be more likely to be
what the searcher is wanting."

Developing a search feature is always interesting, because you have to
make some assumptions and program according to those assumptions and
database engine defaults.

I would say that the typical "search" is going to be an "AND" search -
not an OR search. Mysql defaults to "OR". *Just my 2cents worth. *If I
am looking for "Jason is cool", I do not want to see a post on how the
coolness of the weather - even though it finds one of the words I am
looking for. *Have you seen the 'BING' commercials on search overload?
From a programming standpoint, would this be a simple matter of
converting all " " in the search query to " +"?

Eg:

$search = strtolower(mysql_real_escape_string($_GET['search']));
$search = str_replace(" ", " +", $search);

$query = sprintf("
SELECT DISTINCT id, MATCH(subject, username, comment)
AGAINST ('%s' IN BOOLEAN MODE)
AS occurrences
FROM forum_posts WHERE
MATCH(subject, username, comment)
AGAINST ('%s' IN BOOLEAN MODE)
ORDER BY occurrences DESC",

$search,
$search);

Reply With Quote
  #12  
Old   
Luuk
 
Posts: n/a

Default Re: Search query, adding weight to exact result - 07-21-2011 , 12:11 PM






On 21-07-2011 04:35, jwcarlton wrote:
Quote:
On Jul 17, 2:21 pm, onedbguru <onedbg... (AT) yahoo (DOT) com> wrote:
On Jul 16, 3:29 am, Luuk <L... (AT) invalid (DOT) lan> wrote:



On 16-07-2011 01:42, onedbguru wrote:

On Jul 9, 9:15 am, Luuk <L... (AT) invalid (DOT) lan> wrote:
On 09-07-2011 05:07, jwcarlton wrote:

For a simple message board search engine, I'm using the following
query (in PHP, although I don't think the language matters):

$query = sprintf("
SELECT DISTINCT id, MATCH(subject, username, comment)
AGAINST ('%s' IN BOOLEAN MODE)
AS occurrences
FROM forum_posts WHERE
MATCH(subject, username, comment)
AGAINST ('%s' IN BOOLEAN MODE)
ORDER BY occurrences DESC",

$_GET['search'],
$_GET['search']);

(Note that the breaks are added here in an attempt to make that easier
for you to read.)

What I'm finding, though, is that I can have a thread with a lot of
posts that have the individual keywords (but not as an actual phrase)
that will have a higher result than a thread with less posts, but the
exact phrase.

For example, let's say that the search is for "Jason is cool". A post
that says "Jason went outside and said 'the weather today is pretty
cool'" is going to have the same weight as one that actually says
"Jason is cool", even though the second would be more likely to be
what the searcher is wanting.

So the question is, how do I modify the query to give a higher weight
to an exact result?

Your current search is:
'Jason is cool'

Try:
'>"Jason is cool" Jason is cool'

This should give 'Jason is cool' extra weight

see:http://dev.mysql.com/doc/refman/5.5/...t-boolean.html

--
Luuk

why would you not use
:
'>"Jason is cool" +Jason +is +cool'

As I read the docs, This says use give extra weight to "Jason is cool"
searching for 'Jason AND is AND cool' although "is" is a noise word
that would get discarded anyway. Using 'Jason is cool' by itself
infers search for 'Jason OR is OR cool' with no specific weight or
ranking.

From the docs
"Note
In implementing this feature, MySQL uses what is sometimes referred to
as implied Boolean logic, in which
+ stands for AND
- stands for NOT
[no operator] implies OR
"
How you implement that in your search feature is going to be fun...

i did a simple test, and the '+' does not look like you want to have (in
this case)

mysql (test)> select i,t, match(t) AGAINST('>"aaaaaaaa bbbbbbbb"
aaaaaaaa bbbbbbbb' IN BOOLEAN MODE) f from search where match(t)
AGAINST('>"aaaaaaaa bbbbbbbb" aaaaaaaa bbbbbbbb' IN BOOLEAN MODE);
+----+-------------------+------+
| i | t | f |
+----+-------------------+------+
| 6 | aaaaaaaa | 1 |
| 7 | bbbbbbbb | 1 |
| 8 | aaaaaaaa bbbbbbbb | 3.5 |
| 9 | cccccccc bbbbbbbb | 1 |
| 10 | cccccccc aaaaaaaa | 1 |
+----+-------------------+------+
5 rows in set (0.00 sec)

mysql (test)> select i,t, match(t) AGAINST('>"aaaaaaaa bbbbbbbb"
+aaaaaaaa bbbbbbbb' IN BOOLEAN MODE) f from search where match(t)
AGAINST('>"aaaaaaaa bbbbbbbb" +aaaaaaaa bbbbbbbb' IN BOOLEAN MODE);
+----+-------------------+-----------------+
| i | t | f |
+----+-------------------+-----------------+
| 6 | aaaaaaaa | 1 |
| 8 | aaaaaaaa bbbbbbbb | 1.8333333730698 |
| 10 | cccccccc aaaaaaaa | 1 |
+----+-------------------+-----------------+
3 rows in set (0.00 sec)

mysql (test)> select i,t, match(t) AGAINST('>"aaaaaaaa bbbbbbbb"
+aaaaaaaa +bbbbbbbb' IN BOOLEAN MODE) f from search where match(t)
AGAINST('>"aaaaaaaa bbbbbbbb" +aaaaaaaa +bbbbbbbb' IN BOOLEAN MODE);
+---+-------------------+------+
| i | t | f |
+---+-------------------+------+
| 8 | aaaaaaaa bbbbbbbb | 1.5 |
+---+-------------------+------+
1 row in set (0.00 sec)

mysql (test)

--
Luuk

I dunno - it might be EXACTLY what he is looking for
From the OP: " says "Jason is cool", even though the second ( "Jason
is cool") would be more likely to be
what the searcher is wanting."

Developing a search feature is always interesting, because you have to
make some assumptions and program according to those assumptions and
database engine defaults.

I would say that the typical "search" is going to be an "AND" search -
not an OR search. Mysql defaults to "OR". Just my 2cents worth. If I
am looking for "Jason is cool", I do not want to see a post on how the
coolness of the weather - even though it finds one of the words I am
looking for. Have you seen the 'BING' commercials on search overload?

From a programming standpoint, would this be a simple matter of
converting all " " in the search query to " +"?

Eg:

basically, yes,
not if the search contains characters that influence the way full-text
behaves

i.e.
you should not change: '"Jason is cool"'
to: '+"Jason +is +cool"'
i think it should be change to: '+"Jason is cool"'

see:
http://dev.mysql.com/doc/refman/5.5/...t-boolean.html



--
Luuk

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.