![]() | |
![]() |
| | Thread Tools | Display Modes |
#11
| |||
| |||
|
|
On Jul 16, 3:29*am, Luuk <L... (AT) invalid (DOT) lan> wrote: On 16-07-2011 01:42, onedbguru wrote: On Jul 9, 9:15 am, Luuk <L... (AT) invalid (DOT) lan> wrote: On 09-07-2011 05:07, jwcarlton wrote: For a simple message board search engine, I'm using the following query (in PHP, although I don't think the language matters): $query = sprintf(" SELECT DISTINCT id, MATCH(subject, username, comment) AGAINST ('%s' IN BOOLEAN MODE) AS occurrences FROM forum_posts WHERE MATCH(subject, username, comment) AGAINST ('%s' IN BOOLEAN MODE) ORDER BY occurrences DESC", $_GET['search'], $_GET['search']); (Note that the breaks are added here in an attempt to make that easier for you to read.) What I'm finding, though, is that I can have a thread with a lot of posts that have the individual keywords (but not as an actual phrase) that will have a higher result than a thread with less posts, but the exact phrase. For example, let's say that the search is for "Jason is cool". A post that says "Jason went outside and said 'the weather today is pretty cool'" is going to have the same weight as one that actually says "Jason is cool", even though the second would be more likely to be what the searcher is wanting. So the question is, how do I modify the query to give a higher weight to an exact result? Your current search is: 'Jason is cool' Try: '>"Jason is cool" Jason is cool' This should give 'Jason is cool' extra weight see:http://dev.mysql.com/doc/refman/5.5/...t-boolean.html -- Luuk why would you not use : '>"Jason is cool" +Jason +is +cool' As I read the docs, This says use give extra weight to "Jason is cool" searching for 'Jason AND is AND cool' *although "is" is a noise word that would get discarded anyway. Using 'Jason is cool' by itself infers search for 'Jason OR is OR cool' with no specific weight or ranking. From the docs "Note In implementing this feature, MySQL uses what is sometimes referred to as implied Boolean logic, in which + stands for AND - stands for NOT [no operator] implies OR " How you implement that in your search feature is going to be fun... i did a simple test, and the '+' does not look like you want to have (in this case) mysql (test)> select i,t, match(t) AGAINST('>"aaaaaaaa bbbbbbbb" aaaaaaaa bbbbbbbb' IN BOOLEAN MODE) f from search where match(t) AGAINST('>"aaaaaaaa bbbbbbbb" aaaaaaaa bbbbbbbb' IN BOOLEAN MODE); +----+-------------------+------+ | i *| t * * * * * * * * | f * *| +----+-------------------+------+ | *6 | aaaaaaaa * * * * *| * *1 | | *7 | bbbbbbbb * * * * *| * *1 | | *8 | aaaaaaaa bbbbbbbb | *3.5 | | *9 | cccccccc bbbbbbbb | * *1 | | 10 | cccccccc aaaaaaaa | * *1 | +----+-------------------+------+ 5 rows in set (0.00 sec) mysql (test)> select i,t, match(t) AGAINST('>"aaaaaaaa bbbbbbbb" +aaaaaaaa bbbbbbbb' IN BOOLEAN MODE) f from search where match(t) AGAINST('>"aaaaaaaa bbbbbbbb" +aaaaaaaa bbbbbbbb' IN BOOLEAN MODE); +----+-------------------+-----------------+ | i *| t * * * * * * * * | f * * * * * * * | +----+-------------------+-----------------+ | *6 | aaaaaaaa * * * * *| * * * * * * * 1 | | *8 | aaaaaaaa bbbbbbbb | 1.8333333730698 | | 10 | cccccccc aaaaaaaa | * * * * * * * 1 | +----+-------------------+-----------------+ 3 rows in set (0.00 sec) mysql (test)> select i,t, match(t) AGAINST('>"aaaaaaaa bbbbbbbb" +aaaaaaaa +bbbbbbbb' IN BOOLEAN MODE) f from search where match(t) AGAINST('>"aaaaaaaa bbbbbbbb" +aaaaaaaa +bbbbbbbb' IN BOOLEAN MODE); +---+-------------------+------+ | i | t * * * * * * * * | f * *| +---+-------------------+------+ | 8 | aaaaaaaa bbbbbbbb | *1.5 | +---+-------------------+------+ 1 row in set (0.00 sec) mysql (test) -- Luuk I dunno - it might be EXACTLY what he is looking for From the OP: " says "Jason is cool", even though the second ( "Jason is cool") would be more likely to be what the searcher is wanting." Developing a search feature is always interesting, because you have to make some assumptions and program according to those assumptions and database engine defaults. I would say that the typical "search" is going to be an "AND" search - not an OR search. Mysql defaults to "OR". *Just my 2cents worth. *If I am looking for "Jason is cool", I do not want to see a post on how the coolness of the weather - even though it finds one of the words I am looking for. *Have you seen the 'BING' commercials on search overload? |
#12
| |||
| |||
|
|
On Jul 17, 2:21 pm, onedbguru <onedbg... (AT) yahoo (DOT) com> wrote: On Jul 16, 3:29 am, Luuk <L... (AT) invalid (DOT) lan> wrote: On 16-07-2011 01:42, onedbguru wrote: On Jul 9, 9:15 am, Luuk <L... (AT) invalid (DOT) lan> wrote: On 09-07-2011 05:07, jwcarlton wrote: For a simple message board search engine, I'm using the following query (in PHP, although I don't think the language matters): $query = sprintf(" SELECT DISTINCT id, MATCH(subject, username, comment) AGAINST ('%s' IN BOOLEAN MODE) AS occurrences FROM forum_posts WHERE MATCH(subject, username, comment) AGAINST ('%s' IN BOOLEAN MODE) ORDER BY occurrences DESC", $_GET['search'], $_GET['search']); (Note that the breaks are added here in an attempt to make that easier for you to read.) What I'm finding, though, is that I can have a thread with a lot of posts that have the individual keywords (but not as an actual phrase) that will have a higher result than a thread with less posts, but the exact phrase. For example, let's say that the search is for "Jason is cool". A post that says "Jason went outside and said 'the weather today is pretty cool'" is going to have the same weight as one that actually says "Jason is cool", even though the second would be more likely to be what the searcher is wanting. So the question is, how do I modify the query to give a higher weight to an exact result? Your current search is: 'Jason is cool' Try: '>"Jason is cool" Jason is cool' This should give 'Jason is cool' extra weight see:http://dev.mysql.com/doc/refman/5.5/...t-boolean.html -- Luuk why would you not use : '>"Jason is cool" +Jason +is +cool' As I read the docs, This says use give extra weight to "Jason is cool" searching for 'Jason AND is AND cool' although "is" is a noise word that would get discarded anyway. Using 'Jason is cool' by itself infers search for 'Jason OR is OR cool' with no specific weight or ranking. From the docs "Note In implementing this feature, MySQL uses what is sometimes referred to as implied Boolean logic, in which + stands for AND - stands for NOT [no operator] implies OR " How you implement that in your search feature is going to be fun... i did a simple test, and the '+' does not look like you want to have (in this case) mysql (test)> select i,t, match(t) AGAINST('>"aaaaaaaa bbbbbbbb" aaaaaaaa bbbbbbbb' IN BOOLEAN MODE) f from search where match(t) AGAINST('>"aaaaaaaa bbbbbbbb" aaaaaaaa bbbbbbbb' IN BOOLEAN MODE); +----+-------------------+------+ | i | t | f | +----+-------------------+------+ | 6 | aaaaaaaa | 1 | | 7 | bbbbbbbb | 1 | | 8 | aaaaaaaa bbbbbbbb | 3.5 | | 9 | cccccccc bbbbbbbb | 1 | | 10 | cccccccc aaaaaaaa | 1 | +----+-------------------+------+ 5 rows in set (0.00 sec) mysql (test)> select i,t, match(t) AGAINST('>"aaaaaaaa bbbbbbbb" +aaaaaaaa bbbbbbbb' IN BOOLEAN MODE) f from search where match(t) AGAINST('>"aaaaaaaa bbbbbbbb" +aaaaaaaa bbbbbbbb' IN BOOLEAN MODE); +----+-------------------+-----------------+ | i | t | f | +----+-------------------+-----------------+ | 6 | aaaaaaaa | 1 | | 8 | aaaaaaaa bbbbbbbb | 1.8333333730698 | | 10 | cccccccc aaaaaaaa | 1 | +----+-------------------+-----------------+ 3 rows in set (0.00 sec) mysql (test)> select i,t, match(t) AGAINST('>"aaaaaaaa bbbbbbbb" +aaaaaaaa +bbbbbbbb' IN BOOLEAN MODE) f from search where match(t) AGAINST('>"aaaaaaaa bbbbbbbb" +aaaaaaaa +bbbbbbbb' IN BOOLEAN MODE); +---+-------------------+------+ | i | t | f | +---+-------------------+------+ | 8 | aaaaaaaa bbbbbbbb | 1.5 | +---+-------------------+------+ 1 row in set (0.00 sec) mysql (test) -- Luuk I dunno - it might be EXACTLY what he is looking for From the OP: " says "Jason is cool", even though the second ( "Jason is cool") would be more likely to be what the searcher is wanting." Developing a search feature is always interesting, because you have to make some assumptions and program according to those assumptions and database engine defaults. I would say that the typical "search" is going to be an "AND" search - not an OR search. Mysql defaults to "OR". Just my 2cents worth. If I am looking for "Jason is cool", I do not want to see a post on how the coolness of the weather - even though it finds one of the words I am looking for. Have you seen the 'BING' commercials on search overload? From a programming standpoint, would this be a simple matter of converting all " " in the search query to " +"? Eg: |
![]() |
| Thread Tools | |
| Display Modes | |
| |