![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
#2
| |||
| |||
|
|
For a simple message board search engine, I'm using the following query (in PHP, although I don't think the language matters): $query = sprintf(" SELECT DISTINCT id, MATCH(subject, username, comment) AGAINST ('%s' IN BOOLEAN MODE) AS occurrences FROM forum_posts WHERE MATCH(subject, username, comment) AGAINST ('%s' IN BOOLEAN MODE) ORDER BY occurrences DESC", $_GET['search'], $_GET['search']); (Note that the breaks are added here in an attempt to make that easier for you to read.) What I'm finding, though, is that I can have a thread with a lot of posts that have the individual keywords (but not as an actual phrase) that will have a higher result than a thread with less posts, but the exact phrase. For example, let's say that the search is for "Jason is cool". A post that says "Jason went outside and said 'the weather today is pretty cool'" is going to have the same weight as one that actually says "Jason is cool", even though the second would be more likely to be what the searcher is wanting. So the question is, how do I modify the query to give a higher weight to an exact result? |
#3
| |||
| |||
|
|
For a simple message board search engine, I'm using the following query (in PHP, although I don't think the language matters): $query = sprintf(" SELECT DISTINCT id, MATCH(subject, username, comment) AGAINST ('%s' IN BOOLEAN MODE) AS occurrences FROM forum_posts WHERE MATCH(subject, username, comment) AGAINST ('%s' IN BOOLEAN MODE) ORDER BY occurrences DESC", $_GET['search'], $_GET['search']); |
|
(Note that the breaks are added here in an attempt to make that easier for you to read.) |

#4
| |||
| |||
|
|
For a simple message board search engine, I'm using the following query (in PHP, although I don't think the language matters): $query = sprintf(" SELECT DISTINCT id, MATCH(subject, username, comment) AGAINST ('%s' IN BOOLEAN MODE) AS occurrences FROM forum_posts WHERE MATCH(subject, username, comment) AGAINST ('%s' IN BOOLEAN MODE) ORDER BY occurrences DESC", $_GET['search'], $_GET['search']); *Sigh* I've already given this advice three or four times today and it's only 10:00 a.m. here: http://en.wikipedia.org/wiki/SQL_injection |
|
Well, nothing prevents you from adding them in your real code and make it easier to read *for you* as well ![]() |
#5
| |||
| |||
|
|
Well, nothing prevents you from adding them in your real code and make it easier to read *for you* as well ![]() Sure, but I don't really have a problem reading it without breaks; in fact, I find it a little easier without them. I just added breaks here because some readers and Google Groups tend to wrap them in odd ways. |
#6
| |||
| |||
|
|
For a simple message board search engine, I'm using the following query (in PHP, although I don't think the language matters): $query = sprintf(" SELECT DISTINCT id, MATCH(subject, username, comment) AGAINST ('%s' IN BOOLEAN MODE) AS occurrences FROM forum_posts WHERE MATCH(subject, username, comment) AGAINST ('%s' IN BOOLEAN MODE) ORDER BY occurrences DESC", $_GET['search'], $_GET['search']); *Sigh* I've already given this advice three or four times today and it's only 10:00 a.m. here: http://en.wikipedia.org/wiki/SQL_injection While I thank you for replying, I'm not sure how this page gives any insight toward the question? |
|
Well, nothing prevents you from adding them in your real code and make it easier to read *for you* as well ![]() Sure, but I don't really have a problem reading it without breaks; in fact, I find it a little easier without them. |
#7
| |||
| |||
|
|
On 09-07-2011 05:07, jwcarlton wrote: For a simple message board search engine, I'm using the following query (in PHP, although I don't think the language matters): $query = sprintf(" SELECT DISTINCT id, MATCH(subject, username, comment) AGAINST ('%s' IN BOOLEAN MODE) AS occurrences FROM forum_posts WHERE MATCH(subject, username, comment) AGAINST ('%s' IN BOOLEAN MODE) ORDER BY occurrences DESC", $_GET['search'], $_GET['search']); (Note that the breaks are added here in an attempt to make that easier for you to read.) What I'm finding, though, is that I can have a thread with a lot of posts that have the individual keywords (but not as an actual phrase) that will have a higher result than a thread with less posts, but the exact phrase. For example, let's say that the search is for "Jason is cool". A post that says "Jason went outside and said 'the weather today is pretty cool'" is going to have the same weight as one that actually says "Jason is cool", even though the second would be more likely to be what the searcher is wanting. So the question is, how do I modify the query to give a higher weight to an exact result? Your current search is: 'Jason is cool' Try: '>"Jason is cool" Jason is cool' This should give 'Jason is cool' extra weight see:http://dev.mysql.com/doc/refman/5.5/...t-boolean.html -- Luuk |
#8
| ||||
| ||||
|
|
On Jul 9, 9:15 am, Luuk <L... (AT) invalid (DOT) lan> wrote: On 09-07-2011 05:07, jwcarlton wrote: For a simple message board search engine, I'm using the following query (in PHP, although I don't think the language matters): $query = sprintf(" SELECT DISTINCT id, MATCH(subject, username, comment) AGAINST ('%s' IN BOOLEAN MODE) AS occurrences FROM forum_posts WHERE MATCH(subject, username, comment) AGAINST ('%s' IN BOOLEAN MODE) ORDER BY occurrences DESC", $_GET['search'], $_GET['search']); (Note that the breaks are added here in an attempt to make that easier for you to read.) What I'm finding, though, is that I can have a thread with a lot of posts that have the individual keywords (but not as an actual phrase) that will have a higher result than a thread with less posts, but the exact phrase. For example, let's say that the search is for "Jason is cool". A post that says "Jason went outside and said 'the weather today is pretty cool'" is going to have the same weight as one that actually says "Jason is cool", even though the second would be more likely to be what the searcher is wanting. So the question is, how do I modify the query to give a higher weight to an exact result? Your current search is: 'Jason is cool' Try: '>"Jason is cool" Jason is cool' This should give 'Jason is cool' extra weight see:http://dev.mysql.com/doc/refman/5.5/...t-boolean.html -- Luuk why would you not use : '>"Jason is cool" +Jason +is +cool' As I read the docs, This says use give extra weight to "Jason is cool" searching for 'Jason AND is AND cool' although "is" is a noise word that would get discarded anyway. Using 'Jason is cool' by itself infers search for 'Jason OR is OR cool' with no specific weight or ranking. From the docs "Note In implementing this feature, MySQL uses what is sometimes referred to as implied Boolean logic, in which + stands for AND - stands for NOT [no operator] implies OR " How you implement that in your search feature is going to be fun... |
|
i | t | f | +----+-------------------+------+ 6 | aaaaaaaa | 1 | 7 | bbbbbbbb | 1 | 8 | aaaaaaaa bbbbbbbb | 3.5 | 9 | cccccccc bbbbbbbb | 1 | 10 | cccccccc aaaaaaaa | 1 | +----+-------------------+------+ |
|
i | t | f | +----+-------------------+-----------------+ 6 | aaaaaaaa | 1 | 8 | aaaaaaaa bbbbbbbb | 1.8333333730698 | 10 | cccccccc aaaaaaaa | 1 | +----+-------------------+-----------------+ |
|
i | t | f | +---+-------------------+------+ 8 | aaaaaaaa bbbbbbbb | 1.5 | +---+-------------------+------+ |
#9
| |||
| |||
|
|
On 16-07-2011 01:42, onedbguru wrote: On Jul 9, 9:15 am, Luuk <L... (AT) invalid (DOT) lan> wrote: On 09-07-2011 05:07, jwcarlton wrote: For a simple message board search engine, I'm using the following query (in PHP, although I don't think the language matters): $query = sprintf(" SELECT DISTINCT id, MATCH(subject, username, comment) AGAINST ('%s' IN BOOLEAN MODE) AS occurrences FROM forum_posts WHERE MATCH(subject, username, comment) AGAINST ('%s' IN BOOLEAN MODE) ORDER BY occurrences DESC", $_GET['search'], $_GET['search']); (Note that the breaks are added here in an attempt to make that easier for you to read.) What I'm finding, though, is that I can have a thread with a lot of posts that have the individual keywords (but not as an actual phrase) that will have a higher result than a thread with less posts, but the exact phrase. For example, let's say that the search is for "Jason is cool". A post that says "Jason went outside and said 'the weather today is pretty cool'" is going to have the same weight as one that actually says "Jason is cool", even though the second would be more likely to be what the searcher is wanting. So the question is, how do I modify the query to give a higher weight to an exact result? Your current search is: 'Jason is cool' Try: '>"Jason is cool" Jason is cool' This should give 'Jason is cool' extra weight see:http://dev.mysql.com/doc/refman/5.5/...t-boolean.html -- Luuk why would you not use : '>"Jason is cool" +Jason +is +cool' As I read the docs, This says use give extra weight to "Jason is cool" searching for 'Jason AND is AND cool' *although "is" is a noise word that would get discarded anyway. Using 'Jason is cool' by itself infers search for 'Jason OR is OR cool' with no specific weight or ranking. From the docs "Note In implementing this feature, MySQL uses what is sometimes referred to as implied Boolean logic, in which + stands for AND - stands for NOT [no operator] implies OR " How you implement that in your search feature is going to be fun... i did a simple test, and the '+' does not look like you want to have (in this case) mysql (test)> select i,t, match(t) AGAINST('>"aaaaaaaa bbbbbbbb" aaaaaaaa bbbbbbbb' IN BOOLEAN MODE) f from search where match(t) AGAINST('>"aaaaaaaa bbbbbbbb" aaaaaaaa bbbbbbbb' IN BOOLEAN MODE); +----+-------------------+------+ | i *| t * * * * * * * * | f * *| +----+-------------------+------+ | *6 | aaaaaaaa * * * * *| * *1 | | *7 | bbbbbbbb * * * * *| * *1 | | *8 | aaaaaaaa bbbbbbbb | *3.5 | | *9 | cccccccc bbbbbbbb | * *1 | | 10 | cccccccc aaaaaaaa | * *1 | +----+-------------------+------+ 5 rows in set (0.00 sec) mysql (test)> select i,t, match(t) AGAINST('>"aaaaaaaa bbbbbbbb" +aaaaaaaa bbbbbbbb' IN BOOLEAN MODE) f from search where match(t) AGAINST('>"aaaaaaaa bbbbbbbb" +aaaaaaaa bbbbbbbb' IN BOOLEAN MODE); +----+-------------------+-----------------+ | i *| t * * * * * * * * | f * * * * * * * | +----+-------------------+-----------------+ | *6 | aaaaaaaa * * * * *| * * * * * * * 1 | | *8 | aaaaaaaa bbbbbbbb | 1.8333333730698 | | 10 | cccccccc aaaaaaaa | * * * * * * * 1 | +----+-------------------+-----------------+ 3 rows in set (0.00 sec) mysql (test)> select i,t, match(t) AGAINST('>"aaaaaaaa bbbbbbbb" +aaaaaaaa +bbbbbbbb' IN BOOLEAN MODE) f from search where match(t) AGAINST('>"aaaaaaaa bbbbbbbb" +aaaaaaaa +bbbbbbbb' IN BOOLEAN MODE); +---+-------------------+------+ | i | t * * * * * * * * | f * *| +---+-------------------+------+ | 8 | aaaaaaaa bbbbbbbb | *1.5 | +---+-------------------+------+ 1 row in set (0.00 sec) mysql (test) -- Luuk |
#10
| |||
| |||
|
|
Have you seen the 'BING' commercials on search overload? |
![]() |
| Thread Tools | |
| Display Modes | |
| |