Mladen Gogala wrote:
Quote:
On Wed, 10 Feb 2010 18:04:08 +0000, Mladen Gogala wrote:
Where can I find the list of separator characters for the configuration
named "english"? In other words, I need the list of characters which
delimit words.
Also, is there a way to use text search and search for phrases? Something
like "chicken salad" should not return "fried chicken with potato salad"? |
About separator characters, that is more complicated than that.
First the parser is invoked to create tokens, which are then run
through the dictionary.
Consider this example:
test=> SELECT alias, '>' || token || '<' AS token, lexemes
test-> FROM ts_debug('english', 'Examples: for "various" nilly-willy tökens');
alias | token | lexemes
-----------------+---------------+---------------
asciiword | >Examples< | {exampl}
blank | >: < |
asciiword | >for< | {}
blank | > "< |
asciiword | >various< | {various}
blank | >" < |
asciihword | >nilly-willy< | {nilly-willi}
hword_asciipart | >nilly< | {nilli}
blank | >-< |
hword_asciipart | >willy< | {willi}
blank | > < |
word | >tökens< | {töken}
(12 rows)
"token" is the output of the parser, lexemes is what the dictionary
makes of that.
Would you say that "-" delimits words or not?
Maybe it's better to say that certain characters delimit certain token types.
I guess you want to know which characters make "blank" tokens.
Unless you want to dig into the code, I'd say, experiment with
queries like
SELECT alias FROM ts_debug('@');
About searching for a phrase, you can use the ranking function "ts_rank_cd"
with normalization 4 which will calculate the "mean harmonic distance"
between the matches.
The higher the rank, the closer they are together. Order is irrelevant,
so it is not exactly what you want, but it should help.
Compare
test=> WITH vals AS (
test-> SELECT to_tsvector('english', 'fried chicken with potato salad') AS searchvector,
test-> to_tsquery('english', 'chicken & salad') AS query
test-> ) SELECT searchvector @@ query,
test-> ts_rank_cd(searchvector, query, 4)
test-> FROM vals;
?column? | ts_rank_cd
----------+------------
t | 0.0333333
(1 row)
and
test=> WITH vals AS (
test-> SELECT to_tsvector('english', 'fried chicken with potato salad') AS searchvector,
test-> to_tsquery('english', 'chicken & fried') AS query
test-> ) SELECT searchvector @@ query,
test-> ts_rank_cd(searchvector, query, 4)
test-> FROM vals;
?column? | ts_rank_cd
----------+------------
t | 0.1
(1 row)
Yours,
Laurenz Albe