dbTalk Databases Forums  

tsearch2: List of lexemes and their count

comp.databases.postgresql comp.databases.postgresql


Discuss tsearch2: List of lexemes and their count in the comp.databases.postgresql forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Thomas Guettler
 
Posts: n/a

Default tsearch2: List of lexemes and their count - 03-25-2008 , 09:36 AM






Hi,

How can you get all lexemes of an indexed column and how
often they occur?

This would help me to find words which should
be added to the stop word list.


Thomas


--
Thomas Guettler, http://www.thomas-guettler.de/
E-Mail: guettli (*) thomas-guettler + de

Reply With Quote
  #2  
Old   
Laurenz Albe
 
Posts: n/a

Default Re: tsearch2: List of lexemes and their count - 03-26-2008 , 03:26 AM






Thomas Guettler <hv (AT) tbz-pariv (DOT) de> wrote:
Quote:
How can you get all lexemes of an indexed column and how
often they occur?

This would help me to find words which should
be added to the stop word list.
This is explained in
http://www.postgresql.org/docs/curre...RCH-STATISTICS

An example:

CREATE TABLE fts(id integer PRIMARY KEY, val text);
CREATE INDEX fts_index ON fts USING gin(to_tsvector('german', val));

INSERT INTO fts VALUES (1,
'dies ist ein deutscher Text mit mehreren Worten; '
Quote:
| 'etliche Worte doppelt.');
INSERT INTO fts VALUES (2,
'ein weiterer deutscher Text enthält andere Worte, '
Quote:
| 'aber natürlich auch die selben.');
INSERT INTO fts VALUES (3,
'Ein dritter deutscher Text rundet die Sache ab.');

Now, for the statistics, I use the same function as I used to define
the index:

SELECT * FROM ts_stat('SELECT to_tsvector(''german'', val) FROM fts');

word | ndoc | nentry
---------+------+--------
ab | 1 | 1
mehr | 1 | 1
sach | 1 | 1
selb | 1 | 1
text | 3 | 3
weit | 1 | 1
wort | 2 | 3
dritt | 1 | 1
natur | 1 | 1
etlich | 1 | 1
rundet | 1 | 1
deutsch | 3 | 3
doppelt | 1 | 1
enthalt | 1 | 1
(14 rows)

Yours,
Laurenz Albe


Reply With Quote
  #3  
Old   
Laurenz Albe
 
Posts: n/a

Default Re: tsearch2: List of lexemes and their count - 03-26-2008 , 03:26 AM



Thomas Guettler <hv (AT) tbz-pariv (DOT) de> wrote:
Quote:
How can you get all lexemes of an indexed column and how
often they occur?

This would help me to find words which should
be added to the stop word list.
This is explained in
http://www.postgresql.org/docs/curre...RCH-STATISTICS

An example:

CREATE TABLE fts(id integer PRIMARY KEY, val text);
CREATE INDEX fts_index ON fts USING gin(to_tsvector('german', val));

INSERT INTO fts VALUES (1,
'dies ist ein deutscher Text mit mehreren Worten; '
Quote:
| 'etliche Worte doppelt.');
INSERT INTO fts VALUES (2,
'ein weiterer deutscher Text enthält andere Worte, '
Quote:
| 'aber natürlich auch die selben.');
INSERT INTO fts VALUES (3,
'Ein dritter deutscher Text rundet die Sache ab.');

Now, for the statistics, I use the same function as I used to define
the index:

SELECT * FROM ts_stat('SELECT to_tsvector(''german'', val) FROM fts');

word | ndoc | nentry
---------+------+--------
ab | 1 | 1
mehr | 1 | 1
sach | 1 | 1
selb | 1 | 1
text | 3 | 3
weit | 1 | 1
wort | 2 | 3
dritt | 1 | 1
natur | 1 | 1
etlich | 1 | 1
rundet | 1 | 1
deutsch | 3 | 3
doppelt | 1 | 1
enthalt | 1 | 1
(14 rows)

Yours,
Laurenz Albe


Reply With Quote
  #4  
Old   
Laurenz Albe
 
Posts: n/a

Default Re: tsearch2: List of lexemes and their count - 03-26-2008 , 03:26 AM



Thomas Guettler <hv (AT) tbz-pariv (DOT) de> wrote:
Quote:
How can you get all lexemes of an indexed column and how
often they occur?

This would help me to find words which should
be added to the stop word list.
This is explained in
http://www.postgresql.org/docs/curre...RCH-STATISTICS

An example:

CREATE TABLE fts(id integer PRIMARY KEY, val text);
CREATE INDEX fts_index ON fts USING gin(to_tsvector('german', val));

INSERT INTO fts VALUES (1,
'dies ist ein deutscher Text mit mehreren Worten; '
Quote:
| 'etliche Worte doppelt.');
INSERT INTO fts VALUES (2,
'ein weiterer deutscher Text enthält andere Worte, '
Quote:
| 'aber natürlich auch die selben.');
INSERT INTO fts VALUES (3,
'Ein dritter deutscher Text rundet die Sache ab.');

Now, for the statistics, I use the same function as I used to define
the index:

SELECT * FROM ts_stat('SELECT to_tsvector(''german'', val) FROM fts');

word | ndoc | nentry
---------+------+--------
ab | 1 | 1
mehr | 1 | 1
sach | 1 | 1
selb | 1 | 1
text | 3 | 3
weit | 1 | 1
wort | 2 | 3
dritt | 1 | 1
natur | 1 | 1
etlich | 1 | 1
rundet | 1 | 1
deutsch | 3 | 3
doppelt | 1 | 1
enthalt | 1 | 1
(14 rows)

Yours,
Laurenz Albe


Reply With Quote
  #5  
Old   
Laurenz Albe
 
Posts: n/a

Default Re: tsearch2: List of lexemes and their count - 03-26-2008 , 03:26 AM



Thomas Guettler <hv (AT) tbz-pariv (DOT) de> wrote:
Quote:
How can you get all lexemes of an indexed column and how
often they occur?

This would help me to find words which should
be added to the stop word list.
This is explained in
http://www.postgresql.org/docs/curre...RCH-STATISTICS

An example:

CREATE TABLE fts(id integer PRIMARY KEY, val text);
CREATE INDEX fts_index ON fts USING gin(to_tsvector('german', val));

INSERT INTO fts VALUES (1,
'dies ist ein deutscher Text mit mehreren Worten; '
Quote:
| 'etliche Worte doppelt.');
INSERT INTO fts VALUES (2,
'ein weiterer deutscher Text enthält andere Worte, '
Quote:
| 'aber natürlich auch die selben.');
INSERT INTO fts VALUES (3,
'Ein dritter deutscher Text rundet die Sache ab.');

Now, for the statistics, I use the same function as I used to define
the index:

SELECT * FROM ts_stat('SELECT to_tsvector(''german'', val) FROM fts');

word | ndoc | nentry
---------+------+--------
ab | 1 | 1
mehr | 1 | 1
sach | 1 | 1
selb | 1 | 1
text | 3 | 3
weit | 1 | 1
wort | 2 | 3
dritt | 1 | 1
natur | 1 | 1
etlich | 1 | 1
rundet | 1 | 1
deutsch | 3 | 3
doppelt | 1 | 1
enthalt | 1 | 1
(14 rows)

Yours,
Laurenz Albe


Reply With Quote
  #6  
Old   
Laurenz Albe
 
Posts: n/a

Default Re: tsearch2: List of lexemes and their count - 03-26-2008 , 03:26 AM



Thomas Guettler <hv (AT) tbz-pariv (DOT) de> wrote:
Quote:
How can you get all lexemes of an indexed column and how
often they occur?

This would help me to find words which should
be added to the stop word list.
This is explained in
http://www.postgresql.org/docs/curre...RCH-STATISTICS

An example:

CREATE TABLE fts(id integer PRIMARY KEY, val text);
CREATE INDEX fts_index ON fts USING gin(to_tsvector('german', val));

INSERT INTO fts VALUES (1,
'dies ist ein deutscher Text mit mehreren Worten; '
Quote:
| 'etliche Worte doppelt.');
INSERT INTO fts VALUES (2,
'ein weiterer deutscher Text enthält andere Worte, '
Quote:
| 'aber natürlich auch die selben.');
INSERT INTO fts VALUES (3,
'Ein dritter deutscher Text rundet die Sache ab.');

Now, for the statistics, I use the same function as I used to define
the index:

SELECT * FROM ts_stat('SELECT to_tsvector(''german'', val) FROM fts');

word | ndoc | nentry
---------+------+--------
ab | 1 | 1
mehr | 1 | 1
sach | 1 | 1
selb | 1 | 1
text | 3 | 3
weit | 1 | 1
wort | 2 | 3
dritt | 1 | 1
natur | 1 | 1
etlich | 1 | 1
rundet | 1 | 1
deutsch | 3 | 3
doppelt | 1 | 1
enthalt | 1 | 1
(14 rows)

Yours,
Laurenz Albe


Reply With Quote
  #7  
Old   
Laurenz Albe
 
Posts: n/a

Default Re: tsearch2: List of lexemes and their count - 03-26-2008 , 03:26 AM



Thomas Guettler <hv (AT) tbz-pariv (DOT) de> wrote:
Quote:
How can you get all lexemes of an indexed column and how
often they occur?

This would help me to find words which should
be added to the stop word list.
This is explained in
http://www.postgresql.org/docs/curre...RCH-STATISTICS

An example:

CREATE TABLE fts(id integer PRIMARY KEY, val text);
CREATE INDEX fts_index ON fts USING gin(to_tsvector('german', val));

INSERT INTO fts VALUES (1,
'dies ist ein deutscher Text mit mehreren Worten; '
Quote:
| 'etliche Worte doppelt.');
INSERT INTO fts VALUES (2,
'ein weiterer deutscher Text enthält andere Worte, '
Quote:
| 'aber natürlich auch die selben.');
INSERT INTO fts VALUES (3,
'Ein dritter deutscher Text rundet die Sache ab.');

Now, for the statistics, I use the same function as I used to define
the index:

SELECT * FROM ts_stat('SELECT to_tsvector(''german'', val) FROM fts');

word | ndoc | nentry
---------+------+--------
ab | 1 | 1
mehr | 1 | 1
sach | 1 | 1
selb | 1 | 1
text | 3 | 3
weit | 1 | 1
wort | 2 | 3
dritt | 1 | 1
natur | 1 | 1
etlich | 1 | 1
rundet | 1 | 1
deutsch | 3 | 3
doppelt | 1 | 1
enthalt | 1 | 1
(14 rows)

Yours,
Laurenz Albe


Reply With Quote
  #8  
Old   
Laurenz Albe
 
Posts: n/a

Default Re: tsearch2: List of lexemes and their count - 03-26-2008 , 03:26 AM



Thomas Guettler <hv (AT) tbz-pariv (DOT) de> wrote:
Quote:
How can you get all lexemes of an indexed column and how
often they occur?

This would help me to find words which should
be added to the stop word list.
This is explained in
http://www.postgresql.org/docs/curre...RCH-STATISTICS

An example:

CREATE TABLE fts(id integer PRIMARY KEY, val text);
CREATE INDEX fts_index ON fts USING gin(to_tsvector('german', val));

INSERT INTO fts VALUES (1,
'dies ist ein deutscher Text mit mehreren Worten; '
Quote:
| 'etliche Worte doppelt.');
INSERT INTO fts VALUES (2,
'ein weiterer deutscher Text enthält andere Worte, '
Quote:
| 'aber natürlich auch die selben.');
INSERT INTO fts VALUES (3,
'Ein dritter deutscher Text rundet die Sache ab.');

Now, for the statistics, I use the same function as I used to define
the index:

SELECT * FROM ts_stat('SELECT to_tsvector(''german'', val) FROM fts');

word | ndoc | nentry
---------+------+--------
ab | 1 | 1
mehr | 1 | 1
sach | 1 | 1
selb | 1 | 1
text | 3 | 3
weit | 1 | 1
wort | 2 | 3
dritt | 1 | 1
natur | 1 | 1
etlich | 1 | 1
rundet | 1 | 1
deutsch | 3 | 3
doppelt | 1 | 1
enthalt | 1 | 1
(14 rows)

Yours,
Laurenz Albe


Reply With Quote
  #9  
Old   
Laurenz Albe
 
Posts: n/a

Default Re: tsearch2: List of lexemes and their count - 03-26-2008 , 03:26 AM



Thomas Guettler <hv (AT) tbz-pariv (DOT) de> wrote:
Quote:
How can you get all lexemes of an indexed column and how
often they occur?

This would help me to find words which should
be added to the stop word list.
This is explained in
http://www.postgresql.org/docs/curre...RCH-STATISTICS

An example:

CREATE TABLE fts(id integer PRIMARY KEY, val text);
CREATE INDEX fts_index ON fts USING gin(to_tsvector('german', val));

INSERT INTO fts VALUES (1,
'dies ist ein deutscher Text mit mehreren Worten; '
Quote:
| 'etliche Worte doppelt.');
INSERT INTO fts VALUES (2,
'ein weiterer deutscher Text enthält andere Worte, '
Quote:
| 'aber natürlich auch die selben.');
INSERT INTO fts VALUES (3,
'Ein dritter deutscher Text rundet die Sache ab.');

Now, for the statistics, I use the same function as I used to define
the index:

SELECT * FROM ts_stat('SELECT to_tsvector(''german'', val) FROM fts');

word | ndoc | nentry
---------+------+--------
ab | 1 | 1
mehr | 1 | 1
sach | 1 | 1
selb | 1 | 1
text | 3 | 3
weit | 1 | 1
wort | 2 | 3
dritt | 1 | 1
natur | 1 | 1
etlich | 1 | 1
rundet | 1 | 1
deutsch | 3 | 3
doppelt | 1 | 1
enthalt | 1 | 1
(14 rows)

Yours,
Laurenz Albe


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.