dbTalk Databases Forums  

Re: SQL and info in text fields

comp.databases comp.databases


Discuss Re: SQL and info in text fields in the comp.databases forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Bob Badour
 
Posts: n/a

Default Re: SQL and info in text fields - 07-04-2003 , 11:05 AM






"Tim" <tim (AT) NOSPAM (DOT) invalid> wrote

Quote:
Hello,

I'm struggling with text searches, both in functionality and speed.

I have a table with an ID and a value, the value being textual. The
problem is that some words can have different meanings (Casino: the
movie, gambling hall, supermarket chain,...). To distinguish between
these meanings, they used to clarify them in brackets.

This obviously isn't the best way to do something like that, but the
data is there and I have to work with it. At least, that's what I
should do, but I don't know how to go about it.

Some of the entries could be
1 dog
2 Rock festival (14/07)
3 bowl
4 blue (color)
5 magazine
6 Casino (gambling)
7 Casino (supermarket)
8 food
9 blue (feeling)
10 bark
11 Casino (movie)
...

What I would like to know:
- how many terms are there with multiple meanings?
- which terms have how many different meanings?
- what are those different meanings?
- ...

I don't know how to go about it, since I can only be sure by looking
for the part of the value up until the opening bracket and compare
that to every other entry. I don't know how to do this in one SQL
statement, I think I'd have to loop over the total table, which would
create a huge load.

Does anybody know of a way to do such thing?
What dbms are you using? You may not be able to do what you want with SQL
unless the dbms you use has sufficient string manipulation operations or
allows one to create user-defined functions.

By syntactically encoding information in a substring instead of representing
it as a separate value, someone has taken simple queries and turned them
into parsing exercises. There are real consequences to violating the
information principle.

I suggest you correct the logical schema first, and then your queries will
become very easy.




Reply With Quote
  #2  
Old   
Bob Badour
 
Posts: n/a

Default Re: SQL and info in text fields - 07-04-2003 , 12:31 PM






"Tim" <tim (AT) NOSPAM (DOT) invalid> wrote

Quote:
What dbms are you using? You may not be able to do what you want with SQL
unless the dbms you use has sufficient string manipulation operations or
allows one to create user-defined functions.

I'm using MySQL at the moment, but something else might be used in the
future (which needn't be a problem).

By syntactically encoding information in a substring instead of
representing
it as a separate value, someone has taken simple queries and turned them
into parsing exercises. There are real consequences to violating the
information principle.

I suggest you correct the logical schema first, and then your queries
will
become very easy.

The problem is that I have an enormous amount of data like this (at
least when it should be edited by hand), so I was really hoping for a
more automated way of doing this (with still some performance in
mind).
From a quick perusal of the mysql documentation at
http://www.mysql.com/documentation/m...rence.html#Str
ing_functions, it appears you should find the following string functions
useful: LOCATE, POSITION, INSTR, LEFT, RIGHT, SUBSTRING, MID, SUBSTRING,
TRIM.

You may also find IF and CASE/WHEN very useful. Consider automating the task
with something like:

UPDATE table
SET baseword =
CASE WHEN LOCATE( word, '(' ) < 1
THEN
word
ELSE
RTRIM( LEFT( word, LOCATE( word, '(' ) - 1 ) )
END
, qualifier =
CASE WHEN LOCATE( word, '(' ) < 1
THEN
NULL
ELSE
CASE WHEN LOCATE( word, ')', LOCATE( word, '(' ) ) < 1
THEN
TRIM( SUBSTRING( word, LOCATE( word, '(' ) + 1 ) )
ELSE
TRIM( SUBSTRING( word
, LOCATE( word, '(' ) + 1
, LOCATE( word, ')', LOCATE( word, '(' ) ) - LOCATE( word,
'(' ) - 1
) )
END
END
;




Reply With Quote
  #3  
Old   
Ed Prochak
 
Posts: n/a

Default Re: SQL and info in text fields - 07-04-2003 , 04:19 PM



Bob Badour wrote:
Quote:
"Tim" <tim (AT) NOSPAM (DOT) invalid> wrote in message
news:tnbbgvcg57rvattpj1fafj24cb38mprlbr (AT) 4ax (DOT) com...

What dbms are you using? You may not be able to do what you want with SQL
unless the dbms you use has sufficient string manipulation operations or
allows one to create user-defined functions.

I'm using MySQL at the moment, but something else might be used in the
future (which needn't be a problem).


By syntactically encoding information in a substring instead of

representing

it as a separate value, someone has taken simple queries and turned them
into parsing exercises. There are real consequences to violating the
information principle.

I suggest you correct the logical schema first, and then your queries

will

become very easy.

The problem is that I have an enormous amount of data like this (at
least when it should be edited by hand), so I was really hoping for a
more automated way of doing this (with still some performance in
mind).


From a quick perusal of the mysql documentation at
http://www.mysql.com/documentation/m...rence.html#Str
ing_functions, it appears you should find the following string functions
useful: LOCATE, POSITION, INSTR, LEFT, RIGHT, SUBSTRING, MID, SUBSTRING,
TRIM.

You may also find IF and CASE/WHEN very useful. Consider automating the task
with something like:

UPDATE table
SET baseword =
CASE WHEN LOCATE( word, '(' ) < 1
THEN
word
ELSE
RTRIM( LEFT( word, LOCATE( word, '(' ) - 1 ) )
END
, qualifier =
CASE WHEN LOCATE( word, '(' ) < 1
THEN
NULL
ELSE
CASE WHEN LOCATE( word, ')', LOCATE( word, '(' ) ) < 1
THEN
TRIM( SUBSTRING( word, LOCATE( word, '(' ) + 1 ) )
ELSE
TRIM( SUBSTRING( word
, LOCATE( word, '(' ) + 1
, LOCATE( word, ')', LOCATE( word, '(' ) ) - LOCATE( word,
'(' ) - 1
) )
END
END
;


But I think, Bob, that you first suggestion is best. He'll really be better
off parsing the fields once, converting them to appropriately designed tables.
Especially if there really is "an enormous amount" of data, whatever that
means to him. (I once thought a Megabyte was enormous.)



--
Ed Prochak
running http://www.faqs.org/faqs/running-faq/
netiquette http://www.psg.com/emily.html
--
"Two roads diverged in a wood and I
I took the one less travelled by
and that has made all the difference."
robert frost



Reply With Quote
  #4  
Old   
Bob Badour
 
Posts: n/a

Default Re: SQL and info in text fields - 07-04-2003 , 07:21 PM



"Ed Prochak" <edprochak (AT) adelphia (DOT) net> wrote

Quote:
Bob Badour wrote:
"Tim" <tim (AT) NOSPAM (DOT) invalid> wrote in message
news:tnbbgvcg57rvattpj1fafj24cb38mprlbr (AT) 4ax (DOT) com...

What dbms are you using? You may not be able to do what you want with
SQL
unless the dbms you use has sufficient string manipulation operations
or
allows one to create user-defined functions.

I'm using MySQL at the moment, but something else might be used in the
future (which needn't be a problem).


By syntactically encoding information in a substring instead of

representing

it as a separate value, someone has taken simple queries and turned
them
into parsing exercises. There are real consequences to violating the
information principle.

I suggest you correct the logical schema first, and then your queries

will

become very easy.

The problem is that I have an enormous amount of data like this (at
least when it should be edited by hand), so I was really hoping for a
more automated way of doing this (with still some performance in
mind).


From a quick perusal of the mysql documentation at

http://www.mysql.com/documentation/m...rence.html#Str
ing_functions, it appears you should find the following string functions
useful: LOCATE, POSITION, INSTR, LEFT, RIGHT, SUBSTRING, MID, SUBSTRING,
TRIM.

You may also find IF and CASE/WHEN very useful. Consider automating the
task
with something like:

UPDATE table
SET baseword =
CASE WHEN LOCATE( word, '(' ) < 1
THEN
word
ELSE
RTRIM( LEFT( word, LOCATE( word, '(' ) - 1 ) )
END
, qualifier =
CASE WHEN LOCATE( word, '(' ) < 1
THEN
NULL
ELSE
CASE WHEN LOCATE( word, ')', LOCATE( word, '(' ) ) < 1
THEN
TRIM( SUBSTRING( word, LOCATE( word, '(' ) + 1 ) )
ELSE
TRIM( SUBSTRING( word
, LOCATE( word, '(' ) + 1
, LOCATE( word, ')', LOCATE( word, '(' ) ) - LOCATE( word,
'(' ) - 1
) )
END
END
;



But I think, Bob, that you first suggestion is best. He'll really be
better
off parsing the fields once, converting them to appropriately designed
tables.
Especially if there really is "an enormous amount" of data, whatever that
means to him. (I once thought a Megabyte was enormous.)
The update statement was intended to demonstrate one way of doing that. Of
course, I dislike the use of NULL in the qualifier column, but I think if
Tim does properly redesign the database schema, the above should give him
enough ideas to work with.

Instead of adding two new columns to the existing table, he could invent two
new tables: one with base words and the other with qualifiers where
qualifiers exist. Instead of an update statement, he would use a pair of
INSERT/SELECT or SELECT/INTO statements. (Hint: MIN or MAX will help
identify a unique identifier for each baseword.)

Or he could move the qualifiers into a different table using an
INSERT/SELECT and remove the qualifiers from the original using an UPDATE
followed by a DELETE to remove duplicate basewords.

Or he could just use something other than a NULL qualifier for unqualified
basewords.

In any case, Tim will have to adjust the above and verify it does what he
wants it to do. It's not like I ran the above through a mysql parser let
alone tested it on data.

(In 1981, I thought 32k was "Da Bomb!". I once squeezed a self-modifying
program for pixel-plots of conic sections and sinusoids into 3k on my best
friend's Commodore VIC 20. In 1983, I stood in awe in front of the brand new
5 gigabyte winchester drives the University of Toronto had purchased for
keeping student records. The platters were 6' in diameter--my
high-definition widescreen tv takes up about an eighth the volume being
about half the length in each linear dimension. I don't think anybody makes
a hard-drive that small anymore even in a 2" compact format.)




Reply With Quote
  #5  
Old   
Mike Preece
 
Posts: n/a

Default Re: SQL and info in text fields - 07-05-2003 , 12:02 AM



Quote:
"Tim" <tim (AT) NOSPAM (DOT) invalid> wrote in message
news:u30bgvklgbt4tv8u309vu558t25kp66mcn (AT) 4ax (DOT) com...
Hello,

I'm struggling with text searches, both in functionality and speed.

I have a table with an ID and a value, the value being textual. The
problem is that some words can have different meanings (Casino: the
movie, gambling hall, supermarket chain,...). To distinguish between
these meanings, they used to clarify them in brackets.

This obviously isn't the best way to do something like that, but the
data is there and I have to work with it. At least, that's what I
should do, but I don't know how to go about it.

Some of the entries could be
1 dog
2 Rock festival (14/07)
3 bowl
4 blue (color)
5 magazine
6 Casino (gambling)
7 Casino (supermarket)
8 food
9 blue (feeling)
10 bark
11 Casino (movie)
...
If I was faced with this task on a Pick system I would do the
following...

Let's call the incoming table 'Text' and assume it's in a *nix or dos
folder or directory called 'dirfolder'.

CREATE-FILE LEXICON 1 101
CREATE-INDEX LEXICON A1

ED BP GETTEXT
..I
001 OPEN '{dos:\ or *nix:/}dirfolder' TO OSFILE ELSE DEBUG
002 READ TEXTSTR FROM OSFILE,'Text' ELSE DEBUG
003 OPEN 'LEXICON' TO LEXICON ELSE DEBUG
004 ROOT 'LEXICON','A1' TO PHRASEINDEX ELSE DEBUG
005 NEXTITEMID=1
006 FOR X=DCOUNT(TEXTSTR,@AM) TO 1 STEP -1
007 LN=TEXSTR<X> ;* Get a line of text, something like "2 Rock
festival (14/07)"
008 ID=FIELD(LN,' ',1) ;* Extract the ID ('2')
009 WORDS=LN[COL1()+1,99] ;* Get the text without the ID (assume max
len = 99)
010 OPENBRS=COUNT(WORDS,'(') ;* How many open brackets are there?
011 IF OPENBRS AND WORDS[LEN(WORDS),1]=')' THEN ;* If there is a
'qualifier'...
012 OPENBR=INDEX(WORDS,'(',OPENBRS) ;* Get the position of the
last open bracket
013 KEYWORDS=WORDS[1,OPENBR-2] ;* In the example this would be the
phrase 'Rock festival'
014 QUALIFIER=LN[OPENBR+1,99] ;* In the example this would be
'14/07)'
015 QUALIFIER=QUALIFIER[1,LEN(QUALIFIER)-1] ;* '14/07'
016 END ELSE
017 KEYWORDS=WORDS
018 QUALIFIER=''
019 END
020 KEY('R',PHRASEINDEX,KEYWORDS,ITEMID) THEN ;* Get the key for the
item containing the phrase ('Rock festival')
021 READU REC FROM LEXICON,ITEMID ELSE DEBUG ;* Read the item
022 LOCATE QUALIFIER IN REC<2>,1 BY 'AL' SETTING POS THEN ;* Look
for '14/07' in the list of qualifiers in attribute 2
023 REC<3,POS,-1>=ID ;* Add the ID ('2') to the list of IDs for
the combination of phrase ('Rock festival') & qualifier ('14/07')
024 END ELSE
025 INS QUALIFIER BEFORE REC<2,POS> ;* Add the 'qualifier'
('14/07') to the list of qualifiers for the phrase
026 INS ID BEFORE REC<3,POS> ;* Record the ID ('2')
027 END
028 END ELSE
029 ITEMID=NEXTITEMID
030 NEXTITEMID+=1
031 REC=KEYWORDS
032 REC<2>=QUALIFIER
033 REC<3>=ID
034 END
035 WRITE REC ON LEXICON,ITEMID
036 NEXT X
037 STOP

After running the above program (untested as it is), we would have a
file like this:

1
001 dog
002
003 1

2
001 Rock festival
002 14/07
003 2

3
001 bowl
002
003 3

4
001 blue
002 colour]feeling
003 4]9

5
001 magazine
002
003 5

6
001 Casino
002 gambling]supermarket]movie
003 6]7]11

7
001 food
002
003 8

8
001 bark
002
003 10

So that if we were to simply list the file we would see something like
this:

File name: LEXICON
ItemID...Phrase (A1).....Qualifier (A2)...Text ID (A3)
1........dog..............................1
2........Rock festival...14/07............2
3........bowl.............................3
4........blue............color............4
..........................feeling..........9
5........magazine.........................5
6........Casino..........gambling.........6
..........................supermarket......7
..........................movie............11
7........food.............................8
8........bark.............................10
....

Quote:
What I would like to know:
- how many terms are there with multiple meanings?
COUNT LEXICON WITH A2
- which terms have how many different meanings?
LIST LEXICON CNT2
where CNT2 is a dictionary item with "F;2;(S;'1';'0');S" in attribute
8
Quote:
- what are those different meanings?
see listing above
- ...

I don't know how to go about it, since I can only be sure by looking
for the part of the value up until the opening bracket and compare
that to every other entry. I don't know how to do this in one SQL
statement, I think I'd have to loop over the total table, which would
create a huge load.

Does anybody know of a way to do such thing?
If you'd like to give Pick a try I can supply links to free downloads
for evaluation purposes.

Regards
Mike.


Reply With Quote
  #6  
Old   
Jerry Gitomer
 
Posts: n/a

Default Re: SQL and info in text fields - 07-05-2003 , 10:26 AM



Mike Preece wrote:
Quote:
"Tim" <tim (AT) NOSPAM (DOT) invalid> wrote in message
news:u30bgvklgbt4tv8u309vu558t25kp66mcn (AT) 4ax (DOT) com...

Hello,

I'm struggling with text searches, both in functionality and speed.

I have a table with an ID and a value, the value being textual. The
problem is that some words can have different meanings (Casino: the
movie, gambling hall, supermarket chain,...). To distinguish between
these meanings, they used to clarify them in brackets.

This obviously isn't the best way to do something like that, but the
data is there and I have to work with it. At least, that's what I
should do, but I don't know how to go about it.

Some of the entries could be
1 dog
2 Rock festival (14/07)
3 bowl
4 blue (color)
5 magazine
6 Casino (gambling)
7 Casino (supermarket)
8 food
9 blue (feeling)
10 bark
11 Casino (movie)
...


If I was faced with this task on a Pick system I would do the
following...

Let's call the incoming table 'Text' and assume it's in a *nix or dos
folder or directory called 'dirfolder'.

CREATE-FILE LEXICON 1 101
CREATE-INDEX LEXICON A1

ED BP GETTEXT
.I
001 OPEN '{dos:\ or *nix:/}dirfolder' TO OSFILE ELSE DEBUG
002 READ TEXTSTR FROM OSFILE,'Text' ELSE DEBUG
003 OPEN 'LEXICON' TO LEXICON ELSE DEBUG
004 ROOT 'LEXICON','A1' TO PHRASEINDEX ELSE DEBUG
005 NEXTITEMID=1
006 FOR X=DCOUNT(TEXTSTR,@AM) TO 1 STEP -1
007 LN=TEXSTR<X> ;* Get a line of text, something like "2 Rock
festival (14/07)"
008 ID=FIELD(LN,' ',1) ;* Extract the ID ('2')
009 WORDS=LN[COL1()+1,99] ;* Get the text without the ID (assume max
len = 99)
010 OPENBRS=COUNT(WORDS,'(') ;* How many open brackets are there?
011 IF OPENBRS AND WORDS[LEN(WORDS),1]=')' THEN ;* If there is a
'qualifier'...
012 OPENBR=INDEX(WORDS,'(',OPENBRS) ;* Get the position of the
last open bracket
013 KEYWORDS=WORDS[1,OPENBR-2] ;* In the example this would be the
phrase 'Rock festival'
014 QUALIFIER=LN[OPENBR+1,99] ;* In the example this would be
'14/07)'
015 QUALIFIER=QUALIFIER[1,LEN(QUALIFIER)-1] ;* '14/07'
016 END ELSE
017 KEYWORDS=WORDS
018 QUALIFIER=''
019 END
020 KEY('R',PHRASEINDEX,KEYWORDS,ITEMID) THEN ;* Get the key for the
item containing the phrase ('Rock festival')
021 READU REC FROM LEXICON,ITEMID ELSE DEBUG ;* Read the item
022 LOCATE QUALIFIER IN REC<2>,1 BY 'AL' SETTING POS THEN ;* Look
for '14/07' in the list of qualifiers in attribute 2
023 REC<3,POS,-1>=ID ;* Add the ID ('2') to the list of IDs for
the combination of phrase ('Rock festival') & qualifier ('14/07')
024 END ELSE
025 INS QUALIFIER BEFORE REC<2,POS> ;* Add the 'qualifier'
('14/07') to the list of qualifiers for the phrase
026 INS ID BEFORE REC<3,POS> ;* Record the ID ('2')
027 END
028 END ELSE
029 ITEMID=NEXTITEMID
030 NEXTITEMID+=1
031 REC=KEYWORDS
032 REC<2>=QUALIFIER
033 REC<3>=ID
034 END
035 WRITE REC ON LEXICON,ITEMID
036 NEXT X
037 STOP

After running the above program (untested as it is), we would have a
file like this:

1
001 dog
002
003 1

2
001 Rock festival
002 14/07
003 2

3
001 bowl
002
003 3

4
001 blue
002 colour]feeling
003 4]9

5
001 magazine
002
003 5

6
001 Casino
002 gambling]supermarket]movie
003 6]7]11

7
001 food
002
003 8

8
001 bark
002
003 10

So that if we were to simply list the file we would see something like
this:

File name: LEXICON
ItemID...Phrase (A1).....Qualifier (A2)...Text ID (A3)
1........dog..............................1
2........Rock festival...14/07............2
3........bowl.............................3
4........blue............color............4
.........................feeling..........9
5........magazine.........................5
6........Casino..........gambling.........6
.........................supermarket......7
.........................movie............11
7........food.............................8
8........bark.............................10
...


What I would like to know:
- how many terms are there with multiple meanings?

COUNT LEXICON WITH A2

- which terms have how many different meanings?

LIST LEXICON CNT2

where CNT2 is a dictionary item with "F;2;(S;'1';'0');S" in attribute
8

- what are those different meanings?

see listing above

- ...

I don't know how to go about it, since I can only be sure by looking
for the part of the value up until the opening bracket and compare
that to every other entry. I don't know how to do this in one SQL
statement, I think I'd have to loop over the total table, which would
create a huge load.

Does anybody know of a way to do such thing?


If you'd like to give Pick a try I can supply links to free downloads
for evaluation purposes.

Regards
Mike.
If you prefer sticking with a traditional RDBMS....

The quickest retrieval on text data is based on the use of fully
inverted data, that is, indexes on each word that appears in the
text being indexed. (With an exception being made for the most
common words in the language. For example words such as "the",
"a", "is" are not indexed.)



Rather than do this yourself you should investigate the text
searching capabilities of the readily available database systems.

Having worked with Oracle's text searching package I can attest
to the performance -- incredibly quick! Given the competitive
nature of the Relational Data Base industry I presume that
Oracle's competitors have comparable facilities available.
Oracle (and it's competitors -- I think) can search using AND
and OR conditions combined with NOT conditions, etc. You can
also search for combinations of words that are near each other,
for example words that appear within ten words of each other in
a text entry.

The primary benefit is that all you have to worry about is the
content of the database. The code required to build and
maintain the database and its indexes is built and maintained by
someone else.





Reply With Quote
  #7  
Old   
Bob Badour
 
Posts: n/a

Default Re: SQL and info in text fields - 07-05-2003 , 11:13 AM



"Jerry Gitomer" <jgitomer (AT) erols (DOT) com> wrote

Quote:
Mike Preece wrote:
"Tim" <tim (AT) NOSPAM (DOT) invalid> wrote in message
news:u30bgvklgbt4tv8u309vu558t25kp66mcn (AT) 4ax (DOT) com...
Hello,

I'm struggling with text searches, both in functionality and speed.

I have a table with an ID and a value, the value being textual. The
problem is that some words can have different meanings (Casino: the
movie, gambling hall, supermarket chain,...). To distinguish between
these meanings, they used to clarify them in brackets.

This obviously isn't the best way to do something like that, but the
data is there and I have to work with it. At least, that's what I
should do, but I don't know how to go about it.

[ridiculously complex pick example snipped]


Quote:
If you prefer sticking with a traditional RDBMS....

The quickest retrieval on text data is based on the use of fully
inverted data, that is, indexes on each word that appears in the
text being indexed. (With an exception being made for the most
common words in the language. For example words such as "the",
"a", "is" are not indexed.)
Did it ever occur to you that Tim's problem relates to how to express what
he wants? Until he can express what he wants, the availability of indexing
strategies are useless to him.




Reply With Quote
  #8  
Old   
Mike Preece
 
Posts: n/a

Default Re: SQL and info in text fields - 07-05-2003 , 10:36 PM



"Bob Badour" <bbadour (AT) golden (DOT) net> wrote

Quote:
"Jerry Gitomer" <jgitomer (AT) erols (DOT) com> wrote in message
news:be6qng$mau$1 (AT) bob (DOT) news.rcn.net...
Mike Preece wrote:
"Tim" <tim (AT) NOSPAM (DOT) invalid> wrote in message
news:u30bgvklgbt4tv8u309vu558t25kp66mcn (AT) 4ax (DOT) com...
Hello,

I'm struggling with text searches, both in functionality and speed.

I have a table with an ID and a value, the value being textual. The
problem is that some words can have different meanings (Casino: the
movie, gambling hall, supermarket chain,...). To distinguish between
these meanings, they used to clarify them in brackets.

This obviously isn't the best way to do something like that, but the
data is there and I have to work with it. At least, that's what I
should do, but I don't know how to go about it.


[ridiculously complex pick example snipped]
Or - to put it another way - full parsing and data transformation code
supplied.

Quote:

If you prefer sticking with a traditional RDBMS....

The quickest retrieval on text data is based on the use of fully
inverted data, that is, indexes on each word that appears in the
text being indexed. (With an exception being made for the most
common words in the language. For example words such as "the",
"a", "is" are not indexed.)

Did it ever occur to you that Tim's problem relates to how to express what
he wants? Until he can express what he wants, the availability of indexing
strategies are useless to him.
So supply a solution Bob. Your inability to .... proves .... complex
.... weak... blah ... blah... insult... arrogant statement... drivel...
etc.

Now Bob is going to provide a simpler way to express what Tim wants.
Simpler than:

COUNT LEXICON WITH A2
(for "how many terms are there with multiple meanings?")

and

LIST LEXICON CNT2
(for "which terms have how many different meanings?")
Oops. That should have been "LIST LEXICON A1 CNT2". Added complexity
there huh?

and

LIST LEXICON A1 A2 A3
(for "what are those different meanings?").

Oh yes. The solution he's about to provide will be not only provide
all of the functionality Tim's looking for but will also be quicker
and cost less. I'm looking forward to this. Bob?

Oh and while we're at it - is there anything else you'd like to be
able to do with the data Tim? Would you like to know if there are any
duplicates (for both the "phrase" and "qualifier")? Maybe string
searches using "wild-cards", sub-strings, indexing or "soundex" - or
whatever?

Mike.


Reply With Quote
  #9  
Old   
Bob Badour
 
Posts: n/a

Default Re: SQL and info in text fields - 07-06-2003 , 12:04 AM



"Jerry Gitomer" <jgitomer (AT) erols (DOT) com> wrote

Quote:
Bob Badour wrote:
"Jerry Gitomer" <jgitomer (AT) erols (DOT) com> wrote in message
news:be6qng$mau$1 (AT) bob (DOT) news.rcn.net...

Mike Preece wrote:

"Tim" <tim (AT) NOSPAM (DOT) invalid> wrote in message
news:u30bgvklgbt4tv8u309vu558t25kp66mcn (AT) 4ax (DOT) com...
Hello,

I'm struggling with text searches, both in functionality and speed.

I have a table with an ID and a value, the value being textual. The
problem is that some words can have different meanings (Casino: the
movie, gambling hall, supermarket chain,...). To distinguish between
these meanings, they used to clarify them in brackets.

This obviously isn't the best way to do something like that, but the
data is there and I have to work with it. At least, that's what I
should do, but I don't know how to go about it.



[ridiculously complex pick example snipped]



If you prefer sticking with a traditional RDBMS....

The quickest retrieval on text data is based on the use of fully
inverted data, that is, indexes on each word that appears in the
text being indexed. (With an exception being made for the most
common words in the language. For example words such as "the",
"a", "is" are not indexed.)


Did it ever occur to you that Tim's problem relates to how to express
what
he wants? Until he can express what he wants, the availability of
indexing
strategies are useless to him.


Either you failed to read the paragraph in my response where I
suggested the use of a commercially available RDBMS with a text
search option that includes conditional search operations that
will easily satisfy Tim's stated requirments or I failed to make
myself clear.
I read the whole thing. AND, OR, NOT and NEAR will not satisfy Tim's stated
requirements. Did you read Tim's stated requirements? I am curious why you
think AND, OR, NOT and NEAR will solve the specific problems Tim identified.
Am I missing something?




Reply With Quote
  #10  
Old   
Bob Badour
 
Posts: n/a

Default Re: SQL and info in text fields - 07-06-2003 , 12:11 AM



"Mike Preece" <michael (AT) preece (DOT) net> wrote

Quote:
"Bob Badour" <bbadour (AT) golden (DOT) net> wrote

"Jerry Gitomer" <jgitomer (AT) erols (DOT) com> wrote in message
news:be6qng$mau$1 (AT) bob (DOT) news.rcn.net...
Mike Preece wrote:
"Tim" <tim (AT) NOSPAM (DOT) invalid> wrote in message
news:u30bgvklgbt4tv8u309vu558t25kp66mcn (AT) 4ax (DOT) com...
Hello,

I'm struggling with text searches, both in functionality and speed.

I have a table with an ID and a value, the value being textual. The
problem is that some words can have different meanings (Casino: the
movie, gambling hall, supermarket chain,...). To distinguish between
these meanings, they used to clarify them in brackets.

This obviously isn't the best way to do something like that, but the
data is there and I have to work with it. At least, that's what I
should do, but I don't know how to go about it.


[ridiculously complex pick example snipped]

Or - to put it another way - full parsing and data transformation code
supplied.
As I asked previously, why would he want a complex 40-step procedure when he
already has a single statement that performs the same with a tool he already
has? ie. full parsing and data transformation.


Quote:
If you prefer sticking with a traditional RDBMS....

The quickest retrieval on text data is based on the use of fully
inverted data, that is, indexes on each word that appears in the
text being indexed. (With an exception being made for the most
common words in the language. For example words such as "the",
"a", "is" are not indexed.)

Did it ever occur to you that Tim's problem relates to how to express
what
he wants? Until he can express what he wants, the availability of
indexing
strategies are useless to him.

So supply a solution Bob. Your inability to .... proves .... complex
... weak... blah ... blah... insult... arrogant statement... drivel...
etc.
I already did. Were you not paying attention? See:
http://groups.google.com/groups?dq=&...LjNa.280%24Rp1.
35746873%40mantis.golden.net




Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.