![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
#2
| |||
| |||
|
|
What would be a neat (i.e. reasonably efficient) way of splitting paragraphs of text? E.g splitting VALUES ('I would, if possible, like to see a sentence (such as this one) be split. Hopefully into words, punctuation and tokens.') Into SEQ 2 ----------- ------------------- 1 I 2 would 3 , 4 if 5 possible 6 , 7 like 8 to 9 see 10 a 11 sentence 12 ( 13 such 14 as 15 this 16 one 17 ) 18 be 19 split 20 . 21 Hopefully 22 into 23 words 24 , 25 punctuation 26 and 27 tokens 28 . |
#3
| |||
| |||
|
|
news:bhu3jp$13ga$1 (AT) gazette (DOT) almaden.ibm.com... What would be a neat (i.e. reasonably efficient) way of splitting paragraphs of text? E.g splitting ........ If you don't mind a small amount of programming, Java has a StringTokenizer class that would do the job very nicely in only a few lines of code. JDBC allows your Java program to access DB2 data quite simply. Or, you can write a SQL Query, like this: |
#4
| |||
| |||
|
|
Or, you can write a SQL Query, like this: WITH Source(text) AS ( VALUES ('I would, if possible, like to see a sentence (such as this one) be split. Hopefully into words, punctuation and tokens.') ) , Splitting (Seq, token, rest) AS ( SELECT 0, VARCHAR('', 50), LTRIM(text) FROM Source UNION ALL SELECT pre.Seq + 1 , VARCHAR(SUBSTR(pre.rest, 1, next_pos - 1), 50) , LTRIM(SUBSTR(pre.rest || ' ', next_pos)) FROM (SELECT pre.Seq , CASE WHEN SUBSTR(pre.rest, 1, 1) IN ( '.' , ',' , '(' , ')' ) THEN 2 ELSE POSSTR(TRANSLATE(pre.rest, ' ', '.,()'), ' ') END AS next_pos , pre.rest FROM Splitting pre WHERE pre.Seq < 1000 AND pre.rest <> '' ) AS pre ) SELECT Seq, token FROM splitting WHERE Seq > 0 ORDER BY Seq ; [snip] I don't know about efficient. Anyway, it will be not so difficult to make a UDF(SQL Table) based on this example. |
![]() |
| Thread Tools | |
| Display Modes | |
| |