![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||||
| |||||
|
|
... for example the pattern 'a|z' (which should match single 'a' or 'z' characters only, according to SQL spec) is converted into POSIX regular expression in the form of '^a|z$' which matches all strings beginning with 'a' ('abcdef' for example) and all strings ending with 'z' ('xyz' for example). So the meaning of the pattern is changed, which is not good. |
|
The behaviour above is also caused by similar_escape(), which converts '[_]' to '^[.]$' and '[%]' to '^[.*]$', not noticing the simple fact that these characters are inside brackets. |
|
Talking about square brackets, it should be noticed that there is a slight difference between SIMILAR TO and POSIX way of describing named character classes. |
|
This at least could be avoided simply by prepending regular expression returned by similar_escape() with a magic sequence '***:' which switches regexp engine into ARE mode. |
|
I think I am able to write such a patch in my spare time, |
#2
| |||
| |||
|
|
As near as I can tell, the SQL spec requires special characters to be escaped when they are inside a bracket construct. So indeed the above are invalid SQL regexes. |
|
Good point. Actually, do we want to force ARE mode, or something simpler? Perhaps ERE or even BRE would be a better match to the SQL spec. |
#3
| ||||||
| ||||||
|
|
('a' similar to 'a' escape null) is true (should be unknown!) |
|
I think that either PostgreSQL should check for nulls in SIMILAR TO construct before calling similar_escape(), or there should be two versions of similar_escape() function: one getting only one argument (for SIMILAR TO without ESCAPE) and second, getting two arguments (a pattern and an escape char). Which solution is better? |
|
As near as I can tell, the SQL spec requires special characters to be escaped when they are inside a bracket construct. So indeed the above are invalid SQL regexes. How the function should behave when such an invalid pattern is passed as its argument? Should it throw an error (this is what SQL spec says) or tolerate as much mistakes as possible, generating some warnings only? |
|
Good point. Actually, do we want to force ARE mode, or something simpler? Perhaps ERE or even BRE would be a better match to the SQL spec. I think that there is no difference which regexp dialect is choosen, only the speed matters. Function translating SIMILAR TO patterns into POSIX regular expressions will be more or less the same. What should I choose then? |
|
BTW, should I write some regression tests for SIMILAR TO? |
|
Should the changes be written for CVS HEAD only or 7.4/7.3 branches either? |
![]() |
| Thread Tools | |
| Display Modes | |
| |