dbTalk Databases Forums  

Regular expression help required please

comp.databases.oracle.misc comp.databases.oracle.misc


Discuss Regular expression help required please in the comp.databases.oracle.misc forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Kevin S
 
Posts: n/a

Default Regular expression help required please - 09-21-2010 , 06:31 AM






Hi All,

From what I understand the sql below should return 'Hello World',
however it actually returns 'HelloWorld' i.e. the whitespace is not
ignored.

Can anyone tell me what I am doing wrong?

Thanks

SELECT regexp_replace('H\/e l£lo World','[^a-zA-Z0-9]', '') FROM dual;

Reply With Quote
  #2  
Old   
Kay Kanekowski
 
Posts: n/a

Default Re: Regular expression help required please - 09-21-2010 , 07:55 AM






Hi,
a simple solution is this
SELECT regexp_replace('H\/e l£lo World','[^ a-zA-Z0-9]', '') FROM
dual;

but it produces this "He llo World"

The questions is what is the different between the space in "/e l" and
the other space in "o W" ?

If you have an answer there might be a rule to discribe it.

regards
Kay

Reply With Quote
  #3  
Old   
Kevin S
 
Posts: n/a

Default Re: Regular expression help required please - 09-21-2010 , 08:18 AM



Firstly I realise that I typed the wrong regular expression in my
question which won't help so apologies for that.

Perhaps it would be easier if I say that my aim is to strip out
everything except alphanumeric characters and whitespaces from strings
to get over an error we are getting in an Oracle 8 database.

We can reproduce the error as follows where step 4 results in the
'ORA-01401: inserted value too large for column' error even though
it's the right length.

1 DROP TABLE FIFTYCHAR;
2 CREATE TABLE FIFTYCHAR(FIFTY_CHAR_COL VARCHAR2(50));
3 SELECT length('• 123456789012345678901234567890123456789012345678')
FROM dual;
4 INSERT INTO FIFTYCHAR(FIFTY_CHAR_COL)
VALUES('• 123456789012345678901234567890123456789012345678') ;
5 DROP TABLE FIFTYCHAR;

I think I have found an initial solution to my proposed solution.

SELECT regexp_replace('H\el£lo World','[^[:alnum:][:space:]]', '')
FROM dual;

will strip out the alpha numerics while leaving white spaces.

It's not ideal as it strips out £ signs and all punctuation, however
it's characters that fall under the punctuation category (Bullet
points, £ signs etc) which are causing the problem.

Would anyone have any suggestions as to how to only strip out
troublesome characters?

Reply With Quote
  #4  
Old   
Mark D Powell
 
Posts: n/a

Default Re: Regular expression help required please - 09-21-2010 , 09:38 AM



On Sep 21, 9:18*am, Kevin S <sear... (AT) googlemail (DOT) com> wrote:
Quote:
Firstly I realise that I typed the wrong regular expression in my
question which won't help so apologies for that.

Perhaps it would be easier if I say that my aim is to strip out
everything except alphanumeric characters and whitespaces from strings
to get over an error we are getting in an Oracle 8 database.

We can reproduce the error as follows where step 4 results in the
'ORA-01401: inserted value too large for column' error even though
it's the right length.

1 DROP TABLE FIFTYCHAR;
2 CREATE TABLE FIFTYCHAR(FIFTY_CHAR_COL VARCHAR2(50));
3 SELECT length('• * *123456789012345678901234567890123456789012345678' )
FROM dual;
4 INSERT INTO FIFTYCHAR(FIFTY_CHAR_COL)
* *VALUES('• *123456789012345678901234567890123456789012345678' );
5 DROP TABLE FIFTYCHAR;

I think I have found an initial solution to my proposed solution.

SELECT *regexp_replace('H\el£lo World','[^[:alnum:][:space:]]', '')
FROM dual;

will strip out the alpha numerics while leaving white spaces.

It's not ideal as it strips out £ signs and all punctuation, however
it's characters that fall under the punctuation category (Bullet
points, £ signs etc) which are causing the problem.

Would anyone have any suggestions as to how to only strip out
troublesome characters?
If you cannot figure it out using regular expressions alone then you
might consider falling back to using the traditional replace function
to locate and eliminate specific characters (replace Euro symbol with
null, etc...).

HTH -- Mark D Powell --

Reply With Quote
  #5  
Old   
Gerard H. Pille
 
Posts: n/a

Default Re: Regular expression help required please - 09-21-2010 , 12:16 PM



Kevin S wrote:
Quote:
I think I have found an initial solution to my proposed solution.

SELECT regexp_replace('H\el£lo World','[^[:alnum:][:space:]]', '')
FROM dual;

will strip out the alpha numerics while leaving white spaces.

It's not ideal as it strips out £ signs and all punctuation, however
it's characters that fall under the punctuation category (Bullet
points, £ signs etc) which are causing the problem.

Would anyone have any suggestions as to how to only strip out
troublesome characters?
Would this do?

select regexp_replace('H\eL€lo World', '[^[rint:]]','') from dual

Reply With Quote
  #6  
Old   
Tim X
 
Posts: n/a

Default Re: Regular expression help required please - 09-28-2010 , 06:35 PM



Kevin S <searlek (AT) googlemail (DOT) com> writes:

Quote:
Firstly I realise that I typed the wrong regular expression in my
question which won't help so apologies for that.

Perhaps it would be easier if I say that my aim is to strip out
everything except alphanumeric characters and whitespaces from strings
to get over an error we are getting in an Oracle 8 database.

We can reproduce the error as follows where step 4 results in the
'ORA-01401: inserted value too large for column' error even though
it's the right length.

1 DROP TABLE FIFTYCHAR;
2 CREATE TABLE FIFTYCHAR(FIFTY_CHAR_COL VARCHAR2(50));
3 SELECT length('• 123456789012345678901234567890123456789012345678')
FROM dual;
4 INSERT INTO FIFTYCHAR(FIFTY_CHAR_COL)
VALUES('• 123456789012345678901234567890123456789012345678') ;
5 DROP TABLE FIFTYCHAR;

I think I have found an initial solution to my proposed solution.

SELECT regexp_replace('H\el£lo World','[^[:alnum:][:space:]]', '')
FROM dual;

will strip out the alpha numerics while leaving white spaces.

It's not ideal as it strips out £ signs and all punctuation, however
it's characters that fall under the punctuation category (Bullet
points, £ signs etc) which are causing the problem.

Would anyone have any suggestions as to how to only strip out
troublesome characters?
Just a side note. I suspect you may be solving the wrong problem.

If I understand your issue, the problem is that your getting errors that
the data is too long to fit in the varchar2 column, but in the variable
has the correct number of characters.

This is almost certainly due to you having mixed character encodings in
your data. Most likely, a different database character encoding from
your clients. If you look at the docs for some of the sql functions
dealing with things like calculating the length of data, you will se
that the varchar2 related versions report the number of characters, not
the number of bytes. This is why it looks like your string is the right
length., but generates the exception you are seeing.

Trying to 'fix' this with regular expressions is unmaintainable as you
would need to filter out all multibyte characters, not just the
punctuation ones you have noticed. Also note that some regexpression
implementations don't handle mixed encodings well without adding
additional switches (I'm not sure about Oracle's implementation).

Read up about Oracle's support for internationalisation, look at
nvarchr2 et.al. and check what the default encodiing is for both your
database and clients. Note also that there are some sql functions that
you can pass a string to which will return more informative/accurate
information regarding the actaul characters and their 'real' length.

Tim

--
tcross (at) rapttech dot com dot au

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.