dbTalk Databases Forums  

Regular Expressions in SELECT statement?

microsoft.public.sqlserver.clients microsoft.public.sqlserver.clients


Discuss Regular Expressions in SELECT statement? in the microsoft.public.sqlserver.clients forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Alex M
 
Posts: n/a

Default Regular Expressions in SELECT statement? - 06-30-2003 , 11:39 AM






Hi,

I need to separate certain data in a large text field into distinct columns,
however I need to use regular expression to extract the data, as it is field
delimited, as below:

"Value=123|Description=hello, world!|Data=123456ABCDE|"
"Description=goodbye, world!|Value=345|MoreData=ABCDE123456|"

The only information I have about the data in this text field, is that it is
all in a "<fieldname>=<value><delimiter>" format, and the order of the
tag-value pairs is not necessarily the same for all rows. Also, some rows
have more fields than others.

I'd like to be able to extract the data into columns (e.g. the above row
would have 3 columns: Value, Description, Data). For any particular SELECT
statement, extra tag-value columns would be ignored, and non-existent ones
would have a value of NULL.

So something like (pseudocode follows):
SELECT regexp("Value=*|") AS Value, regexp("Description=*|") AS Description,
regexp("Data=*|") AS Data
FROM TableName

Should return
"123", "hello, world!", "123456ABCDE"
"345", "goodbye, world!", NULL

Any ideas?

Thanks
Alex






Reply With Quote
  #2  
Old   
Aaron Bertrand - MVP
 
Posts: n/a

Default Re: Regular Expressions in SELECT statement? - 06-30-2003 , 12:03 PM






Ugh, string parsing


CREATE TABLE blat(txt VARCHAR(8000))

INSERT blat VALUES('Value=123|Description=hello, world!|Data=123456ABCDE|')

INSERT blat VALUES('Description=goodbye,
world!|Value=345|MoreStuff=ABCDE123456|')




SELECT
Value = CASE WHEN CHARINDEX('Value=', txt) > 0 THEN SUBSTRING(
txt,
CHARINDEX('Value=', txt) + 6,
CHARINDEX('|', SUBSTRING(txt, CHARINDEX('Value=', txt) + 6, LEN(txt))) - 1
) END,
Description = CASE WHEN CHARINDEX('Description=', txt) > 0 THEN SUBSTRING(
txt,
CHARINDEX('Description=', txt) + 12,
CHARINDEX('|', SUBSTRING(txt, CHARINDEX('Description=', txt) + 12,
LEN(txt))) - 1
) END,
Data = CASE WHEN CHARINDEX('Data=', txt) > 0 THEN SUBSTRING(
txt,
COALESCE(CHARINDEX('Data=', txt) + 5, LEN(txt)),
CHARINDEX('|', SUBSTRING(txt, CHARINDEX('Data=', txt) + 5, LEN(txt))) - 1
) END,
MoreStuff = CASE WHEN CHARINDEX('MoreStuff=', txt) > 0 THEN SUBSTRING(
txt,
CHARINDEX('MoreStuff=', txt) + 10,
CHARINDEX('|', SUBSTRING(txt, CHARINDEX('MoreStuff=', txt) + 10,
LEN(txt))) - 1
) END
FROM blat

DROP TABLE blat


Two problems... (1) you need an expression for each "field" you have in you
pipe-delimited text, and (2) to do this with one single pass of the table,
you can't have "fields" that are partial names of other fields... this is
why I changed MoreData to MoreStuff.

In other words, maybe you should consider normalization.





"Alex M" <nospam (AT) hotmail (DOT) com> wrote

Quote:
Hi,

I need to separate certain data in a large text field into distinct
columns,
however I need to use regular expression to extract the data, as it is
field
delimited, as below:

"Value=123|Description=hello, world!|Data=123456ABCDE|"
"Description=goodbye, world!|Value=345|MoreData=ABCDE123456|"

The only information I have about the data in this text field, is that it
is
all in a "<fieldname>=<value><delimiter>" format, and the order of the
tag-value pairs is not necessarily the same for all rows. Also, some rows
have more fields than others.

I'd like to be able to extract the data into columns (e.g. the above row
would have 3 columns: Value, Description, Data). For any particular SELECT
statement, extra tag-value columns would be ignored, and non-existent ones
would have a value of NULL.

So something like (pseudocode follows):
SELECT regexp("Value=*|") AS Value, regexp("Description=*|") AS
Description,
regexp("Data=*|") AS Data
FROM TableName

Should return
"123", "hello, world!", "123456ABCDE"
"345", "goodbye, world!", NULL

Any ideas?

Thanks
Alex








Reply With Quote
  #3  
Old   
bruce barker
 
Posts: n/a

Default Re: Regular Expressions in SELECT statement? - 06-30-2003 , 12:52 PM



pretty trival, but you will be restricted to a 4k value for any tag.


create function dbo.GetValue(
@s text,
@fn varchar(4000))
returns varchar(4000)
as
begin
declare @i int,
@i2 int,
@v varchar(4000)

set @i = patindex('%|' + @fn + '=%',@s)
if @i = 0
begin
if left(cast(@s as varchar(4000)),len(@fn)) <> @fn
return null
end
begin
set @i = @i + len(@fn) + 1
set @i2 = charindex('|',@s,@i)
set @v = substring(@s,@i+1,@i2-@i-1)
end
return @v
end


SELECT dbo.GetValue('Value',TextField) AS Value,
dbo.GetValue('Description',TextField) AS Description,
dbo.GetValue('Data',TextField) AS Data
FROM TableName


-- bruce (sqlwork.com)

"Alex M" <nospam (AT) hotmail (DOT) com> wrote

Quote:
Hi,

I need to separate certain data in a large text field into distinct
columns,
however I need to use regular expression to extract the data, as it is
field
delimited, as below:

"Value=123|Description=hello, world!|Data=123456ABCDE|"
"Description=goodbye, world!|Value=345|MoreData=ABCDE123456|"

The only information I have about the data in this text field, is that it
is
all in a "<fieldname>=<value><delimiter>" format, and the order of the
tag-value pairs is not necessarily the same for all rows. Also, some rows
have more fields than others.

I'd like to be able to extract the data into columns (e.g. the above row
would have 3 columns: Value, Description, Data). For any particular SELECT
statement, extra tag-value columns would be ignored, and non-existent ones
would have a value of NULL.

So something like (pseudocode follows):
SELECT regexp("Value=*|") AS Value, regexp("Description=*|") AS
Description,
regexp("Data=*|") AS Data
FROM TableName

Should return
"123", "hello, world!", "123456ABCDE"
"345", "goodbye, world!", NULL

Any ideas?

Thanks
Alex








Reply With Quote
  #4  
Old   
bruce barker
 
Posts: n/a

Default Re: Regular Expressions in SELECT statement? - 06-30-2003 , 12:59 PM



minor correction, you can use varchar(8000), and double field size. if using
ntext, then you are restricted to nvarchar(4000)

-- bruce (sqlwork.com)


"bruce barker" <nospam_brubar (AT) safeco (DOT) com> wrote

Quote:
pretty trival, but you will be restricted to a 4k value for any tag.


create function dbo.GetValue(
@s text,
@fn varchar(4000))
returns varchar(4000)
as
begin
declare @i int,
@i2 int,
@v varchar(4000)

set @i = patindex('%|' + @fn + '=%',@s)
if @i = 0
begin
if left(cast(@s as varchar(4000)),len(@fn)) <> @fn
return null
end
begin
set @i = @i + len(@fn) + 1
set @i2 = charindex('|',@s,@i)
set @v = substring(@s,@i+1,@i2-@i-1)
end
return @v
end


SELECT dbo.GetValue('Value',TextField) AS Value,
dbo.GetValue('Description',TextField) AS Description,
dbo.GetValue('Data',TextField) AS Data
FROM TableName


-- bruce (sqlwork.com)

"Alex M" <nospam (AT) hotmail (DOT) com> wrote in message
news:KIZLa.34$6W.19 (AT) newreader (DOT) ukcore.bt.net...
Hi,

I need to separate certain data in a large text field into distinct
columns,
however I need to use regular expression to extract the data, as it is
field
delimited, as below:

"Value=123|Description=hello, world!|Data=123456ABCDE|"
"Description=goodbye, world!|Value=345|MoreData=ABCDE123456|"

The only information I have about the data in this text field, is that
it
is
all in a "<fieldname>=<value><delimiter>" format, and the order of the
tag-value pairs is not necessarily the same for all rows. Also, some
rows
have more fields than others.

I'd like to be able to extract the data into columns (e.g. the above row
would have 3 columns: Value, Description, Data). For any particular
SELECT
statement, extra tag-value columns would be ignored, and non-existent
ones
would have a value of NULL.

So something like (pseudocode follows):
SELECT regexp("Value=*|") AS Value, regexp("Description=*|") AS
Description,
regexp("Data=*|") AS Data
FROM TableName

Should return
"123", "hello, world!", "123456ABCDE"
"345", "goodbye, world!", NULL

Any ideas?

Thanks
Alex










Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.