dbTalk Databases Forums  

SSIS remove duplicate rows

microsoft.public.sqlserver.dts microsoft.public.sqlserver.dts


Discuss SSIS remove duplicate rows in the microsoft.public.sqlserver.dts forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Ken McClaren
 
Posts: n/a

Default SSIS remove duplicate rows - 10-12-2006 , 02:31 PM






I'm trying to put together an SSIS package to remove duplicate rows from a
flat file.
Source: Flat file (EDI)
Target: Flat file (EDI)

The aggregate transformation seems to work, however, the output lines are in
a random order. I need retain the line order and just remove the duplicate
rows.

Thanks for your help,
Ken.



Reply With Quote
  #2  
Old   
Darren Green
 
Posts: n/a

Default Re: SSIS remove duplicate rows - 10-13-2006 , 11:41 AM






The Sort Transform has an option to remove duplicates. You will of course
need to be able to select the correct sort order.


--
Darren
http://wiki.sqlis.com | http://www.sqlis.com | http://www.sqldts.com |
http://www.konesans.com
"Ken McClaren" <kmcclare (AT) hotmail (DOT) com> wrote

Quote:
I'm trying to put together an SSIS package to remove duplicate rows from a
flat file.
Source: Flat file (EDI)
Target: Flat file (EDI)

The aggregate transformation seems to work, however, the output lines are
in a random order. I need retain the line order and just remove the
duplicate rows.

Thanks for your help,
Ken.




Reply With Quote
  #3  
Old   
Ken McClaren
 
Posts: n/a

Default Re: SSIS remove duplicate rows - 10-16-2006 , 01:45 PM



Again, this almost works, however the file is sorted incorrectly. I need a
way to remove duplicate lines without changing the sort order fo the file.

A simple sort and remove duplicates is not going to work.


"Darren Green" <darren.green (AT) reply-to-newsgroup (DOT) sqldts.com> wrote

Quote:
The Sort Transform has an option to remove duplicates. You will of course
need to be able to select the correct sort order.


--
Darren
http://wiki.sqlis.com | http://www.sqlis.com | http://www.sqldts.com |
http://www.konesans.com
"Ken McClaren" <kmcclare (AT) hotmail (DOT) com> wrote in message
news:%23Bo%23GVj7GHA.3644 (AT) TK2MSFTNGP03 (DOT) phx.gbl...
I'm trying to put together an SSIS package to remove duplicate rows from
a flat file.
Source: Flat file (EDI)
Target: Flat file (EDI)

The aggregate transformation seems to work, however, the output lines are
in a random order. I need retain the line order and just remove the
duplicate rows.

Thanks for your help,
Ken.






Reply With Quote
  #4  
Old   
Darren Green
 
Posts: n/a

Default Re: SSIS remove duplicate rows - 10-16-2006 , 04:03 PM



So you cannot re-sort ?

Do this in two stages, and add a row number to start with, then sort for
duplicates, then sort by row number.


--
Darren
http://wiki.sqlis.com | http://www.sqlis.com | http://www.sqldts.com |
http://www.konesans.com

"Ken McClaren" <kmcclare (AT) hotmail (DOT) com> wrote

Quote:
Again, this almost works, however the file is sorted incorrectly. I need a
way to remove duplicate lines without changing the sort order fo the file.

A simple sort and remove duplicates is not going to work.


"Darren Green" <darren.green (AT) reply-to-newsgroup (DOT) sqldts.com> wrote in
message news:uZJTIau7GHA.568 (AT) TK2MSFTNGP05 (DOT) phx.gbl...
The Sort Transform has an option to remove duplicates. You will of course
need to be able to select the correct sort order.


--
Darren
http://wiki.sqlis.com | http://www.sqlis.com | http://www.sqldts.com |
http://www.konesans.com
"Ken McClaren" <kmcclare (AT) hotmail (DOT) com> wrote in message
news:%23Bo%23GVj7GHA.3644 (AT) TK2MSFTNGP03 (DOT) phx.gbl...
I'm trying to put together an SSIS package to remove duplicate rows from
a flat file.
Source: Flat file (EDI)
Target: Flat file (EDI)

The aggregate transformation seems to work, however, the output lines
are in a random order. I need retain the line order and just remove the
duplicate rows.

Thanks for your help,
Ken.








Reply With Quote
  #5  
Old   
Ken McClaren
 
Posts: n/a

Default Re: SSIS remove duplicate rows - 10-17-2006 , 02:55 PM



It may help if I supplied some sample data.

Here is the original file.
P1016530 S123456 1234 0000 20060927
946 6661.5SCNN
00123123123123123 020060927
P O123456 0000123
ORD002222222222222 2006092720060921123456
P P123456 000000085 CUSTN
SOME_TEXT_HERE! WI53051 00000111111111111111 13.5 23
23 1200085 CUSTN
SOME_TEXT_HERE! WI53051 20060927
P D123456 000000000000010001 1 020060927
289EACH 123456 7897897897897 0045645645645
P P123456 000000085 CUSTN
SOME_TEXT_HERE! WI53051 00000111111111111111 13.5 23
23 1200085 CUSTN
SOME_TEXT_HERE! WI53051 20060927
P D123456 000000000000010001 1 020060927
289EACH 123456 7897897897897 0045645645645


Here is how it looks after the last two lines are removed which are
duplicates.
P1016530 S123456 1234 0000 20060927
946 6661.5SCNN
00123123123123123 020060927
P O123456 0000123
ORD002222222222222 2006092720060921123456
P P123456 000000085 CUSTN
SOME_TEXT_HERE! WI53051 00000111111111111111 13.5 23
23 1200085 CUSTN
SOME_TEXT_HERE! WI53051 20060927
P D123456 000000000000010001 1 020060927
289EACH 123456 7897897897897 0045645645645

The data has been changed to protect the innocent and my job.
Data varies from line to line, so this line just makes up on record.

I've thought of doing a brach import into three or four different tables.
Removing the dupes and putting it back together into an output file, but
there has to be a better way of doing this without scripting or awk vbscript
etc...

Thanks for your help.
"Darren Green" <darren.green (AT) reply-to-newsgroup (DOT) sqldts.com> wrote

Quote:
So you cannot re-sort ?

Do this in two stages, and add a row number to start with, then sort for
duplicates, then sort by row number.


--
Darren
http://wiki.sqlis.com | http://www.sqlis.com | http://www.sqldts.com |
http://www.konesans.com

"Ken McClaren" <kmcclare (AT) hotmail (DOT) com> wrote in message
news:uAaOnNV8GHA.3644 (AT) TK2MSFTNGP03 (DOT) phx.gbl...
Again, this almost works, however the file is sorted incorrectly. I need
a way to remove duplicate lines without changing the sort order fo the
file.

A simple sort and remove duplicates is not going to work.


"Darren Green" <darren.green (AT) reply-to-newsgroup (DOT) sqldts.com> wrote in
message news:uZJTIau7GHA.568 (AT) TK2MSFTNGP05 (DOT) phx.gbl...
The Sort Transform has an option to remove duplicates. You will of
course need to be able to select the correct sort order.


--
Darren
http://wiki.sqlis.com | http://www.sqlis.com | http://www.sqldts.com |
http://www.konesans.com
"Ken McClaren" <kmcclare (AT) hotmail (DOT) com> wrote in message
news:%23Bo%23GVj7GHA.3644 (AT) TK2MSFTNGP03 (DOT) phx.gbl...
I'm trying to put together an SSIS package to remove duplicate rows
from a flat file.
Source: Flat file (EDI)
Target: Flat file (EDI)

The aggregate transformation seems to work, however, the output lines
are in a random order. I need retain the line order and just remove the
duplicate rows.

Thanks for your help,
Ken.










Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.