dbTalk Databases Forums  

Duplicate records in source file

microsoft.public.sqlserver.dts microsoft.public.sqlserver.dts


Discuss Duplicate records in source file in the microsoft.public.sqlserver.dts forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Peter Feakins
 
Posts: n/a

Default Duplicate records in source file - 03-10-2005 , 01:49 PM






One of our source files has duplicate records which generate a primary key
constraint violation during the insert. We tried performing a lookup as part
of the transformation, specifying the same connection as the destination.

If the primary key exists in the table prior to the import, the import works
correctly.

If there are two new records, A & B with the same primary key, the batch
fails on B because the lookup doesn't appear to detect that A has already
been inserted in this batch.

Shoul this idea work, or is the lookup really happening on a separate
connection?

Function Main()
Dim id
id = DTSLookups("GetDesc").Execute(DTSSource("col001"))
If Not IsEmpty(id) Then
Main = DTSTransformStat_SkipRow
Else
DTSDestination("numfield1") = DTSSource("col001")
DTSDestination("varfield2") = DTSSource("col002")
DTSDestination("numfield3") = DTSSource("col003")
DTSDestination("varfield4") = DTSSource("col004")
Main = DTSTransformStat_OK
End If
End Function


SELECT numfield1
FROM DTSTEST
WHERE (numfield1 = ?)

Reply With Quote
  #2  
Old   
Allan Mitchell
 
Posts: n/a

Default Re: Duplicate records in source file - 03-10-2005 , 02:10 PM






It will not see the value because it does not actually exist in the
destination yet. Alone when the batch is committed will it be there.

In any event I would personally import the text file to a working table
and do the duplicate elimination there before inserting to the real
table.

Allan


"Peter Feakins" <PeterFeakins (AT) discussions (DOT) microsoft.com> wrote in
message news:PeterFeakins (AT) discussions (DOT) microsoft.com:

Quote:
One of our source files has duplicate records which generate a primary key
constraint violation during the insert. We tried performing a lookup as part
of the transformation, specifying the same connection as the destination.

If the primary key exists in the table prior to the import, the import works
correctly.

If there are two new records, A & B with the same primary key, the batch
fails on B because the lookup doesn't appear to detect that A has already
been inserted in this batch.

Shoul this idea work, or is the lookup really happening on a separate
connection?

Function Main()
Dim id
id = DTSLookups("GetDesc").Execute(DTSSource("col001"))
If Not IsEmpty(id) Then
Main = DTSTransformStat_SkipRow
Else
DTSDestination("numfield1") = DTSSource("col001")
DTSDestination("varfield2") = DTSSource("col002")
DTSDestination("numfield3") = DTSSource("col003")
DTSDestination("varfield4") = DTSSource("col004")
Main = DTSTransformStat_OK
End If
End Function


SELECT numfield1
FROM DTSTEST
WHERE (numfield1 = ?)


Reply With Quote
  #3  
Old   
GhostInTheMachine
 
Posts: n/a

Default RE: Duplicate records in source file - 03-11-2005 , 10:55 AM



Hi

I'm guessing the data source has no pk constraint, or at least a unique
constraint.

I would clean the data source first rather than put any intelligence at all
into your package/task. As they say, "garbage in, garbage out."

When I come across these design imperfections, I make it a habit to discuss
a data clean up with the business owner. The alternative is to risk all
manner of hard to spot errors later on in execution. For example, what if
this column is referenced by a fk constraint elsewhere? Which record does the
referring record ACTUALLY point to?

Get the business owner to clean up the source data. Otherwise, the script
you indicated seems fundamentally sound.




"Peter Feakins" wrote:

Quote:
One of our source files has duplicate records which generate a primary key
constraint violation during the insert. We tried performing a lookup as part
of the transformation, specifying the same connection as the destination.

If the primary key exists in the table prior to the import, the import works
correctly.

If there are two new records, A & B with the same primary key, the batch
fails on B because the lookup doesn't appear to detect that A has already
been inserted in this batch.

Shoul this idea work, or is the lookup really happening on a separate
connection?

Function Main()
Dim id
id = DTSLookups("GetDesc").Execute(DTSSource("col001"))
If Not IsEmpty(id) Then
Main = DTSTransformStat_SkipRow
Else
DTSDestination("numfield1") = DTSSource("col001")
DTSDestination("varfield2") = DTSSource("col002")
DTSDestination("numfield3") = DTSSource("col003")
DTSDestination("varfield4") = DTSSource("col004")
Main = DTSTransformStat_OK
End If
End Function


SELECT numfield1
FROM DTSTEST
WHERE (numfield1 = ?)

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.