dbTalk Databases Forums  

Differential Data Import

microsoft.public.sqlserver.dts microsoft.public.sqlserver.dts


Discuss Differential Data Import in the microsoft.public.sqlserver.dts forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Bidyut Bhattacharjee
 
Posts: n/a

Default Differential Data Import - 11-22-2006 , 02:37 AM






The scenario : I import a medical history table every day. If I delete the
destination data table every day and import the complete table, it takes a
hell lot of time (about 2 hours). I just want to import the refreshed(or new
data) and not the old data. Let me explain with an example, say if I have
imported the data till october 30th, now next time when I run the package I
want the package to import data after October 30th (starting from Nov) and
append the imported(november) data to the existing data. Also let me know if
I can save time using this method? thank you for your time to read and
respond to the question.

Reply With Quote
  #2  
Old   
Allan Mitchell
 
Posts: n/a

Default Re: Differential Data Import - 11-22-2006 , 04:29 PM






Hello Bidyut,

This is certainly possible. What you need to be able to do though is identify
the new rows on the source. Can you? And is it by date?


If you can identify new rows by date what about amended rows? How are you
dealing with those?


The said. If you simply want new rows you can do this.


Have a log table that stores the date and time of the import start

You have an ExecuteSQL task which reads this date into a variable.

Your SourceSQLStatement does this

SELECT <col list> FROM TABLE WHERE DateColumnOnSource > ?

Map the ? to the variable holding the date of the start of the last load.





Regards

Allan Mitchell
Konesans Ltd
T +44 7966 476 572
F +44 2071 008 479
http://www.konesans.com

Quote:
The scenario : I import a medical history table every day. If I delete
the destination data table every day and import the complete table, it
takes a hell lot of time (about 2 hours). I just want to import the
refreshed(or new data) and not the old data. Let me explain with an
example, say if I have imported the data till october 30th, now next
time when I run the package I want the package to import data after
October 30th (starting from Nov) and append the imported(november)
data to the existing data. Also let me know if I can save time using
this method? thank you for your time to read and respond to the
question.




Reply With Quote
  #3  
Old   
Bidyut Bhattacharjee
 
Posts: n/a

Default Re: Differential Data Import - 11-23-2006 , 01:30 AM



Hi Allan,

Thank you for responding to the question. I can certainly
identify the new rows on the table; but what I am planning to do, is to
delete the rows on the destination table to a certain back date. Say, I have
imported the data till 5th of November,before I import new data, I will
delete the data, on the destination table, for 5th of Nov and then I will
import the data from 5th Nov to the lastest. Let me know if this sounds like
a good or crude idea.

Also please let me know how I can create a log table that stores the date
and time of the import start.

Thanks for your time.

Bidyut


"Allan Mitchell" wrote:

Quote:
Hello Bidyut,

This is certainly possible. What you need to be able to do though is identify
the new rows on the source. Can you? And is it by date?


If you can identify new rows by date what about amended rows? How are you
dealing with those?


The said. If you simply want new rows you can do this.


Have a log table that stores the date and time of the import start

You have an ExecuteSQL task which reads this date into a variable.

Your SourceSQLStatement does this

SELECT <col list> FROM TABLE WHERE DateColumnOnSource > ?

Map the ? to the variable holding the date of the start of the last load.





Regards

Allan Mitchell
Konesans Ltd
T +44 7966 476 572
F +44 2071 008 479
http://www.konesans.com

The scenario : I import a medical history table every day. If I delete
the destination data table every day and import the complete table, it
takes a hell lot of time (about 2 hours). I just want to import the
refreshed(or new data) and not the old data. Let me explain with an
example, say if I have imported the data till october 30th, now next
time when I run the package I want the package to import data after
October 30th (starting from Nov) and append the imported(november)
data to the existing data. Also let me know if I can save time using
this method? thank you for your time to read and respond to the
question.





Reply With Quote
  #4  
Old   
Allan Mitchell
 
Posts: n/a

Default Re: Differential Data Import - 11-23-2006 , 01:57 PM



Hello Bidyut,

Why do you want to remove the rows until a given date and then reimport them?
Is this to capture changed rows?

If yes then this is IMHO a bad design. If you delete 5,000,000 rows then
reload them just so you can say you have the latest rows then this will cost
whereas if you did a query which said "For those rows that exist in the destination
that are on the source has anything changed and if it has update" this would
probably be more cost effective.


You may create a log table consisting of things like

UserName
JobID
DateOfImport
ServerName
<more>


at the start of the package you do something like

INSERT INTO dbo.LogTable(UserName, JobID, DateOfImport,ServerName,<more>)
VALUES(suser_sname(),@jobID,Getdate(),@@Servername ,<more>)






Regards

Allan Mitchell
Konesans Ltd
T +44 7966 476 572
F +44 2071 008 479
http://www.konesans.com

Quote:
Hi Allan,

Thank you for responding to the question. I can certainly
identify the new rows on the table; but what I am planning to do, is
to delete the rows on the destination table to a certain back date.
Say, I have imported the data till 5th of November,before I import new
data, I will delete the data, on the destination table, for 5th of Nov
and then I will import the data from 5th Nov to the lastest. Let me
know if this sounds like a good or crude idea.

Also please let me know how I can create a log table that stores the
date and time of the import start.

Thanks for your time.

Bidyut

"Allan Mitchell" wrote:

Hello Bidyut,

This is certainly possible. What you need to be able to do though is
identify the new rows on the source. Can you? And is it by date?

If you can identify new rows by date what about amended rows? How
are you dealing with those?

The said. If you simply want new rows you can do this.

Have a log table that stores the date and time of the import start

You have an ExecuteSQL task which reads this date into a variable.

Your SourceSQLStatement does this

SELECT <col list> FROM TABLE WHERE DateColumnOnSource > ?

Map the ? to the variable holding the date of the start of the last
load.

Regards

Allan Mitchell
Konesans Ltd
T +44 7966 476 572
F +44 2071 008 479
http://www.konesans.com
The scenario : I import a medical history table every day. If I
delete the destination data table every day and import the complete
table, it takes a hell lot of time (about 2 hours). I just want to
import the refreshed(or new data) and not the old data. Let me
explain with an example, say if I have imported the data till
october 30th, now next time when I run the package I want the
package to import data after October 30th (starting from Nov) and
append the imported(november) data to the existing data. Also let me
know if I can save time using this method? thank you for your time
to read and respond to the question.




Reply With Quote
  #5  
Old   
Bidyut Bhattacharjee
 
Posts: n/a

Default Re: Differential Data Import - 11-24-2006 , 06:16 AM



Hello Allan,

Thank you again for your suggestions. I will try it out
and get back to you.

Regards
Bidyut

"Allan Mitchell" wrote:

Quote:
Hello Bidyut,

Why do you want to remove the rows until a given date and then reimport them?
Is this to capture changed rows?

If yes then this is IMHO a bad design. If you delete 5,000,000 rows then
reload them just so you can say you have the latest rows then this will cost
whereas if you did a query which said "For those rows that exist in the destination
that are on the source has anything changed and if it has update" this would
probably be more cost effective.


You may create a log table consisting of things like

UserName
JobID
DateOfImport
ServerName
more


at the start of the package you do something like

INSERT INTO dbo.LogTable(UserName, JobID, DateOfImport,ServerName,<more>)
VALUES(suser_sname(),@jobID,Getdate(),@@Servername ,<more>)






Regards

Allan Mitchell
Konesans Ltd
T +44 7966 476 572
F +44 2071 008 479
http://www.konesans.com

Hi Allan,

Thank you for responding to the question. I can certainly
identify the new rows on the table; but what I am planning to do, is
to delete the rows on the destination table to a certain back date.
Say, I have imported the data till 5th of November,before I import new
data, I will delete the data, on the destination table, for 5th of Nov
and then I will import the data from 5th Nov to the lastest. Let me
know if this sounds like a good or crude idea.

Also please let me know how I can create a log table that stores the
date and time of the import start.

Thanks for your time.

Bidyut

"Allan Mitchell" wrote:

Hello Bidyut,

This is certainly possible. What you need to be able to do though is
identify the new rows on the source. Can you? And is it by date?

If you can identify new rows by date what about amended rows? How
are you dealing with those?

The said. If you simply want new rows you can do this.

Have a log table that stores the date and time of the import start

You have an ExecuteSQL task which reads this date into a variable.

Your SourceSQLStatement does this

SELECT <col list> FROM TABLE WHERE DateColumnOnSource > ?

Map the ? to the variable holding the date of the start of the last
load.

Regards

Allan Mitchell
Konesans Ltd
T +44 7966 476 572
F +44 2071 008 479
http://www.konesans.com
The scenario : I import a medical history table every day. If I
delete the destination data table every day and import the complete
table, it takes a hell lot of time (about 2 hours). I just want to
import the refreshed(or new data) and not the old data. Let me
explain with an example, say if I have imported the data till
october 30th, now next time when I run the package I want the
package to import data after October 30th (starting from Nov) and
append the imported(november) data to the existing data. Also let me
know if I can save time using this method? thank you for your time
to read and respond to the question.





Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.