dbTalk Databases Forums  

best practises ETL DTS sample wanted

microsoft.public.sqlserver.dts microsoft.public.sqlserver.dts


Discuss best practises ETL DTS sample wanted in the microsoft.public.sqlserver.dts forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
David Bulog
 
Posts: n/a

Default best practises ETL DTS sample wanted - 08-06-2004 , 08:24 AM






Hi all,
Im a bit rusty on DTS after working mostly on .Net and other Server
products for the past two years

I have a large amount of data to load (millons of rows from a
mainframe) and have picked the Data Pump task and possibly the Dynamic
Properties task or Execute SQL task --problem is I cant find a good
dts sample to get me started
on a robust best practises Flatfile Load.

I have 39 Flatfiles to load into SQL Server 2000 SP3A that contain the
same schema and need to know how to dymanicly set the connection
properties of the source connection (text)

Source FlatFileName = d:\ftproot\BLDTSE01
Source FlatFileName = d:\ftproot\BLDTSE02
.....etc etc...........
Source FlatFileName = d:\ftproot\BLDTSE39

Q1: How do I dymanicly set the connection properties of the source
connection (text) so I can use the one DTS package for this


Q2: The last two characters of the flat File Name tell me the Cycle,
(possible range 1 to 39)

'SQL Server destination
CREATE TABLE [S_BLDTSE] (
[DTLMEI] [int] NOT NULL ,
[DTLMLN] [int] NOT NULL ,
[DTLRSN] [int] NOT NULL ,
[DTLIDS] [varchar] (25) COLLATE Latin1_General_CI_AS NOT NULL ,
[CYCLE] [tinyint] NULL --extra column not in source
) ON [PRIMARY]
GO


example: Source FlatFileName = d:\ftproot\BLDTSE39--we can calulate
the value for the extra column "cycle" in our destination table
In this case the last two chars of the FlatFileName are 39 which
equals
CYCLE 39, what is the quickest way to insert the CYCLE value during
the load?

Thanks in advance
Dave

Reply With Quote
  #2  
Old   
Darren Green
 
Posts: n/a

Default Re: best practises ETL DTS sample wanted - 08-06-2004 , 03:17 PM






In message <fbc2ae1e.0408060524.2723b8d6 (AT) posting (DOT) google.com>, David
Bulog <d2ba (AT) xtra (DOT) co.nz> writes
Quote:
Hi all,
Im a bit rusty on DTS after working mostly on .Net and other Server
products for the past two years

I have a large amount of data to load (millons of rows from a
mainframe) and have picked the Data Pump task and possibly the Dynamic
Properties task or Execute SQL task --problem is I cant find a good
dts sample to get me started
on a robust best practises Flatfile Load.

I have 39 Flatfiles to load into SQL Server 2000 SP3A that contain the
same schema and need to know how to dymanicly set the connection
properties of the source connection (text)

Source FlatFileName = d:\ftproot\BLDTSE01
Source FlatFileName = d:\ftproot\BLDTSE02
....etc etc...........
Source FlatFileName = d:\ftproot\BLDTSE39

Q1: How do I dymanicly set the connection properties of the source
connection (text) so I can use the one DTS package for this

Try the Dynamic Properties Task or for more control/logic use a ActiveX
Script Task-

How can I change the filename for a text file connection?
(http://www.sqldts.com/default.aspx?200)

Quote:
Q2: The last two characters of the flat File Name tell me the Cycle,
(possible range 1 to 39)

'SQL Server destination
CREATE TABLE [S_BLDTSE] (
[DTLMEI] [int] NOT NULL ,
[DTLMLN] [int] NOT NULL ,
[DTLRSN] [int] NOT NULL ,
[DTLIDS] [varchar] (25) COLLATE Latin1_General_CI_AS NOT NULL ,
[CYCLE] [tinyint] NULL --extra column not in source
) ON [PRIMARY]
GO


example: Source FlatFileName = d:\ftproot\BLDTSE39--we can calulate
the value for the extra column "cycle" in our destination table
In this case the last two chars of the FlatFileName are 39 which
equals
CYCLE 39, what is the quickest way to insert the CYCLE value during
the load?

When setting the file name in an ActiveX Script Task as above you could
store the cycle characters in a global variable. When setting up the
DataPump task use a normal copy column transform for all your source
columns, then select the destination column cycle, and add a new ActiveX
Script Transform. Transforms can be 0 to n or n to 0 etc, so whilst you
have no source you have a destination. In the script assign the global
variable to the cycle destination column, e.g.

DTSDestination("CYCLE").Value = DTSGlobalVariables("CycleValue").Value


A faster way may be to ignore cycle during the DataPump and then just
use an Exec SQL TAsk that runs an UPDATE using the global variable value
of cycle, but only realistic if these are the only rows in the table or
you can derive a suitable WHERE clause.

I suggest this as script transforms are slower than the other types.
Copy column is the faster transform and using a n to n transform is
faster than 1 to 1, which is the default!

Cheers
--
Darren Green (SQL Server MVP)
DTS - http://www.sqldts.com

PASS - the definitive, global community for SQL Server professionals
http://www.sqlpass.org



Reply With Quote
  #3  
Old   
David Bulog
 
Posts: n/a

Default Re: best practises ETL DTS sample wanted - 08-06-2004 , 06:24 PM




Thanks Darran,
Expanding on what I want to do is Truncate the SQL server staging
tableand after I have finished the flat file load then append the data
in the stage to a monthly holding table which mirrors the stage schema
but with an autogenerated ID

ie Stage table
TRUNCATE TABLE TABLE S_BLDTSE-->Then load Flat file

CREATE TABLE [S_BLDTSE] (
[DTLMEI] [int] NOT NULL ,
[DTLMLN] [int] NOT NULL ,
[DTLRSN] [int] NOT NULL ,
[DTLIDS] [varchar] (25) COLLATE Latin1_General_CI_AS NOT NULL ,
[CYCLE] [tinyint] NULL --extra column not in source
) ON [PRIMARY]
GO

'SQL Server destination monthly holding
CREATE TABLE [BLDTSE] (
ID--Autogenerated on insertextra column not in source
[DTLMEI] [int] NOT NULL ,
[DTLMLN] [int] NOT NULL ,
[DTLRSN] [int] NOT NULL ,
[DTLIDS] [varchar] (25) COLLATE Latin1_General_CI_AS NOT NULL ,
[CYCLE] [tinyint] NULL --holds all billing cycles for the month ie 1-39
) ON [PRIMARY]
GO



techique 1:
Im not quite sure of exactly how to set up the active X task correctly
in the data pump?I assume I will be doing
a row by row transform this way.
I may have 1-10 million records in some files with low cycle numbers and
trailing off to a few hunfred thousand
as we near cycle 39 at month end

techique 2
Assuming I update the global as I read each filename via
Active X,how do I send that to the Execute SQL task

Would it be possible to see a quick DTS sample of each of the two
techiques you described.
send to
d2ba (AT) xtra (DOT) co.nz

Thanks in advance
David




*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!

Reply With Quote
  #4  
Old   
Darren Green
 
Posts: n/a

Default Re: best practises ETL DTS sample wanted - 08-09-2004 , 02:01 PM



In message <OKZW3xAfEHA.3556 (AT) TK2MSFTNGP12 (DOT) phx.gbl>, David Bulog
<d2ba (AT) xtra (DOT) co.nz> writes
Quote:
Thanks Darran,
Expanding on what I want to do is Truncate the SQL server staging
tableand after I have finished the flat file load then append the data
in the stage to a monthly holding table which mirrors the stage schema
but with an autogenerated ID

ie Stage table
TRUNCATE TABLE TABLE S_BLDTSE-->Then load Flat file

CREATE TABLE [S_BLDTSE] (
[DTLMEI] [int] NOT NULL ,
[DTLMLN] [int] NOT NULL ,
[DTLRSN] [int] NOT NULL ,
[DTLIDS] [varchar] (25) COLLATE Latin1_General_CI_AS NOT NULL ,
[CYCLE] [tinyint] NULL --extra column not in source
) ON [PRIMARY]
GO

'SQL Server destination monthly holding
CREATE TABLE [BLDTSE] (
ID--Autogenerated on insertextra column not in source
[DTLMEI] [int] NOT NULL ,
[DTLMLN] [int] NOT NULL ,
[DTLRSN] [int] NOT NULL ,
[DTLIDS] [varchar] (25) COLLATE Latin1_General_CI_AS NOT NULL ,
[CYCLE] [tinyint] NULL --holds all billing cycles for the month ie 1-39
) ON [PRIMARY]
GO



techique 1:
Im not quite sure of exactly how to set up the active X task correctly
in the data pump?I assume I will be doing
a row by row transform this way.
I may have 1-10 million records in some files with low cycle numbers and
trailing off to a few hunfred thousand
as we near cycle 39 at month end

Open the DataPump task and select the Transforms tab. Click the Cycle
column in the desination list adn ensure no other columns are selected.
Then click New, ActiveX Script, Properties, then add code into the Sub
Main stub -

DTSDestination("CYCLE").Value = DTSGlobalVariables("CycleValue").Value

Quote:
techique 2
Assuming I update the global as I read each filename via
Active X,how do I send that to the Execute SQL task

Probably better since you can do this on your staging table before
moving it to the final table.

Add an Exec SQL Task with an UPDATE statement using ? as the variable
placeholder.

UPDATE StagingTable
SET Cycle = ?

Then click Parameters and on the input tab, select your global variable
to map it to the parameter.


Quote:
Would it be possible to see a quick DTS sample of each of the two
techiques you described.
Nope, partly because I'm lazy, but also because I don't have 2000
installed right now.


--
Darren Green (SQL Server MVP)
DTS - http://www.sqldts.com

PASS - the definitive, global community for SQL Server professionals
http://www.sqlpass.org



Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.