dbTalk Databases Forums  

scrubber?

microsoft.public.sqlserver.dts microsoft.public.sqlserver.dts


Discuss scrubber? in the microsoft.public.sqlserver.dts forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Joe
 
Posts: n/a

Default scrubber? - 10-18-2005 , 09:41 PM






I've got some big flat files that need to go into a relational database. Some
heavy scrubbing needs to be done for removal of duplicates, generation of
keys, transformation of data values. It looks like I will be able to process
some of the data in parallel to speed things up a bit. I'm wondering for the
scrubbing process do I use DTS parallel processing calling stored procs to do
the dirty work or I was thinking about writing a C# based multi-threaded app
that would read in the data one end into datasets, do the scrubbing and then
write out the clean data at the other end. I'm not sure about this approach
though because of the size of the files, it might prove too costly to read
the data into memory and also I've never scrubbed in C# before.
Does anyone have any thoughts on the approach that I should take or any
experience to share?

Reply With Quote
  #2  
Old   
Sue Hoegemeier
 
Posts: n/a

Default Re: scrubber? - 10-18-2005 , 10:37 PM






Another option that is often used is to load the data into
staging tables - tables that work like holding tanks for the
raw data. Scrub the data in the staging tables. Then load
from your staging tables into the destination tables.

-Sue

On Tue, 18 Oct 2005 19:41:02 -0700, Joe
<Joe (AT) discussions (DOT) microsoft.com> wrote:

Quote:
I've got some big flat files that need to go into a relational database. Some
heavy scrubbing needs to be done for removal of duplicates, generation of
keys, transformation of data values. It looks like I will be able to process
some of the data in parallel to speed things up a bit. I'm wondering for the
scrubbing process do I use DTS parallel processing calling stored procs to do
the dirty work or I was thinking about writing a C# based multi-threaded app
that would read in the data one end into datasets, do the scrubbing and then
write out the clean data at the other end. I'm not sure about this approach
though because of the size of the files, it might prove too costly to read
the data into memory and also I've never scrubbed in C# before.
Does anyone have any thoughts on the approach that I should take or any
experience to share?


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.