dbTalk Databases Forums  

An ETL Strategy - Data Transformation

comp.databases.olap comp.databases.olap


Discuss An ETL Strategy - Data Transformation in the comp.databases.olap forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Domenico Discepola
 
Posts: n/a

Default An ETL Strategy - Data Transformation - 07-08-2004 , 09:31 AM






I find it conceptually easier to physically transform incoming data into a
relational format (in rows and columns). It allows for easier processing
and subsequent loading into a relational database (if a relational database
is one you're using). For example, one of my data sources comes from Lotus
Notes. This particular application does not store data in a purely
relational format (those multi value fields are messy to deal with). In
this case, fields are separated with a CRLF (\r\n) and records are separated
with a form feed character. In any case, I try to transform each incoming
record into a relational format and load it into an array of some sort.
Once it is in an array, you can manipulate or reference the data easier.



An important concept to realize is that data transformation is not data
cleansing. Think of data transformation as the preparation of data for
cleansing.



Comments are welcomed.



Reply With Quote
  #2  
Old   
Joerg Narr
 
Posts: n/a

Default Re: An ETL Strategy - Data Transformation - 07-08-2004 , 01:09 PM






Sorry Domenico, but I need to...

Quote:
An important concept to realize is that data transformation is not data
cleansing. Think of data transformation as the preparation of data for
cleansing.
Transformation tasks can cleanse data - if the admin decides to rather do it
the ETL-way than to clean the operational system. The way you recommend ETL
is ELT, an aproach usually done by IBM where the transformations are mainly
being done in the data base.




Reply With Quote
  #3  
Old   
Domenico Discepola
 
Posts: n/a

Default Re: An ETL Strategy - Data Transformation - 07-08-2004 , 02:31 PM




"Joerg Narr" <n_o_spa_mjoerg_narr (AT) hotmail (DOT) com> wrote

Quote:
Sorry Domenico, but I need to...

An important concept to realize is that data transformation is not data
cleansing. Think of data transformation as the preparation of data for
cleansing.

Transformation tasks can cleanse data - if the admin decides to rather do
it
the ETL-way than to clean the operational system. The way you recommend
ETL
is ELT, an aproach usually done by IBM where the transformations are
mainly
being done in the data base.


No need to be sorry - one of the reasons I posted this is to get feedback
;-)

I may humbly disagree with you here. Although there is more than one way to
skin a cat, handling the transformation and cleansing of data separately has
benefits. This method allows for a logical progression of data through the
ETL. You begin at a very low level (concentrating more on the physical
aspects of the data) and gradually progress to a higher level (concentrating
more on the logical aspects of the data; or applying more and more business
logic to the data). So, as each step is performed, you are able to trap
exceptions, deal with them separately, and decide what the next step should
be. I should also mention that I find that, all things being equal,
performing the transformations outside of the database is much quicker (I
may start a heated debate here ;-) ). Your choice may depend on the variety
of ETL tools though. Not all tools are created equally...

You also mention ELT as opposed to ETL. I suppose that your choice of
methodology depends on the business needs... Handling terrabytes of data
will prompt the use of certain methods whereas those same methods may be
overkill for handling GBs of data.





Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.