"Joerg Narr" <n_o_spa_mjoerg_narr (AT) hotmail (DOT) com> wrote
Quote:
Sorry Domenico, but I need to...
An important concept to realize is that data transformation is not data
cleansing. Think of data transformation as the preparation of data for
cleansing.
Transformation tasks can cleanse data - if the admin decides to rather do
it
the ETL-way than to clean the operational system. The way you recommend
ETL
is ELT, an aproach usually done by IBM where the transformations are
mainly
being done in the data base. |
No need to be sorry - one of the reasons I posted this is to get feedback
;-)
I may humbly disagree with you here. Although there is more than one way to
skin a cat, handling the transformation and cleansing of data separately has
benefits. This method allows for a logical progression of data through the
ETL. You begin at a very low level (concentrating more on the physical
aspects of the data) and gradually progress to a higher level (concentrating
more on the logical aspects of the data; or applying more and more business
logic to the data). So, as each step is performed, you are able to trap
exceptions, deal with them separately, and decide what the next step should
be. I should also mention that I find that, all things being equal,
performing the transformations outside of the database is much quicker (I
may start a heated debate here ;-) ). Your choice may depend on the variety
of ETL tools though. Not all tools are created equally...
You also mention ELT as opposed to ETL. I suppose that your choice of
methodology depends on the business needs... Handling terrabytes of data
will prompt the use of certain methods whereas those same methods may be
overkill for handling GBs of data.