"shaab mohagir" <shaab.mohagir (AT) bonbon (DOT) net> schrieb im Newsbeitrag
news:e1b3da1b.0308180544.af4ae22 (AT) posting (DOT) google.com...
Quote:
Hello,
The sys admin of the AS/400 (our online system) suggested to prepare
file(s) that contains all the data required by the data warehouse, the
file(s) is then downloaded nightly instead of downloading from
individual tables like the customer, product tables etc..
I personally feel reluctant to go with the prepared file(s) approach,
I think it is redundant plus it adds one extra step that needs to
happen before the downloiad and also makes me so dependent on people
whom aren't in the same office. I also didn't see this approch in any
of the text books about data warehousing which makes me more reluctant
to go with it. |
Hello Shaab,
in practice this method is quite often used to extract data from a source
system. In fact a lot of SAP R/3-Sites use this method to extract data using
ABAP-Code. The decision whether this method is reasonable or not can be made
from the following questions:
1. How many data sources (systems and tables) have to be used?
The more systems/tables you connect the higher the indication towards an ETL
tool.
2. Which transformations need to be applied to the data?
If you have to create a lot of transformations you might be better off with
an ETL tool since these tools provide sophisticated and comfortable tools to
manage this task.
3. Are there sophisticated processes or process chains to be created to
transform and load the data?
Again: If there are a lot of data sources to integrate in a predefined
manner and you want to have full control over the process flow sophisticated
ETL tools can help you manage this.
4. Do you want to have a consistent meta data management?
This is again a question of the overall complexity of your ETL process. If
you need functionality like data lineage (data flow) or impact analysis you
can use an ETL tool. They also have standardized interfaces to data
modelling tools like ERWin or Rational and can exchange meta data.
5. Data Quality management
Some ETL tools provide functionalities to at least check address data.
5. Price
ETL tools usually have very high license cost. You need to check and
calculate for a certain time period if the efforts to maintain a data
warehouse without an ETL tool and the (maybe lower) quality reachable by
this method are higher than the license fees you have to pay.
Quote:
Your comments and suggestions on this matter are very important.
Another question:
Are there best practices for the Data Extracton process, is it simply
initial download of the required tables/fields to the staging are from
the source system, then the rest is nighltly downloads to trace
changes
and do the transformation and loading to the actual DW ?
Thanks, |
This highly depends on the scenario. The ETL process for a multinational,
decentral, real-time data warehouse is different from the process for a data
warehouse of a small to medium sized data warehouse. I saw warehouses where
the employees of the company had excellent knowledge of the data structures
in the source systems and the were easily able to maintain the process
without an ETL tool.
Kind regards,
Joerg