dbTalk Databases Forums  

An ETL Strategy - Data Cleansing

comp.databases.olap comp.databases.olap


Discuss An ETL Strategy - Data Cleansing in the comp.databases.olap forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Domenico Discepola
 
Posts: n/a

Default An ETL Strategy - Data Cleansing - 07-08-2004 , 09:39 AM






Please refer to my other post entitled An ETL Strategy - Data
Transformation. Once data has been transformed, it is passed on to the
cleansing routine. The following data cleansing methodology seems to work
for me. Points to note are that: 1. cleansing starts off at a very low
level which deals with the more physical aspects of the data, and 2. having
data in a relational format certainly helps. From there, cleansing proceeds
through 4 additional steps which gradually deal with the more logical
aspects of the data.



Step 1: basic record formation includes checking for the correct # of
fields, that each field is of the proper data type (integer, alphabetic,
decimal, etc.)



Step 2: constraint checking (including nulls and default values)



Step 3: data masking (data should "look" proper - e.g. zip code, telephone
numbers, etc.)



Step 4: simple / single field lookups (e.g. product code exists, customer
number exists, etc.)



Step 5: complex / multi field lookups (e.g. if zip code = xxx then state
must = yyy)



Steps 3, 4, and 5 are considered advanced data cleansing. Steps 1 and 2 are
considered basic data cleansing. Although you can code around this, all
cleansing steps should include lower steps. For instance, if you decide to
perform cleansing at level 3, then you should automatically cleanse at
levels 1 and 2. Exceptions produced during each cleansing stage must be
handled in other routines.



Comments are welcomed.



Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.