dbTalk Databases Forums  

Data profiling - the big picture...

comp.databases.olap comp.databases.olap


Discuss Data profiling - the big picture... in the comp.databases.olap forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
vrushali@tapadiya.net
 
Posts: n/a

Default Data profiling - the big picture... - 02-03-2005 , 01:46 AM






Hello,

Can someone please make me understand how data profiling helps? Okay.
So it helps the ETL process. But how? I have been googling on data
profiling but any information I get is vague. Take, for instance,
http://it-director.com/article.php?articleid=3492. The article is about
<quote>the importance of understanding your data</quote>. But I don't
get it. There is no specific example of data profiling.

Ok. The dude mentions there are three problems that data profiling
tries to solve.

1. Lot of data stored in legacy system and is poorly documented.

SO HOW DOES DATA PROFILING HELP? DO YOU ANALYZE A TABLE JUST TO
UNDERSTAND THE DATA TYPES OF EACH COLUMN? IS THAT DATA PROFILING ALL
ABOUT?

2. The data does not match the metadata that describes it.

OKAY. THE LEGACY SYSTEM I AM WORKING WITH IS AN ACCESS DATABASE. HOW
CAN A COLUMN THAT IS OF TYPE NUMERIC EVER STORE ALPHANUMERIC CHARACTERS
IN IT? ANY DATABASE SYSTEM, LEGACY OR NOT, WOULD NOT SIMPLY LET YOU
VIOLATE THE DATA TYPE CONSTRAINTS. WHAT AM I MISSING?

3. The data itself contains errors and is inconsitent.

THIS MUST BE HANDLED BY DATA "CLEANSING" AND NOT DATA "PROFILING."

I am sorry for using upper case. But I wanted to draw your attention.
My manager wants me to learn all about data profiling but I just don't
get it. I would appreciate it if someone can give me some concrete
examples of data profiling (and how they benefit) so that when we can
choose the right ETL tool.
Thank you in advance for your help.

Vrushali


Reply With Quote
  #2  
Old   
Kevin Lancaster
 
Posts: n/a

Default Re: Data profiling - the big picture... - 02-04-2005 , 07:06 PM






Vrushali

This BLOG on the Data Profiling features of the upcoming release of
Oracle Warehouse Builder (OWB) may interest you :
http://www.bayontechnologies.com/bt/...0g_paris_d.php

This release of OWB is currently in Beta.

Hope the article helps.

vrushali (AT) tapadiya (DOT) net wrote:
Quote:
Hello,

Can someone please make me understand how data profiling helps? Okay.
So it helps the ETL process. But how? I have been googling on data
profiling but any information I get is vague. Take, for instance,
http://it-director.com/article.php?articleid=3492. The article is about
quote>the importance of understanding your data</quote>. But I don't
get it. There is no specific example of data profiling.

Ok. The dude mentions there are three problems that data profiling
tries to solve.

1. Lot of data stored in legacy system and is poorly documented.

SO HOW DOES DATA PROFILING HELP? DO YOU ANALYZE A TABLE JUST TO
UNDERSTAND THE DATA TYPES OF EACH COLUMN? IS THAT DATA PROFILING ALL
ABOUT?

2. The data does not match the metadata that describes it.

OKAY. THE LEGACY SYSTEM I AM WORKING WITH IS AN ACCESS DATABASE. HOW
CAN A COLUMN THAT IS OF TYPE NUMERIC EVER STORE ALPHANUMERIC CHARACTERS
IN IT? ANY DATABASE SYSTEM, LEGACY OR NOT, WOULD NOT SIMPLY LET YOU
VIOLATE THE DATA TYPE CONSTRAINTS. WHAT AM I MISSING?

3. The data itself contains errors and is inconsitent.

THIS MUST BE HANDLED BY DATA "CLEANSING" AND NOT DATA "PROFILING."

I am sorry for using upper case. But I wanted to draw your attention.
My manager wants me to learn all about data profiling but I just don't
get it. I would appreciate it if someone can give me some concrete
examples of data profiling (and how they benefit) so that when we can
choose the right ETL tool.
Thank you in advance for your help.

Vrushali



Reply With Quote
  #3  
Old   
arun.varadarajan@gmail.com
 
Posts: n/a

Default Re: Data profiling - the big picture... - 02-10-2005 , 11:14 PM



Vrushali,
You might also look at Trillium www.trilliumsoftware.com which is one
of the best data profiling softwares in the market.
One of the reasons why you need to have data profiling is...
1. Match and Merge Functionality - Same user registers with Hotmail in
5 different IDs and you can run a match merge utility to find out
related entries
2. Data Harmonization - To be able to have common data standards across
the enterprise especially when it comes to ETL between your OLTP
Systems and OLAP Systems.

may more ... please look up some of the white papers given on the
trillium website...

Arun Varadarajan


Reply With Quote
  #4  
Old   
bucknuggets@yahoo.com
 
Posts: n/a

Default Re: Data profiling - the big picture... - 02-15-2005 , 01:09 PM



Quote:
1. Lot of data stored in legacy system and is poorly documented.

SO HOW DOES DATA PROFILING HELP? DO YOU ANALYZE A TABLE JUST TO
UNDERSTAND THE DATA TYPES OF EACH COLUMN? IS THAT DATA
PROFILING ALL ABOUT?
not just types - formats, valid values, frequency distributions of
valid values, relationships between values in one column to values in
another, implied business rules, etc.

Quote:
2. The data does not match the metadata that describes it.
OKAY. THE LEGACY SYSTEM I AM WORKING WITH IS AN ACCESS
DATABASE. HOW CAN A COLUMN THAT IS OF TYPE NUMERIC EVER
STORE ALPHANUMERIC CHARACTERS IN IT? ANY DATABASE SYSTEM,
LEGACY OR NOT, WOULD NOT SIMPLY LET YOU VIOLATE THE DATA TYPE
CONSTRAINTS. WHAT AM I MISSING?
First off, not all data is in a DBMS - that can enforce type.
Secondly, metadata covers quite a lot more than just type. It covers
formats (think about telephone numbers), it covers length of varchars,
null/not null, case, range, etc, etc.

Quote:
3. The data itself contains errors and is inconsitent.
THIS MUST BE HANDLED BY DATA "CLEANSING" AND NOT DATA
"PROFILING."
I am sorry for using upper case. But I wanted to draw your attention.
My manager wants me to learn all about data profiling but I just
don't
get it. I would appreciate it if someone can give me some concrete
examples of data profiling (and how they benefit) so that when we can
choose the right ETL tool.
Odds are the data has errors & is inconsistent: most data comes from
oltp systems that are very forgiving of errors and temporal
inconsistencies - in comparison to dss/olap systems. And - profiling
can help you identify these problems - and find especially weak spots
before diving into the etl. But you're right - it's the etl system
that will have to fix it.

In my opinion the use of profiling tools is strictly optional: I do all
profiling manually using sql and very rapid warehouse/mart iterations.
A simple product might be nice, but if I've got to spend 40-80 hours
evaluating tools, justifying the cost, working it through procurement,
installing the tool, working with their helpdesk, reading the tutorial
and/or documentation, etc, etc - then it buys me nothing.

I suppose I might reconsider if I had a team of unskilled analysts
responsible for requirements gathering, a vast number of attributes
across dozens of systems, etc, etc. Then it might be worth it.

buck



Reply With Quote
  #5  
Old   
Stephan Eggermont
 
Posts: n/a

Default Re: Data profiling - the big picture... - 02-16-2005 , 04:31 AM



vrushali (AT) tapadiya (DOT) net wrote:
Quote:
OKAY. THE LEGACY SYSTEM I AM WORKING WITH IS AN ACCESS DATABASE. HOW
CAN A COLUMN THAT IS OF TYPE NUMERIC EVER STORE ALPHANUMERIC CHARACTERS
IN IT? ANY DATABASE SYSTEM, LEGACY OR NOT, WOULD NOT SIMPLY LET YOU
VIOLATE THE DATA TYPE CONSTRAINTS. WHAT AM I MISSING?
Data type constraints of a rdbms are very poor, and are rarely enforced.

Stephan


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.