dbTalk Databases Forums  

Data Transformation

comp.databases.ms-access comp.databases.ms-access


Discuss Data Transformation in the comp.databases.ms-access forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Gilgamesh
 
Posts: n/a

Default Data Transformation - 03-24-2011 , 11:40 AM






All,
Periodically, I have to import some large excel spreadsheets into an
Access Db (I'm using Access 2007).
The excel spreadsheets are of this form:

ID, Q1, Q2, Q3, Q4, ...

I need to import them into a database table tblAwards that has these
two columns:

ID, Award

ID is a unique person identifier. Q1, Q2, Q3, etc are text strings
that are answers to a survey. Each answer Q1, Q2, etc from the
spreadsheet should generate a row in the table. For example, the
spreadsheet row

ID, Q1, Q2, Q3, ...

1001, "Men's Downhill Gold Medal", "Nobel Prize Chemistry", "Dean's
List", etc

should appear as multiple rows in tblAwards, like this:

ID, Award
1001, "Men's Downhill Gold Medal"
1001, "Nobel Prize Chemistry"
1001, "Dean's List"
1001, etc

The problem is, these spreadsheets have 200 or so columns each, with
three hundred or so rows. I can create a link table that links to the
spreadsheet, then write a line of SQL for each of the 200 hundred
columns that inserts (appends) to tblAwards;but that will take
forever. I was wondering if there's a smart way to do it.

Manipulating the spreadsheet into that form before bringing it into
the db is also quite laborious, unless there's a quick way to do that
that I don't know of.

Any help gratefully appreciated.

Mike

Reply With Quote
  #2  
Old   
Bob Barrows
 
Posts: n/a

Default Re: Data Transformation - 03-24-2011 , 12:41 PM






Gilgamesh wrote:
Quote:
All,
Periodically, I have to import some large excel spreadsheets into an
Access Db (I'm using Access 2007).
The excel spreadsheets are of this form:

ID, Q1, Q2, Q3, Q4, ...

I need to import them into a database table tblAwards that has these
two columns:

ID, Award
Wouldn't you want a column to identify the question number as well?

Quote:
ID is a unique person identifier. Q1, Q2, Q3, etc are text strings
that are answers to a survey. Each answer Q1, Q2, etc from the
spreadsheet should generate a row in the table. For example, the
spreadsheet row

ID, Q1, Q2, Q3, ...

1001, "Men's Downhill Gold Medal", "Nobel Prize Chemistry", "Dean's
List", etc

should appear as multiple rows in tblAwards, like this:

ID, Award
1001, "Men's Downhill Gold Medal"
1001, "Nobel Prize Chemistry"
1001, "Dean's List"
1001, etc

The problem is, these spreadsheets have 200 or so columns each, with
three hundred or so rows. I can create a link table that links to the
spreadsheet, then write a line of SQL for each of the 200 hundred
columns that inserts (appends) to tblAwards;but that will take
forever. I was wondering if there's a smart way to do it.

Nope

I would probably write a VBA sub to open a recordset on the linked
spreadsheet and loop through its Fields collection and insert the data into
the destination table. Something like this (untested air code - assumes a
column to identify the question):

Sub TransformSpreadsheet ()
dim rsAwards as dao.recordset,rsSprd as dao.recordset
dim db as database
dim lngID as long, qNum as Integer
dim sqlAwards as string
set db=currentdb

set rsSprd=db.openrecordset("select * from linkedspreadsheed")
if not rsSprd.eof then
sqlAwards="SELECT ID,QNum,Award " & _
" FROM tblAwards WHERE 1=2"
set rsAwards=db.openrecordset(sqlAwards)
Do Until rsSprd .eof
lngID = rsSprd(0)
For qNum = 1 to rsSprd .Fields.Count-1
With rsAwards
.AddNew
.Fields(0)=lngID
.Fields(1)=qNum
.Fields(qNum).value=rsSprd.Fields(qNum).value
.Update
End With
Next
rs.movenext
loop
end if
rsSprd.close

End Sub

Reply With Quote
  #3  
Old   
The Frog
 
Posts: n/a

Default Re: Data Transformation - 03-25-2011 , 07:58 AM



Hi Mike,

The problem you are looking at is a classic ETL problem and is best
handled (IMO) with a tool suited to the task. I use Pentaho Data
Integration for this exact problem as it is both free and hass the
native capability to do exactly what you need. It can read a
spreadsheet, transform the data to a 'normalised' form, and then pump
it into a table of a db of your choosing. You could probably write
some VBA and maybe some clever SQL to achieve this same result, but I
am not ssure it would be worth the effort. You can find info at
www.pentaho.com , the tool with either be under Pentaho Data
Integration or simply PDI. It has been a while since I visited the
website, but its always been pretty easy to navigate and the tool as
well as the online forum / community are pretty easy to use and quite
helpful.

Cheers

The Frog

Reply With Quote
  #4  
Old   
Gilgamesh
 
Posts: n/a

Default Re: Data Transformation - 03-26-2011 , 09:25 AM



On Mar 24, 11:41*am, "Bob Barrows" <reb01... (AT) NOSPAMyahoo (DOT) com> wrote:
Quote:
Gilgamesh wrote:
All,
Periodically, I have to import some large excel spreadsheets into an
Access Db (I'm using Access 2007).
The excel spreadsheets are of this form:

ID, Q1, Q2, Q3, Q4, ...

I need to import them into a database table tblAwards that has these
two columns:

ID, Award

Wouldn't you want a column to identify the question number as well?







ID is a unique person identifier. *Q1, Q2, Q3, etc are text strings
that are answers to a survey. *Each answer Q1, Q2, etc from the
spreadsheet should generate a row in the table. *For example, the
spreadsheet row

ID, Q1, Q2, Q3, ...

1001, "Men's Downhill Gold Medal", "Nobel Prize Chemistry", "Dean's
List", etc

should appear as multiple rows in tblAwards, like this:

ID, Award
1001, "Men's Downhill Gold Medal"
1001, "Nobel Prize Chemistry"
1001, "Dean's List"
1001, etc

The problem is, these spreadsheets have 200 or so columns each, with
three hundred or so rows. *I can create a link table that links to the
spreadsheet, then write a line of SQL for each of the 200 hundred
columns that inserts (appends) to tblAwards;but that will take
forever. *I was wondering if there's a smart way to do it.

Nope

I would probably write a VBA sub to open a recordset on the linked
spreadsheet and loop through its Fields collection and insert the data into
the destination table. Something like this (untested air code - assumes a
column to identify the question):

Sub TransformSpreadsheet ()
dim rsAwards as dao.recordset,rsSprd as dao.recordset
dim db as database
dim lngID as long, qNum as Integer
dim sqlAwards as string
set db=currentdb

set rsSprd=db.openrecordset("select * from linkedspreadsheed")
if not rsSprd.eof then
* * sqlAwards="SELECT ID,QNum,Award " & _
* * " FROM tblAwards WHERE 1=2"
* * set rsAwards=db.openrecordset(sqlAwards)
* * Do Until rsSprd .eof
* * * * lngID = rsSprd(0)
* * * * For qNum = 1 to rsSprd .Fields.Count-1
* * * * * * With rsAwards
* * * * * * * * .AddNew
* * * * * * * * .Fields(0)=lngID
* * * * * * * * .Fields(1)=qNum
* * * * * * * * .Fields(qNum).value=rsSprd.Fields(qNum)..value
* * * * * * * * .Update
* * * * * * End With
* * * * Next
* * * * rs.movenext
* * loop
end if
rsSprd.close

End Sub
Thanks Bob.

Reply With Quote
  #5  
Old   
Gilgamesh
 
Posts: n/a

Default Re: Data Transformation - 03-26-2011 , 09:29 AM



On Mar 25, 6:58*am, The Frog <mr.frog.to.... (AT) googlemail (DOT) com> wrote:
Quote:
Hi Mike,

The problem you are looking at is a classic ETL problem and is best
handled (IMO) with a tool suited to the task. I use Pentaho Data
Integration for this exact problem as it is both free and hass the
native capability to do exactly what you need. It can read a
spreadsheet, transform the data to a 'normalised' form, and then pump
it into a table of a db of your choosing. You could probably write
some VBA and maybe some clever SQL to achieve this same result, but I
am not ssure it would be worth the effort. You can find info atwww.pentaho.com, the tool with either be under Pentaho Data
Integration or simply PDI. It has been a while since I visited the
website, but its always been pretty easy to navigate and the tool as
well as the online forum / community are pretty easy to use and quite
helpful.

Cheers

The Frog
Thanks Frog. I'll definitely check this out. It's not that I can't
think of dozens of ways to do it, it's just that all involve writing
and properly testing and debugging many lines of code. Of course, if
I have to write lots of code I will. The tool you indicate might make
things easier.
Thanks again.

Mike

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.