dbTalk Databases Forums  

Re: SSIS: Updating a DataWarhouse with Modified Rows

microsoft.public.sqlserver.dts microsoft.public.sqlserver.dts


Discuss Re: SSIS: Updating a DataWarhouse with Modified Rows in the microsoft.public.sqlserver.dts forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Allan Mitchell
 
Posts: n/a

Default Re: SSIS: Updating a DataWarhouse with Modified Rows - 12-08-2006 , 05:26 PM






Hello Todd,

Modifying fact table rows can become a religious debate. Most shops I have
worked in simply enter a delta row to cancel out the "Error"

Can you identify changed rows in the source so that you know their fact table
row will be out of date? If you can and you must update the fact table then
yes the first approach is the better of the two.


Regards

Allan Mitchell
Konesans Ltd
T +44 7966 476 572
F +44 2071 008 479
http://www.konesans.com

Quote:
Looking for opinions on this:

In designing a DataWarehouse in SQL 2005, I am developing the SSIS
Package that will MODIFY changed rows in a FACT table. As far as I can
see, there are two general options:

Option 1 is to EXTRACT all modified rows (since the last extract) and
dump them into a staging table in the Warehouse. Then use a stored
procedure to update the FACT Table with data from the staging table
based on a join between the two (joining KEY fields, of course).

Option 2 is to use an OLE DB Command in the SSIS data flow to execute
a stored procedure for each row in the stream.

Is there an Option 3 involving a Script Transform? Not sure how that
would work.

My guess is that the set based operation of the first option would be
faster than a row based operation of any others.

Any opinions out there? Is there some 'best practice' when updating
rows in the Destination in SSIS (updating, not adding new rows)

Thanks.




Reply With Quote
  #2  
Old   
Allan Mitchell
 
Posts: n/a

Default Re: SSIS: Updating a DataWarhouse with Modified Rows - 12-11-2006 , 02:59 PM






Hello Todd,

"so what I say goes, even if it's wrong!" - :-)


The way i see it is, it is a toss up between the OLE DB cmd transform to
do the updates and the staged rows and set based update post data flow task.
For me the answer is in how many updates you are likely to do. If it is
10 or so then the perf hit of the OLE DB transform will be negligible. If
it is 100K then up are going to know about it and should choose the staging
of the rows with SET based update. That said I would probably stage in any
event and do the SET based update. Cleaner and should you get wild fluctuations
in counts to be updated then you are prepared.


How many rows do you have as an input dataset and a lookup dataset. This
needs thinnking abount as well because there are things we can do here.




Regards

Allan Mitchell
Konesans Ltd
T +44 7966 476 572
F +44 2071 008 479
http://www.konesans.com

Quote:
Allan,

This particular Fact table is not a "pure" transactional system. It is
for a Help-Desk ticketing system and has properties such as "Status"
and "Assignee" which change. Updating with a staging table and one sp
will not pose any problems.

I do have other Fact tables that are strictly appends (no
modifications) so they don't have this dilema.

As far as a religious debate, I am the only one designing the DW, so
what I say goes, even if it's wrong!

Thanks for your input. Always appreciated.

"Allan Mitchell" wrote:

Hello Todd,

Modifying fact table rows can become a religious debate. Most shops
I have worked in simply enter a delta row to cancel out the "Error"

Can you identify changed rows in the source so that you know their
fact table row will be out of date? If you can and you must update
the fact table then yes the first approach is the better of the two.

Regards

Allan Mitchell
Konesans Ltd
T +44 7966 476 572
F +44 2071 008 479
http://www.konesans.com
Looking for opinions on this:

In designing a DataWarehouse in SQL 2005, I am developing the SSIS
Package that will MODIFY changed rows in a FACT table. As far as I
can see, there are two general options:

Option 1 is to EXTRACT all modified rows (since the last extract)
and dump them into a staging table in the Warehouse. Then use a
stored procedure to update the FACT Table with data from the staging
table based on a join between the two (joining KEY fields, of
course).

Option 2 is to use an OLE DB Command in the SSIS data flow to
execute a stored procedure for each row in the stream.

Is there an Option 3 involving a Script Transform? Not sure how that
would work.

My guess is that the set based operation of the first option would
be faster than a row based operation of any others.

Any opinions out there? Is there some 'best practice' when updating
rows in the Destination in SSIS (updating, not adding new rows)

Thanks.




Reply With Quote
  #3  
Old   
Allan Mitchell
 
Posts: n/a

Default Re: SSIS: Updating a DataWarhouse with Modified Rows - 12-11-2006 , 05:05 PM



Hello Todd,

You wouldn't. That would be even less performant than the OLE DB transform
as you would have to call out to an external library to do it.

I would stay with the staging route you are investigating.

Regards

Allan Mitchell
Konesans Ltd
T +44 7966 476 572
F +44 2071 008 479
http://www.konesans.com

Quote:
I'm still new at designing Fact table loads for a DW so I started with
the smallest I could find in my environment: 1000 rows existing and 10
rows updated (OK, you can stop laughing now!)

The next Fact table will be a lot bigger and I just wanted to get an
idea / set some guidelines for future design work.

I went with the staging table and SP and it works fine.

Although, always eager to learn something new, how would I set up a
Script Transform to modify rows?

Thanks.

"Allan Mitchell" wrote:

Hello Todd,

"so what I say goes, even if it's wrong!" - :-)

The way i see it is, it is a toss up between the OLE DB cmd transform
to do the updates and the staged rows and set based update post data
flow task. For me the answer is in how many updates you are likely to
do. If it is 10 or so then the perf hit of the OLE DB transform will
be negligible. If it is 100K then up are going to know about it and
should choose the staging of the rows with SET based update. That
said I would probably stage in any event and do the SET based update.
Cleaner and should you get wild fluctuations in counts to be updated
then you are prepared.

How many rows do you have as an input dataset and a lookup dataset.
This needs thinnking abount as well because there are things we can
do here.

Regards

Allan Mitchell
Konesans Ltd
T +44 7966 476 572
F +44 2071 008 479
http://www.konesans.com



Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.