dbTalk Databases Forums  

Reducing the execution time of my DTS (Bring in only new records)?

microsoft.public.sqlserver.olap microsoft.public.sqlserver.olap


Discuss Reducing the execution time of my DTS (Bring in only new records)? in the microsoft.public.sqlserver.olap forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Learner
 
Posts: n/a

Default Reducing the execution time of my DTS (Bring in only new records)? - 05-09-2004 , 02:01 AM






Hi,

Via a DTS, I am bringing in data from my OLTP system into my Staging
area.

The problem is that since I am not aware of any alternative solution
therefore I am currently bringing ALL the to-date records from my SALES
table (from my OLTP system) into one of the tables in the Staging area
(let's call it table stg_Sales). There are over 300,000 that are brought
in every time I run this DTS. And the result is that it takes a lot of
time... Currently my DTS is doing the following 3 things:

1) Drops the table stg_Sales
2) Creates the table stg_Sales
2) Inserts ALL the record from SALES into stg_Sales

I'm looking for a better solution... Is there some solution where only
those records (from the SALES table) are APPENDED to my stg_Sales table
which were entered AFTER the DTS last ran? Please assume both
possibilities while you assist/reply:

A) The SaleDate field in Sales might only be storing the DATE (not the
time part)

B) The SaleDate field in Sales may have complete Date information (i.e.
Date as well as time)

Or if there is an even better solution,please recommend that too.

Many TIA.


--
Thanks.

Reply With Quote
  #2  
Old   
Michael Vardinghus
 
Posts: n/a

Default Re: Reducing the execution time of my DTS (Bring in only new records)? - 05-09-2004 , 03:15 AM






I actually have a similar post....however - i want to avoid fething only new
records for the fact table - if it isn't bad olap practice.

I don't have a performance issue - 300.000 records as you mention below
doesn't sound as much...but I don't know your
hardware specifications. But then again I don't know your speed requirements
... 10 minutes to slow ?

I can accept half an hour to an hour because I expect this process to be
done during night.

The dimensions in my environment aren't that complicated (not much member
properties from host) so that is pretty straight forward
solved by using a select ... where not exists (select...) to only add new
dimension values.

The problem lies in the fact table records - if you here only want to add
new I see two ways to solve it

a) getting the host to transfer only new records (or you see to that you
only receive new records..)...in some oltp - systems there is a timestamp
that tells you that this record is made today ...
b) in your staging environment you see to that only new records are being
added....think this can be done by using some kind of checksum column .. not
totally sure however...

a + b ads some complexity that I wan't to avoid if possible - but if I in my
post gets a reply that says that OLAP can't handle dropping of the fact
table without making a full process every time - I will be looking at
something similar.

If I were to compare a + b I would think that b would be a little bit safer
because it will always transfer records you don't have - however this can be
demanding in performance as well. a is perhaps the quicker but there lies a
problem with the days where something goes wrong...then you should be able
to run the dts again

I actually think that the accelerator for bi is based on b - but I'm not
totally sure.....

http://www.microsoft.com/sql/ssabi/howtobuy/default.asp

....choose register to download at the bottom..

By the way there actually is a specific DTS newsgroup where you (as in this
one..) get great help.

\Michael Vardinghus

"Learner" <wantnospam (AT) email (DOT) com> skrev i en meddelelse
news:MPG.1b082dd591111120989721 (AT) msnews (DOT) microsoft.com...
Quote:
Hi,

Via a DTS, I am bringing in data from my OLTP system into my Staging
area.

The problem is that since I am not aware of any alternative solution
therefore I am currently bringing ALL the to-date records from my SALES
table (from my OLTP system) into one of the tables in the Staging area
(let's call it table stg_Sales). There are over 300,000 that are brought
in every time I run this DTS. And the result is that it takes a lot of
time... Currently my DTS is doing the following 3 things:

1) Drops the table stg_Sales
2) Creates the table stg_Sales
2) Inserts ALL the record from SALES into stg_Sales

I'm looking for a better solution... Is there some solution where only
those records (from the SALES table) are APPENDED to my stg_Sales table
which were entered AFTER the DTS last ran? Please assume both
possibilities while you assist/reply:

A) The SaleDate field in Sales might only be storing the DATE (not the
time part)

B) The SaleDate field in Sales may have complete Date information (i.e.
Date as well as time)

Or if there is an even better solution,please recommend that too.

Many TIA.


--
Thanks.



Reply With Quote
  #3  
Old   
Michael Vardinghus
 
Posts: n/a

Default Re: Reducing the execution time of my DTS (Bring in only new records)? - 05-09-2004 , 11:05 AM



Think I was wrong about accelerator - there is nothing in these dts packages
to prevent the same fact record from being imported twice - so they must
expect the host to do the filtrering when outputting - or you to alter the
staging import dts scripts so you only receive new.

Not totally sure....however....lots of parameters to mess with...


"Michael Vardinghus" <michaelvardinghus (AT) hotmail (DOT) com> skrev i en meddelelse
news:uLOt60ZNEHA.3016 (AT) tk2msftngp13 (DOT) phx.gbl...
Quote:
I actually have a similar post....however - i want to avoid fething only
new
records for the fact table - if it isn't bad olap practice.

I don't have a performance issue - 300.000 records as you mention below
doesn't sound as much...but I don't know your
hardware specifications. But then again I don't know your speed
requirements
.. 10 minutes to slow ?

I can accept half an hour to an hour because I expect this process to be
done during night.

The dimensions in my environment aren't that complicated (not much member
properties from host) so that is pretty straight forward
solved by using a select ... where not exists (select...) to only add new
dimension values.

The problem lies in the fact table records - if you here only want to add
new I see two ways to solve it

a) getting the host to transfer only new records (or you see to that you
only receive new records..)...in some oltp - systems there is a timestamp
that tells you that this record is made today ...
b) in your staging environment you see to that only new records are being
added....think this can be done by using some kind of checksum column ..
not
totally sure however...

a + b ads some complexity that I wan't to avoid if possible - but if I in
my
post gets a reply that says that OLAP can't handle dropping of the fact
table without making a full process every time - I will be looking at
something similar.

If I were to compare a + b I would think that b would be a little bit
safer
because it will always transfer records you don't have - however this can
be
demanding in performance as well. a is perhaps the quicker but there lies
a
problem with the days where something goes wrong...then you should be able
to run the dts again

I actually think that the accelerator for bi is based on b - but I'm not
totally sure.....

http://www.microsoft.com/sql/ssabi/howtobuy/default.asp

...choose register to download at the bottom..

By the way there actually is a specific DTS newsgroup where you (as in
this
one..) get great help.

\Michael Vardinghus

"Learner" <wantnospam (AT) email (DOT) com> skrev i en meddelelse
news:MPG.1b082dd591111120989721 (AT) msnews (DOT) microsoft.com...
Hi,

Via a DTS, I am bringing in data from my OLTP system into my Staging
area.

The problem is that since I am not aware of any alternative solution
therefore I am currently bringing ALL the to-date records from my SALES
table (from my OLTP system) into one of the tables in the Staging area
(let's call it table stg_Sales). There are over 300,000 that are brought
in every time I run this DTS. And the result is that it takes a lot of
time... Currently my DTS is doing the following 3 things:

1) Drops the table stg_Sales
2) Creates the table stg_Sales
2) Inserts ALL the record from SALES into stg_Sales

I'm looking for a better solution... Is there some solution where only
those records (from the SALES table) are APPENDED to my stg_Sales table
which were entered AFTER the DTS last ran? Please assume both
possibilities while you assist/reply:

A) The SaleDate field in Sales might only be storing the DATE (not the
time part)

B) The SaleDate field in Sales may have complete Date information (i.e.
Date as well as time)

Or if there is an even better solution,please recommend that too.

Many TIA.


--
Thanks.





Reply With Quote
  #4  
Old   
Learner
 
Posts: n/a

Default Re: Reducing the execution time of my DTS (Bring in only new records)? - 05-09-2004 , 11:32 AM



Hello Michael,

Well putting a check on the ETL side (I.e. "throw out" only the "newer"
records) is a possibility but that will make me run into another problem
I.e. How can I pass a parameter to a view (which BTW is the basis from
where/how I get data from from my ETL) so that the job is done
automatically.

For example, suppose on 09-May-2004, I have all the to-date SalesOrders
into my staging area (or for that matter into my data mart).

Now we are assuming that I populate my staging area table only once a
day. Okay then suppose on any date AFTER 09-May-2004, I want to be able
to somehow automatically get only those sales orders that were processed
after a 'certain date' and I do not wish to 'hard code' this 'certain
date' but rather want it to be 'picked' automatically...

Honestly, I'm not sure if I've been able to make any sense or or you can
get which I am trying to say???

This one probably is not as easy as I thought?

Regards.


Reply With Quote
  #5  
Old   
Michael Vardinghus
 
Posts: n/a

Default Re: Reducing the execution time of my DTS (Bring in only new records)? - 05-09-2004 , 03:00 PM



It makes sense - but it isn't actually that difficult .. in your sql view
where you filter the data you need you put a where clause:

where salesorderdate = getdate()

This will see to it that you only get salesorders from today (subtract one
if you want them from yesterday...).

As I said before you must consider how to handle when this doesn't work...if
you use the sentence above you will get
records from today .. but what if the package fails and you don't see it
until the next day.

Then you will perhaps need a way to execute the package with a day
parameter.

The last one I haven't figured out yet ... have some packages with
parameters but haven't gotten date parameters to
work.

Another question is the nature of your salesorders...you mention a date
field - is that the day the Sales Order
are registered ? Do you pull out information that can be due to change after
this date ? You must be sure to have defined
when to pull out the record from the host...perhaps the date is too soon -
perhaps you need to combine fields like the
status of the sales order plus a date and so forth...

\Michael Vardinghus


"Learner" <wantnospam (AT) email (DOT) com> skrev i en meddelelse
news:MPG.1b08b3e3b1f36db89896f1 (AT) msnews (DOT) microsoft.com...
Quote:
Hello Michael,

Well putting a check on the ETL side (I.e. "throw out" only the "newer"
records) is a possibility but that will make me run into another problem
I.e. How can I pass a parameter to a view (which BTW is the basis from
where/how I get data from from my ETL) so that the job is done
automatically.

For example, suppose on 09-May-2004, I have all the to-date SalesOrders
into my staging area (or for that matter into my data mart).

Now we are assuming that I populate my staging area table only once a
day. Okay then suppose on any date AFTER 09-May-2004, I want to be able
to somehow automatically get only those sales orders that were processed
after a 'certain date' and I do not wish to 'hard code' this 'certain
date' but rather want it to be 'picked' automatically...

Honestly, I'm not sure if I've been able to make any sense or or you can
get which I am trying to say???

This one probably is not as easy as I thought?

Regards.




Reply With Quote
  #6  
Old   
Michael Vardinghus
 
Posts: n/a

Default Re: Reducing the execution time of my DTS (Bring in only new records)? - 05-09-2004 , 03:09 PM



....come to think ... in Accelerator they actually use stored procedures a
lot...seems as if it is possible to throw
a parameter to a stored procedure...i'm actually doing that in one of my
packages - didn't get the exact date
to work though .. but for me a year function around a date was sufficient.

In accelerator it's a long way before these variables are used in the
dts-package (from sql-table to activex-script in the package...)

\Michael Vardinghus

"Michael Vardinghus" <michaelvardinghus (AT) hotmail (DOT) com> skrev i en meddelelse
news:uC8Qm%23fNEHA.2704 (AT) TK2MSFTNGP10 (DOT) phx.gbl...
Quote:
It makes sense - but it isn't actually that difficult .. in your sql view
where you filter the data you need you put a where clause:

where salesorderdate = getdate()

This will see to it that you only get salesorders from today (subtract one
if you want them from yesterday...).

As I said before you must consider how to handle when this doesn't
work...if
you use the sentence above you will get
records from today .. but what if the package fails and you don't see it
until the next day.

Then you will perhaps need a way to execute the package with a day
parameter.

The last one I haven't figured out yet ... have some packages with
parameters but haven't gotten date parameters to
work.

Another question is the nature of your salesorders...you mention a date
field - is that the day the Sales Order
are registered ? Do you pull out information that can be due to change
after
this date ? You must be sure to have defined
when to pull out the record from the host...perhaps the date is too soon -
perhaps you need to combine fields like the
status of the sales order plus a date and so forth...

\Michael Vardinghus


"Learner" <wantnospam (AT) email (DOT) com> skrev i en meddelelse
news:MPG.1b08b3e3b1f36db89896f1 (AT) msnews (DOT) microsoft.com...
Hello Michael,

Well putting a check on the ETL side (I.e. "throw out" only the "newer"
records) is a possibility but that will make me run into another problem
I.e. How can I pass a parameter to a view (which BTW is the basis from
where/how I get data from from my ETL) so that the job is done
automatically.

For example, suppose on 09-May-2004, I have all the to-date SalesOrders
into my staging area (or for that matter into my data mart).

Now we are assuming that I populate my staging area table only once a
day. Okay then suppose on any date AFTER 09-May-2004, I want to be able
to somehow automatically get only those sales orders that were processed
after a 'certain date' and I do not wish to 'hard code' this 'certain
date' but rather want it to be 'picked' automatically...

Honestly, I'm not sure if I've been able to make any sense or or you can
get which I am trying to say???

This one probably is not as easy as I thought?

Regards.






Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.