![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
#2
| |||
| |||
|
|
Hi, Via a DTS, I am bringing in data from my OLTP system into my Staging area. The problem is that since I am not aware of any alternative solution therefore I am currently bringing ALL the to-date records from my SALES table (from my OLTP system) into one of the tables in the Staging area (let's call it table stg_Sales). There are over 300,000 that are brought in every time I run this DTS. And the result is that it takes a lot of time... Currently my DTS is doing the following 3 things: 1) Drops the table stg_Sales 2) Creates the table stg_Sales 2) Inserts ALL the record from SALES into stg_Sales I'm looking for a better solution... Is there some solution where only those records (from the SALES table) are APPENDED to my stg_Sales table which were entered AFTER the DTS last ran? Please assume both possibilities while you assist/reply: A) The SaleDate field in Sales might only be storing the DATE (not the time part) B) The SaleDate field in Sales may have complete Date information (i.e. Date as well as time) Or if there is an even better solution,please recommend that too. Many TIA. -- Thanks. |
#3
| |||
| |||
|
|
I actually have a similar post....however - i want to avoid fething only new records for the fact table - if it isn't bad olap practice. I don't have a performance issue - 300.000 records as you mention below doesn't sound as much...but I don't know your hardware specifications. But then again I don't know your speed requirements .. 10 minutes to slow ? I can accept half an hour to an hour because I expect this process to be done during night. The dimensions in my environment aren't that complicated (not much member properties from host) so that is pretty straight forward solved by using a select ... where not exists (select...) to only add new dimension values. The problem lies in the fact table records - if you here only want to add new I see two ways to solve it a) getting the host to transfer only new records (or you see to that you only receive new records..)...in some oltp - systems there is a timestamp that tells you that this record is made today ... b) in your staging environment you see to that only new records are being added....think this can be done by using some kind of checksum column .. not totally sure however... a + b ads some complexity that I wan't to avoid if possible - but if I in my post gets a reply that says that OLAP can't handle dropping of the fact table without making a full process every time - I will be looking at something similar. If I were to compare a + b I would think that b would be a little bit safer because it will always transfer records you don't have - however this can be demanding in performance as well. a is perhaps the quicker but there lies a problem with the days where something goes wrong...then you should be able to run the dts again I actually think that the accelerator for bi is based on b - but I'm not totally sure..... http://www.microsoft.com/sql/ssabi/howtobuy/default.asp ...choose register to download at the bottom.. By the way there actually is a specific DTS newsgroup where you (as in this one..) get great help. \Michael Vardinghus "Learner" <wantnospam (AT) email (DOT) com> skrev i en meddelelse news:MPG.1b082dd591111120989721 (AT) msnews (DOT) microsoft.com... Hi, Via a DTS, I am bringing in data from my OLTP system into my Staging area. The problem is that since I am not aware of any alternative solution therefore I am currently bringing ALL the to-date records from my SALES table (from my OLTP system) into one of the tables in the Staging area (let's call it table stg_Sales). There are over 300,000 that are brought in every time I run this DTS. And the result is that it takes a lot of time... Currently my DTS is doing the following 3 things: 1) Drops the table stg_Sales 2) Creates the table stg_Sales 2) Inserts ALL the record from SALES into stg_Sales I'm looking for a better solution... Is there some solution where only those records (from the SALES table) are APPENDED to my stg_Sales table which were entered AFTER the DTS last ran? Please assume both possibilities while you assist/reply: A) The SaleDate field in Sales might only be storing the DATE (not the time part) B) The SaleDate field in Sales may have complete Date information (i.e. Date as well as time) Or if there is an even better solution,please recommend that too. Many TIA. -- Thanks. |
#4
| |||
| |||
|
#5
| |||
| |||
|
|
Hello Michael, Well putting a check on the ETL side (I.e. "throw out" only the "newer" records) is a possibility but that will make me run into another problem I.e. How can I pass a parameter to a view (which BTW is the basis from where/how I get data from from my ETL) so that the job is done automatically. For example, suppose on 09-May-2004, I have all the to-date SalesOrders into my staging area (or for that matter into my data mart). Now we are assuming that I populate my staging area table only once a day. Okay then suppose on any date AFTER 09-May-2004, I want to be able to somehow automatically get only those sales orders that were processed after a 'certain date' and I do not wish to 'hard code' this 'certain date' but rather want it to be 'picked' automatically... Honestly, I'm not sure if I've been able to make any sense or or you can get which I am trying to say??? This one probably is not as easy as I thought? Regards. |
#6
| |||
| |||
|
|
It makes sense - but it isn't actually that difficult .. in your sql view where you filter the data you need you put a where clause: where salesorderdate = getdate() This will see to it that you only get salesorders from today (subtract one if you want them from yesterday...). As I said before you must consider how to handle when this doesn't work...if you use the sentence above you will get records from today .. but what if the package fails and you don't see it until the next day. Then you will perhaps need a way to execute the package with a day parameter. The last one I haven't figured out yet ... have some packages with parameters but haven't gotten date parameters to work. Another question is the nature of your salesorders...you mention a date field - is that the day the Sales Order are registered ? Do you pull out information that can be due to change after this date ? You must be sure to have defined when to pull out the record from the host...perhaps the date is too soon - perhaps you need to combine fields like the status of the sales order plus a date and so forth... \Michael Vardinghus "Learner" <wantnospam (AT) email (DOT) com> skrev i en meddelelse news:MPG.1b08b3e3b1f36db89896f1 (AT) msnews (DOT) microsoft.com... Hello Michael, Well putting a check on the ETL side (I.e. "throw out" only the "newer" records) is a possibility but that will make me run into another problem I.e. How can I pass a parameter to a view (which BTW is the basis from where/how I get data from from my ETL) so that the job is done automatically. For example, suppose on 09-May-2004, I have all the to-date SalesOrders into my staging area (or for that matter into my data mart). Now we are assuming that I populate my staging area table only once a day. Okay then suppose on any date AFTER 09-May-2004, I want to be able to somehow automatically get only those sales orders that were processed after a 'certain date' and I do not wish to 'hard code' this 'certain date' but rather want it to be 'picked' automatically... Honestly, I'm not sure if I've been able to make any sense or or you can get which I am trying to say??? This one probably is not as easy as I thought? Regards. |
![]() |
| Thread Tools | |
| Display Modes | |
| |