JN,
Have you look at the execution profile for the views? Are their any table
scans (bad!!) or index scans versus clustered index seeks (good!!) or index
seeks? Is there a way you can load up profiler on the source
server/database and capture the execution plan, and perhaps post it here?
It would help to see how the server is attempting to put the view together.
There could be some basic things going wrong that are easy to fix they could
greatly increase your performance.
Raymond Lewallen
"JN" <anonymous (AT) discussions (DOT) microsoft.com> wrote
Quote:
Need your expert or experienced opinion and/or suggestions regarding
performance tuning an import package. What's the best sourcing option? CSV
|
or SQL Server views with INNER JOINs? If SQL Server views used with multiple
inner joins, what's the best way to structure or design to get the best
performance and minimize execution time other than indexing?
Quote:
We have an ETL process in a health care data environment. User extracts
data quarterly (monthly sometimes) from two main claim systems (I think it's
|
SAS or using SAS, at least). Scrubs, formats, and tags the data, as well as
breaking it down by BA rules and sends it for production reports in CSV
format. My job is to take the CSV and import them into the production SQL
Server 2000 database, while tagging the data by looking up values in
dimension tables and assigning primary keys to input files. This has been
working okay.
Quote:
Now we are changing the changing things around and sourcing the data from
SQL Server instead of SAS or whatever other legacy system we have been using
|
'till now. CSV files always came in static format and were imported directly
as they were. SQL Server sourcing is through SQL VIEWs, which use large
underlying tables and several INNER JOINs to dimension tables in the views
to extract the needing quarterly or monthly data still tagging and scrubbing
the data as needed. This part about creating views is fine. Now, when I go
to import these data sourcing from SQL VIEW to destination SQL table, I get
into problems. First, it takes about 2 hours just to see the transformation
tab of Transform Data Task Properties. Then if I need to change
transformation, I do that, which takes additional 2 hours, then I hit Okay
to save the changes made to transformation, which again takes about 2-3
hours if it doesn't freeze up, that is! Just to speed up the transformation
changes, I told the source DBA to change the view to include only TOP 1000
rows and interestingly, the whole process of making transformation changes
finished in seconds and I was able to even save it in seconds. But of
course, the view had to be changed back to its normal structure so that I
could proceed with full import. After all these, I started the import and my
import finished with the following stats:
Quote:
X File: (Rows: 287893) 3 hours, 10 minutes, 55 seconds to import from the
view.
Start: 3:45:35 PM End: 6:56:29 PM.
Y File: (Rows: 865834) 1 hours 5, minutes, 41 seconds to import from the
view.
Start: 6:56:29 PM End: 8:02:10.
Upon research I learned that the base source table (92768855 rows) for X
File is much larger than the base table (48782203 rows) for Y File, which is
|
why X File import took three times more to import even with less data as
compared to Y File import. X File has a field included in the view that is
not included in the Y File view so that, too, could add to longer import
time.
Quote:
This is just a beginning so now I am wondering if we should go back to
using CSV files as the format of the source input files?! We thought SQL
|
Server would make things straightforward, since the destination is SQL
Server, but it seems SQL Server performs better with CSV as the source file
format than SQL Server views. Of course, CSV doesn't use any joins where as
SQL Server views do so CSV will have exactly the number of rows being
imported where as views will also have exactly the rows being imported, but
views' underlying tables are much larger in size.
Quote:
So now my question is using CSV as the source format better than SQL
Server views? How can we performance tune these views to minimize import
|
time? Any suggestions or opinions? Is there a better sourcing solution
within SQL Server than using views?
Quote:
Thanks for any advise you could provide.
JN |