dbTalk Databases Forums  

proper way to dts 400 million record table?

microsoft.public.sqlserver.dts microsoft.public.sqlserver.dts


Discuss proper way to dts 400 million record table? in the microsoft.public.sqlserver.dts forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
JJ Wang
 
Posts: n/a

Default proper way to dts 400 million record table? - 12-20-2003 , 02:00 PM






Hi,

we have a table 'tableA' in 'databaseA' that has 400
million rows and 60 gig big, without any index on it.

We need to build a cluster index on tableA, and a non-
clustered index on 2 of its columns.

Hence we created tableB which is identical to tableA only
tableB will with index built on it. TableB will also be
in a different database databaseB on the same server,
because we also need to reorganize the filegroup for
databaseA.

after I done all this, I simply use DTS wizard created a
dts package, in the saved package, I modified the 'select'
query to have 'order by' clause on the clustered index
column, and then run the dts package.

this package have been running 40 hours now, it's writting
data to the new table because the file size of the new
table has been increasing. my estimation of this packege
will need another 10 hr to complete.

did I do something wrong here? is there a better way to
build indexes on such a big table? will building the
clustered indexes directly on the original table faster?
if so, it will create more tran-log, isn't it?

desperatly need your advice!! thanks a ton!!

JJ

Reply With Quote
  #2  
Old   
Hermit Dave
 
Posts: n/a

Default Re: proper way to dts 400 million record table? - 12-20-2003 , 03:50 PM






Well JJ,

You do have a lot of data in your table...
I think you have done the right thing cause it would have taken a long time
re-arranging the pages while creating the clustered index.
40 hours and 10 more to go would can probably be explained by a combination
of things
* you original table not having any indexes as such (that would make it very
long to sort the results and actually fetch the data considering it will
probably belong to a lot of data pages)
* if you have the new database on the same physical drive. (different
partition doesnt make a difference)

I guess DBAs might have a better answer
Hope this helps

HD

"JJ Wang" <anonymous (AT) discussions (DOT) microsoft.com> wrote

Quote:
Hi,

we have a table 'tableA' in 'databaseA' that has 400
million rows and 60 gig big, without any index on it.

We need to build a cluster index on tableA, and a non-
clustered index on 2 of its columns.

Hence we created tableB which is identical to tableA only
tableB will with index built on it. TableB will also be
in a different database databaseB on the same server,
because we also need to reorganize the filegroup for
databaseA.

after I done all this, I simply use DTS wizard created a
dts package, in the saved package, I modified the 'select'
query to have 'order by' clause on the clustered index
column, and then run the dts package.

this package have been running 40 hours now, it's writting
data to the new table because the file size of the new
table has been increasing. my estimation of this packege
will need another 10 hr to complete.

did I do something wrong here? is there a better way to
build indexes on such a big table? will building the
clustered indexes directly on the original table faster?
if so, it will create more tran-log, isn't it?

desperatly need your advice!! thanks a ton!!

JJ



Reply With Quote
  #3  
Old   
oj
 
Posts: n/a

Default Re: proper way to dts 400 million record table? - 12-20-2003 , 04:40 PM



JJ,

Here is a good guideline excerpt from BOL:

As a general guide, the following table shows suggested figures for the
amount of data to be added to a table for various types of indexes. If you
exceed these percentages, you may find it faster to drop and re-create the
indexes.

Indexes Amount of data added
Clustered index only 30%
Clustered and one nonclustered index 25%
Clustered and two nonclustered indexes 25%
Single nonclustered index only 100%
Two nonclustered indexes 60%



--
-oj
RAC v2.2 & QALite!
http://www.rac4sql.net



"JJ Wang" <anonymous (AT) discussions (DOT) microsoft.com> wrote

Quote:
Hi,

we have a table 'tableA' in 'databaseA' that has 400
million rows and 60 gig big, without any index on it.

We need to build a cluster index on tableA, and a non-
clustered index on 2 of its columns.

Hence we created tableB which is identical to tableA only
tableB will with index built on it. TableB will also be
in a different database databaseB on the same server,
because we also need to reorganize the filegroup for
databaseA.

after I done all this, I simply use DTS wizard created a
dts package, in the saved package, I modified the 'select'
query to have 'order by' clause on the clustered index
column, and then run the dts package.

this package have been running 40 hours now, it's writting
data to the new table because the file size of the new
table has been increasing. my estimation of this packege
will need another 10 hr to complete.

did I do something wrong here? is there a better way to
build indexes on such a big table? will building the
clustered indexes directly on the original table faster?
if so, it will create more tran-log, isn't it?

desperatly need your advice!! thanks a ton!!

JJ



Reply With Quote
  #4  
Old   
JJ Wang
 
Posts: n/a

Default Re: proper way to dts 400 million record table? - 12-20-2003 , 08:11 PM



thanks, oj. cool tips!

JJ
Quote:
-----Original Message-----
JJ,

Here is a good guideline excerpt from BOL:

As a general guide, the following table shows suggested
figures for the
amount of data to be added to a table for various types
of indexes. If you
exceed these percentages, you may find it faster to drop
and re-create the
indexes.

Indexes Amount of data added
Clustered index only 30%
Clustered and one nonclustered index 25%
Clustered and two nonclustered indexes 25%
Single nonclustered index only 100%
Two nonclustered indexes 60%



--
-oj
RAC v2.2 & QALite!
http://www.rac4sql.net



"JJ Wang" <anonymous (AT) discussions (DOT) microsoft.com> wrote in
message
news:02d801c3c733$fc929430$a101280a (AT) phx (DOT) gbl...
Hi,

we have a table 'tableA' in 'databaseA' that has 400
million rows and 60 gig big, without any index on it.

We need to build a cluster index on tableA, and a non-
clustered index on 2 of its columns.

Hence we created tableB which is identical to tableA
only
tableB will with index built on it. TableB will also be
in a different database databaseB on the same server,
because we also need to reorganize the filegroup for
databaseA.

after I done all this, I simply use DTS wizard created a
dts package, in the saved package, I modified
the 'select'
query to have 'order by' clause on the clustered index
column, and then run the dts package.

this package have been running 40 hours now, it's
writting
data to the new table because the file size of the new
table has been increasing. my estimation of this
packege
will need another 10 hr to complete.

did I do something wrong here? is there a better way to
build indexes on such a big table? will building the
clustered indexes directly on the original table faster?
if so, it will create more tran-log, isn't it?

desperatly need your advice!! thanks a ton!!

JJ


.


Reply With Quote
  #5  
Old   
JJ Wang
 
Posts: n/a

Default Re: proper way to dts 400 million record table? - 12-20-2003 , 08:13 PM



thanks, Hermit, you are right on the target!
you 'guessed' everything right! they are exactly what I
am doing.

thanks for the reassurance.

JJ
Quote:
-----Original Message-----
Well JJ,

You do have a lot of data in your table...
I think you have done the right thing cause it would have
taken a long time
re-arranging the pages while creating the clustered index.
40 hours and 10 more to go would can probably be
explained by a combination
of things
* you original table not having any indexes as such (that
would make it very
long to sort the results and actually fetch the data
considering it will
probably belong to a lot of data pages)
* if you have the new database on the same physical
drive. (different
partition doesnt make a difference)

I guess DBAs might have a better answer
Hope this helps

HD

"JJ Wang" <anonymous (AT) discussions (DOT) microsoft.com> wrote in
message
news:02d801c3c733$fc929430$a101280a (AT) phx (DOT) gbl...
Hi,

we have a table 'tableA' in 'databaseA' that has 400
million rows and 60 gig big, without any index on it.

We need to build a cluster index on tableA, and a non-
clustered index on 2 of its columns.

Hence we created tableB which is identical to tableA
only
tableB will with index built on it. TableB will also be
in a different database databaseB on the same server,
because we also need to reorganize the filegroup for
databaseA.

after I done all this, I simply use DTS wizard created a
dts package, in the saved package, I modified
the 'select'
query to have 'order by' clause on the clustered index
column, and then run the dts package.

this package have been running 40 hours now, it's
writting
data to the new table because the file size of the new
table has been increasing. my estimation of this
packege
will need another 10 hr to complete.

did I do something wrong here? is there a better way to
build indexes on such a big table? will building the
clustered indexes directly on the original table faster?
if so, it will create more tran-log, isn't it?

desperatly need your advice!! thanks a ton!!

JJ


.


Reply With Quote
  #6  
Old   
Ross McKay
 
Posts: n/a

Default Re: proper way to dts 400 million record table? - 12-20-2003 , 08:45 PM



On Sat, 20 Dec 2003 12:00:59 -0800, "JJ Wang" wrote:

Quote:
we have a table 'tableA' in 'databaseA' that has 400
million rows and 60 gig big, without any index on it.
Yikes, big. And no index you say? I'll throw in some comments, but bear
in mind that my biggest table so far was 100M rows and 20GB - but
similar issues, of course.

Quote:
We need to build a cluster index on tableA, and a non-
clustered index on 2 of its columns.

Hence we created tableB which is identical to tableA only
tableB will with index built on it. TableB will also be
in a different database databaseB on the same server,
because we also need to reorganize the filegroup for
databaseA.

after I done all this, I simply use DTS wizard created a
dts package, in the saved package, I modified the 'select'
query to have 'order by' clause on the clustered index
column, and then run the dts package.
In general, that's good practice, but I reckon most of your 40 hours was
probably spent on sorting tableA. However, if databaseB.tableB is
growing now, the sort on your source table has completed and data is now
transferring OK. It should build up pretty quickly from there.

Quote:
this package have been running 40 hours now, it's writting
data to the new table because the file size of the new
table has been increasing. my estimation of this packege
will need another 10 hr to complete.
Maybe not - the data load part might whizz along nicely now that the
sort is out of the way.

Quote:
did I do something wrong here? is there a better way to
build indexes on such a big table? will building the
clustered indexes directly on the original table faster?
if so, it will create more tran-log, isn't it?
If you have all of this on a machine with multiple processors and SQL
Server 2000 Enterprise, I reckon (read: guess) that building the
clustered index first, or even a non-clustered index, would have greatly
reduced your time. That sort of 400M rows would be a *big* hit, and I
don't know if it can take advantage of multiple threads (unless the
table is on a multiple-file filegroup).

Also, if these databases are on the same server, why use DTS? DTS means
passing data through many hands, and small overheads become grossly
inflated when you are dealing with many millions of rows. You could have
simply used:

Insert Into databaseB..tableB With(TABLOCKX) (...)
Select ... From DatabaseA..tableA With(TABLOCK)

w.r.t. transaction logs, sure you will get a transaction log hit, but I
think you can minimise that by setting your recovery model to Simple or
Bulk-Logged. Look up "Switching Recovery Models" in BOL. CAUTION: only
do this if you can guarantee that it will not harm your data recovery
position! (i.e. backups taken and verified, no OLTP users adding /
modifying rows, etc)

Another thing to consider is database growth. If your target database or
the log files don't already have enough space to accommodate this
transaction, then files will need to grow. The defaults will almost
certainly be too small; either set a big enough chunk to start with (my
preference), or set the factor of growth to something more sensible.
What sensible is depends a lot on your data, but I sometimes specify
growth for logs as 1GB increments, and for data as either (n)GB or 20%
increments where n is between 0.5 and 2. YMMV.

One thing I notice is that you don't mention anything about your SQL
Server environment: version, processors, RAM, disc subsystem, RAID level
for data and log files, etc. These can all have a major impact on the
whole deal also. Specifically, more processors is good when building an
index on SQL Server 200 Enterprise, more RAM is always good when dealing
with large quantities of data, log files should not generally be on
RAID5 drive arrays, etc.

And watch out for silly things like 3D Open-GL screensavers that chew
100% processor on non-AGP equipped computers! I'd hope that isn't true
in your case, but having seen it once, I have to mention it

I hope that by the time you read this, all is over and you don't need to
repeat the activity - but if not, I hope this helps somehow.

cheers,
Ross.
--
Ross McKay, WebAware Pty Ltd
"Words can only hurt if you try to read them. Don't play their game" - Zoolander


Reply With Quote
  #7  
Old   
James Hokes
 
Posts: n/a

Default Re: proper way to dts 400 million record table? - 12-20-2003 , 09:17 PM



Quote:
You could have
simply used:

Insert Into databaseB..tableB With(TABLOCKX) (...)
Select ... From DatabaseA..tableA With(TABLOCK)

I wouldn't do that with 40 million rows, much less 400 million.

This is exactly why they created BCP.

James Hokes





Reply With Quote
  #8  
Old   
Ross McKay
 
Posts: n/a

Default Re: proper way to dts 400 million record table? - 12-20-2003 , 09:58 PM



On Sat, 20 Dec 2003 22:17:27 -0500, "James Hokes" wrote:

Quote:
Insert Into databaseB..tableB With(TABLOCKX) (...)
Select ... From DatabaseA..tableA With(TABLOCK)

I wouldn't do that with 40 million rows, much less 400 million.

This is exactly why they created BCP.
Hmmm, good point - I was thinking that the recovery model would
alleviate the problem of transaction logs, but thinking now, that only
works for SELECT INTO ... and then you still have the clustered index to
create.

So, just BCP the source table out sorted, and BULK INSERT it into the
destination. DTS is kind of doing this on-the-fly, so maybe it is the
better solution.

Either way, it's that 'order by' on the source table that probably
contributed most to the 40 hours. An index would greatly improve the
performance.

Since it sounded like it was a once-off activity, it's all probably moot
now.

cheers,
Ross.
--
Ross McKay, WebAware Pty Ltd
"The lawn could stand another mowing; funny, I don't even care"
- Elvis Costello


Reply With Quote
  #9  
Old   
JJ Wang
 
Posts: n/a

Default Re: proper way to dts 400 million record table? - 12-21-2003 , 01:45 AM



thank you both, Ross and James, for being so kind taking
your time and helping me out!

you all exhibited great depth of sql knowlege. I sure
learned a lot from you.

in my situation, BCP is better or DTS packege is better in
terms of speed and tran-log growth management?

thank you so much!

JJ
Quote:
-----Original Message-----
On Sat, 20 Dec 2003 22:17:27 -0500, "James Hokes" wrote:

Insert Into databaseB..tableB With(TABLOCKX) (...)
Select ... From DatabaseA..tableA With(TABLOCK)

I wouldn't do that with 40 million rows, much less 400
million.

This is exactly why they created BCP.

Hmmm, good point - I was thinking that the recovery model
would
alleviate the problem of transaction logs, but thinking
now, that only
works for SELECT INTO ... and then you still have the
clustered index to
create.

So, just BCP the source table out sorted, and BULK INSERT
it into the
destination. DTS is kind of doing this on-the-fly, so
maybe it is the
better solution.

Either way, it's that 'order by' on the source table that
probably
contributed most to the 40 hours. An index would greatly
improve the
performance.

Since it sounded like it was a once-off activity, it's
all probably moot
now.

cheers,
Ross.
--
Ross McKay, WebAware Pty Ltd
"The lawn could stand another mowing; funny, I don't even
care"
- Elvis Costello
.


Reply With Quote
  #10  
Old   
Ross McKay
 
Posts: n/a

Default Re: proper way to dts 400 million record table? - 12-21-2003 , 02:47 AM



On Sat, 20 Dec 2003 23:45:48 -0800, "JJ Wang" wrote:

Quote:
in my situation, BCP is better or DTS packege is better in
terms of speed and tran-log growth management?
Both should be equivalent - the important thing with BCP and DTS (when
Fast Load option is selected) is that both use a bulk copy interface to
load data into SQL Server. BCP requires a file output followed by a file
input, whereas DTS works on the SQL cursor (i.e. no intermediate file).

However, I note in your other message that your transaction log is
growing fast. I don't have an answer to your actual question (I'm a
developer, not a DBA - but I suspect that you can't shrink the log
during a transaction; at least, doing so will have no effect). I doubt
that it is doing much in tempdb, BTW - so go ahead and shrink that if it
will help.

Generally, setting your recovery model to Simple or Bulk-Logged first is
the way to go when loading the amount of data you are handling. I'm
guessing that you currently have it set to Full.

I also notice that you have a non-clustered index on the destination
table. Drop it now, and add it back afterwards. It is slowing down your
load.

For some more information about using DTS, you should check out this
site:

http://www.sqldts.com/

Good luck, hope you are enjoying your weekend

cheers,
Ross.
--
Ross McKay, WebAware Pty Ltd
"The lawn could stand another mowing; funny, I don't even care"
- Elvis Costello


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.