dbTalk Databases Forums  

Transform/transfer 50Gb - how to do it fast?

comp.databases.ms-sqlserver comp.databases.ms-sqlserver


Discuss Transform/transfer 50Gb - how to do it fast? in the comp.databases.ms-sqlserver forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
B D Jensen
 
Posts: n/a

Default Transform/transfer 50Gb - how to do it fast? - 04-28-2007 , 12:54 AM






Hello!
I have an big table with 50Gb data, written som functions
that cleanup data and want to do something like this

insert into newtable
select id, func1(col1), func2(col2)
from oldtable;

I'm also plan to make newtable partioned (before insert).

But how could i get the insert as fast as possible?
Greetings
Bjorn D. Jensen


Reply With Quote
  #2  
Old   
Mork69
 
Posts: n/a

Default Re: Transform/transfer 50Gb - how to do it fast? - 04-28-2007 , 03:53 AM






Hi,

The quickest way to do this is to use INSERT INTO...SELECT FROM as it
is a non-logged operation

regards,

Malc

B D Jensen wrote:
Quote:
Hello!
I have an big table with 50Gb data, written som functions
that cleanup data and want to do something like this

insert into newtable
select id, func1(col1), func2(col2)
from oldtable;

I'm also plan to make newtable partioned (before insert).

But how could i get the insert as fast as possible?
Greetings
Bjorn D. Jensen


Reply With Quote
  #3  
Old   
Erland Sommarskog
 
Posts: n/a

Default Re: Transform/transfer 50Gb - how to do it fast? - 04-28-2007 , 04:04 AM



B D Jensen (bjorn.d.jensen (AT) gmail (DOT) com) writes:
Quote:
I have an big table with 50Gb data, written som functions
that cleanup data and want to do something like this

insert into newtable
select id, func1(col1), func2(col2)
from oldtable;

I'm also plan to make newtable partioned (before insert).

But how could i get the insert as fast as possible?
Exactly what is in those functions? Do they perform data access? Are
they written in T-SQL or in the CLR? I ask, because they could have a
great impact on performance.

Apart from that, there are a couple of possible strategies for this
situation. One is SELECT INTO, but since you plan to make the new
table partitioned, I don't think SELECT INTO is good for this. (SELECT
INTO creates a new table.)

Another is to use BCP to first unload the table to a file. You would
then use queryout or a view with your functions, so what get on file
is the cleaned-up version. Then you use BCP to load the data into the
new table. The key here is that there should be no indexes on the table
and it should be empty. In this case the bulk-load is minimally logged.
Of course, you also need to account for the time it takes to create
the indexes.

And the final option is to use a plain INSERT. But a single INSERT
statement will not be good for your transaction log. It's better to
batch and insert, say, 100000 rows at a time, preferrably with the database
set to simple recovery. You should batch on the clustered index of
the old table:

SELECT @start = 1
WHILE EXISTS (SELECT * FROM oldtable WHERE clustercol >= @start)
BEGIN
INSERT newtable (...)
SELECT ...
FROM oldtable
WHERE clustercol >= @start AND clustercol < @start - 100000
SELECT @start = @start + 100000
END

Here the actual increment would depend on the nature of your clustered
key. If it's a date, maybe taking one month at a time is a good idea.

If new the table will have the same clustered index as the old table,
have the clustered index in place when run the above, but wait with
adding non-clustered indexes until you have the data in place.

--
Erland Sommarskog, SQL Server MVP, esquel (AT) sommarskog (DOT) se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx


Reply With Quote
  #4  
Old   
Erland Sommarskog
 
Posts: n/a

Default Re: Transform/transfer 50Gb - how to do it fast? - 04-28-2007 , 10:55 AM



Mork69 (mleach (AT) bigfoot (DOT) com) writes:
Quote:
The quickest way to do this is to use INSERT INTO...SELECT FROM as it
is a non-logged operation
This is not correct. INSERT SELECT FROM is a fully-logged operation. You are
thinking of SELECT INTO which is a minimally logged operation. That is,
all that is logged are the extent allocations. There are no write operations
in SQL Server that are entirely non-logged.


--
Erland Sommarskog, SQL Server MVP, esquel (AT) sommarskog (DOT) se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx


Reply With Quote
  #5  
Old   
B D Jensen
 
Posts: n/a

Default Re: Transform/transfer 50Gb - how to do it fast? - 04-28-2007 , 11:01 PM



Quote:
Exactly what is in those functions? Do they perform data access? Are
they written in T-SQL or in the CLR? I ask, because they could have a
great impact on performance.

Hello Erland!
Functions are written in T-SQL (i also wrote them in CLR, but in this
case they were slower). The original columns have incorrect datatypes,
that uses too much storage,
so the functions check that values are in correct domain and if not
they return null
- what is a correct result, because the values then are physical
impossible.

I wondered why you only wrote that I can't use "select into" for
patitioned tables.
I also expected I must create table first (and then not being able to
use 'select into')
because of the new datatypes - but of course I could write:
select id, cast(func1(col1) as <datatype>) into newtbl from oldtbl

Doing this multiple times with the appropiate where-clause
followed by partion switches maybe will be the solution; I'll
investigate...

.... but if you have some comments let me hear ;-)


Greetings
Bjorn D. Jensen
bjorn.d.jensen (AT) gmail (DOT) com


P.S. I already know your website, thanks for all that good
information!



Reply With Quote
  #6  
Old   
Erland Sommarskog
 
Posts: n/a

Default Re: Transform/transfer 50Gb - how to do it fast? - 04-29-2007 , 03:52 AM



B D Jensen (bjorn.d.jensen (AT) gmail (DOT) com) writes:
Quote:
Functions are written in T-SQL (i also wrote them in CLR, but in this
case they were slower). The original columns have incorrect datatypes,
that uses too much storage, so the functions check that values are in
correct domain and if not they return null - what is a correct result,
because the values then are physical impossible.
I would recommend that you have the expressions inline, at least if
you desire to cut down execution time.

Quote:
I wondered why you only wrote that I can't use "select into" for
patitioned tables.
I assumed that it is not possible to create a partitioned tables from
existing ones. But I have not worked much with partitioned tables, so
I could be wrong.


--
Erland Sommarskog, SQL Server MVP, esquel (AT) sommarskog (DOT) se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx


Reply With Quote
  #7  
Old   
B D Jensen
 
Posts: n/a

Default Re: Transform/transfer 50Gb - how to do it fast? - 04-29-2007 , 07:20 AM



Hi again!
As I see the cast is not needed, because the functions return correct
datatype.
Is it that what you mean with "inline"??

I think there is another problem: the original table is not in the
right filegroup
and if I understand it right, It must be or must be to make switch
ultra fast;
I'll investigate...
Greetings
Bjorn


Reply With Quote
  #8  
Old   
Erland Sommarskog
 
Posts: n/a

Default Re: Transform/transfer 50Gb - how to do it fast? - 04-29-2007 , 10:43 AM



B D Jensen (bjorn.d.jensen (AT) gmail (DOT) com) writes:
Quote:
As I see the cast is not needed, because the functions return correct
datatype.
Is it that what you mean with "inline"??
I don't know what your functions do, but it seemed from your description
that I could expect something like:

CREATE FUNCTION makesmaller(@x bigint) RETURNS tinyint AS
BEGIN
RETURN (CASE WHEN @x BETWEEN 0 AND 255 THEN @x ELSE NULL END)
END

Then in your INSERT operation you would rather write:

SELECT CASE WHEN bigcol BETWEEN 0 AND 255 THEN bigcol ELSE NULL END,
...

then

SELECT dbo.makesmaller(bigcol), ...

there is an overhead for the call, although it seems to be a lot less
in SQL 2005 than in SQL 2000.

Quote:
I think there is another problem: the original table is not in the right
filegroup and if I understand it right, It must be or must be to make
switch ultra fast;
Yes, but as you are about to ditch the original table that is not much
of an issue, or?


--
Erland Sommarskog, SQL Server MVP, esquel (AT) sommarskog (DOT) se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx


Reply With Quote
  #9  
Old   
B D Jensen
 
Posts: n/a

Default Re: Transform/transfer 50Gb - how to do it fast? - 04-29-2007 , 11:26 AM



Hi!
You nearly guessed on of my functions;
but I use <= and >= in stead of 'between'.

I didn't understand the last part about "ditch" (what means that?).
Will the use of functions make the select into very slow?

Greetings
Bjorn


Reply With Quote
  #10  
Old   
Erland Sommarskog
 
Posts: n/a

Default Re: Transform/transfer 50Gb - how to do it fast? - 04-29-2007 , 11:44 AM



B D Jensen (bjorn.d.jensen (AT) gmail (DOT) com) writes:
Quote:
I didn't understand the last part about "ditch" (what means that?).
To ditch = slänga, kasta, göra sig av med.

Quote:
Will the use of functions make the select into very slow?
Slower. I cannot say how much slower, but I would never use functions for
this situation. Since this appears to be a one-off, code maintainability
does not seem to be important.


--
Erland Sommarskog, SQL Server MVP, esquel (AT) sommarskog (DOT) se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.