dbTalk Databases Forums  

IYHO: fastest way to copy a multi-million rows table?

comp.databases.sybase comp.databases.sybase


Discuss IYHO: fastest way to copy a multi-million rows table? in the comp.databases.sybase forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Tester
 
Posts: n/a

Default IYHO: fastest way to copy a multi-million rows table? - 11-10-2009 , 03:11 PM






Bypassing the disk.

Something like a batch select into dest_db..dest_table from
src_db..src_table or insert select.
Anything that does not involve excessive logging and storing bcp output on
disk.

Reply With Quote
  #2  
Old   
Leonid Gvirtz
 
Posts: n/a

Default Re: IYHO: fastest way to copy a multi-million rows table? - 11-11-2009 , 05:46 AM






On Nov 10, 10:11*pm, "Tester" <teste... (AT) tester (DOT) cc> wrote:
Quote:
Bypassing the disk.

Something like a batch select into dest_db..dest_table from
src_db..src_table or insert select.
Anything that does not involve excessive logging and storing bcp output on
disk.
Hi

Assuming that you need to move the data between different dataservers,
you can use CIS for this purpose. Create a proxy table in the
destination database that points to your source table in the source
dataserver. Then, you will be able to run SELECT INTO statement into a
new table in the destination dataserver from the proxy table with
minimal logging. You may want to tune some CIS-related parameters on
source and destination dataservers for better performance, such as
"cis packet size", "cis bulk insert array size", "cis cursor rows"
etc.

Hope it helps
Leonid Gvirtz
http://www.gvirtz-consulting.com

Reply With Quote
  #3  
Old   
Keith
 
Posts: n/a

Default Re: IYHO: fastest way to copy a multi-million rows table? - 11-12-2009 , 11:53 AM



On Nov 10, 2:11*pm, "Tester" <teste... (AT) tester (DOT) cc> wrote:
Quote:
Bypassing the disk.

Something like a batch select into dest_db..dest_table from
src_db..src_table or insert select.
Anything that does not involve excessive logging and storing bcp output on
disk.
Or if you can't bypass the disk completely and don't have the CIS
plugin, I've had good results on some O/S's by doing this:

1) Create a FIFO (named pipe).
2) Start up a bcp in from the FIFO.
3) Run a bcp out to the FIFO.

Keith

Reply With Quote
  #4  
Old   
joop
 
Posts: n/a

Default Re: IYHO: fastest way to copy a multi-million rows table? - 11-13-2009 , 03:05 PM



On 11/12/09 17:53 , Keith wrote:
Quote:
On Nov 10, 2:11 pm, "Tester"<teste... (AT) tester (DOT) cc> wrote:
Bypassing the disk.

Something like a batch select into dest_db..dest_table from
src_db..src_table or insert select.
Anything that does not involve excessive logging and storing bcp output on
disk.

Or if you can't bypass the disk completely and don't have the CIS
plugin, I've had good results on some O/S's by doing this:

1) Create a FIFO (named pipe).
2) Start up a bcp in from the FIFO.
3) Run a bcp out to the FIFO.

Keith
and if you go across macnines,

pipe the output from the prev post into ssh
and that into a fifo again
and a bcp out of that pipe

Reply With Quote
  #5  
Old   
Axel
 
Posts: n/a

Default Re: IYHO: fastest way to copy a multi-million rows table? - 11-17-2009 , 07:45 PM



SELECT INTO does look like the fastest method.
But in our case, for instance, we have a table with approx. 150M rows. While
SELECT INTO took roughly 3 hours, it's the subsequent build of clustered
index that times out.

Is there a way to boost that, maybe run something in parallel? Partitioning
this table is not an open option at this time, since we are prohibited to
change the model by the vendor.


"Leonid Gvirtz" <lgvirtz (AT) yahoo (DOT) com> wrote

On Nov 10, 10:11 pm, "Tester" <teste... (AT) tester (DOT) cc> wrote:
Quote:
Bypassing the disk.

Something like a batch select into dest_db..dest_table from
src_db..src_table or insert select.
Anything that does not involve excessive logging and storing bcp output on
disk.
Hi

Assuming that you need to move the data between different dataservers,
you can use CIS for this purpose. Create a proxy table in the
destination database that points to your source table in the source
dataserver. Then, you will be able to run SELECT INTO statement into a
new table in the destination dataserver from the proxy table with
minimal logging. You may want to tune some CIS-related parameters on
source and destination dataservers for better performance, such as
"cis packet size", "cis bulk insert array size", "cis cursor rows"
etc.

Hope it helps
Leonid Gvirtz
http://www.gvirtz-consulting.com

Reply With Quote
  #6  
Old   
Leonid Gvirtz
 
Posts: n/a

Default Re: IYHO: fastest way to copy a multi-million rows table? - 11-18-2009 , 03:00 PM



Hi Axel

Did you try WITH SORTED_DATA option for creation of your clustered
index? If not - then it may be the way. Add ORDER BY clause according
to the clustered index fields to your SELECT INTO statement. Then, you
will be able to skip the data copying during the clustered index
creation by using WITH SORTED_DATA option since the data in the
resulting table is already sorted. See this link for more details:
http://infocenter.sybase.com/help/to...868.htm#X24774

Hope it helps
Leonid Gvirtz
http://www.gvirtz-consulting.com

On Nov 18, 2:45*am, "Axel" <tes... (AT) tester (DOT) cc> wrote:
Quote:
SELECT INTO does look like the fastest method.
But in our case, for instance, we have a table with approx. 150M rows. While
SELECT INTO took roughly 3 hours, it's the subsequent build of clustered
index that times out.

Is there a way to boost that, maybe run something in parallel? Partitioning
this table is not an open option at this time, since we are prohibited to
change the model by the vendor.

"Leonid Gvirtz" <lgvi... (AT) yahoo (DOT) com> wrote in message

news:09a014e8-87d6-4421-98d2-e3c043ec4195 (AT) t2g2000yqn (DOT) googlegroups.com...
On Nov 10, 10:11 pm, "Tester" <teste... (AT) tester (DOT) cc> wrote:

Bypassing the disk.

Something like a batch select into dest_db..dest_table from
src_db..src_table or insert select.
Anything that does not involve excessive logging and storing bcp outputon
disk.

Hi

Assuming that you need to move the data between different dataservers,
you can use CIS for this purpose. Create a proxy table in the
destination database that points to your source table in the source
dataserver. Then, you will be able to run SELECT INTO statement into a
new table in the destination dataserver from the proxy table with
minimal logging. You may want to tune some CIS-related parameters on
source and destination dataservers for better performance, such as
"cis packet size", "cis bulk insert array size", "cis cursor rows"
etc.

Hope it helps
Leonid Gvirtzhttp://www.gvirtz-consulting.com

Reply With Quote
  #7  
Old   
Axel
 
Posts: n/a

Default Re: IYHO: fastest way to copy a multi-million rows table? - 11-18-2009 , 05:02 PM



That's what I was thinking exactly, however, this seem to completely fill
the logsegment in tempdb unless we make it the size of this enormous table
we got. This happens during the SELECT INTO FROM src_table ORDER BY step.


"Leonid Gvirtz" <lgvirtz (AT) yahoo (DOT) com> wrote

Hi Axel

Did you try WITH SORTED_DATA option for creation of your clustered
index? If not - then it may be the way. Add ORDER BY clause according
to the clustered index fields to your SELECT INTO statement. Then, you
will be able to skip the data copying during the clustered index
creation by using WITH SORTED_DATA option since the data in the
resulting table is already sorted. See this link for more details:
http://infocenter.sybase.com/help/to...868.htm#X24774

Hope it helps
Leonid Gvirtz
http://www.gvirtz-consulting.com

On Nov 18, 2:45 am, "Axel" <tes... (AT) tester (DOT) cc> wrote:
Quote:
SELECT INTO does look like the fastest method.
But in our case, for instance, we have a table with approx. 150M rows.
While
SELECT INTO took roughly 3 hours, it's the subsequent build of clustered
index that times out.

Is there a way to boost that, maybe run something in parallel?
Partitioning
this table is not an open option at this time, since we are prohibited to
change the model by the vendor.

"Leonid Gvirtz" <lgvi... (AT) yahoo (DOT) com> wrote in message

news:09a014e8-87d6-4421-98d2-e3c043ec4195 (AT) t2g2000yqn (DOT) googlegroups.com...
On Nov 10, 10:11 pm, "Tester" <teste... (AT) tester (DOT) cc> wrote:

Bypassing the disk.

Something like a batch select into dest_db..dest_table from
src_db..src_table or insert select.
Anything that does not involve excessive logging and storing bcp output
on
disk.

Hi

Assuming that you need to move the data between different dataservers,
you can use CIS for this purpose. Create a proxy table in the
destination database that points to your source table in the source
dataserver. Then, you will be able to run SELECT INTO statement into a
new table in the destination dataserver from the proxy table with
minimal logging. You may want to tune some CIS-related parameters on
source and destination dataservers for better performance, such as
"cis packet size", "cis bulk insert array size", "cis cursor rows"
etc.

Hope it helps
Leonid Gvirtzhttp://www.gvirtz-consulting.com

Reply With Quote
  #8  
Old   
Mark A. Parsons
 
Posts: n/a

Default Re: IYHO: fastest way to copy a multi-million rows table? - 11-18-2009 , 05:20 PM



If the source table has an index identical to the desired clustered index for the new table, consider having the
SELECT/INTO use said index (force it if necessary) from the source table.

Then use the 'with sorted_data' option when creating the clustered index on the new table.

Upside: should eliminate the overhead of sorting the table (either via select/into/order-by or create/index)

Downside: SELECT could take longer to run and require more physical/logical IOs, especially if having to make 150
million hops from index leaf pages to data pages

Obviously (?):

- if the source table is APL with a clustered index, then insuring the SELECT uses said clustered index should be
(relatively) quick *and* allow for the use of the 'with sorted_data' option

- use of parallel/worker threads for the SELECT could cause the data to be scrambled durint the INSERT step, thus
negating the ability to use the 'with sorted_data' option


Axel wrote:
Quote:
That's what I was thinking exactly, however, this seem to completely fill
the logsegment in tempdb unless we make it the size of this enormous table
we got. This happens during the SELECT INTO FROM src_table ORDER BY step.


"Leonid Gvirtz" <lgvirtz (AT) yahoo (DOT) com> wrote in message
news:670903cc-2e4d-47ae-85ff-8ed548a98113 (AT) d5g2000yqm (DOT) googlegroups.com...
Hi Axel

Did you try WITH SORTED_DATA option for creation of your clustered
index? If not - then it may be the way. Add ORDER BY clause according
to the clustered index fields to your SELECT INTO statement. Then, you
will be able to skip the data copying during the clustered index
creation by using WITH SORTED_DATA option since the data in the
resulting table is already sorted. See this link for more details:
http://infocenter.sybase.com/help/to...868.htm#X24774

Hope it helps
Leonid Gvirtz
http://www.gvirtz-consulting.com

On Nov 18, 2:45 am, "Axel" <tes... (AT) tester (DOT) cc> wrote:
SELECT INTO does look like the fastest method.
But in our case, for instance, we have a table with approx. 150M rows.
While
SELECT INTO took roughly 3 hours, it's the subsequent build of clustered
index that times out.

Is there a way to boost that, maybe run something in parallel?
Partitioning
this table is not an open option at this time, since we are prohibited to
change the model by the vendor.

"Leonid Gvirtz" <lgvi... (AT) yahoo (DOT) com> wrote in message

news:09a014e8-87d6-4421-98d2-e3c043ec4195 (AT) t2g2000yqn (DOT) googlegroups.com...
On Nov 10, 10:11 pm, "Tester" <teste... (AT) tester (DOT) cc> wrote:

Bypassing the disk.
Something like a batch select into dest_db..dest_table from
src_db..src_table or insert select.
Anything that does not involve excessive logging and storing bcp output
on
disk.
Hi

Assuming that you need to move the data between different dataservers,
you can use CIS for this purpose. Create a proxy table in the
destination database that points to your source table in the source
dataserver. Then, you will be able to run SELECT INTO statement into a
new table in the destination dataserver from the proxy table with
minimal logging. You may want to tune some CIS-related parameters on
source and destination dataservers for better performance, such as
"cis packet size", "cis bulk insert array size", "cis cursor rows"
etc.

Hope it helps
Leonid Gvirtzhttp://www.gvirtz-consulting.com


Reply With Quote
  #9  
Old   
Mark A. Parsons
 
Posts: n/a

Default Re: IYHO: fastest way to copy a multi-million rows table? - 11-18-2009 , 05:36 PM



Instead of:

if the source table is APL with a clustered index

Should have been:

if the source table is APL with the same desired clustered index


Mark A. Parsons wrote:
Quote:
If the source table has an index identical to the desired clustered
index for the new table, consider having the SELECT/INTO use said index
(force it if necessary) from the source table.

Then use the 'with sorted_data' option when creating the clustered index
on the new table.

Upside: should eliminate the overhead of sorting the table (either via
select/into/order-by or create/index)

Downside: SELECT could take longer to run and require more
physical/logical IOs, especially if having to make 150 million hops from
index leaf pages to data pages

Obviously (?):

- if the source table is APL with a clustered index, then insuring the
SELECT uses said clustered index should be (relatively) quick *and*
allow for the use of the 'with sorted_data' option

- use of parallel/worker threads for the SELECT could cause the data to
be scrambled durint the INSERT step, thus negating the ability to use
the 'with sorted_data' option


Axel wrote:
That's what I was thinking exactly, however, this seem to completely
fill the logsegment in tempdb unless we make it the size of this
enormous table we got. This happens during the SELECT INTO FROM
src_table ORDER BY step.


"Leonid Gvirtz" <lgvirtz (AT) yahoo (DOT) com> wrote in message
news:670903cc-2e4d-47ae-85ff-8ed548a98113 (AT) d5g2000yqm (DOT) googlegroups.com...
Hi Axel

Did you try WITH SORTED_DATA option for creation of your clustered
index? If not - then it may be the way. Add ORDER BY clause according
to the clustered index fields to your SELECT INTO statement. Then, you
will be able to skip the data copying during the clustered index
creation by using WITH SORTED_DATA option since the data in the
resulting table is already sorted. See this link for more details:
http://infocenter.sybase.com/help/to...868.htm#X24774


Hope it helps
Leonid Gvirtz
http://www.gvirtz-consulting.com

On Nov 18, 2:45 am, "Axel" <tes... (AT) tester (DOT) cc> wrote:
SELECT INTO does look like the fastest method.
But in our case, for instance, we have a table with approx. 150M
rows. While
SELECT INTO took roughly 3 hours, it's the subsequent build of clustered
index that times out.

Is there a way to boost that, maybe run something in parallel?
Partitioning
this table is not an open option at this time, since we are
prohibited to
change the model by the vendor.

"Leonid Gvirtz" <lgvi... (AT) yahoo (DOT) com> wrote in message

news:09a014e8-87d6-4421-98d2-e3c043ec4195 (AT) t2g2000yqn (DOT) googlegroups.com...
On Nov 10, 10:11 pm, "Tester" <teste... (AT) tester (DOT) cc> wrote:

Bypassing the disk.
Something like a batch select into dest_db..dest_table from
src_db..src_table or insert select.
Anything that does not involve excessive logging and storing bcp
output on
disk.
Hi

Assuming that you need to move the data between different dataservers,
you can use CIS for this purpose. Create a proxy table in the
destination database that points to your source table in the source
dataserver. Then, you will be able to run SELECT INTO statement into a
new table in the destination dataserver from the proxy table with
minimal logging. You may want to tune some CIS-related parameters on
source and destination dataservers for better performance, such as
"cis packet size", "cis bulk insert array size", "cis cursor rows"
etc.

Hope it helps
Leonid Gvirtzhttp://www.gvirtz-consulting.com


Reply With Quote
  #10  
Old   
Leonid Gvirtz
 
Posts: n/a

Default Re: IYHO: fastest way to copy a multi-million rows table? - 11-19-2009 , 06:51 AM



Hi

You can use CIS trace flags (11201-11207, server-wide) in order to see
the actual query text which is sent to the source dataserver. Then,
you can check the execution plan on the source dataserver, watch for
Worktable(s) creation. I suspect that Worktables used by the query
fills up your tempdb on the source dataserver. If you have the same
clustered index on the source table then ORDER BY by index fields
should not cause Worktable creation.

Hope it helps
Leonid Gvirtz
http://www.gvirtz-consulting.com


On Nov 19, 12:36*am, "Mark A. Parsons"
<iron_horse (AT) no_spamola (DOT) compuserve.com> wrote:
Quote:
Instead of:

* * *if the source table is APL with a clustered index

Should have been:

* * *if the source table is APL with the same desired clustered index

Mark A. Parsons wrote:
If the source table has an index identical to the desired clustered
index for the new table, consider having the SELECT/INTO use said index
(force it if necessary) from the source table.

Then use the 'with sorted_data' option when creating the clustered index
on the new table.

Upside: *should eliminate the overhead of sorting the table (either via
select/into/order-by or create/index)

Downside: *SELECT could take longer to run and require more
physical/logical IOs, especially if having to make 150 million hops from
index leaf pages to data pages

Obviously (?):

- if the source table is APL with a clustered index, then insuring the
SELECT uses said clustered index should be (relatively) quick *and*
allow for the use of the 'with sorted_data' option

- use of parallel/worker threads for the SELECT could cause the data to
be scrambled durint the INSERT step, thus negating the ability to use
the 'with sorted_data' option

Axel wrote:
That's what I was thinking exactly, however, this seem to completely
fill the logsegment in tempdb unless we make it the size of this
enormous table we got. This happens during the SELECT INTO FROM
src_table ORDER BY step.

"Leonid Gvirtz" <lgvi... (AT) yahoo (DOT) com> wrote in message
news:670903cc-2e4d-47ae-85ff-8ed548a98113 (AT) d5g2000yqm (DOT) googlegroups.com....
Hi Axel

Did you try WITH SORTED_DATA option for creation of your clustered
index? If not - then it may be the way. Add ORDER BY clause according
to the clustered index fields to your SELECT INTO statement. Then, you
will be able to skip the data copying during the clustered index
creation by using WITH SORTED_DATA option since the data in the
resulting table is already sorted. See this link for more details:
http://infocenter.sybase.com/help/to...center.dc36272....

Hope it helps
Leonid Gvirtz
http://www.gvirtz-consulting.com

On Nov 18, 2:45 am, "Axel" <tes... (AT) tester (DOT) cc> wrote:
SELECT INTO does look like the fastest method.
But in our case, for instance, we have a table with approx. 150M
rows. While
SELECT INTO took roughly 3 hours, it's the subsequent build of clustered
index that times out.

Is there a way to boost that, maybe run something in parallel?
Partitioning
this table is not an open option at this time, since we are
prohibited to
change the model by the vendor.

"Leonid Gvirtz" <lgvi... (AT) yahoo (DOT) com> wrote in message

news:09a014e8-87d6-4421-98d2-e3c043ec4195 (AT) t2g2000yqn (DOT) googlegroups.com....
On Nov 10, 10:11 pm, "Tester" <teste... (AT) tester (DOT) cc> wrote:

Bypassing the disk.
Something like a batch select into dest_db..dest_table from
src_db..src_table or insert select.
Anything that does not involve excessive logging and storing bcp
output on
disk.
Hi

Assuming that you need to move the data between different dataservers,
you can use CIS for this purpose. Create a proxy table in the
destination database that points to your source table in the source
dataserver. Then, you will be able to run SELECT INTO statement into a
new table in the destination dataserver from the proxy table with
minimal logging. You may want to tune some CIS-related parameters on
source and destination dataservers for better performance, such as
"cis packet size", "cis bulk insert array size", "cis cursor rows"
etc.

Hope it helps
Leonid Gvirtzhttp://www.gvirtz-consulting.com


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.