dbTalk Databases Forums  

Query for only specific rows in a table? Need Help with Query

microsoft.public.sqlserver.mseq microsoft.public.sqlserver.mseq


Discuss Query for only specific rows in a table? Need Help with Query in the microsoft.public.sqlserver.mseq forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Gordon
 
Posts: n/a

Default Query for only specific rows in a table? Need Help with Query - 10-09-2007 , 10:39 AM






We have a stored procedure that removes old data from a very large (millions
of rows) table and puts it into an archive table. It grabs the top 5000 rows
and moves them.

B/C the database is in production and is heavily used, the job is run every
5 minutes
to avoid performance issues. When first started, the query ran quickly b/c
it would find 5000 rows fast. now that it has been running for a week it
takes longer for the query to execute and is locking the tables for too long.

Is there a way to query only a specific set of rows in the query? We could
store the last known row that was deleted in a temp table and start the query
over from that last known row. The problem is we don't know how to do this
with code in the stored procedure. Our code is below. Can anyone help us
with this or point us in the right direction to get this accomplished?


INSERT INTO gbdb_arch..tests_to_archive

select top 5000 p.test_id from gbdb..tests p (NOLOCK) LEFT OUTER JOIN
gbdb_arch..tests_to_archive a (NOLOCK) ON

p.test_id = a.test_id where var_id in (select var_id from gbdb..variables
(NOLOCK) where pu_id <> 0)

AND Result_On < DATEADD(year, -2, getdate()) AND a.test_id IS NULL

go

gbdb_arch..bow_ArchiveHistoricalTestData 5000

Reply With Quote
  #2  
Old   
Russell Fields
 
Posts: n/a

Default Re: Query for only specific rows in a table? Need Help with Query - 10-09-2007 , 03:41 PM






Gordon,

A few questions that could affect your performance.
1 - Do you have indexes to support the TOP 5000 select? If not, create an
index to support the select and delete.
2 - Are your statistics up-to-date? If not you should update those?
(sp_updatestats)
3 - Is your data becoming fragmented as you delete rows? If so you should
defragment the table?
(SS2000 & 2005 DBCC DBREINDEX, SS2005 ALTER INDEX)

Another observation is that your TOP 5000 (in the sample) does not have an
ORDER BY. Therefore, the 5000 rows being deleted are an undefined selection
from the qualifying rows. TOP makes much more sense with an ORDER BY.

How are you deleting the rows? If you are deleting by joining
gbdb_arch..tests_to_archive to gbdb..tests you join set it getting bigger
and bigger as you build up the archive table. For example
DELETE ts
FROM gbdb..tests ts JOIN gbdb_arch..tests_to_archive ar
ON ts.test_id = ar.test_id
If this is the problem, then you might need an index on tests_to_archive to
support the join.

RLF


"Gordon" <Gordon (AT) discussions (DOT) microsoft.com> wrote

Quote:
We have a stored procedure that removes old data from a very large
(millions
of rows) table and puts it into an archive table. It grabs the top 5000
rows
and moves them.

B/C the database is in production and is heavily used, the job is run
every
5 minutes
to avoid performance issues. When first started, the query ran quickly
b/c
it would find 5000 rows fast. now that it has been running for a week it
takes longer for the query to execute and is locking the tables for too
long.

Is there a way to query only a specific set of rows in the query? We
could
store the last known row that was deleted in a temp table and start the
query
over from that last known row. The problem is we don't know how to do
this
with code in the stored procedure. Our code is below. Can anyone help us
with this or point us in the right direction to get this accomplished?


INSERT INTO gbdb_arch..tests_to_archive

select top 5000 p.test_id from gbdb..tests p (NOLOCK) LEFT OUTER JOIN
gbdb_arch..tests_to_archive a (NOLOCK) ON

p.test_id = a.test_id where var_id in (select var_id from gbdb..variables
(NOLOCK) where pu_id <> 0)

AND Result_On < DATEADD(year, -2, getdate()) AND a.test_id IS NULL

go

gbdb_arch..bow_ArchiveHistoricalTestData 5000



Reply With Quote
  #3  
Old   
Hugo Kornelis
 
Posts: n/a

Default Re: Query for only specific rows in a table? Need Help with Query - 10-09-2007 , 03:58 PM



On Tue, 9 Oct 2007 08:39:00 -0700, Gordon wrote:

Quote:
We have a stored procedure that removes old data from a very large (millions
of rows) table and puts it into an archive table. It grabs the top 5000 rows
and moves them.

B/C the database is in production and is heavily used, the job is run every
5 minutes
to avoid performance issues. When first started, the query ran quickly b/c
it would find 5000 rows fast. now that it has been running for a week it
takes longer for the query to execute and is locking the tables for too long.

Is there a way to query only a specific set of rows in the query? We could
store the last known row that was deleted in a temp table and start the query
over from that last known row. The problem is we don't know how to do this
with code in the stored procedure. Our code is below. Can anyone help us
with this or point us in the right direction to get this accomplished?


INSERT INTO gbdb_arch..tests_to_archive

select top 5000 p.test_id from gbdb..tests p (NOLOCK) LEFT OUTER JOIN
gbdb_arch..tests_to_archive a (NOLOCK) ON

p.test_id = a.test_id where var_id in (select var_id from gbdb..variables
(NOLOCK) where pu_id <> 0)

AND Result_On < DATEADD(year, -2, getdate()) AND a.test_id IS NULL

go

gbdb_arch..bow_ArchiveHistoricalTestData 5000
Hi Gordon,

In addition to Russell's reply, some more points.

1. What version of SQL Server are you using? SQL 2005 has a new option
(the OUTPUT option) that you can leverage for a tremendous performance
boost.

2. Why are you using (NOLOCK). Are you aware of the risks of reading
dirty data, missing rows, or reading rows twice? Will you really risk
archiving dirty data for a performance gain?

3. I assume that the stored proc bow_ArchiveHistoricalTestData does the
actual delete. That means that copying to archive and purging the
original are not only in seperate transactions; they are even in
seperate batches. You run the risk that the insert succeeds, but the
delete fails - and you even run the risk that the insert fails and the
delete succeeds, causing you to lose data permanently!!

4. I agree with Russell that the real problem is probably in the
bow_ArchiveHistoricalTestData procedure. Can you please post that code?

--
Hugo Kornelis, SQL Server MVP
My SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.