dbTalk Databases Forums  

Transactional replication and the publisher’s transaction log

microsoft.public.sqlserver.replication microsoft.public.sqlserver.replication


Discuss Transactional replication and the publisher’s transaction log in the microsoft.public.sqlserver.replication forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Adrian Mos
 
Posts: n/a

Default Transactional replication and the publisher’s transaction log - 11-23-2004 , 01:05 PM






It’s about the transactional replication and the publisher’s transaction log.
The database that is being replicated is in ‘simple’ recovery model.
The problem is that once in a while the transaction log grows 3, 4 times
bigger than its normal size. Over the time it stabilized it self to 4 GB, but
it happens to automatically grow to 12-15 GB, and last time it got to 62 GB
when the replication failed and the database stopped to respond as well due
to no more space on the hard disk; the log reader agent was timing out. I
figure that was because it could not check the hole 62 GB file in 5 minutes
which is the default time out for gathering the default ReadBatchSize of 500
transactions (this doesn’t happen when it grows to ‘only’ 12 GB).
The next day I fixed the problem by killing all sessions on that database
and making some more room for the log file for the sessions to roll back.
Then I changed the ReadBatchSize to 20 and the time out to 50000. I noticed
that the log reader immediately returned the 957 transactions and 1914
commands outstanding in the transaction log file (the same number returned by
sp_repltrans when ran earlier) but that it took about 1 hour to get the ‘No
replicated transactions are available’ message.
Now, the questions:
1. After MinLSN got past the 957 transaction that where holding the
Checkpoint from truncating the log, why the log reader had to go trough all
transactions to the end of the log file (I think it did so since it took 1
hour to report ‘No replicated transactions are available’.) I thought that
since those where the only transactions marked for replication and the rest
of the transactions in the log file where either committed either rolled back
the Checkpoint will truncate most of the log file and the log reader will
jump and resume the checking from the new MinLSN. If so, it should have
finished in a few seconds. I should mention that because the ‘no more space’
issue SQL Server logged a message that it stopped the automatic Checkpoint,
however I manually ran the Checkpoint command several times after bringing
the database back online and before clearing the replication. That should
restart the automatic Checkpoint, isn’t it?
2. Maybe this should have been the first question: what could be the
reasons for the initial file growth (up to 12-15 GB)? There is no out of the
ordinary activity on the database at the time, no processes failing and
rolling back and no backup going on. Could this be also because of the log
reader, not being able to keep up with the number of the transactions logged,
and holding back the MinLSN? If it is, would the solution be to permanently
set the ReadBatchSize to a lower value (like 50 maybe?) and the QueryTimeout
to a higher value?

Reply With Quote
  #2  
Old   
Hilary Cotter
 
Posts: n/a

Default Re: Transactional replication and the publisher's transaction log - 11-23-2004 , 02:17 PM






That's a tough one.

First of all you should be running in Full Recovery Model and dumping your
tlogs as frequently as you can. Perhaps every minute or 5 minutes. Why?

Well pruning your log like this will improve overall database efficiency and
give the log reader agent less log to have to chew though.

When your tlogs explode like this it could be because you have open
transactions, run dbcc opentran to see what it says. Look for something
like:

Oldest active transaction:
SPID (server process ID) : 55

this says there is an orphaned transaction which is preventing your log from
recycling itself. Examine this spid and consider killing it.

Then after you dump your log try to shrink it, and watch how big it gets.
Then shrink it and allocate it in advance to this size. This minimizes the
number of vlfs and increases over all db and log reader performance.

Then when your log blows up like this drop your readbatch size to perhaps
50. This allows the log reader to grab smaller gulps of data, and instead of
having to chew through large logs getting 500 transactions, it chews through
less before committing these transactions to the distribution database and
marking them as read.

I hope this answers your questions, if not post back.

--
Hilary Cotter
Looking for a SQL Server replication book?
Now available for purchase at:
http://www.nwsu.com/0974973602.html

"Adrian Mos" <AdrianMos (AT) discussions (DOT) microsoft.com> wrote

Quote:
It's about the transactional replication and the publisher's transaction
log.
The database that is being replicated is in 'simple' recovery model.
The problem is that once in a while the transaction log grows 3, 4 times
bigger than its normal size. Over the time it stabilized it self to 4 GB,
but
it happens to automatically grow to 12-15 GB, and last time it got to 62
GB
when the replication failed and the database stopped to respond as well
due
to no more space on the hard disk; the log reader agent was timing out. I
figure that was because it could not check the hole 62 GB file in 5
minutes
which is the default time out for gathering the default ReadBatchSize of
500
transactions (this doesn't happen when it grows to 'only' 12 GB).
The next day I fixed the problem by killing all sessions on that database
and making some more room for the log file for the sessions to roll back.
Then I changed the ReadBatchSize to 20 and the time out to 50000. I
noticed
that the log reader immediately returned the 957 transactions and 1914
commands outstanding in the transaction log file (the same number returned
by
sp_repltrans when ran earlier) but that it took about 1 hour to get the
'No
replicated transactions are available' message.
Now, the questions:
1. After MinLSN got past the 957 transaction that where holding the
Checkpoint from truncating the log, why the log reader had to go trough
all
transactions to the end of the log file (I think it did so since it took 1
hour to report 'No replicated transactions are available'.) I thought that
since those where the only transactions marked for replication and the
rest
of the transactions in the log file where either committed either rolled
back
the Checkpoint will truncate most of the log file and the log reader will
jump and resume the checking from the new MinLSN. If so, it should have
finished in a few seconds. I should mention that because the 'no more
space'
issue SQL Server logged a message that it stopped the automatic
Checkpoint,
however I manually ran the Checkpoint command several times after bringing
the database back online and before clearing the replication. That should
restart the automatic Checkpoint, isn't it?
2. Maybe this should have been the first question: what could be
the
reasons for the initial file growth (up to 12-15 GB)? There is no out of
the
ordinary activity on the database at the time, no processes failing and
rolling back and no backup going on. Could this be also because of the log
reader, not being able to keep up with the number of the transactions
logged,
and holding back the MinLSN? If it is, would the solution be to
permanently
set the ReadBatchSize to a lower value (like 50 maybe?) and the
QueryTimeout
to a higher value?



Reply With Quote
  #3  
Old   
Adrian Mos
 
Posts: n/a

Default Re: Transactional replication and the publisher's transaction log - 11-23-2004 , 02:51 PM



Thanks for reply!
Simple recovery model is not my decision. The “business” is happy with a
possible 24 hours data lost. However, for the replication this is a good
thing since this keeps the active transaction log small and so the log reader
is faster.
I guess since I don’t keep the log I can’t never say what happened
initially. Usually this happens over night, and by the time I come to work
the log is clear (just a big, almost empty file) and I can shrink it back. So
this kind of answers my second question.
The only exception was the last time when it grew to 62GB and when it was
the un-replicated transactions that was holding back the MinLSN. Setting a
smaller value for the readbatch size did solve the problem. With my first
question I’m trying to find out how the log reader advances trough the log. I
know that the only transactions needed to be replicated where at the
beginning of the file, but I could not shrink the log until after an hour
when the log reader finished scanning the entire log (without finding any
other transaction for repl.) Why didn’t the Checkpoint just discard the log
since there where no more open or marked for replication transactions? Then
the log reader would not have to scan the entire 62GB file.
I think I’m slowly answering my self: Checkpoint needs the log reader to
advance in order to be shore that those transactions are not needed anymore.
If you have any insights about how log reader progresses trough the log and
how Checkpoint gets mixed into this cocktail please post them.
Thank you.


Reply With Quote
  #4  
Old   
Hilary Cotter
 
Posts: n/a

Default Re: Transactional replication and the publisher's transaction log - 11-24-2004 , 07:02 AM



My understanding is this. When the log reader starts up, it will read the
msrepl_transactions table and get the last LSN (xact_id IIRC) in this table.
It then starts to seek through the transaction log looking for this lsn.
When it find it, it starts to read all the transactions which are marked for
replication. It then breaks these transactions down to singletons.

So if a transaction affects 100 rows, the log reader agent write transaction
information to the msrepl_transactions table, and then writes 100 singletons
to the msrepl_commands table.

The log reader agent reads in batches defined in the ReadBatchSize. So if
ReaderBatchSize is set to 500, it reads through the tlog until it reads 500
transactions and then it writes them to the msrepl_transaction table (and
the msrepl_commands table). Then it marks these transactions as replicated
in the log.

Then when a checkpoint is issues the log can be chopped to remove all these
transactions.

If your log is huge, it means the log reader has to read through more log to
gather the 500 transactions and mark them as replicated in the log. Dropping
the ReadBatchSize (and upping QueryTimeOut) can help the LogReaderAgent read
the log. Getting as small a log as possible also helps.

Another factor is how your log is being used. You may have portions of your
log which are still open, and others which are closed, but no contiguous
closed sets so the log reader agent can recycle it. Use DBCC logInfo to see
this. When you have a status of 2, this means open, 0 means closed. You want
the 2 at the start. If you have lots of 0's at the end, these rows (and
vlfs) can be chopped by the SQL Server.

To fix this you will need to size your log accordingly, by using the full
recovery model, and adjusting your log to a representative size. You should
be running with 16 or so vlfs (the number of row returned in dbcc loginfo).
Running with several hundered or over a thousand means a badly sized tlog.
Autogrows causes multiple vlfs, and causes poor log maintenance.


--
Hilary Cotter
Looking for a SQL Server replication book?
Now available for purchase at:
http://www.nwsu.com/0974973602.html

"Adrian Mos" <AdrianMos (AT) discussions (DOT) microsoft.com> wrote

Quote:
Thanks for reply!
Simple recovery model is not my decision. The "business" is happy with a
possible 24 hours data lost. However, for the replication this is a good
thing since this keeps the active transaction log small and so the log
reader
is faster.
I guess since I don't keep the log I can't never say what happened
initially. Usually this happens over night, and by the time I come to work
the log is clear (just a big, almost empty file) and I can shrink it back.
So
this kind of answers my second question.
The only exception was the last time when it grew to 62GB and when it was
the un-replicated transactions that was holding back the MinLSN. Setting a
smaller value for the readbatch size did solve the problem. With my first
question I'm trying to find out how the log reader advances trough the
log. I
know that the only transactions needed to be replicated where at the
beginning of the file, but I could not shrink the log until after an hour
when the log reader finished scanning the entire log (without finding any
other transaction for repl.) Why didn't the Checkpoint just discard the
log
since there where no more open or marked for replication transactions?
Then
the log reader would not have to scan the entire 62GB file.
I think I'm slowly answering my self: Checkpoint needs the log reader to
advance in order to be shore that those transactions are not needed
anymore.
If you have any insights about how log reader progresses trough the log
and
how Checkpoint gets mixed into this cocktail please post them.
Thank you.




Reply With Quote
  #5  
Old   
Adrian Mos
 
Posts: n/a

Default Re: Transactional replication and the publisher's transaction log - 11-24-2004 , 08:45 AM



I thought that ReadBatchSize = 500 means 500 commands (rows) that the log
reader needs to read. But you say that it means 500 transactions with who
knows how man rows, it could be thousands or even hundreds of thousands of
rows. If this is so than I can’t really have a tight control over the amount
of data the log reader has to read in one shot. It depends on how big the
transactions are; I can keep asking the developers to keep the transactions
small, but I could never be sure. Looking at the replication statistics I can
see an average of 5000 commands per transaction for the log reader and an
average of 8000 commands per transaction for the distribution agent. Does
this mean that the log reader had to read 500x5000=2.5 millions rows in one
shot? Isn’t this a big number?
By the way, why is it that the statistics for the log reader and
distribution agents are never the same? For instance, when I had the
replication failure only the log reader agent failed, the distribution agent
was reporting that there are no more transactions to distribute. After I
finally managed to read the log file the log reader agent showed 20
transactions and 40 commands but the distribution agent showed 957
transactions (the same number that sp_repltrans returned) and 1914
commands!?!?
Thank you very much for the detailed description of the log reader
methodology. I know now why I always see at least one record for each
publication in the MSrepl_transactios table.

I have 114 vlfs for a 4 GB transaction log file. Do you think I should chop
it to, say, 2 MB to clear most of the vlfs and then increase the size
manually in one shot to 4 GB and then SQL will only create 10-20 big vlfs?


Reply With Quote
  #6  
Old   
Hilary Cotter
 
Posts: n/a

Default Re: Transactional replication and the publisher's transaction log - 11-24-2004 , 10:16 AM



Quoting from BOL, and BOL could be wrong

-ReadBatchSize number_of_transactions

Is the maximum number of transactions read out of the source. For the Log
Reader Agent, the source is the transaction log of the publishing database.
For the Distribution Agent, the source is the distribution database. The
default is 500.

This is a transaction update TableName set ColumnName=ColumnName. This
single transaction will generate one command for each row in this table.

The statistics for the log reader and agent differ, as they are per second
delivered. So transactions can get queued in the distribution database.

Transactions are converted into commands in batches of 27. Your statistics
look correct. I think you should look at your commands in the distribution
database to get an idea of what is pooling there and being distributed. Have
a look at the view MSdistribution_status.

Your system certainly is under some load.


--
Hilary Cotter
Looking for a SQL Server replication book?
Now available for purchase at:
http://www.nwsu.com/0974973602.html


"Adrian Mos" <AdrianMos (AT) discussions (DOT) microsoft.com> wrote

Quote:
I thought that ReadBatchSize = 500 means 500 commands (rows) that the log
reader needs to read. But you say that it means 500 transactions with who
knows how man rows, it could be thousands or even hundreds of thousands of
rows. If this is so than I can't really have a tight control over the
amount
of data the log reader has to read in one shot. It depends on how big the
transactions are; I can keep asking the developers to keep the
transactions
small, but I could never be sure. Looking at the replication statistics I
can
see an average of 5000 commands per transaction for the log reader and an
average of 8000 commands per transaction for the distribution agent. Does
this mean that the log reader had to read 500x5000=2.5 millions rows in
one
shot? Isn't this a big number?
By the way, why is it that the statistics for the log reader and
distribution agents are never the same? For instance, when I had the
replication failure only the log reader agent failed, the distribution
agent
was reporting that there are no more transactions to distribute. After I
finally managed to read the log file the log reader agent showed 20
transactions and 40 commands but the distribution agent showed 957
transactions (the same number that sp_repltrans returned) and 1914
commands!?!?
Thank you very much for the detailed description of the log reader
methodology. I know now why I always see at least one record for each
publication in the MSrepl_transactios table.

I have 114 vlfs for a 4 GB transaction log file. Do you think I should
chop
it to, say, 2 MB to clear most of the vlfs and then increase the size
manually in one shot to 4 GB and then SQL will only create 10-20 big vlfs?




Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.