dbTalk Databases Forums  

RE: Extremly long checkpoints: How to find the reason and solve the

comp.databases.informix comp.databases.informix


Discuss RE: Extremly long checkpoints: How to find the reason and solve the in the comp.databases.informix forum.



Reply
 
Thread Tools Search this Thread Display Modes
  #1  
Old   
Willem Roos
 
Posts: n/a

Default RE: Extremly long checkpoints: How to find the reason and solve the - 05-19-2004 , 10:32 AM








You'll sometimes see user threads in log-buffer-wait (onstat -u flags
column has a B) and onstat -l shows, uhmmm, i don't recall exactly what
it says - something about waiting in the header, but it boils down to
not being able to switch to the next log buffer (of the three (triple
buffering)) when the log buffer hasn't been replicated to the secondary
(or at least we've no reply yet from the secondary - check with onstat
-g ath you'll see the hdr threads run).

When the primary goes into a checkpoint it may wait (possibly very long)
for a session to leave critical section which it can't because it's
waiting on a log buffer.

Getting any useful stats out of the secondary sucks in general. Other
than that i quite like HDR.

Quote:
-----Original Message-----
From: owner-informix-list (AT) iiug (DOT) org
[mailtowner-informix-list (AT) iiug (DOT) org] On Behalf Of TBP
Sent: 19 May 2004 10:42
To: informix-list (AT) iiug (DOT) org
Subject: Re: Extremly long checkpoints: How to find the
reason and solve the


Francisco Roldan wrote:
Are you Replicating to other server ?
Some time ago I got very long checkpoints in an
Informix Server with High Availability Data Replication (HDR)
on the primary server.

I found out that the reason was an extremely complex query
executed in the secondary server (Stand By Server) for generating
a report (DSS Reports in OLTP System , not a good idea !! ).

Chekpoints for HDR Systems are always Synchronous .
It doesn't matter if you configure the system to be
Asynchronous (I don't
remember
the name of the parameter in the Onconfig File), the only
thing that really
gets Asynchronous are the transactions (No 2-Fase Commit Protocol),
the onconfig's parameter should be named TwoFaseCommit instead of
the name that I don't remember.
Primary Server Always wait an acknowledge message of the
other servers
for finishing its own checkpoint.

If you are not replicating ignore this message, I just
wanted to express
my frustrating experience with HDR.
Enterprise Replication (ER) would solve the problem.

Regards


snip ...

DRINTERVAL -1 (Synchronous) or DRINTERVAL > 0 Asynchronous

There appears to be some activity which the checkpoint is
dependent on
(not the checkpoint itself) which is synchronous; you can see
this when
the checkpoint completes on the primary but hasn't started /
completed
on the secondary when DRINTERVAL > 0. Something to do with
flushing the
physical log buffer on the secondary, and threads in critical section.

What was the "extremely complex query executed in the
secondary server
(Stand By Server) for generating a report (DSS Reports in
OLTP System ,
not a good idea !! )." (I thought that was a nice way to
split out DSS
from the OLTP primary by putting DSS on the secondary).

Did you log a Tech Support case??

Were there a lot of writes involved on the secondary to temp tables?

Disclaimer
http://www.shoprite.co.za/disclaimer.html

sending to informix-list


Reply With Quote
  #2  
Old   
Francisco Roldan
 
Posts: n/a

Default Re: Extremly long checkpoints: How to find the reason and solve the - 05-19-2004 , 10:43 AM







I don't have the DSS Query right now, as a matter a fact ,
I don't even work on the same company where I had that problem.
In that time, the decision was to migrate to Enterprise Replication
because of this problem and primary because of the replication granularity
flexibility
allowed by ER.

For the record :

From the Administration Guide for IDS 7.3 : (4354.pdf) :
- Chapter 25 "What is High Availability Data Replication":
(Page 578)
"Checkpoints Between Database Servers"
Checkpoint Between Database Servers in a High-Availability data-Replication
pair are synchronous, regardless of the value of DRINTERVAL . A checkpoint
on the primary database server completes only after it completes on the
secondary database server.

HDR Systems are not optimized for having the secondary server for
DSS . Sounds logic because HDR is oriented for High Availability Systems,
=> OLTP Systems. In that time the company wanted the primary server
for the Production System, and the Secondary Server for up to date Reports
..
I learned this by the hard way ....
I hope this message helps people who are evaluating HDR .


Anyway thanks for your help,

Regards


----- Original Message -----
From: "TBP" <TBP (AT) Nospam (DOT) Nothere.Co.Uk>
To: <informix-list (AT) iiug (DOT) org>
Sent: Wednesday, May 19, 2004 2:41 AM
Subject: Re: Extremly long checkpoints: How to find the reason and solve the


Quote:
Francisco Roldan wrote:
Are you Replicating to other server ?
Some time ago I got very long checkpoints in an
Informix Server with High Availability Data Replication (HDR)
on the primary server.

I found out that the reason was an extremely complex query
executed in the secondary server (Stand By Server) for generating
a report (DSS Reports in OLTP System , not a good idea !! ).

Chekpoints for HDR Systems are always Synchronous .
It doesn't matter if you configure the system to be Asynchronous (I
don't
remember
the name of the parameter in the Onconfig File), the only thing that
really
gets Asynchronous are the transactions (No 2-Fase Commit Protocol),
the onconfig's parameter should be named TwoFaseCommit instead of
the name that I don't remember.
Primary Server Always wait an acknowledge message of the other servers
for finishing its own checkpoint.

If you are not replicating ignore this message, I just wanted to express
my frustrating experience with HDR.
Enterprise Replication (ER) would solve the problem.

Regards


snip ...

DRINTERVAL -1 (Synchronous) or DRINTERVAL > 0 Asynchronous

There appears to be some activity which the checkpoint is dependent on
(not the checkpoint itself) which is synchronous; you can see this when
the checkpoint completes on the primary but hasn't started / completed
on the secondary when DRINTERVAL > 0. Something to do with flushing the
physical log buffer on the secondary, and threads in critical section.

What was the "extremely complex query executed in the secondary server
(Stand By Server) for generating a report (DSS Reports in OLTP System ,
not a good idea !! )." (I thought that was a nice way to split out DSS
from the OLTP primary by putting DSS on the secondary).

Did you log a Tech Support case??

Were there a lot of writes involved on the secondary to temp tables?


sending to informix-list


Reply With Quote
  #3  
Old   
Fernando Nunes
 
Posts: n/a

Default Re: Extremly long checkpoints: How to find the reason and solve the - 05-19-2004 , 08:40 PM



Francisco Roldan wrote:
Quote:
HDR Systems are not optimized for having the secondary server for
DSS . Sounds logic because HDR is oriented for High Availability Systems,
=> OLTP Systems. In that time the company wanted the primary server
for the Production System, and the Secondary Server for up to date Reports
.
I learned this by the hard way ....
I hope this message helps people who are evaluating HDR .

I've been working in an environment where customer uses secondary mainly
for data extraction (DSS queries and jobs and extraction for DW).

There were and still are problems regarding HDR which can cause the
situation you mentioned. Some are related to heavy load on the secondary
and one is (I think) caused by a (as of recently) known bug, related to
critical sections and checkpoint holding on secondary.

This situations MUST be considered as BUGS and not as "design
orientation". We currently have an open case, under investigation by IBM
technical support.

I'd also like to point one fact: Around 9.30.UC1 there was a version
which forced the same ONCONFIG parameters on primary and secondary for
things like BUFFERS, CPUVPs etc. This would inhibit a configuration
where the resources were different between primary and secondary. This
was eventually considered a bug and corrected. This clearly states that
you can use the secondary for different purposes then the primary.

The normal and desirable behaviour, in case secondary as too much work
is to stop replication and eventually recover. DRPINGTIMEOUT has also
this objective.
Of course that, if you intend your HDR environment to be mainly a
stand-by database, you won't want this to happen, but in that case you
shouldn't load the secondary too much...

Regards,

Fernando Nunes



Reply With Quote
  #4  
Old   
Francisco Roldan
 
Posts: n/a

Default Re: Extremly long checkpoints: How to find the reason and solve the - 05-20-2004 , 10:56 AM




I completely agree with you,
this has to be considered as a bug.

Checkpoints in a HDR system must be asynchronous/independents between
servers.

Please let us know the results of the opened case.

Regards


----- Original Message -----
From: "Fernando Nunes" <spam (AT) domus (DOT) online.pt>
To: <informix-list (AT) iiug (DOT) org>
Sent: Wednesday, May 19, 2004 6:40 PM
Subject: Re: Extremly long checkpoints: How to find the reason and solve the


Quote:
Francisco Roldan wrote:
HDR Systems are not optimized for having the secondary server for
DSS . Sounds logic because HDR is oriented for High Availability
Systems,
=> OLTP Systems. In that time the company wanted the primary server
for the Production System, and the Secondary Server for up to date
Reports
.
I learned this by the hard way ....
I hope this message helps people who are evaluating HDR .


I've been working in an environment where customer uses secondary mainly
for data extraction (DSS queries and jobs and extraction for DW).

There were and still are problems regarding HDR which can cause the
situation you mentioned. Some are related to heavy load on the secondary
and one is (I think) caused by a (as of recently) known bug, related to
critical sections and checkpoint holding on secondary.

This situations MUST be considered as BUGS and not as "design
orientation". We currently have an open case, under investigation by IBM
technical support.

I'd also like to point one fact: Around 9.30.UC1 there was a version
which forced the same ONCONFIG parameters on primary and secondary for
things like BUFFERS, CPUVPs etc. This would inhibit a configuration
where the resources were different between primary and secondary. This
was eventually considered a bug and corrected. This clearly states that
you can use the secondary for different purposes then the primary.

The normal and desirable behaviour, in case secondary as too much work
is to stop replication and eventually recover. DRPINGTIMEOUT has also
this objective.
Of course that, if you intend your HDR environment to be mainly a
stand-by database, you won't want this to happen, but in that case you
shouldn't load the secondary too much...

Regards,

Fernando Nunes



sending to informix-list


Reply With Quote
  #5  
Old   
Fernando Nunes
 
Posts: n/a

Default Re: Extremly long checkpoints: How to find the reason and solve the - 05-20-2004 , 11:40 AM



Francisco Roldan wrote:

Quote:
I completely agree with you,
this has to be considered as a bug.

Checkpoints in a HDR system must be asynchronous/independents between
servers.
Not quite. They also mark the consistency point between the two servers.
And the logging mode of the databases also influence the synchronism
between the two servers. Buffered vs unbuffered logging makes a difference.

All this is by design. We can argument that it should be different...

Regards.


Reply With Quote
Reply




Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.