dbTalk Databases Forums  

Time of failover of Microsoft SQL 2000

microsoft.public.sqlserver.clustering microsoft.public.sqlserver.clustering


Discuss Time of failover of Microsoft SQL 2000 in the microsoft.public.sqlserver.clustering forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Patrice
 
Posts: n/a

Default Time of failover of Microsoft SQL 2000 - 02-09-2005 , 03:28 AM






Dear All,

I have recently setup an active-active Microsoft SQL 2000 cluster on
Windows 2003, and I am starting to wonder how such a setup can reach
the target of 99.99% availability, i.e. 52 minutes and 33 seconds
downtime per year! I know that I should have made some research before,
but I was blinded by all the nice high-availability-white-papers...

Here are the issues:
(1) It takes between 1 and 3 min to make a move group of either SQL or
DTC resources.
(2) To make a HW/SW maintenance on the two nodes, we need to perform 6
moves for a total of 6 min downtime minimum, 18 min downtime maximum.
(3) Indeed, I have more, but let's start with only those two.

Then, 52 minutes / 6 minutes = 8 maintenances/year maximum, 52 minutes
/ 18 minutes = 2 maintenances/year minimum. In my opinion, it clearly
shows that 99.99% with MS SQL Cluster is NOT possible, until I have
missed something ...

Thanks in advance for your comments and best regards,

Patrice


Reply With Quote
  #2  
Old   
Mike Epprecht (SQL MVP)
 
Posts: n/a

Default RE: Time of failover of Microsoft SQL 2000 - 02-09-2005 , 08:47 AM






Hi

Have a look at
http://www.microsoft.com/sql/techinf...ailability.asp

On our clusters, we have failover times on 10-32 seconds. Most of them
average 12 minutes downtime per year. Banking environment, with procedures in
place to ensure this.

Regards
Mike


"Patrice" wrote:

Quote:
Dear All,

I have recently setup an active-active Microsoft SQL 2000 cluster on
Windows 2003, and I am starting to wonder how such a setup can reach
the target of 99.99% availability, i.e. 52 minutes and 33 seconds
downtime per year! I know that I should have made some research before,
but I was blinded by all the nice high-availability-white-papers...

Here are the issues:
(1) It takes between 1 and 3 min to make a move group of either SQL or
DTC resources.
(2) To make a HW/SW maintenance on the two nodes, we need to perform 6
moves for a total of 6 min downtime minimum, 18 min downtime maximum.
(3) Indeed, I have more, but let's start with only those two.

Then, 52 minutes / 6 minutes = 8 maintenances/year maximum, 52 minutes
/ 18 minutes = 2 maintenances/year minimum. In my opinion, it clearly
shows that 99.99% with MS SQL Cluster is NOT possible, until I have
missed something ...

Thanks in advance for your comments and best regards,

Patrice



Reply With Quote
  #3  
Old   
Joe Yong
 
Posts: n/a

Default Re: Time of failover of Microsoft SQL 2000 - 02-18-2005 , 04:23 PM



Inline.


joe.

"Patrice" <krakowpat (AT) yahoo (DOT) com> wrote

Quote:
Dear All,

I have recently setup an active-active Microsoft SQL 2000 cluster on
Windows 2003, and I am starting to wonder how such a setup can reach
the target of 99.99% availability, i.e. 52 minutes and 33 seconds
downtime per year! I know that I should have made some research before,
but I was blinded by all the nice high-availability-white-papers...

Here are the issues:
(1) It takes between 1 and 3 min to make a move group of either SQL or
DTC resources.
It usually takes under 10seconds for failover at the OS/MSCS level. This has
been tested with various hardware vendors including some that aren't even on
the Failover Clustering HCL. SQL Server failover time is dependent on a
number of things but mostly on how much time it takes to startup and
complete roll forward/back operations. Tweaking the checkpoint interval can
help as can having short transactions. Huge caches in the SAN can impact
recovery time also. You first need to figure out where the bulk of the 1-3
minutes is being spent in order to determine what you might need to do at
each end.

Quote:
(2) To make a HW/SW maintenance on the two nodes, we need to perform 6
moves for a total of 6 min downtime minimum, 18 min downtime maximum.
Not sure what you mean by this. Why would you need 6 moves? Take down one
side and do your maintenance then the other and then back to the original if
you really want to (not necessary/recommended unless you have multiple
instances). Please elaborate.

Quote:
(3) Indeed, I have more, but let's start with only those two.

Then, 52 minutes / 6 minutes = 8 maintenances/year maximum, 52 minutes
/ 18 minutes = 2 maintenances/year minimum. In my opinion, it clearly
shows that 99.99% with MS SQL Cluster is NOT possible, until I have
missed something ...
99.99% availability is exceedingly difficult with hardware alone. Processes
and management play massive roles in a highly available system. Getting to
99.9% is fairly easy with just technology. Beyond that you're really looking
at people & processes which may include design changes.

At the end of the day, your business requirements should set your
availability goals. Every point you gain after 99.9 generally results in a
bigger increase in cost & complexity/effort. What do you really need?

Quote:
Thanks in advance for your comments and best regards,

Patrice




Reply With Quote
  #4  
Old   
Patrice
 
Posts: n/a

Default Re: Time of failover of Microsoft SQL 2000 - 02-25-2005 , 09:08 AM



Hello Joe,

After the 10-32 seconds for Mike, you come with < 10 seconds. We should
do something very wrong to end with ~ 2 min on a not loaded SQL
cluster.

But, I also think that we should at this stage be sure that we are
measuring the failover duration the same way ;-) Indeed, last WE, we
had to install some MS patches, and we have made two complete
failovers. Here are the details of the 2nd failover (the fastest one),
which took 2 min 9 sec:

- NODE01 17:25:30 The Cluster Service is attempting to offline the
Resource Group "Cluster Group".
- NODE01 17:25:30 The Cluster Service brought the Resource Group
"Cluster Group" offline.
- NODE02 17:25:55 The Cluster Service is attempting to bring online
the Resource Group "Cluster Group".
- NODE02 17:25:59 The Cluster Service brought the Resource Group
"Cluster Group" online.
- NODE01 17:26:11 The Cluster Service is attempting to offline the
Resource Group "MSDTC".
- NODE01 17:26:12 The Cluster Service brought the Resource Group
"MSDTC" offline.
- NODE01 17:26:32 The Cluster Service is attempting to offline the
Resource Group "SQL01".
- NODE02 17:26:34 The Cluster Service is attempting to bring online
the Resource Group "MSDTC".
- NODE01 17:26:39 The Cluster Service brought the Resource Group
"SQL01" offline.
- NODE02 17:26:47 The Cluster Service brought the Resource Group
"MSDTC" online.
- NODE01 17:27:00 The Cluster Service is attempting to offline the
Resource Group "SQL02".
- NODE02 17:27:00 The Cluster Service is attempting to bring online
the Resource Group "SQL01".
- NODE01 17:27:08 The Cluster Service brought the Resource Group
"SQL02" offline.
- NODE02 17:27:11 The Cluster Service brought the Resource Group
"SQL01" online.
- NODE02 17:27:28 The Cluster Service is attempting to bring online
the Resource Group "SQL02".
- NODE02 17:27:39 The Cluster Service brought the Resource Group
"SQL02" online.
[00:02:09]

Indeed, I must admit that the administrator has moved the four groups
one by one, which is probably not the most efficient way, any advices
on this topic is welcome! But, if we take the four moves independently,
we can see that we still have durations that are > 10 sec:

Move of the "Cluster Group" group:
- NODE01 17:25:30 The Cluster Service is attempting to offline the
Resource Group "Cluster Group".
- NODE01 17:25:30 The Cluster Service brought the Resource Group
"Cluster Group" offline.
- NODE02 17:25:55 The Cluster Service is attempting to bring online
the Resource Group "Cluster Group".
- NODE02 17:25:59 The Cluster Service brought the Resource Group
"Cluster Group" online.
[00:00:29]

Move of the "MSDTC" group:
- NODE01 17:26:11 The Cluster Service is attempting to offline the
Resource Group "MSDTC".
- NODE01 17:26:12 The Cluster Service brought the Resource Group
"MSDTC" offline.
- NODE02 17:26:34 The Cluster Service is attempting to bring online
the Resource Group "MSDTC".
- NODE02 17:26:47 The Cluster Service brought the Resource Group
"MSDTC" online.
[00:00:36]

Move of the "SQL01" group:
- NODE01 17:26:32 The Cluster Service is attempting to offline the
Resource Group "SQL01".
- NODE01 17:26:39 The Cluster Service brought the Resource Group
"SQL01" offline.
- NODE02 17:27:00 The Cluster Service is attempting to bring online
the Resource Group "SQL01".
- NODE02 17:27:11 The Cluster Service brought the Resource Group
"SQL01" online.
[00:00:39]

Move of the "SQL02" group:
- NODE01 17:27:00 The Cluster Service is attempting to offline the
Resource Group "SQL02".
- NODE01 17:27:08 The Cluster Service brought the Resource Group
"SQL02" offline.
- NODE02 17:27:28 The Cluster Service is attempting to bring online
the Resource Group "SQL02".
- NODE02 17:27:39 The Cluster Service brought the Resource Group
"SQL02" online.
[00:00:39]

By the way, you can here see why I was talking about 6 moves. You can
see here 4 moves, which should be completed by 2 last moves in order to
equally distribute the groups between the two servers. We did not
perform the 2 last moves because of our lack of confidence with the
NODE01 server, which has crashed 2 times since the beginning of the
year :-(

In summary, I am looking for:
(1) Advices on the most efficient way to move all the groups of a
cluster;
(2) Similar Event Log analysis

Finally, I need to emphasize that I have no clue about the SQL "load"
during the failover, I guess it would very interesting to have a graph
of duration of failover versus load :-))

Many thanks in advance and best regards,

Patrice


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.