dbTalk Databases Forums  

Failover Question

microsoft.public.sqlserver.clustering microsoft.public.sqlserver.clustering


Discuss Failover Question in the microsoft.public.sqlserver.clustering forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
 
Posts: n/a

Default Failover Question - 08-31-2004 , 12:29 PM







We recently set up an Active/passive SQL Server 2000
Windows 2003 Cluster for our customer. They ran their SQL
client application which was doing bulk inserts and some
reads (using an ADO stored procedure?).

When we induced a failure (pulled network cable), the
failover took about 90 seconds to happen, and the client
application came back with a failure after a while because
it was disconnected. The customer said this was not
acceptable (90 second down time). They wanted more like 1-
2 second failover.

We are now investigating an Active/Active Cluster where
the SQLCLUSTER1 instance manages the database on the
primary server in the cluster and the SQLCLUSTER2 instance
will manage a database that is a replica (using
transactional replication?) of the primary database on the
secondary server. If a failure happens during access to
the SQLCLUSTER1 instance and database, the application
will detect the failure, and reconnect instantly to the
SQLCLUSTER2 instance which will be managing a duplicate
copy of the database.

Does that sound like it will work? Using a stored ADO
procedure, how quickly before the call to do the SELECT
will return so the customer can connect to SQLCLUSTER2? In
High Availability applications, I would imagine 1-2
seconds of down time is not unreasonable?

I would "really" appreciate an answer on these questions.

Thanks.


Reply With Quote
  #2  
Old   
Geoff N. Hiten
 
Posts: n/a

Default Re: Failover Question - 08-31-2004 , 01:29 PM






Failover times can vary. 90 seconds is not unreasonable, especially if you
are using a system with lots of memory (12GB and up). 1-2 seconds is highly
unrealistic in every situation. One of the reasons that failover takes as
long as it does is that the cluster management software has a series of
tests that make sure the instances do not fail over unnecessarily. The
client application must be able to detect and handle a minute or two of
disconnect while the cluster handles a failover. No matter how long the
failover lasts the client will see a connection loss and will need to
reconnect.

As for replicating to a target system, be aware that replication will not
copy everything in a database, only data in specific tables. Only a few
special DDL commands will replicate, leaving your target system with an
older schema. How do you intend to reconcile your failed instance with the
data in the now-live instance? Now describe how to do it after a couple of
fast failover and failback incidents.

If the reads can be on completely different connections than the inserts,
you might look into a small NLB cluster where the read-only load can be
shared among several nodes. If a node fails, queries are re-routed to
another node. Have a master 'insert' node that replicates to the front-end
nodes and you are in business. You can even use Standard Edition nodes.
The licensing savings may even pay for the difference in hardware.

--
Geoff N. Hiten
Microsoft SQL Server MVP
Senior Database Administrator
Careerbuilder.com

I support the Professional Association for SQL Server
www.sqlpass.org

<anonymous (AT) discussions (DOT) microsoft.com> wrote

Quote:
We recently set up an Active/passive SQL Server 2000
Windows 2003 Cluster for our customer. They ran their SQL
client application which was doing bulk inserts and some
reads (using an ADO stored procedure?).

When we induced a failure (pulled network cable), the
failover took about 90 seconds to happen, and the client
application came back with a failure after a while because
it was disconnected. The customer said this was not
acceptable (90 second down time). They wanted more like 1-
2 second failover.

We are now investigating an Active/Active Cluster where
the SQLCLUSTER1 instance manages the database on the
primary server in the cluster and the SQLCLUSTER2 instance
will manage a database that is a replica (using
transactional replication?) of the primary database on the
secondary server. If a failure happens during access to
the SQLCLUSTER1 instance and database, the application
will detect the failure, and reconnect instantly to the
SQLCLUSTER2 instance which will be managing a duplicate
copy of the database.

Does that sound like it will work? Using a stored ADO
procedure, how quickly before the call to do the SELECT
will return so the customer can connect to SQLCLUSTER2? In
High Availability applications, I would imagine 1-2
seconds of down time is not unreasonable?

I would "really" appreciate an answer on these questions.

Thanks.




Reply With Quote
  #3  
Old   
John Toner [MVP]
 
Posts: n/a

Default Re: Failover Question - 08-31-2004 , 11:31 PM



Clusters are not fault tolerant systems, they're high availability
solutions. This means that you'll experience some downtime while the
application restarts on the failover node. 90 seconds of downtime is very
acceptible for a HA solution...1 - 2 seconds is very unreasonable.

If you're looking for better uptime, you'll need to look into a "fault
tolerant" solution...but even this will not protect you from an application
crash, though it will protect you from hardware failures. You can take a
look at the following solution to see if this meets your needs.

http://www.stratus.com/products/ftserver/index.htm

Regards,
John

<anonymous (AT) discussions (DOT) microsoft.com> wrote

Quote:
We recently set up an Active/passive SQL Server 2000
Windows 2003 Cluster for our customer. They ran their SQL
client application which was doing bulk inserts and some
reads (using an ADO stored procedure?).

When we induced a failure (pulled network cable), the
failover took about 90 seconds to happen, and the client
application came back with a failure after a while because
it was disconnected. The customer said this was not
acceptable (90 second down time). They wanted more like 1-
2 second failover.

We are now investigating an Active/Active Cluster where
the SQLCLUSTER1 instance manages the database on the
primary server in the cluster and the SQLCLUSTER2 instance
will manage a database that is a replica (using
transactional replication?) of the primary database on the
secondary server. If a failure happens during access to
the SQLCLUSTER1 instance and database, the application
will detect the failure, and reconnect instantly to the
SQLCLUSTER2 instance which will be managing a duplicate
copy of the database.

Does that sound like it will work? Using a stored ADO
procedure, how quickly before the call to do the SELECT
will return so the customer can connect to SQLCLUSTER2? In
High Availability applications, I would imagine 1-2
seconds of down time is not unreasonable?

I would "really" appreciate an answer on these questions.

Thanks.




Reply With Quote
  #4  
Old   
 
Posts: n/a

Default Re: Failover Question - 09-01-2004 , 12:00 PM




Thanks Geoff and John for your replies. They were both
great and just what I needed!

Quote:
-----Original Message-----
Clusters are not fault tolerant systems, they're high
availability
solutions. This means that you'll experience some
downtime while the
application restarts on the failover node. 90 seconds of
downtime is very
acceptible for a HA solution...1 - 2 seconds is very
unreasonable.

If you're looking for better uptime, you'll need to look
into a "fault
tolerant" solution...but even this will not protect you
from an application
crash, though it will protect you from hardware failures.
You can take a
look at the following solution to see if this meets your
needs.

http://www.stratus.com/products/ftserver/index.htm

Regards,
John

anonymous (AT) discussions (DOT) microsoft.com> wrote in message
news:394901c48f80$19cd7e60$a601280a (AT) phx (DOT) gbl...

We recently set up an Active/passive SQL Server 2000
Windows 2003 Cluster for our customer. They ran their
SQL
client application which was doing bulk inserts and some
reads (using an ADO stored procedure?).

When we induced a failure (pulled network cable), the
failover took about 90 seconds to happen, and the client
application came back with a failure after a while
because
it was disconnected. The customer said this was not
acceptable (90 second down time). They wanted more like
1-
2 second failover.

We are now investigating an Active/Active Cluster where
the SQLCLUSTER1 instance manages the database on the
primary server in the cluster and the SQLCLUSTER2
instance
will manage a database that is a replica (using
transactional replication?) of the primary database on
the
secondary server. If a failure happens during access to
the SQLCLUSTER1 instance and database, the application
will detect the failure, and reconnect instantly to the
SQLCLUSTER2 instance which will be managing a duplicate
copy of the database.

Does that sound like it will work? Using a stored ADO
procedure, how quickly before the call to do the SELECT
will return so the customer can connect to SQLCLUSTER2?
In
High Availability applications, I would imagine 1-2
seconds of down time is not unreasonable?

I would "really" appreciate an answer on these
questions.

Thanks.



.


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.