dbTalk Databases Forums  

Cluster fails

microsoft.public.sqlserver.clustering microsoft.public.sqlserver.clustering


Discuss Cluster fails in the microsoft.public.sqlserver.clustering forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
JonathanS
 
Posts: n/a

Default Cluster fails - 02-23-2004 , 05:58 AM






Hi

I have SQL2K SP3 on W2k SP4 in an active passive cluster. Recently I
have had problems with the cluster either failing over or attempting
to and then hanging in an inaccessible state. The following errors
appear in the W2K Application log...

[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed

[sqsrvres] printODBCError: sqlstate = HYT00; native error = 0; message
= [Microsoft][ODBC SQL Server Driver]Timeout expired

[sqsrvres] OnlineThread: QP is not online.

The above are repeated for a while and then...

[sqsrvres] ODBC sqldriverconnect failed

[sqsrvres] checkODBCConnectError: sqlstate = 08001; native error = b;
message = [Microsoft][ODBC SQL Server Driver][DBNETLIB]General network
error. Check your network documentation.

The above are repeated for a while and then various other errors
including...

[sqsrvres] CheckServiceAlive: Service is dead

I am also seeing the following repeated many times in the run up to
the shutdown/failover in the SQL Error log...

2004-02-22 11:53:58.56 spid73 WARNING: EC 1aefd588, 0 waited 300
sec. on latch 1a1426c0. Not a BUF latch.
2004-02-22 11:53:58.56 spid73 Waiting for type 0x3, current count
0xa, current owning EC 0x1A197588.

Obviously with different SPIDs and ECs. The errors occur at different
times of the day and there is no common scheduled jobs or maintenance
that I have noticed.

Having searched, I have found many posts from people with similar
problems but have not managed to find a solution. I would really
appreciate any advice anyone can offer because this has been going on
for a while and its really starting to bug me.

Thanks


Jonathan

Reply With Quote
  #2  
Old   
Jeff Zavada
 
Posts: n/a

Default Re: Cluster fails - 02-24-2004 , 12:16 PM






I am seeing the same thing Running W2kSP3 and SQL 2kSP3a in an
active/passive cluster for SAP. I see the same errors in the event
log, but i see no errors in the SQL error log. The closest event in
the SQL error log was a log backup that completed successfully 36
minutes prior to the failover. I see no DB corruption from the DBCC
checks that ran earlier in the morning. Right now i am stumped. If
anyone has any info that would be great.


Jonathan.Smith (AT) MoneySupermarket (DOT) com (JonathanS) wrote in message news:<cee0e99c.0402230358.6a5f17a4 (AT) posting (DOT) google.com>...
Quote:
Hi

I have SQL2K SP3 on W2k SP4 in an active passive cluster. Recently I
have had problems with the cluster either failing over or attempting
to and then hanging in an inaccessible state. The following errors
appear in the W2K Application log...

[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed

[sqsrvres] printODBCError: sqlstate = HYT00; native error = 0; message
= [Microsoft][ODBC SQL Server Driver]Timeout expired

[sqsrvres] OnlineThread: QP is not online.

The above are repeated for a while and then...

[sqsrvres] ODBC sqldriverconnect failed

[sqsrvres] checkODBCConnectError: sqlstate = 08001; native error = b;
message = [Microsoft][ODBC SQL Server Driver][DBNETLIB]General network
error. Check your network documentation.

The above are repeated for a while and then various other errors
including...

[sqsrvres] CheckServiceAlive: Service is dead

I am also seeing the following repeated many times in the run up to
the shutdown/failover in the SQL Error log...

2004-02-22 11:53:58.56 spid73 WARNING: EC 1aefd588, 0 waited 300
sec. on latch 1a1426c0. Not a BUF latch.
2004-02-22 11:53:58.56 spid73 Waiting for type 0x3, current count
0xa, current owning EC 0x1A197588.

Obviously with different SPIDs and ECs. The errors occur at different
times of the day and there is no common scheduled jobs or maintenance
that I have noticed.

Having searched, I have found many posts from people with similar
problems but have not managed to find a solution. I would really
appreciate any advice anyone can offer because this has been going on
for a while and its really starting to bug me.

Thanks


Jonathan

Reply With Quote
  #3  
Old   
Cindy Gross
 
Posts: n/a

Default Re: Cluster fails - 02-25-2004 , 03:27 PM



Timeout expired indicates that the SQL Server is too busy to respond to the
cluster admin's looksalive/isalive requests. This could be the result of a
legitimately busy instance, or SQL Server could be "hung" or otherwise
unable to respond.

There are many potential root causes, to narrow it down you need to find
out if other local connections succeed, if existing connections succeed but
new connections fail, if TCP or named pipes connections succeed, if Windows
itself is not responding, etc. You can also do some general performance
testing/tracking/tuning and use SQL Profiler and/or PerfMon to track what
is going on in Windows and in SQL Server.

Cindy Gross, MCDBA, MCSE
http://cindygross.tripod.com
This posting is provided "AS IS" with no warranties, and confers no rights.


Reply With Quote
  #4  
Old   
JonathanS
 
Posts: n/a

Default Re: Cluster fails - 03-02-2004 , 05:26 AM



Thanks for your responses. Cindy, during the period where SQL is generating the errors, new user connections are still being accepted but it appears that little or no work is being done. After modifying the login timeout for the IsAlive call, the server doesn't failover but still appears to have periods of processing little or no work even though load has not increased according to all obvious PerfMon counters. We are currently using PerfMon to track Proc Cache counters. We believe there may be a memory leak related to pages for procedures. We have observed that the number of Proc Cache pages for SPs steadily increases over time and although it plateaus at very low load times, it continues to increase as the load comes on again. We have discounted h/w issues. It may be of interest that the application uses a JDBC/ODBC bridge to connect to SQL and we are suspicious of this implementation.

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.