dbTalk Databases Forums  

System Freezes

comp.databases.informix comp.databases.informix


Discuss System Freezes in the comp.databases.informix forum.



Reply
 
Thread Tools Display Modes
  #21  
Old   
Alexandre Marini
 
Posts: n/a

Default Re: System Freezes - 01-12-2011 , 11:23 AM






Hello,
I noticed you have DUMPSHMEM 1 on your onconfig file.
Maybe some engine is freezing your engine, trying to dump your memory to
the disk?

I suggest you to turn off your shared memory dump, and observe the
engine behaviour after that.

Best regards.

Em 12/01/2011 11:14, Rubinstein, James escreveu:
Quote:
I'm noticing this on a top after a system freeze:

550 processes: 317 sleeping, 63 running, 168 stopped

When the system comes back, the stopped process number starts dropping quickly. Within a few second, all the stopped process start running and I see something like this:

550 processes: 484 sleeping, 66 running

-----Original Message-----
From: Keith Simmons [mailto:smiley73 (AT) gmail (DOT) com]
Sent: Wednesday, January 12, 2011 2:38 AM
To: Rubinstein, James; informix-list (AT) iiug (DOT) org
Subject: Re: System Freezes

Do you have a console that you can use for monitoring rather than relying
on a telnet type session ?
I had an issue last week on an AIX box that kept dropping connections,
not allowing a connection, 'downing' print queues etc. Identified the issue
as a SAMBA validation issue on some file shares, nothing to do with telnet
connections but still affecting the network stack on the server.
What other 'networky' type apps are running on this server ? Is there
anything in their logs, is there any thing in the Apache logs ?

Keith

On 12 January 2011 00:12, Rajib Sarkar<rsarkar (AT) us (DOT) ibm.com> wrote:
I'm wondering if you can run a sar output to file during the time and
analyze it later .. you can run it as root with a higher priority ( renice
the shell .. renice -n 20 $$ ) .. this way you can start with what kind of
CPU activity is spiking up i.e. %usr or %sys .. If %sys is high then the
sar -c output can give you the basic calls i.e. read/write etc. if they are
higher than normal .. which can point to queries with bad plans ..

I don't think there's any tool equivalent to tprof or dtrace on HPUX which
can shed some light into what's causing the spike ..

I would suggest, having a couple of telnet windows as root and reniced
shells ( so that any commands from that window will run at a higher
priority ) .. and capturing, sar, vmstat, glance and top data continuously
during the spike .. the higher priority will ensure the tools will run..

The way you have described the issue, I think its probably more like a
paging issue than anything else .. but of course only data can show that ..

Hope this helps.

Thanx much,

Rajib Sarkar
Sr. Technical Analyst
DB2 UDB APD Team
IBM Data Management Group
http://www.ibm.com/software/data/db2/udb/support/


From his neck down a man is worth a couple of dollars a day, from hisneck
up he is worth anything that his brain can produce. -- T. Edison




From: "Rubinstein, James"<JRUBIN (AT) midwestern (DOT) edu
To: "Fernando Nunes"<domusonline (AT) gmail (DOT) com
Cc: informix-list (AT) iiug (DOT) org
Date: 01/11/2011 04:42 PM
Subject: RE: System Freezes
Sent by: informix-list-bounces (AT) iiug (DOT) org



It isn't a virtualized host. It is a dedicated HPUX box.

I'm not seeing substantially more web traffic. I have thousands of other
CGI's and I've never seen anything like this before. What is new is some
very complex SQL.

We're using raw devices.

From: Fernando Nunes [mailto:domusonline (AT) gmail (DOT) com]
Sent: Tuesday, January 11, 2011 4:39 PM
To: Rubinstein, James
Cc: informix-list (AT) iiug (DOT) org
Subject: Re: System Freezes

I understand that your system freezing will hide any useful stuff fromtop
or other tools like that. But if you take some snapshots (vmstat?) it may
reveal something.
Also... Is this a virtualized host? Are you sure no other host is
"stealing" your CPU cores? (This would appear in the logs...)

Wild guess... CGIs are nasty... Have you checked your netstat output? Do
you have enough tcp ports configured? CGIs typically connec/disconnect, and
if you have large TIME_WAIT parameters you can end up exhausting your
available ports (it would block new connections, but should not affect
existing ones).

Are you using raw devices or file system? If it's fs, which one?

As you may have noted, I'm shooting into the air...

On Tue, Jan 11, 2011 at 10:26 PM, Rubinstein, James<JRUBIN (AT) midwestern (DOT) edu
wrote:
Nothing of interest in our system log. The problems started Friday when I
turned on my new perl CGI's (which do database operations), were no
existent over the weekend, when the scripts were not being used, and then
started again yesterday, when our university opened for business. I don't
notice any freezes during weekends of evenings (when no one is working
except for me) and they start again when the web traffic and database
activity starts up again.

From: informix-list-bounces (AT) iiug (DOT) org [mailto:informix-list-bounces (AT) iiug (DOT) org
] On Behalf Of Everett Mills
Sent: Tuesday, January 11, 2011 3:24 PM

Cc: informix-list (AT) iiug (DOT) org
Subject: RE: System Freezes

Those symptoms sound like a hardware issue to me. Have you looked for
error messages in /var/adm/syslog/syslog.log?

--EEM

From: informix-list-bounces (AT) iiug (DOT) org [mailto:informix-list-bounces (AT) iiug (DOT) org
] On Behalf Of Fernando Nunes
Sent: Tuesday, January 11, 2011 4:07 PM
To: Rubinstein, James
Cc: informix-list (AT) iiug (DOT) org
Subject: Re: System Freezes


On Tue, Jan 11, 2011 at 9:29 PM, Rubinstein, James<JRUBIN (AT) midwestern (DOT) edu
wrote:
I'm running IDS 11.50.FC6 on HPUX 11.31. We recently rolled out an
internally developed web-based system (Apache perl/mod_perl) and
immediately started noticing that our HPUX system becomes completely
unresponsive, for 20-30 seconds at a time, many times throughout the day.
My first thought was network problems, but we have pretty much ruled this
out since I can connect to a twin HPUX server which seems fine during the
outages. During these system freezes, any connections to the databasefail
and the system is completely unresponsive to the point that I cannot even
type any commands at the shell. I have seen this behavior in the pastwhen
the oninit processes use a lot of CPU resources, but it is usually pretty
easy to track these down to some bad SQL/report writing. In this case, I'm
trying to figure out what may be causing the system freezes. I have top
and the dbtop utility from IIUG, but those don't refresh during or freezes.
I am also unable to type any onstat commands until the system comes back.
By that time, everything looks pretty normal with low load averages and our
oninit processes at normal levels. I'm looking at the various system
reports in OAT, but don't see anything that jumps out as the culprit. I'd
appreciate any troubleshooting ideas.

It may look as I'm defending Informix, but I find it very hard to believe
that the symptoms you describe can be caused by any kind of bad SQL.
I'd start looking at memory usage etc. Check the OS ratios for filesystem
cache vs program memory. Check you memory usage. Try to keep a "top" or
similar tool open and see if you notice something. Take frequent snapshots
of paging status (so that you can compare before/after counters.
Also check your system logs.
Last time I saw something similar (not on HP) it was the filesystem/program
memory ratios. If froze the machine whenever a filesystem intensive
operation was run.

By no means I'm insinuating it's everything ok with Informix, but whatever
happens with it should not cause that effect.

Regards.

--
Fernando Nunes
Portugal

http://informix-technology.blogspot.com
My email works... but I don't check it frequently...

_______________________________________________
Informix-list mailing list
Informix-list (AT) iiug (DOT) org
http://www.iiug.org/mailman/listinfo/informix-list



--
Fernando Nunes
Portugal

http://informix-technology.blogspot.com
My email works... but I don't check it frequently...
_______________________________________________
Informix-list mailing list
Informix-list (AT) iiug (DOT) org
http://www.iiug.org/mailman/listinfo/informix-list
_______________________________________________
Informix-list mailing list
Informix-list (AT) iiug (DOT) org
http://www.iiug.org/mailman/listinfo/informix-list

_______________________________________________
Informix-list mailing list
Informix-list (AT) iiug (DOT) org
http://www.iiug.org/mailman/listinfo/informix-list


--

Alexandre Marini

Tecnologia da Informação - DBA

msn: alexandre_marini (AT) hotmail (DOT) com

SEFAZ-MS / SGI-UGSR / Sistemas IBM-Informix

Cert-Info-Mgmt_color <Cert-Info-Mgmt_color.jpg>

IBM Certified System Administrator - Informix Dynamic Server V10 / V11

Reply With Quote
  #22  
Old   
Superboer
 
Posts: n/a

Default Re: System Freezes - 01-13-2011 , 02:15 AM






Hello James,

not much i am afraid, i would ask HP anyway.


You state after adding cgi stuff the problems start showing up.. which
sounds like opening files..(nr open files limit???)
you might want to look at sar -v when this happens;
eq start collecting sar output so you can analyze later

eq sar -A -o ....

collect all info so you can choose.

Also make sure that your dbspacetemp is correct and exists otherwise
the database uses filesystem for sort stuff.
using filedesc's.

We had issues with another ids that is hp's intruder detection system
which could hang the system for XX or
could hangup the machine period. In that case we had to push the
button....

Is there anything in the syslog when this happens??

Sorry have not more suggestions.

Hope above helps a bit.

Superboer.



On 12 jan, 16:14, "Rubinstein, James" <JRU... (AT) midwestern (DOT) edu> wrote:
Quote:
I'm noticing this on a top after a system freeze:

550 processes: 317 sleeping, 63 running, 168 stopped

When the system comes back, the stopped process number starts dropping quickly. *Within a few second, all the stopped process start running and I see something like this:

550 processes: 484 sleeping, 66 running

-----Original Message-----
From: Keith Simmons [mailto:smile... (AT) gmail (DOT) com]
Sent: Wednesday, January 12, 2011 2:38 AM
To: Rubinstein, James; informix-l... (AT) iiug (DOT) org
Subject: Re: System Freezes

Do you have a console that you can use for monitoring rather than relying
on a telnet type session ?
I had an issue last week on an AIX box that kept dropping connections,
not allowing a connection, 'downing' print queues etc. Identified the issue
as a SAMBA validation issue on some file shares, nothing to do with telnet
connections but still affecting the network stack on the server.
What other 'networky' type apps are running on this server ? Is there
anything in their logs, is there any thing in the Apache logs ?

Keith

On 12 January 2011 00:12, Rajib Sarkar <rsar... (AT) us (DOT) ibm.com> wrote:
I'm wondering if you can run a sar output to file during the time and
analyze it later .. you can run it as root with a higher priority ( renice
the shell .. renice -n 20 $$ ) .. this way you can start with what kindof
CPU activity is spiking up i.e. %usr or %sys .. If %sys is high then the
sar -c output can give you the basic calls i.e. read/write etc. if theyare
higher than normal .. which can point to queries with bad plans ..

I don't think there's any tool equivalent to tprof or dtrace on HPUX which
can shed some light into what's causing the spike ..

I would suggest, having a couple of telnet windows as root and reniced
shells ( so that any commands from that window will run at a higher
priority ) .. and capturing, sar, vmstat, glance and top data continuously
during the spike .. the higher priority will ensure the tools will run ...

The way you have described the issue, I think its probably more like a
paging issue than anything else .. but of course only data can show that ..

Hope this helps.

Thanx much,

Rajib Sarkar
Sr. Technical Analyst
DB2 UDB APD Team
IBM Data Management Group
http://www.ibm.com/software/data/db2/udb/support/

From his neck down a man is worth a couple of dollars a day, from his neck
up he is worth anything that his brain can produce. -- T. Edison

From: * * * "Rubinstein, James" <JRU... (AT) midwestern (DOT) edu
To: * * * * "Fernando Nunes" <domusonl... (AT) gmail (DOT) com
Cc: * * * * informix-l... (AT) iiug (DOT) org
Date: * * * 01/11/2011 04:42 PM
Subject: * *RE: System Freezes
Sent by: * *informix-list-boun... (AT) iiug (DOT) org

It isn't a virtualized host. *It is a dedicated HPUX box.

I'm not seeing substantially more web traffic. * I have thousands of other
CGI's and I've never seen anything like this before. *What is new is some
very complex SQL.

We're using raw devices.

From: Fernando Nunes [mailto:domusonl... (AT) gmail (DOT) com]
Sent: Tuesday, January 11, 2011 4:39 PM
To: Rubinstein, James
Cc: informix-l... (AT) iiug (DOT) org
Subject: Re: System Freezes

I understand that your system freezing will hide any useful stuff from top
or other tools like that. But if you take some snapshots (vmstat?) it may
reveal something.
Also... Is this a virtualized host? Are you sure no other host is
"stealing" your CPU cores? (This would appear in the logs...)

Wild guess... CGIs are nasty... Have you checked your netstat output? Do
you have enough tcp ports configured? CGIs typically connec/disconnect,and
if you have large TIME_WAIT parameters you can end up exhausting your
available ports (it would block new connections, but should not affect
existing ones).

Are you using raw devices or file system? If it's fs, which one?

As you may have noted, I'm shooting into the air...

On Tue, Jan 11, 2011 at 10:26 PM, Rubinstein, James <JRU... (AT) midwestern (DOT) edu
wrote:
Nothing of interest in our system log. *The problems started Friday when I
turned on my new perl CGI's (which do database operations), were no
existent over the weekend, when the scripts were not being used, and then
started again yesterday, when our university opened for business. *I don't
notice any freezes during weekends of evenings (when no one is working
except for me) and they start again when the web traffic and database
activity starts up again.

From: informix-list-boun... (AT) iiug (DOT) org [mailto:informix-list-boun... (AT) iiug (DOT) .org
] On Behalf Of Everett Mills
Sent: Tuesday, January 11, 2011 3:24 PM

Cc: informix-l... (AT) iiug (DOT) org
Subject: RE: System Freezes

Those symptoms sound like a hardware issue to me. *Have you looked for
error messages in /var/adm/syslog/syslog.log?

* * * * * * * * * * * * * * * * * *--EEM

From: informix-list-boun... (AT) iiug (DOT) org [mailto:informix-list-boun... (AT) iiug (DOT) .org
] On Behalf Of Fernando Nunes
Sent: Tuesday, January 11, 2011 4:07 PM
To: Rubinstein, James
Cc: informix-l... (AT) iiug (DOT) org
Subject: Re: System Freezes

On Tue, Jan 11, 2011 at 9:29 PM, Rubinstein, James <JRU... (AT) midwestern (DOT) edu
wrote:
I'm running IDS 11.50.FC6 on HPUX 11.31. *We recently rolled out an
internally developed web-based system (Apache perl/mod_perl) and
immediately started noticing that our HPUX system becomes completely
unresponsive, for 20-30 seconds at a time, many times throughout the day.
My first thought was network problems, but we have pretty much ruled this
out since I can connect to a twin HPUX server which seems fine during the
outages. *During these system freezes, any connections to the database fail
and the system is completely unresponsive to the point that I cannot even
type any commands at the shell. *I have seen this behavior in the past when
the oninit processes use a lot of CPU resources, but it is usually pretty
easy to track these down to some bad SQL/report writing. *In this case, I'm
trying to figure out what may be causing the system freezes. *I have top
and the dbtop utility from IIUG, but those don't refresh during or freezes.
I am also unable to type any onstat commands until the system comes back.
By that time, everything looks pretty normal with low load averages andour
oninit processes at normal levels. *I'm looking at the various system
reports in OAT, but don't see anything that jumps out as the culprit. *I'd
appreciate any troubleshooting ideas.

It may look as I'm defending Informix, but I find it very hard to believe
that the symptoms you describe can be caused by any kind of bad SQL.
I'd start looking at memory usage etc. Check the OS ratios for filesystem
cache vs program memory. Check you memory usage. Try to keep a "top" or
similar tool open and see if you notice something. Take frequent snapshots
of paging status (so that you can compare before/after counters.
Also check your system logs.
Last time I saw something similar (not on HP) it was the filesystem/program
memory ratios. If froze the machine whenever a filesystem intensive
operation was run.

By no means I'm insinuating it's everything ok with Informix, but whatever
happens with it should not cause that effect.

Regards.

--
Fernando Nunes
Portugal

http://informix-technology.blogspot.com
My email works... but I don't check it frequently...

_______________________________________________
Informix-list mailing list
Informix-l... (AT) iiug (DOT) org
http://www.iiug.org/mailman/listinfo/informix-list

--
Fernando Nunes
Portugal

http://informix-technology.blogspot.com
My email works... but I don't check it frequently...
_______________________________________________
Informix-list mailing list
Informix-l... (AT) iiug (DOT) org
http://www.iiug.org/mailman/listinfo/informix-list
_______________________________________________
Informix-list mailing list
Informix-l... (AT) iiug (DOT) org
http://www.iiug.org/mailman/listinfo/informix-list

Reply With Quote
  #23  
Old   
mpruet
 
Posts: n/a

Default Re: System Freezes - 01-14-2011 , 07:00 PM



On Jan 11, 3:29*pm, "Rubinstein, James" <JRU... (AT) midwestern (DOT) edu> wrote:
Quote:
I'm running IDS 11.50.FC6 on HPUX 11.31. *We recently rolled out an
internally developed web-based system (Apache perl/mod_perl) and
immediately started noticing that our HPUX system becomes completely
unresponsive, for 20-30 seconds at a time, many times throughout the
day. *My first thought was network problems, but we have pretty much
ruled this out since I can connect to a twin HPUX server which seems
fine during the outages. *During these system freezes, any connections
to the database fail and the system is completely unresponsive to the
point that I cannot even type any commands at the shell. *I have seen
this behavior in the past when the oninit processes use a lot of CPU
resources, but it is usually pretty easy to track these down to some bad
SQL/report writing. *In this case, I'm trying to figure out what may be
causing the system freezes. *I have top and the dbtop utility from IIUG,
but those don't refresh during or freezes. *I am also unable to type any
onstat commands until the system comes back. *
If you can't get top or onstat commands to work, then that indicates a
problem at the OS level. Now this could be some looping in the
database code, but probably not.

Are you overconfiguring the system? (i.e. having more CPUVPS than you
have CPUs?) That might cause this type of problem because of lock
inversion. This is the nasty problem where one process has a lock and
is running at a lower priority than another process which is trying to
obtain the lock. Basically you have to wait until the process trying
to aquire the lock ages enough so that it's OS priority drops. Nasty
problem.

One thing that you might consider is somthing like the following....

while [ 1 ];do
date
onstat -g glo
sleep 1
done

When/if the problem reoccurs, you can check the output to see if it
looks like the cputime increased on one of the CPUVPS when the hang
occured...

Quote:
By that time, everything
looks pretty normal with low load averages and our oninit processes at
normal levels. *I'm looking at the various system reports in OAT, but
don't see anything that jumps out as the culprit. *I'd appreciate any
troubleshooting ideas. * * * *

Reply With Quote
  #24  
Old   
caver
 
Posts: n/a

Default Re: System Freezes - 01-15-2011 , 07:15 AM



James,
We had a situation that might be similar - we were running an
Informix, Apache, HPUX
and a perl web application.
Our problem had nothing to do with Informix though and everything to
do with
the apache version/configuration. What version of apache are you
running? I will go back and
look at my notes, but we were in a panic since it was a production
system, so we
threw 3 fixes in at one time and one of the three fixed the issue,
none of
them had anything to do with informix:
1) Changed configuration file for httpd.conf
2) downgraded the version of apache (we were using 2.2)
3) I'll have to check my notes, I forgot #3.
I always believed it was the verson of apache. We always use the pre-
compiled
version from HP. Once we backed off on the apache version, the CPU
stopped pegging at 100%.
We had a call open with HP, so HP should have case notes.
Might be worth a try.
-Daniel
Quote:
On Jan 11, 3:29*pm, "Rubinstein, James" <JRU... (AT) midwestern (DOT) edu> wrote:

I'm running IDS 11.50.FC6 on HPUX 11.31. *We recently rolled out an
internally developed web-based system (Apache perl/mod_perl) and
immediately started noticing that our HPUX system becomes completely
unresponsive, for 20-30 seconds at a time, many times throughout the
day. *My first thought was network problems, but we have pretty much

By that time, everything
looks pretty normal with low load averages and our oninit processes at
normal levels. *I'm looking at the various system reports in OAT, but
don't see anything that jumps out as the culprit. *I'd appreciate any
troubleshooting ideas.

Reply With Quote
  #25  
Old   
Ulf
 
Posts: n/a

Default Re: System Freezes - 01-16-2011 , 04:01 AM



Are you using PRM (process resource management) ? We had similar
problems when we hit the limits set up in PRM.

Even if you are not using PRM I would think that problem is to be
found in HP-UX. That there is nothing in the syslog that indicates
this does not prove that you have not hit limits in the OS such as
number of files etc.

Ulf

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.