![]() | |
#21
| |||
| |||
|
|
I'm noticing this on a top after a system freeze: 550 processes: 317 sleeping, 63 running, 168 stopped When the system comes back, the stopped process number starts dropping quickly. Within a few second, all the stopped process start running and I see something like this: 550 processes: 484 sleeping, 66 running -----Original Message----- From: Keith Simmons [mailto:smiley73 (AT) gmail (DOT) com] Sent: Wednesday, January 12, 2011 2:38 AM To: Rubinstein, James; informix-list (AT) iiug (DOT) org Subject: Re: System Freezes Do you have a console that you can use for monitoring rather than relying on a telnet type session ? I had an issue last week on an AIX box that kept dropping connections, not allowing a connection, 'downing' print queues etc. Identified the issue as a SAMBA validation issue on some file shares, nothing to do with telnet connections but still affecting the network stack on the server. What other 'networky' type apps are running on this server ? Is there anything in their logs, is there any thing in the Apache logs ? Keith On 12 January 2011 00:12, Rajib Sarkar<rsarkar (AT) us (DOT) ibm.com> wrote: I'm wondering if you can run a sar output to file during the time and analyze it later .. you can run it as root with a higher priority ( renice the shell .. renice -n 20 $$ ) .. this way you can start with what kind of CPU activity is spiking up i.e. %usr or %sys .. If %sys is high then the sar -c output can give you the basic calls i.e. read/write etc. if they are higher than normal .. which can point to queries with bad plans .. I don't think there's any tool equivalent to tprof or dtrace on HPUX which can shed some light into what's causing the spike .. I would suggest, having a couple of telnet windows as root and reniced shells ( so that any commands from that window will run at a higher priority ) .. and capturing, sar, vmstat, glance and top data continuously during the spike .. the higher priority will ensure the tools will run.. The way you have described the issue, I think its probably more like a paging issue than anything else .. but of course only data can show that .. Hope this helps. Thanx much, Rajib Sarkar Sr. Technical Analyst DB2 UDB APD Team IBM Data Management Group http://www.ibm.com/software/data/db2/udb/support/ From his neck down a man is worth a couple of dollars a day, from hisneck up he is worth anything that his brain can produce. -- T. Edison From: "Rubinstein, James"<JRUBIN (AT) midwestern (DOT) edu To: "Fernando Nunes"<domusonline (AT) gmail (DOT) com Cc: informix-list (AT) iiug (DOT) org Date: 01/11/2011 04:42 PM Subject: RE: System Freezes Sent by: informix-list-bounces (AT) iiug (DOT) org It isn't a virtualized host. It is a dedicated HPUX box. I'm not seeing substantially more web traffic. I have thousands of other CGI's and I've never seen anything like this before. What is new is some very complex SQL. We're using raw devices. From: Fernando Nunes [mailto:domusonline (AT) gmail (DOT) com] Sent: Tuesday, January 11, 2011 4:39 PM To: Rubinstein, James Cc: informix-list (AT) iiug (DOT) org Subject: Re: System Freezes I understand that your system freezing will hide any useful stuff fromtop or other tools like that. But if you take some snapshots (vmstat?) it may reveal something. Also... Is this a virtualized host? Are you sure no other host is "stealing" your CPU cores? (This would appear in the logs...) Wild guess... CGIs are nasty... Have you checked your netstat output? Do you have enough tcp ports configured? CGIs typically connec/disconnect, and if you have large TIME_WAIT parameters you can end up exhausting your available ports (it would block new connections, but should not affect existing ones). Are you using raw devices or file system? If it's fs, which one? As you may have noted, I'm shooting into the air... On Tue, Jan 11, 2011 at 10:26 PM, Rubinstein, James<JRUBIN (AT) midwestern (DOT) edu wrote: Nothing of interest in our system log. The problems started Friday when I turned on my new perl CGI's (which do database operations), were no existent over the weekend, when the scripts were not being used, and then started again yesterday, when our university opened for business. I don't notice any freezes during weekends of evenings (when no one is working except for me) and they start again when the web traffic and database activity starts up again. From: informix-list-bounces (AT) iiug (DOT) org [mailto:informix-list-bounces (AT) iiug (DOT) org ] On Behalf Of Everett Mills Sent: Tuesday, January 11, 2011 3:24 PM Cc: informix-list (AT) iiug (DOT) org Subject: RE: System Freezes Those symptoms sound like a hardware issue to me. Have you looked for error messages in /var/adm/syslog/syslog.log? --EEM From: informix-list-bounces (AT) iiug (DOT) org [mailto:informix-list-bounces (AT) iiug (DOT) org ] On Behalf Of Fernando Nunes Sent: Tuesday, January 11, 2011 4:07 PM To: Rubinstein, James Cc: informix-list (AT) iiug (DOT) org Subject: Re: System Freezes On Tue, Jan 11, 2011 at 9:29 PM, Rubinstein, James<JRUBIN (AT) midwestern (DOT) edu wrote: I'm running IDS 11.50.FC6 on HPUX 11.31. We recently rolled out an internally developed web-based system (Apache perl/mod_perl) and immediately started noticing that our HPUX system becomes completely unresponsive, for 20-30 seconds at a time, many times throughout the day. My first thought was network problems, but we have pretty much ruled this out since I can connect to a twin HPUX server which seems fine during the outages. During these system freezes, any connections to the databasefail and the system is completely unresponsive to the point that I cannot even type any commands at the shell. I have seen this behavior in the pastwhen the oninit processes use a lot of CPU resources, but it is usually pretty easy to track these down to some bad SQL/report writing. In this case, I'm trying to figure out what may be causing the system freezes. I have top and the dbtop utility from IIUG, but those don't refresh during or freezes. I am also unable to type any onstat commands until the system comes back. By that time, everything looks pretty normal with low load averages and our oninit processes at normal levels. I'm looking at the various system reports in OAT, but don't see anything that jumps out as the culprit. I'd appreciate any troubleshooting ideas. It may look as I'm defending Informix, but I find it very hard to believe that the symptoms you describe can be caused by any kind of bad SQL. I'd start looking at memory usage etc. Check the OS ratios for filesystem cache vs program memory. Check you memory usage. Try to keep a "top" or similar tool open and see if you notice something. Take frequent snapshots of paging status (so that you can compare before/after counters. Also check your system logs. Last time I saw something similar (not on HP) it was the filesystem/program memory ratios. If froze the machine whenever a filesystem intensive operation was run. By no means I'm insinuating it's everything ok with Informix, but whatever happens with it should not cause that effect. Regards. -- Fernando Nunes Portugal http://informix-technology.blogspot.com My email works... but I don't check it frequently... _______________________________________________ Informix-list mailing list Informix-list (AT) iiug (DOT) org http://www.iiug.org/mailman/listinfo/informix-list -- Fernando Nunes Portugal http://informix-technology.blogspot.com My email works... but I don't check it frequently... _______________________________________________ Informix-list mailing list Informix-list (AT) iiug (DOT) org http://www.iiug.org/mailman/listinfo/informix-list _______________________________________________ Informix-list mailing list Informix-list (AT) iiug (DOT) org http://www.iiug.org/mailman/listinfo/informix-list _______________________________________________ Informix-list mailing list Informix-list (AT) iiug (DOT) org http://www.iiug.org/mailman/listinfo/informix-list |
#22
| |||
| |||
|
|
I'm noticing this on a top after a system freeze: 550 processes: 317 sleeping, 63 running, 168 stopped When the system comes back, the stopped process number starts dropping quickly. *Within a few second, all the stopped process start running and I see something like this: 550 processes: 484 sleeping, 66 running -----Original Message----- From: Keith Simmons [mailto:smile... (AT) gmail (DOT) com] Sent: Wednesday, January 12, 2011 2:38 AM To: Rubinstein, James; informix-l... (AT) iiug (DOT) org Subject: Re: System Freezes Do you have a console that you can use for monitoring rather than relying on a telnet type session ? I had an issue last week on an AIX box that kept dropping connections, not allowing a connection, 'downing' print queues etc. Identified the issue as a SAMBA validation issue on some file shares, nothing to do with telnet connections but still affecting the network stack on the server. What other 'networky' type apps are running on this server ? Is there anything in their logs, is there any thing in the Apache logs ? Keith On 12 January 2011 00:12, Rajib Sarkar <rsar... (AT) us (DOT) ibm.com> wrote: I'm wondering if you can run a sar output to file during the time and analyze it later .. you can run it as root with a higher priority ( renice the shell .. renice -n 20 $$ ) .. this way you can start with what kindof CPU activity is spiking up i.e. %usr or %sys .. If %sys is high then the sar -c output can give you the basic calls i.e. read/write etc. if theyare higher than normal .. which can point to queries with bad plans .. I don't think there's any tool equivalent to tprof or dtrace on HPUX which can shed some light into what's causing the spike .. I would suggest, having a couple of telnet windows as root and reniced shells ( so that any commands from that window will run at a higher priority ) .. and capturing, sar, vmstat, glance and top data continuously during the spike .. the higher priority will ensure the tools will run ... The way you have described the issue, I think its probably more like a paging issue than anything else .. but of course only data can show that .. Hope this helps. Thanx much, Rajib Sarkar Sr. Technical Analyst DB2 UDB APD Team IBM Data Management Group http://www.ibm.com/software/data/db2/udb/support/ From his neck down a man is worth a couple of dollars a day, from his neck up he is worth anything that his brain can produce. -- T. Edison From: * * * "Rubinstein, James" <JRU... (AT) midwestern (DOT) edu To: * * * * "Fernando Nunes" <domusonl... (AT) gmail (DOT) com Cc: * * * * informix-l... (AT) iiug (DOT) org Date: * * * 01/11/2011 04:42 PM Subject: * *RE: System Freezes Sent by: * *informix-list-boun... (AT) iiug (DOT) org It isn't a virtualized host. *It is a dedicated HPUX box. I'm not seeing substantially more web traffic. * I have thousands of other CGI's and I've never seen anything like this before. *What is new is some very complex SQL. We're using raw devices. From: Fernando Nunes [mailto:domusonl... (AT) gmail (DOT) com] Sent: Tuesday, January 11, 2011 4:39 PM To: Rubinstein, James Cc: informix-l... (AT) iiug (DOT) org Subject: Re: System Freezes I understand that your system freezing will hide any useful stuff from top or other tools like that. But if you take some snapshots (vmstat?) it may reveal something. Also... Is this a virtualized host? Are you sure no other host is "stealing" your CPU cores? (This would appear in the logs...) Wild guess... CGIs are nasty... Have you checked your netstat output? Do you have enough tcp ports configured? CGIs typically connec/disconnect,and if you have large TIME_WAIT parameters you can end up exhausting your available ports (it would block new connections, but should not affect existing ones). Are you using raw devices or file system? If it's fs, which one? As you may have noted, I'm shooting into the air... On Tue, Jan 11, 2011 at 10:26 PM, Rubinstein, James <JRU... (AT) midwestern (DOT) edu wrote: Nothing of interest in our system log. *The problems started Friday when I turned on my new perl CGI's (which do database operations), were no existent over the weekend, when the scripts were not being used, and then started again yesterday, when our university opened for business. *I don't notice any freezes during weekends of evenings (when no one is working except for me) and they start again when the web traffic and database activity starts up again. From: informix-list-boun... (AT) iiug (DOT) org [mailto:informix-list-boun... (AT) iiug (DOT) .org ] On Behalf Of Everett Mills Sent: Tuesday, January 11, 2011 3:24 PM Cc: informix-l... (AT) iiug (DOT) org Subject: RE: System Freezes Those symptoms sound like a hardware issue to me. *Have you looked for error messages in /var/adm/syslog/syslog.log? * * * * * * * * * * * * * * * * * *--EEM From: informix-list-boun... (AT) iiug (DOT) org [mailto:informix-list-boun... (AT) iiug (DOT) .org ] On Behalf Of Fernando Nunes Sent: Tuesday, January 11, 2011 4:07 PM To: Rubinstein, James Cc: informix-l... (AT) iiug (DOT) org Subject: Re: System Freezes On Tue, Jan 11, 2011 at 9:29 PM, Rubinstein, James <JRU... (AT) midwestern (DOT) edu wrote: I'm running IDS 11.50.FC6 on HPUX 11.31. *We recently rolled out an internally developed web-based system (Apache perl/mod_perl) and immediately started noticing that our HPUX system becomes completely unresponsive, for 20-30 seconds at a time, many times throughout the day. My first thought was network problems, but we have pretty much ruled this out since I can connect to a twin HPUX server which seems fine during the outages. *During these system freezes, any connections to the database fail and the system is completely unresponsive to the point that I cannot even type any commands at the shell. *I have seen this behavior in the past when the oninit processes use a lot of CPU resources, but it is usually pretty easy to track these down to some bad SQL/report writing. *In this case, I'm trying to figure out what may be causing the system freezes. *I have top and the dbtop utility from IIUG, but those don't refresh during or freezes. I am also unable to type any onstat commands until the system comes back. By that time, everything looks pretty normal with low load averages andour oninit processes at normal levels. *I'm looking at the various system reports in OAT, but don't see anything that jumps out as the culprit. *I'd appreciate any troubleshooting ideas. It may look as I'm defending Informix, but I find it very hard to believe that the symptoms you describe can be caused by any kind of bad SQL. I'd start looking at memory usage etc. Check the OS ratios for filesystem cache vs program memory. Check you memory usage. Try to keep a "top" or similar tool open and see if you notice something. Take frequent snapshots of paging status (so that you can compare before/after counters. Also check your system logs. Last time I saw something similar (not on HP) it was the filesystem/program memory ratios. If froze the machine whenever a filesystem intensive operation was run. By no means I'm insinuating it's everything ok with Informix, but whatever happens with it should not cause that effect. Regards. -- Fernando Nunes Portugal http://informix-technology.blogspot.com My email works... but I don't check it frequently... _______________________________________________ Informix-list mailing list Informix-l... (AT) iiug (DOT) org http://www.iiug.org/mailman/listinfo/informix-list -- Fernando Nunes Portugal http://informix-technology.blogspot.com My email works... but I don't check it frequently... _______________________________________________ Informix-list mailing list Informix-l... (AT) iiug (DOT) org http://www.iiug.org/mailman/listinfo/informix-list _______________________________________________ Informix-list mailing list Informix-l... (AT) iiug (DOT) org http://www.iiug.org/mailman/listinfo/informix-list |
#23
| |||
| |||
|
|
I'm running IDS 11.50.FC6 on HPUX 11.31. *We recently rolled out an internally developed web-based system (Apache perl/mod_perl) and immediately started noticing that our HPUX system becomes completely unresponsive, for 20-30 seconds at a time, many times throughout the day. *My first thought was network problems, but we have pretty much ruled this out since I can connect to a twin HPUX server which seems fine during the outages. *During these system freezes, any connections to the database fail and the system is completely unresponsive to the point that I cannot even type any commands at the shell. *I have seen this behavior in the past when the oninit processes use a lot of CPU resources, but it is usually pretty easy to track these down to some bad SQL/report writing. *In this case, I'm trying to figure out what may be causing the system freezes. *I have top and the dbtop utility from IIUG, but those don't refresh during or freezes. *I am also unable to type any onstat commands until the system comes back. * |
|
By that time, everything looks pretty normal with low load averages and our oninit processes at normal levels. *I'm looking at the various system reports in OAT, but don't see anything that jumps out as the culprit. *I'd appreciate any troubleshooting ideas. * * * * |
#24
| |||
| |||
|
|
On Jan 11, 3:29*pm, "Rubinstein, James" <JRU... (AT) midwestern (DOT) edu> wrote: I'm running IDS 11.50.FC6 on HPUX 11.31. *We recently rolled out an internally developed web-based system (Apache perl/mod_perl) and immediately started noticing that our HPUX system becomes completely unresponsive, for 20-30 seconds at a time, many times throughout the day. *My first thought was network problems, but we have pretty much By that time, everything looks pretty normal with low load averages and our oninit processes at normal levels. *I'm looking at the various system reports in OAT, but don't see anything that jumps out as the culprit. *I'd appreciate any troubleshooting ideas. |
#25
| |||
| |||
|
![]() |
| Thread Tools | |
| Display Modes | |
| |