dbTalk Databases Forums  

Host machine powered off without closing IDS instance - socconnections now won't work

comp.databases.informix comp.databases.informix


Discuss Host machine powered off without closing IDS instance - socconnections now won't work in the comp.databases.informix forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Malc
 
Posts: n/a

Default Host machine powered off without closing IDS instance - socconnections now won't work - 10-19-2010 , 10:22 AM






IDS9.30HC5 on HP-UX 11i (yes I know it's old).
One of our test database server machines was powered off the other day
by the Unix team without us taking the IDS instance down first, and
since then we've found that connections using onsoctcp fail with
"-27001 Read error occurred during connection attempt." ipcshm
connections are fine.
Another instance on that machine is working properly with no errors,
just lucky I guess.

Unix can't find anything wrong with any network connections on the
machine, the /etc/services file was not changed, nothing different.
The online log for the server comes up with "listener-thread: err =
-408: oserr = 0: errstr = : Invalid message type received from the
sqlexec process" if we try to telnet to the port the onsoctcp service
is running on, so I'm assuming connection attempts are getting to the
engine but it's failing to recognise or act on them properly. We've
also tried altering the port number in /etc/services and the same
things happen.

I reckon an internal stack on the IDS instance was trashed when the
machine was powered down. What I'm wondering is, would a cold restore
fix it (especially if I do a "oninit -iy" beforehand? We have a backup
available from beforehand.

Cheers
Malc

Reply With Quote
  #2  
Old   
Fernando Nunes
 
Posts: n/a

Default Re: Host machine powered off without closing IDS instance - socconnections now won't work - 10-19-2010 , 12:57 PM






On Tue, Oct 19, 2010 at 4:22 PM, Malc <malcrp (AT) googlemail (DOT) com> wrote:

Quote:
IDS9.30HC5 on HP-UX 11i (yes I know it's old).
One of our test database server machines was powered off the other day
by the Unix team without us taking the IDS instance down first, and
since then we've found that connections using onsoctcp fail with
"-27001 Read error occurred during connection attempt." ipcshm
connections are fine.
Another instance on that machine is working properly with no errors,
just lucky I guess.

Unix can't find anything wrong with any network connections on the
machine, the /etc/services file was not changed, nothing different.
The online log for the server comes up with "listener-thread: err =
-408: oserr = 0: errstr = : Invalid message type received from the
sqlexec process" if we try to telnet to the port the onsoctcp service
is running on, so I'm assuming connection attempts are getting to the
engine but it's failing to recognise or act on them properly. We've
also tried altering the port number in /etc/services and the same
things happen.

I reckon an internal stack on the IDS instance was trashed when the
machine was powered down. What I'm wondering is, would a cold restore
fix it (especially if I do a "oninit -iy" beforehand? We have a backup
available from beforehand.


Sorry... I sent an email directly to the OP. Now for the list:

It doesn't make sense.
No "stack" could be destroyed that would affect this.
The instances which run ok are on the same IDS version as this one?
Why did they stop the machine.... to install any patch?
Did the online.log registered any change in $ONCONFIG?

--
Fernando Nunes
Portugal

http://informix-technology.blogspot.com
My email works... but I don't check it frequently...

Reply With Quote
  #3  
Old   
Malc
 
Posts: n/a

Default Re: Host machine powered off without closing IDS instance - socconnections now won't work - 10-19-2010 , 01:26 PM



On Oct 19, 6:57*pm, Fernando Nunes <domusonl... (AT) gmail (DOT) com> wrote:
Quote:
It doesn't make sense.
No "stack" could be destroyed that would affect this.
The instances which run ok are on the same IDS version as this one?
Why did they stop the machine.... to install any patch?
Did the online.log registered any change in $ONCONFIG?

All instances (3 of) on that machine are running the same server
version, just separate ONCONFIGs.
Unix powered the machine down to add a couple of disks and a network
card on a separate LAN is all (which is why I tried usign different
service numbers for the instance that's causing a problem,to see if
there was a conflict).
Since power-up and restarting the instances, one of them won't accept
socket connections.
No config files have changed anywhere, I just brought the instances up
as normal.
I know it's bad form to power a machine off without taking the
instances down but we weren't given a chance!

Confused? You bet.

Reply With Quote
  #4  
Old   
Andreas Legner
 
Posts: n/a

Default Re: Host machine powered off without closing IDS instance -soc connections now won't work - 10-19-2010 , 03:24 PM



Surely not, nothing contained in IDS chunks would affect connectivity in any way,
so restoring them wouldn't correct anything relevant.

I'd rather look at things like name resolution (DNS) and user authentication in the OS.
You might want to truss the MSC VP who is performing this type of work during
connection and session initialisation.

Regards,
Andreas

On 20:59, Malc wrote:
Quote:
IDS9.30HC5 on HP-UX 11i (yes I know it's old).
One of our test database server machines was powered off the other day
by the Unix team without us taking the IDS instance down first, and
since then we've found that connections using onsoctcp fail with
"-27001 Read error occurred during connection attempt." ipcshm
connections are fine.
Another instance on that machine is working properly with no errors,
just lucky I guess.

Unix can't find anything wrong with any network connections on the
machine, the /etc/services file was not changed, nothing different.
The online log for the server comes up with "listener-thread: err =
-408: oserr = 0: errstr = : Invalid message type received from the
sqlexec process" if we try to telnet to the port the onsoctcp service
is running on, so I'm assuming connection attempts are getting to the
engine but it's failing to recognise or act on them properly. We've
also tried altering the port number in /etc/services and the same
things happen.

I reckon an internal stack on the IDS instance was trashed when the
machine was powered down. What I'm wondering is, would a cold restore
fix it (especially if I do a "oninit -iy" beforehand? We have a backup
available from beforehand.

Cheers
Malc

Reply With Quote
  #5  
Old   
jrenaut
 
Posts: n/a

Default Re: Host machine powered off without closing IDS instance - socconnections now won't work - 10-19-2010 , 03:33 PM



On Oct 19, 1:26*pm, Malc <mal... (AT) googlemail (DOT) com> wrote:
Quote:
On Oct 19, 6:57*pm, Fernando Nunes <domusonl... (AT) gmail (DOT) com> wrote:



It doesn't make sense.
No "stack" could be destroyed that would affect this.
The instances which run ok are on the same IDS version as this one?
Why did they stop the machine.... to install any patch?
Did the online.log registered any change in $ONCONFIG?

All instances (3 of) on that machine are running the same server
version, just separate ONCONFIGs.
Unix powered the machine down to add a couple of disks and a network
card on a separate LAN is all (which is why I tried usign different
service numbers for the instance that's causing a problem,to see if
there was a conflict).
Since power-up and restarting the instances, one of them won't accept
socket connections.
No config files have changed anywhere, I just brought the instances up
as normal.
I know it's bad form to power a machine off without taking the
instances down but we weren't given a chance!

Confused? You bet.
Hmm, that 27001 error generally means we tried to do a read on the
socket, and we got a 0 byte return but because we had expected data to
be there, we assume we got 0 bytes because the other side of the end
point closed down. I wouldn't do an archive restore or anything, but
you said this was a test machine yeah? I might try bringing all of
the informix instances offline, and then clearing out the /INFORMIXTMP
directory (make sure you get all hidden files, several start
with .something). I've seen files left over in this directory cause
problems before, but I don't think it was 27001 errors specifically.
I don't think there is anything in that directory that won't get
rebuilt (but if you are worried you could make a copy of everything in
there before clearing it out). But perhaps after you clear it out,
bring the servers back up and see if all the instance's socket
connections work at that point.

Jacques Renaut
IBM Informix Advanced Support
APD Team

Reply With Quote
  #6  
Old   
Fernando Nunes
 
Posts: n/a

Default Re: Host machine powered off without closing IDS instance - socconnections now won't work - 10-19-2010 , 03:53 PM



On Tue, Oct 19, 2010 at 7:26 PM, Malc <malcrp (AT) googlemail (DOT) com> wrote:

Quote:
On Oct 19, 6:57 pm, Fernando Nunes <domusonl... (AT) gmail (DOT) com> wrote:

It doesn't make sense.
No "stack" could be destroyed that would affect this.
The instances which run ok are on the same IDS version as this one?
Why did they stop the machine.... to install any patch?
Did the online.log registered any change in $ONCONFIG?


All instances (3 of) on that machine are running the same server
version, just separate ONCONFIGs.
Unix powered the machine down to add a couple of disks and a network
card on a separate LAN is all (which is why I tried usign different
service numbers for the instance that's causing a problem,to see if
there was a conflict).
Since power-up and restarting the instances, one of them won't accept
socket connections.


Nothing was changed in the routing? Was this the first instance to be
brought online? have you restarted the failing instance or one of the
others?
Believe us. It was not the fact that the machine was turn off with the
instance online that caused this.
It's clear that the objective of the change was something related with the
network, and I bet the problem has only to do with network and will be
solved with some network change... Maybe it was solved after this instance
came online and before the others. Or the opposite.
Also, are you using "*" in SQLHOSTS for this or the other instances?
--
Fernando Nunes
Portugal

http://informix-technology.blogspot.com
My email works... but I don't check it frequently...

Reply With Quote
  #7  
Old   
Malc
 
Posts: n/a

Default Re: Host machine powered off without closing IDS instance - socconnections now won't work - 10-19-2010 , 04:06 PM



Well, all 3 instances have been brought up and down a couple of times
since the machine was restarted, and it's still only that one instance
("itsoc") that is exhibiting any issues.
Anyway, I've now trashed it with oninit -i and am at present restoring
it, should finish in 4 hours or so so I'll report on the situation
tomorrow.
I must admit that anything the Unix boys say ("We didn't do anything,
honest") can often be taken with a grain (aka an imperial "pinch") of
salt.

Cheers for reading my drivel, guys, appreciated.

/INFORMIXTMP lists as follows. (Instance qrepsoc hasn't been online or
used since 2005)
dev:informixmxp):/opt/informix/reporting>ls -al /
INFORMIXTMP/
total
112
drwxrwxrwt 2 root informix 8192 Oct 19
20:26 .
drwxr-xr-x 42 root root 8192 Oct 19
15:29 ..
-r--r--r-- 1 root informix 40 Oct 18
15:01 .inf.nothing1
-r--r--r-- 1 root informix 40 Oct 19
20:26 .inf.nothing2
-r--r--r-- 1 root informix 40 Oct 19
16:03 .inf.nothing3
-r--r--r-- 1 root informix 40 Jul 6
2005 .inf.qrepshm
-rwxrwxrwx 1 root informix 14 Aug 19
2003 .infxdirs
srwxrwxrwx 1 root informix 0 Oct 18 15:01 VP.devsoc.
0101s
srwxrwxrwx 1 informix informix 0 Oct 18 15:04 VP.devsoc.
0401s
srwxrwxrwx 1 informix informix 0 Oct 18 15:04 VP.devsoc.
0501s
srwxrwxrwx 1 root informix 0 Oct 19 16:03 VP.evalsoc.
0101s
srwxrwxrwx 1 informix informix 0 Oct 19 16:09 VP.evalsoc.
0401s
srwxrwxrwx 1 root informix 0 Oct 19 20:26 VP.itsoc.
0101s
srwxrwxrwx 1 dba informix 0 Oct 4 09:12 VP.itsoc.
0401s
srwxrwxrwx 1 dba informix 0 Oct 4 09:29 VP.itsoc.
0501s
srwxrwxrwx 1 root informix 0 Jul 6 2005 VP.qrepsoc.
0101s
srwxrwxrwx 1 informix informix 0 Jul 6 2005 VP.qrepsoc.
0301s

Reply With Quote
  #8  
Old   
Malc
 
Posts: n/a

Default Re: Host machine powered off without closing IDS instance - socconnections now won't work - 10-19-2010 , 04:32 PM



Oh and SQLHOSTS contains:

# Define 'dev' server
connections
devshm onipcshm dev
nothing1
g_devsoc group - -
i=100
devsoc onsoctcp dev devsoc
g=g_devsoc

# Define 'it' server
connections
itshm onipcshm dev
nothing2
g_itsoc group - -
i=101
itsoc onsoctcp dev itsoc
g=g_itsoc

# Define 'eval' server
connections
evalshm onipcshm dev
nothing3
g_evalsoc group - -
i=103
evalsoc onsoctcp dev evalsoc g=g_evalsoc

Reply With Quote
  #9  
Old   
Malc
 
Posts: n/a

Default Re: Host machine powered off without closing IDS instance - socconnections now won't work - 10-20-2010 , 04:21 AM



OK we've fixed it, and thanks to Jacques Renaut for the initial clue!

With the instance up (and still refusing socket connections) we looked
at /INFORMIXTMP and found the following for the instance giving
problems:
srwxrwxrwx 1 root informix 0 Oct 19 20:05 VP.itsoc.
0101s
srwxrwxrwx 1 informix informix 0 Oct 4 09:12 VP.itsoc.
0401s
srwxrwxrwx 1 informix informix 0 Oct 4 09:29 VP.itsoc.
0501s

The accidental powerdown was on Sunday 17th, so the files from Oct 4th
were obviously left behind at that point.
We just took the instance down and saw that the two Oct 4 files
remained, so we removed the files (renamed them with mv) and restarted
the server and all is OK.
We're a little dubious as to whether we may have broken something
somewhere else by moving the files, so we'll arrange for a clean
reboot which hopefully should ensure that all is tidy.
Thanks for your help on this, appreciated.

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.