dbTalk Databases Forums  

[Info-Ingres] unloaddb stalled in futex

comp.databases.ingres comp.databases.ingres


Discuss [Info-Ingres] unloaddb stalled in futex in the comp.databases.ingres forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Martin Bowes
 
Posts: n/a

Default [Info-Ingres] unloaddb stalled in futex - 01-12-2009 , 08:56 AM






Hi All,



I'm running II 9.0.4 (a64.lnx/105)NPTL + p12479.



I have two database which have decided to stall in unloaddb. Last weeks
unload was no problem.

There is no sign of lock contention, there are no problems listed in the
errlog. Other database on this host are unaffected.



The unloads have both stalled at the point where they have raised the
diagnostic message:

There are XX rules in the database.



copydb is similarly affected, but sysmod will complete.



If I recover both of these databases to the Disaster Recovery Host then
the problem goes away.

However, if I use relocateddb (on the initial host) to copy the database
to a new database, the unloaddb will stall on the new database as well.



If I run the unloaddb command under strace it's final output is:

write(1, "There are 73 rules in the databa"..., 36There are 73 rules in
the database.

) = 36

stat("/dbsystem/II/ingres/files/symbol.tbl", {st_mode=S_IFREG|0644,
st_size=7936, ...}) = 0

write(4, "\0\0\0\0?\1\0\0?\1\0\0\r\0\0\0\30\0\0\0\'\1\0\0\5 \0\0\0"...,
327) = 327

poll([{fd=4, events=POLLIN, revents=POLLIN}], 1, 2147476137) = 1

read(4, "\0\0\0\0\7\1\0\0\267\0\0\0\25\0\0\0\30\0\0\0\237\ 0\0\0"...,
4096) = 271

stat("/dbsystem/II/ingres/files/symbol.tbl", {st_mode=S_IFREG|0644,
st_size=7936, ...}) = 0

write(4, "\0\0\0\0\v\3\0\0\v\3\0\0\r\0\0\0\30\0\0\0\363\2\0 \0\5\0"...,
787) = 787

poll([{fd=4, events=POLLIN, revents=POLLIN}], 1, 2147476127) = 1

read(4, "\0\0\0\0\370\17\0\0#\3\0\0\25\0\0\0\30\0\0\0\v\3\ 0\0\1"...,
4096) = 4096

poll([{fd=4, events=POLLIN, revents=POLLIN}], 1, 2147476117) = 1

read(4, "\0\0\0\0\370\17\0\0\0\0\1\0\0\0\0\0\0\0\0\1\0\0\0 \0\0\0"...,
4096) = 4096

--- SIGSEGV (Segmentation fault) @ 0 (0) ---

rt_sigprocmask(SIG_UNBLOCK, [SEGV], NULL, 8) = 0

lseek(5, 0, SEEK_SET) = 0

read(5, "\4\0\232\2\v\4\0\0\31\0\0\0001\0\0\0D\0\0\0\202\0 \0\0\322"...,
4096) = 4096

lseek(5, 4096, SEEK_SET) = 4096

lseek(5, 1589248, SEEK_SET) = 1589248

read(5, "\314\0\263\0\37\0\211\0\10\00050000E_UG001F\tFEut a: s"...,
4096) = 4096

futex(0x2aaaab256360, FUTEX_WAIT, 2, NULL



The SIGSEGV is impressive...Having just read the Linux manual on futex
and glazed over...I was wondering if anyone can explain the problem and
a solution.



Martin Bowes

PS. I'm negotiating for downtime so I can do an installation restart.

PPS. The Ingres version cannot be upgraded as the host is externally
audited and an attempt to upgrade software has a fairly intensive audit
process attached to it..about 3months of work!





Reply With Quote
  #2  
Old   
Martin Bowes
 
Posts: n/a

Default Re: [Info-Ingres] unloaddb stalled in futex - 01-14-2009 , 05:09 AM






I've just managed to restart the Ingres installation and the problem
persists.



Anyone got any ideas on this before I strap on the official DBA leathers
and raise it with IngresCorp?



Marty



From: Martin Bowes
Sent: 12 January 2009 13:57
To: 'Ingres and related product discussion forum'
Subject: unloaddb stalled in futex



Hi All,



I'm running II 9.0.4 (a64.lnx/105)NPTL + p12479.



I have two database which have decided to stall in unloaddb. Last weeks
unload was no problem.

There is no sign of lock contention, there are no problems listed in the
errlog. Other database on this host are unaffected.



The unloads have both stalled at the point where they have raised the
diagnostic message:

There are XX rules in the database.



copydb is similarly affected, but sysmod will complete.



If I recover both of these databases to the Disaster Recovery Host then
the problem goes away.

However, if I use relocateddb (on the initial host) to copy the database
to a new database, the unloaddb will stall on the new database as well.



If I run the unloaddb command under strace it's final output is:

write(1, "There are 73 rules in the databa"..., 36There are 73 rules in
the database.

) = 36

stat("/dbsystem/II/ingres/files/symbol.tbl", {st_mode=S_IFREG|0644,
st_size=7936, ...}) = 0

write(4, "\0\0\0\0?\1\0\0?\1\0\0\r\0\0\0\30\0\0\0\'\1\0\0\5 \0\0\0"...,
327) = 327

poll([{fd=4, events=POLLIN, revents=POLLIN}], 1, 2147476137) = 1

read(4, "\0\0\0\0\7\1\0\0\267\0\0\0\25\0\0\0\30\0\0\0\237\ 0\0\0"...,
4096) = 271

stat("/dbsystem/II/ingres/files/symbol.tbl", {st_mode=S_IFREG|0644,
st_size=7936, ...}) = 0

write(4, "\0\0\0\0\v\3\0\0\v\3\0\0\r\0\0\0\30\0\0\0\363\2\0 \0\5\0"...,
787) = 787

poll([{fd=4, events=POLLIN, revents=POLLIN}], 1, 2147476127) = 1

read(4, "\0\0\0\0\370\17\0\0#\3\0\0\25\0\0\0\30\0\0\0\v\3\ 0\0\1"...,
4096) = 4096

poll([{fd=4, events=POLLIN, revents=POLLIN}], 1, 2147476117) = 1

read(4, "\0\0\0\0\370\17\0\0\0\0\1\0\0\0\0\0\0\0\0\1\0\0\0 \0\0\0"...,
4096) = 4096

--- SIGSEGV (Segmentation fault) @ 0 (0) ---

rt_sigprocmask(SIG_UNBLOCK, [SEGV], NULL, 8) = 0

lseek(5, 0, SEEK_SET) = 0

read(5, "\4\0\232\2\v\4\0\0\31\0\0\0001\0\0\0D\0\0\0\202\0 \0\0\322"...,
4096) = 4096

lseek(5, 4096, SEEK_SET) = 4096

lseek(5, 1589248, SEEK_SET) = 1589248

read(5, "\314\0\263\0\37\0\211\0\10\00050000E_UG001F\tFEut a: s"...,
4096) = 4096

futex(0x2aaaab256360, FUTEX_WAIT, 2, NULL



The SIGSEGV is impressive...Having just read the Linux manual on futex
and glazed over...I was wondering if anyone can explain the problem and
a solution.



Martin Bowes

PS. I'm negotiating for downtime so I can do an installation restart.

PPS. The Ingres version cannot be upgraded as the host is externally
audited and an attempt to upgrade software has a fairly intensive audit
process attached to it..about 3months of work!





Reply With Quote
  #3  
Old   
Martin Bowes
 
Posts: n/a

Default Re: [Info-Ingres] unloaddb stalled in futex - 01-16-2009 , 07:45 AM



Hi All,



After a lot of experimentation with copydb and chasing my own tail, I
managed to track this down to a single procedure.



Although the procedure is correct and functions perfectly well it is
probably just the wrong length. (Bug 119825)

If I alter the line msg = 'Error: ' +varchar(:errorno)

If I change that to: msg = 'Error: ' + varchar(:errorno)
Then the unloaddb and copydb work correctly.

Note that this was one of many possible edits...including adding an
empty comment...that makes the procedure work in unloaddb/copydb.



Gee, this has been a fun few days!



Marty



From: Martin Bowes
Sent: 14 January 2009 10:10
To: 'Ingres and related product discussion forum'
Subject: RE: unloaddb stalled in futex



I've just managed to restart the Ingres installation and the problem
persists.



Anyone got any ideas on this before I strap on the official DBA leathers
and raise it with IngresCorp?



Marty



From: Martin Bowes
Sent: 12 January 2009 13:57
To: 'Ingres and related product discussion forum'
Subject: unloaddb stalled in futex



Hi All,



I'm running II 9.0.4 (a64.lnx/105)NPTL + p12479.



I have two database which have decided to stall in unloaddb. Last weeks
unload was no problem.

There is no sign of lock contention, there are no problems listed in the
errlog. Other database on this host are unaffected.



The unloads have both stalled at the point where they have raised the
diagnostic message:

There are XX rules in the database.



copydb is similarly affected, but sysmod will complete.



If I recover both of these databases to the Disaster Recovery Host then
the problem goes away.

However, if I use relocateddb (on the initial host) to copy the database
to a new database, the unloaddb will stall on the new database as well.



If I run the unloaddb command under strace it's final output is:

write(1, "There are 73 rules in the databa"..., 36There are 73 rules in
the database.

) = 36

stat("/dbsystem/II/ingres/files/symbol.tbl", {st_mode=S_IFREG|0644,
st_size=7936, ...}) = 0

write(4, "\0\0\0\0?\1\0\0?\1\0\0\r\0\0\0\30\0\0\0\'\1\0\0\5 \0\0\0"...,
327) = 327

poll([{fd=4, events=POLLIN, revents=POLLIN}], 1, 2147476137) = 1

read(4, "\0\0\0\0\7\1\0\0\267\0\0\0\25\0\0\0\30\0\0\0\237\ 0\0\0"...,
4096) = 271

stat("/dbsystem/II/ingres/files/symbol.tbl", {st_mode=S_IFREG|0644,
st_size=7936, ...}) = 0

write(4, "\0\0\0\0\v\3\0\0\v\3\0\0\r\0\0\0\30\0\0\0\363\2\0 \0\5\0"...,
787) = 787

poll([{fd=4, events=POLLIN, revents=POLLIN}], 1, 2147476127) = 1

read(4, "\0\0\0\0\370\17\0\0#\3\0\0\25\0\0\0\30\0\0\0\v\3\ 0\0\1"...,
4096) = 4096

poll([{fd=4, events=POLLIN, revents=POLLIN}], 1, 2147476117) = 1

read(4, "\0\0\0\0\370\17\0\0\0\0\1\0\0\0\0\0\0\0\0\1\0\0\0 \0\0\0"...,
4096) = 4096

--- SIGSEGV (Segmentation fault) @ 0 (0) ---

rt_sigprocmask(SIG_UNBLOCK, [SEGV], NULL, 8) = 0

lseek(5, 0, SEEK_SET) = 0

read(5, "\4\0\232\2\v\4\0\0\31\0\0\0001\0\0\0D\0\0\0\202\0 \0\0\322"...,
4096) = 4096

lseek(5, 4096, SEEK_SET) = 4096

lseek(5, 1589248, SEEK_SET) = 1589248

read(5, "\314\0\263\0\37\0\211\0\10\00050000E_UG001F\tFEut a: s"...,
4096) = 4096

futex(0x2aaaab256360, FUTEX_WAIT, 2, NULL



The SIGSEGV is impressive...Having just read the Linux manual on futex
and glazed over...I was wondering if anyone can explain the problem and
a solution.



Martin Bowes

PS. I'm negotiating for downtime so I can do an installation restart.

PPS. The Ingres version cannot be upgraded as the host is externally
audited and an attempt to upgrade software has a fairly intensive audit
process attached to it..about 3months of work!





Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.