Re: [Info-Ingres] unloaddb stalled in futex -
01-16-2009
, 07:45 AM
Hi All,
After a lot of experimentation with copydb and chasing my own tail, I
managed to track this down to a single procedure.
Although the procedure is correct and functions perfectly well it is
probably just the wrong length. (Bug 119825)
If I alter the line msg = 'Error: ' +varchar(:errorno)
If I change that to: msg = 'Error: ' + varchar(:errorno)
Then the unloaddb and copydb work correctly.
Note that this was one of many possible edits...including adding an
empty comment...that makes the procedure work in unloaddb/copydb.
Gee, this has been a fun few days!
Marty
From: Martin Bowes
Sent: 14 January 2009 10:10
To: 'Ingres and related product discussion forum'
Subject: RE: unloaddb stalled in futex
I've just managed to restart the Ingres installation and the problem
persists.
Anyone got any ideas on this before I strap on the official DBA leathers
and raise it with IngresCorp?
Marty
From: Martin Bowes
Sent: 12 January 2009 13:57
To: 'Ingres and related product discussion forum'
Subject: unloaddb stalled in futex
Hi All,
I'm running II 9.0.4 (a64.lnx/105)NPTL + p12479.
I have two database which have decided to stall in unloaddb. Last weeks
unload was no problem.
There is no sign of lock contention, there are no problems listed in the
errlog. Other database on this host are unaffected.
The unloads have both stalled at the point where they have raised the
diagnostic message:
There are XX rules in the database.
copydb is similarly affected, but sysmod will complete.
If I recover both of these databases to the Disaster Recovery Host then
the problem goes away.
However, if I use relocateddb (on the initial host) to copy the database
to a new database, the unloaddb will stall on the new database as well.
If I run the unloaddb command under strace it's final output is:
write(1, "There are 73 rules in the databa"..., 36There are 73 rules in
the database.
) = 36
stat("/dbsystem/II/ingres/files/symbol.tbl", {st_mode=S_IFREG|0644,
st_size=7936, ...}) = 0
write(4, "\0\0\0\0?\1\0\0?\1\0\0\r\0\0\0\30\0\0\0\'\1\0\0\5 \0\0\0"...,
327) = 327
poll([{fd=4, events=POLLIN, revents=POLLIN}], 1, 2147476137) = 1
read(4, "\0\0\0\0\7\1\0\0\267\0\0\0\25\0\0\0\30\0\0\0\237\ 0\0\0"...,
4096) = 271
stat("/dbsystem/II/ingres/files/symbol.tbl", {st_mode=S_IFREG|0644,
st_size=7936, ...}) = 0
write(4, "\0\0\0\0\v\3\0\0\v\3\0\0\r\0\0\0\30\0\0\0\363\2\0 \0\5\0"...,
787) = 787
poll([{fd=4, events=POLLIN, revents=POLLIN}], 1, 2147476127) = 1
read(4, "\0\0\0\0\370\17\0\0#\3\0\0\25\0\0\0\30\0\0\0\v\3\ 0\0\1"...,
4096) = 4096
poll([{fd=4, events=POLLIN, revents=POLLIN}], 1, 2147476117) = 1
read(4, "\0\0\0\0\370\17\0\0\0\0\1\0\0\0\0\0\0\0\0\1\0\0\0 \0\0\0"...,
4096) = 4096
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
rt_sigprocmask(SIG_UNBLOCK, [SEGV], NULL, 8) = 0
lseek(5, 0, SEEK_SET) = 0
read(5, "\4\0\232\2\v\4\0\0\31\0\0\0001\0\0\0D\0\0\0\202\0 \0\0\322"...,
4096) = 4096
lseek(5, 4096, SEEK_SET) = 4096
lseek(5, 1589248, SEEK_SET) = 1589248
read(5, "\314\0\263\0\37\0\211\0\10\00050000E_UG001F\tFEut a: s"...,
4096) = 4096
futex(0x2aaaab256360, FUTEX_WAIT, 2, NULL
The SIGSEGV is impressive...Having just read the Linux manual on futex
and glazed over...I was wondering if anyone can explain the problem and
a solution.
Martin Bowes
PS. I'm negotiating for downtime so I can do an installation restart.
PPS. The Ingres version cannot be upgraded as the host is externally
audited and an attempt to upgrade software has a fairly intensive audit
process attached to it..about 3months of work! |