Steve,
The minimum you need to do is grab the output from iimonitor for SHOW
SESSIONS FORMATTED for each DBMS server.
This will show the various mutexes that sessions are waiting for. This will
give you output like this
---snip---
Session 8000000186DEC200:16610 (???????? ) cs_state: CS_MUTEX
((x) 8000000100C18068) cs_mask:
Mutex: ULM pool (QSF)
---snip---
You can then parse the HEX mutex ID out of this and run the iimonitor
command SHOW MUTEX 8000000100C18068
IIMONITOR> Mutex at 8000000100C18068: Name: ULM pool (QSF), EXCL owner:
(tid: 17627, pid: 2
5129)
Shared: 0 Collisions: 0 Hwm: 0
Excl: 39546636 Collisions: 52870083
This shows the Process ID (pid) and Thread ID (tid) holding the mutex (if
its Exclusive). Where you go from there depends on what you find but you
should be able to analyse the chains of wait conditions and find what is at
the head of the chain.
Also grab lockstat, logstat and output from ps at the same time.
Inglogs (as mentioned by Linda) does all this for you, including the mutex
listings (it does not do the analysis though).
Not withstanding all this you should perhaps also be looking at a few other
options as well.
1. Upgrading to 9.2
2. If not 9.2 then the latest Service Pack for 2.6. You are some way behind
on 2.6/0305
3. For 1800+ users you might consider reducing the number of DBMS servers.
This will require many adjustments to the config and plenty of testing but
may help to reduce contention. However I would put Ingres 9.2 or the latest
SP ahead of this.
What ever route you may choose please take a look at the Query Recording and
Playback capability in Ingres which will help significantly with testing.
The SC930 trace point (the recording bit) is available for 2.6 but you will
need a patch. Here are a couple of references on the subject.
http://www.iua.org.uk/conference/Sum...d_Playback.pdf
http://community.ingres.com/wiki/Dynamic_Playback
Hope that helps
Peter Gale
2010/1/21 Steve McElhinney <stevem (AT) 141 (DOT) com>
Quote:
.
Ingres 2.6/0305 on AIX 5.2
6 x DBMS's running OS threads.
We are getting MUTEX problems on one of our busy (1800+ user) systems.
Typically one dbms becomes fully jammed with sessions in a MUTEX
state.
After that, IPM/ima (usually) grind to a halt, we end up running
ingstop -force.
This is starting to happen regularly, every 3-4 weeks.
And there isn't much useful info in errlog.log or the DBMS logs ....
Just looking at some previous newsgroup threads, it looks like I need
to establish
what 'flavour/type' of MUTEX I'm getting to take this any further...
(?)
I have seen the iimonitor options "start|stop sampling" and "show
mutex {mid}"
but have not used them on live yet.
Do I need to wait for the system to lock-up again and run these
commands while
the problem is actually occurring to get any meaningful diagnostic
info?
Or is there any low-level monitoring I can run now (and leave running)
to localise this further?
TIA
Steve
_______________________________________________
Info-Ingres mailing list
Info-Ingres (AT) kettleriverconsulting (DOT) com
http://ext-cando.kettleriverconsulti...fo/info-ingres |
--
Peter Gale
pgale61 (AT) gmail (DOT) com