dbTalk Databases Forums  

Localising MUTEX lock-up on Ing 2.6/0305

comp.databases.ingres comp.databases.ingres


Discuss Localising MUTEX lock-up on Ing 2.6/0305 in the comp.databases.ingres forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Steve McElhinney
 
Posts: n/a

Default Localising MUTEX lock-up on Ing 2.6/0305 - 01-21-2010 , 11:52 AM






..
Ingres 2.6/0305 on AIX 5.2
6 x DBMS's running OS threads.

We are getting MUTEX problems on one of our busy (1800+ user) systems.
Typically one dbms becomes fully jammed with sessions in a MUTEX
state.
After that, IPM/ima (usually) grind to a halt, we end up running
ingstop -force.

This is starting to happen regularly, every 3-4 weeks.
And there isn't much useful info in errlog.log or the DBMS logs ....

Just looking at some previous newsgroup threads, it looks like I need
to establish
what 'flavour/type' of MUTEX I'm getting to take this any further...
(?)

I have seen the iimonitor options "start|stop sampling" and "show
mutex {mid}"
but have not used them on live yet.
Do I need to wait for the system to lock-up again and run these
commands while
the problem is actually occurring to get any meaningful diagnostic
info?
Or is there any low-level monitoring I can run now (and leave running)
to localise this further?

TIA
Steve

Reply With Quote
  #2  
Old   
Ingres Forums
 
Posts: n/a

Default Re: Localising MUTEX lock-up on Ing 2.6/0305 - 01-21-2010 , 12:28 PM






Hello Steve,
If you have an ingres contract you can open an issue and get a copy of
inglogs.sh to run at the time the system hangs. This way we can see why
the system is hanging.
Thanks,
Linda


--
andli02

Reply With Quote
  #3  
Old   
Peter Gale
 
Posts: n/a

Default Re: [Info-Ingres] Localising MUTEX lock-up on Ing 2.6/0305 - 01-22-2010 , 04:16 AM



Steve,

The minimum you need to do is grab the output from iimonitor for SHOW
SESSIONS FORMATTED for each DBMS server.
This will show the various mutexes that sessions are waiting for. This will
give you output like this

---snip---
Session 8000000186DEC200:16610 (???????? ) cs_state: CS_MUTEX
((x) 8000000100C18068) cs_mask:
Mutex: ULM pool (QSF)
---snip---

You can then parse the HEX mutex ID out of this and run the iimonitor
command SHOW MUTEX 8000000100C18068

IIMONITOR> Mutex at 8000000100C18068: Name: ULM pool (QSF), EXCL owner:
(tid: 17627, pid: 2
5129)

Shared: 0 Collisions: 0 Hwm: 0
Excl: 39546636 Collisions: 52870083

This shows the Process ID (pid) and Thread ID (tid) holding the mutex (if
its Exclusive). Where you go from there depends on what you find but you
should be able to analyse the chains of wait conditions and find what is at
the head of the chain.

Also grab lockstat, logstat and output from ps at the same time.

Inglogs (as mentioned by Linda) does all this for you, including the mutex
listings (it does not do the analysis though).

Not withstanding all this you should perhaps also be looking at a few other
options as well.

1. Upgrading to 9.2
2. If not 9.2 then the latest Service Pack for 2.6. You are some way behind
on 2.6/0305
3. For 1800+ users you might consider reducing the number of DBMS servers.
This will require many adjustments to the config and plenty of testing but
may help to reduce contention. However I would put Ingres 9.2 or the latest
SP ahead of this.

What ever route you may choose please take a look at the Query Recording and
Playback capability in Ingres which will help significantly with testing.
The SC930 trace point (the recording bit) is available for 2.6 but you will
need a patch. Here are a couple of references on the subject.

http://www.iua.org.uk/conference/Sum...d_Playback.pdf
http://community.ingres.com/wiki/Dynamic_Playback

Hope that helps

Peter Gale

2010/1/21 Steve McElhinney <stevem (AT) 141 (DOT) com>

Quote:
.
Ingres 2.6/0305 on AIX 5.2
6 x DBMS's running OS threads.

We are getting MUTEX problems on one of our busy (1800+ user) systems.
Typically one dbms becomes fully jammed with sessions in a MUTEX
state.
After that, IPM/ima (usually) grind to a halt, we end up running
ingstop -force.

This is starting to happen regularly, every 3-4 weeks.
And there isn't much useful info in errlog.log or the DBMS logs ....

Just looking at some previous newsgroup threads, it looks like I need
to establish
what 'flavour/type' of MUTEX I'm getting to take this any further...
(?)

I have seen the iimonitor options "start|stop sampling" and "show
mutex {mid}"
but have not used them on live yet.
Do I need to wait for the system to lock-up again and run these
commands while
the problem is actually occurring to get any meaningful diagnostic
info?
Or is there any low-level monitoring I can run now (and leave running)
to localise this further?

TIA
Steve
_______________________________________________
Info-Ingres mailing list
Info-Ingres (AT) kettleriverconsulting (DOT) com
http://ext-cando.kettleriverconsulti...fo/info-ingres



--
Peter Gale
pgale61 (AT) gmail (DOT) com

Reply With Quote
  #4  
Old   
Steve McElhinney
 
Posts: n/a

Default Re: Localising MUTEX lock-up on Ing 2.6/0305 - 01-25-2010 , 06:55 AM



<snipped>

That's great info thanks all.
Steve

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.