Hi Martin,
Typically with these parallel thread issues one of the threads has
finished and tidied up before the unfortunate remaining thread has
finished. In order to reproduce the problem you will need something
approximating the original data in order to make sure you get a similar
query plan and to make sure you get similar timings at runtime.
I started in this email to write a general guide on how to create a
testcase for this situation, but then decided it's probably better if
you open a support issue and we discuss it step by step via the issue.
It may be that with details of the SQL procedure, some diagnostics to
see the query execution plan, and a decode of the stack, we could even
work out what has happened without a testcase.
Regards,
Kevin Smedley.
From: info-ingres-bounces (AT) kettleriver...ting (DOT) com
[mailto:info-ingres-bounces (AT) kettleriverconsulting (DOT) com] On Behalf Of
Martin Bowes
Sent: 10 October 2011 14:40
To: Ingres and related product discussion forum
Subject: [Info-Ingres] level of parallelism and SIGSEGV in procedure
call.
Hi All,
On Ingres II 10.0.0 (a64.lnx/132)NPTL + p14139
We have a procedure which when we execute it several times in succession
fails with a SIGSEGV (see below)...and at different points in that
sequence. I've seen situations where we do the sequence and got no
errors, rollback, repeat and get errors aplenty.
The SIGSEGV seems to implicate the Parallel Query Thread and the server
in question has opf_pq_dop = 8. I've recovered the database to another
installation and duplicated the results there. I then reset opf_pq_dop
to zero on the recovery installation and the sequence always executes
OK.
Having reset the recovery box to use opf_pq_dop = 8, I then altered the
sequence to commence with set noparallel and it works fine every time.
So I'm pretty convined we have a bug related to parallel query
processing.
Now the fun part of building a test case and sending to support....
The database is way too large and holds way too much sensitive data to
be sent so I've tried creating empty tables of the same structure as the
originals, copying the stats over to the new tables and repeating the
procedure sequence (modifed for the new tables)...and it works every
time regardless of the parallelism.
Anyone got any ideas on what might be going wrong?
The error log showing a SIGSEGV typically looks like:
E_QE0002_INTERNAL_ERROR A QEF internal error occurred.
Associated error messages which provide more detailed information about
the problem can be found in the error log (errlog.log)
An error occurred in the following session:
Quote:
Session 00002AAAD92383C0:1080228160 |
DB Name: ace_trove_live (Owned by: ace
)
User: dispatch ( <Parallel Query Thread>
)
User Name at Session Startup: <Parallel Query Thread>
Terminal: pts/2
Group Id: ace_group
Role Id:
Application Code: 00000000 Current Facility: DMF
(00000003)
Description:
Query: EXECUTE PROCEDURE value_summary( fmin=36250, fmax=36499 )
Last Query: EXECUTE PROCEDURE value_summary( fmin=36000, fmax=36249
)
bb5.ctsu::[53935 , d92383c0]: Mon Oct 10 11:16:34 2011
Segmentation Violation (SIGSEGV) @PC 0000000000814031
RSP 00000000406227b0 RBP 0000000040622840 RSI 00002aaad902eaa0
RDI 00002aaad902e7c0 RAX 00002aaad902ff00 RBX 0000000000000001
RCX 0000000000000000 RDX 0000000000000000
-----------BEGIN STACK TRACE------------
bb5_ctsu::[53935 , 2aaad92383c0]: pid 22966:0:40622840
iidbms(qen_position+0x269) [0x814031]( ... )
bb5_ctsu::[53935 , 2aaad92383c0]: pid 22966:1:406229b0
iidbms(qen_orig+0xc9a) [0x81579b]( ... )
bb5_ctsu::[53935 , 2aaad92383c0]: pid 22966:2:40622d90
iidbms(qen_exchange_child+0x1b44) [0x80c2f4]( ... )
bb5_ctsu::[53935 , 2aaad92383c0]: pid 22966:3:40622e50
iidbms(scs_dbms_task+0xcdf) [0x78f013]( ... )
bb5_ctsu::[53935 , 2aaad92383c0]: pid 22966:4:40627030
iidbms(scs_sequencer+0x349) [0x48cf7e]( ... )
bb5_ctsu::[53935 , 2aaad92383c0]: pid 22966:5:4062f120
iidbms(CSMT_setup+0x528) [0x74eb27]( ... )
-----------END STACK TRACE----------
Martin Bowes