dbTalk Databases Forums  

Re: [Info-Ingres] level of parallelism and SIGSEGV inprocedure call.

comp.databases.ingres comp.databases.ingres


Discuss Re: [Info-Ingres] level of parallelism and SIGSEGV inprocedure call. in the comp.databases.ingres forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Martin Bowes
 
Posts: n/a

Default Re: [Info-Ingres] level of parallelism and SIGSEGV inprocedure call. - 10-12-2011 , 02:40 AM






Hi Kevin,

I managed to make a test case yesterday and attach it to issue 152671. Christian Challot has picked it up.

As you mentioned, I had to take a smattering of the original data and load in the stats from the original data (using -zp flag amongst others) to get a case which can duplicate the problem...albeit after possibly a few runs of the procedure sequence.

Marty

From: Kevin Smedley [mailto:Kevin.Smedley (AT) actian (DOT) com]
Sent: 11 October 2011 11:06
To: Ingres and related product discussion forum
Subject: Re: [Info-Ingres] level of parallelism and SIGSEGV in procedure call.

Hi Martin,

Typically with these parallel thread issues one of the threads has finishedand tidied up before the unfortunate remaining thread has finished. In order to reproduce the problem you will need something approximating the original data in order to make sure you get a similar query plan and to make sure you get similar timings at runtime.

I started in this email to write a general guide on how to create a testcase for this situation, but then decided it's probably better if you open a support issue and we discuss it step by step via the issue. It may be that with details of the SQL procedure, some diagnostics to see the query execution plan, and a decode of the stack, we could even work out what has happened without a testcase.

Regards,

Kevin Smedley.

From: info-ingres-bounces (AT) kettleriver...ting (DOT) com [mailto:info-ingres-bounces (AT) kettleriverconsulting (DOT) com] On Behalf Of Martin Bowes
Sent: 10 October 2011 14:40
To: Ingres and related product discussion forum
Subject: [Info-Ingres] level of parallelism and SIGSEGV in procedure call.

Hi All,

On Ingres II 10.0.0 (a64.lnx/132)NPTL + p14139

We have a procedure which when we execute it several times in succession fails with a SIGSEGV (see below)...and at different points in that sequence. I've seen situations where we do the sequence and got no errors, rollback, repeat and get errors aplenty.

The SIGSEGV seems to implicate the Parallel Query Thread and the server in question has opf_pq_dop = 8. I've recovered the database to another installation and duplicated the results there. I then reset opf_pq_dop to zero on the recovery installation and the sequence always executes OK.

Having reset the recovery box to use opf_pq_dop = 8, I then altered the sequence to commence with set noparallel and it works fine every time.

So I'm pretty convined we have a bug related to parallel query processing.

Now the fun part of building a test case and sending to support....

The database is way too large and holds way too much sensitive data to be sent so I've tried creating empty tables of the same structure as the originals, copying the stats over to the new tables and repeating the procedure sequence (modifed for the new tables)...and it works every time regardless of the parallelism.

Anyone got any ideas on what might be going wrong?

The error log showing a SIGSEGV typically looks like:
E_QE0002_INTERNAL_ERROR A QEF internal error occurred.
Associated error messages which provide more detailed information about theproblem can be found in the error log (errlog.log)
An error occurred in the following session:
Quote:
Session 00002AAAD92383C0:1080228160
DB Name: ace_trove_live (Owned by: ace
)
User: dispatch ( <Parallel Query Thread> )
User Name at Session Startup: <Parallel Query Thread>
Terminal: pts/2
Group Id: ace_group
Role Id:
Application Code: 00000000 Current Facility: DMF (00000003)
Description:
Query: EXECUTE PROCEDURE value_summary( fmin=36250, fmax=36499 )
Last Query: EXECUTE PROCEDURE value_summary( fmin=36000, fmax=36249 )
bb5.ctsu::[53935 , d92383c0]: Mon Oct 10 11:16:34 2011 Segmentation Violation (SIGSEGV) @PC 0000000000814031
RSP 00000000406227b0 RBP 0000000040622840 RSI 00002aaad902eaa0
RDI 00002aaad902e7c0 RAX 00002aaad902ff00 RBX 0000000000000001
RCX 0000000000000000 RDX 0000000000000000

-----------BEGIN STACK TRACE------------
bb5_ctsu::[53935 , 2aaad92383c0]: pid 22966:0:40622840 iidbms(qen_position+0x269) [0x814031]( ... )
bb5_ctsu::[53935 , 2aaad92383c0]: pid 22966:1:406229b0 iidbms(qen_orig+0xc9a) [0x81579b]( ... )
bb5_ctsu::[53935 , 2aaad92383c0]: pid 22966:2:40622d90 iidbms(qen_exchange_child+0x1b44) [0x80c2f4]( ... )
bb5_ctsu::[53935 , 2aaad92383c0]: pid 22966:3:40622e50 iidbms(scs_dbms_task+0xcdf) [0x78f013]( ... )
bb5_ctsu::[53935 , 2aaad92383c0]: pid 22966:4:40627030 iidbms(scs_sequencer+0x349) [0x48cf7e]( ... )
bb5_ctsu::[53935 , 2aaad92383c0]: pid 22966:5:4062f120 iidbms(CSMT_setup+0x528) [0x74eb27]( ... )
-----------END STACK TRACE----------

Martin Bowes

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.