dbTalk Databases Forums  

High concurrency SELECT / UPDATE procedure

comp.databases.mysql comp.databases.mysql


Discuss High concurrency SELECT / UPDATE procedure in the comp.databases.mysql forum.



Reply
 
Thread Tools Display Modes
  #11  
Old   
Peter H. Coffin
 
Posts: n/a

Default Re: High concurrency SELECT / UPDATE procedure - 02-11-2011 , 07:22 AM






On Fri, 11 Feb 2011 03:46:53 -0800 (PST), Captain Paralytic wrote:
Quote:
On Feb 11, 10:41?am, Mau C <nob... (AT) hotmail (DOT) com> wrote:
Il 11/02/2011 8.41, Mau C ha scritto:

Il 10/02/2011 22.22, Axel Schwenke ha scritto:
[...]

Thank you both. I definitively agree ...
As <my_id> I can use CONNECTION_ID()

I notice right now that it is not easy to find a *unique* <my_id> ...

CONNECTION_ID() returns an unique client id ... But each client may use
the same connection id untile the connection pool decides to invalidate it.
So, the scenario is :
- A single client has a connection id xx
- A client thread FOO retreives <n> rows marked with status=xx and puts
into a buffer
- Another client thread BAR processes independently those <n> rows/jobs
reading from the buffer and when completed deletes each row; in the
meanwhile FOO tries to fill again the buffer but.... it marks others <n
rows and retreives <n> + "the previous already marked" which are not
still processed (and deleted) ...
- The thread BAR may get and process from buffer the same rows/jobs again.

So the problem is to find an unique <my_id> which won't cause contention
in a high concurrency scenario...

Any rand() or hash code function could be the solution... but it does
not theorically prevent from contention...

You haven't give us any details of what the clients are, but there is
normally some sort of task number or session id that is unique to a
client. If you could tell us something about the clients, maybe we
could help further.
Yup. "Subject to host application determination" is how it gets written
in my process doco, but it essentially means "You're on your own,
sucker." And one, at this point, would tend to want it to be be
completely independant of the DB's features anyway, for portability
reasons.

--
This was, apparently, beyond her ken. So far beyond her ken that she
was well into barbie territory.
-- J. D. Baldwin

Reply With Quote
  #12  
Old   
Mau C
 
Posts: n/a

Default Re: High concurrency SELECT / UPDATE procedure - 02-11-2011 , 07:36 AM






Il 11/02/2011 12.46, Captain Paralytic ha scritto:
[...]
Quote:
You haven't give us any details of what the clients are, but there is
normally some sort of task number or session id that is unique to a
client. If you could tell us something about the clients, maybe we
could help further.
You are right.

The client software is :

- A master thread FOO which makes a loop as
a) gets <x> free rows / jobs, sets them as "taken" and puts them into a
buffer;
b) delete completed rows / jobs.

- A fixed number of slave threads BAR(n) (n= number of client cores).
Each thread BAR retreives (and deletes) a single row / job from the
buffer, processes the row / job and when completed sends the "completed"
information to the FOO thread.

So,
according with your suggestion the thread FOO should check if the buffer
is empty and then gets the rows. In this case I can use a unique id
(such as connection id) because I'm sure I won't have any already
"taken" unprocessed rows from db.
So, the a) step could be
a) if the buffer is EMPTY and all rows are been processed / deleted then
gets <x> free...bla bla...

In a "desired" world I would always have a "filled" buffer to avoid
performance lacks (I don't want FOO waits so long the BAR(n) threads
clean up the buffer).
So, the FOO step a) should be
a) if the buffer size < MAX_BUFFER_LIMIT then gets <x> rows... bla bla

I hope this makes clear the scenario.

Best regards,
M.

Reply With Quote
  #13  
Old   
Captain Paralytic
 
Posts: n/a

Default Re: High concurrency SELECT / UPDATE procedure - 02-11-2011 , 07:48 AM



On Feb 11, 1:36*pm, Mau C <nob... (AT) hotmail (DOT) com> wrote:
Quote:
Il 11/02/2011 12.46, Captain Paralytic ha scritto:
[...]

You haven't give us any details of what the clients are, but there is
normally some sort of task number or session id that is unique to a
client. If you could tell us something about the clients, maybe we
could help further.

You are right.

The client software is :

*- A master thread FOO which makes a loop as
a) gets <x> free rows / jobs, sets them as "taken" and puts them into a
buffer;
b) delete completed rows / jobs.

*- A fixed number of slave threads BAR(n) (n= number of client cores)..
Each thread BAR retreives (and deletes) a single row / job from the
buffer, processes the row / job and when completed sends the "completed"
information to the FOO thread.
Why doesn't each thread pick up its own job from the table. ISTM that
the table already supplies a "buffer" for you.

Reply With Quote
  #14  
Old   
Mau C
 
Posts: n/a

Default Re: High concurrency SELECT / UPDATE procedure - 02-11-2011 , 08:14 AM



Il 11/02/2011 14.48, Captain Paralytic ha scritto:
Quote:
Why doesn't each thread pick up its own job from the table. ISTM that
the table already supplies a "buffer" for you.
The master FOO thread is unique because I want to avoid jdbc read /
write concurrency inside the same client. Slave BAR(n) threads need to
read / write from common (synchronized) job list, so I want buffers.

The goal should be :
- FOO gets <x> rows where <x> is not so big to avoid a huge table lock
and network usage, not so small to avoid too much read attempts.
- MAX_BUFFER_SIZE >= <x> to assure to the BAR(n) threads always
something to do.

M.

Reply With Quote
  #15  
Old   
Captain Paralytic
 
Posts: n/a

Default Re: High concurrency SELECT / UPDATE procedure - 02-11-2011 , 08:32 AM



On Feb 11, 2:14*pm, Mau C <nob... (AT) hotmail (DOT) com> wrote:
Quote:
Il 11/02/2011 14.48, Captain Paralytic ha scritto:

Why doesn't each thread pick up its own job from the table. ISTM that
the table already supplies a "buffer" for you.

The master FOO thread is unique because I want to avoid jdbc read /
write concurrency inside the same client. Slave BAR(n) threads need to
read / write from common (synchronized) job list, so I want buffers.

The goal should be :
- FOO gets <x> rows where <x> is not so big to avoid a huge table lock
and network usage, not so small to avoid too much read attempts.
- MAX_BUFFER_SIZE >= <x> to assure to the BAR(n) threads always
something to do.

M.
I get the feeling that you are prematurely optimising.

Reply With Quote
  #16  
Old   
Mau C
 
Posts: n/a

Default Re: High concurrency SELECT / UPDATE procedure - 02-11-2011 , 08:57 AM



Il 11/02/2011 15.32, Captain Paralytic ha scritto:
Quote:
I get the feeling that you are prematurely optimising.
Well,
that's the architecture. <x> and MAX_BUFFER_SIZE are still undefined :-)

Reply With Quote
  #17  
Old   
Jerry Stuckle
 
Posts: n/a

Default Re: High concurrency SELECT / UPDATE procedure - 02-11-2011 , 09:56 AM



On 2/11/2011 9:32 AM, Captain Paralytic wrote:
Quote:
On Feb 11, 2:14 pm, Mau C<nob... (AT) hotmail (DOT) com> wrote:
Il 11/02/2011 14.48, Captain Paralytic ha scritto:

Why doesn't each thread pick up its own job from the table. ISTM that
the table already supplies a "buffer" for you.

The master FOO thread is unique because I want to avoid jdbc read /
write concurrency inside the same client. Slave BAR(n) threads need to
read / write from common (synchronized) job list, so I want buffers.

The goal should be :
- FOO gets<x> rows where<x> is not so big to avoid a huge table lock
and network usage, not so small to avoid too much read attempts.
- MAX_BUFFER_SIZE>=<x> to assure to the BAR(n) threads always
something to do.

M.

I get the feeling that you are prematurely optimising.
Paul,

I've been following this thread also, and have to agree with you. Often
the simplest solutions are the best; making the solution needlessly
complex not only creates a headache nightmare, but can decrease
performance.

For instance - how often do jobs get scheduled? How long does it take
to start them? If they are scheduling 5 jobs/sec but can start 50
job/sec with a single thread, all this complexity is completely unnecessary.

And if they're concerned about performance, jdbc is NOT the way to go!

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex (AT) attglobal (DOT) net
==================

Reply With Quote
  #18  
Old   
Jerry Stuckle
 
Posts: n/a

Default Re: High concurrency SELECT / UPDATE procedure - 02-11-2011 , 09:58 AM



On 2/11/2011 9:57 AM, Mau C wrote:
Quote:
Il 11/02/2011 15.32, Captain Paralytic ha scritto:
I get the feeling that you are prematurely optimising.

Well,
that's the architecture.<x> and MAX_BUFFER_SIZE are still undefined :-)
I agree with Paul here. If that's the architecture, you need to
reexamine the architecture.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex (AT) attglobal (DOT) net
==================

Reply With Quote
  #19  
Old   
Mau C
 
Posts: n/a

Default Re: High concurrency SELECT / UPDATE procedure - 02-14-2011 , 02:19 AM



Il 11/02/2011 16.58, Jerry Stuckle ha scritto:
Quote:
On 2/11/2011 9:57 AM, Mau C wrote:
Il 11/02/2011 15.32, Captain Paralytic ha scritto:
I get the feeling that you are prematurely optimising.

Well,
that's the architecture.<x> and MAX_BUFFER_SIZE are still undefined :-)

I agree with Paul here. If that's the architecture, you need to
reexamine the architecture.

I did.
After some preliminay test I saw there's no any special reason to delay
the DELETE from Tmp_Table.

The DELETE was there as error recovery: if the job execution fails I do
not DELETE but UPDATE SET status = 0 its id and run it again.
Now, if the job execution fails, I do the simply INSERT
field1,field2,... again.

So, according with your suggestions, the stored can be
1) @my_id = CONNECTION_ID()
2) UPDATE SET status = @my_id
3) retrevied rows
4) DELETE rows WHERE status = @my_id

Jdbc and stored procedures are necessary due to the high portability
requirements. In fact I've also submitted this scenario on the Oracle e
SQL Server groups :-)

Any total run might count up to 500000 job executions. Each job
execution depends on the input parameters and should last between 1 and
50 s. The network traffic over the jdbc connection could be the
bottleneck so I've decided to have buffers and reduce open/close/sql
statements calls. Ok, buffers mean RAM which means no transaction-safe
and possibly data lacks... But I can't think about everythink today,
it's S.Valentine :-)

Thanks for all your support.
M.

Reply With Quote
  #20  
Old   
Jerry Stuckle
 
Posts: n/a

Default Re: High concurrency SELECT / UPDATE procedure - 02-14-2011 , 07:09 AM



On 2/14/2011 3:19 AM, Mau C wrote:
Quote:
Il 11/02/2011 16.58, Jerry Stuckle ha scritto:
On 2/11/2011 9:57 AM, Mau C wrote:
Il 11/02/2011 15.32, Captain Paralytic ha scritto:
I get the feeling that you are prematurely optimising.

Well,
that's the architecture.<x> and MAX_BUFFER_SIZE are still undefined :-)

I agree with Paul here. If that's the architecture, you need to
reexamine the architecture.


I did.
After some preliminay test I saw there's no any special reason to delay
the DELETE from Tmp_Table.

The DELETE was there as error recovery: if the job execution fails I do
not DELETE but UPDATE SET status = 0 its id and run it again.
Now, if the job execution fails, I do the simply INSERT
field1,field2,... again.

So, according with your suggestions, the stored can be
1) @my_id = CONNECTION_ID()
2) UPDATE SET status = @my_id
3) retrevied rows
4) DELETE rows WHERE status = @my_id

Jdbc and stored procedures are necessary due to the high portability
requirements. In fact I've also submitted this scenario on the Oracle e
SQL Server groups :-)

Any total run might count up to 500000 job executions. Each job
execution depends on the input parameters and should last between 1 and
50 s. The network traffic over the jdbc connection could be the
bottleneck so I've decided to have buffers and reduce open/close/sql
statements calls. Ok, buffers mean RAM which means no transaction-safe
and possibly data lacks... But I can't think about everythink today,
it's S.Valentine :-)

Thanks for all your support.
M.
I wasn't referring to delaying the delete on the tmp table - I was
referring to your entire architecture. And I'd say the exact same thing
if you were using Oracle, SQL Server or DB2 (all of which I've used over
the years).

As for portability - there are other ways to get portability than using
jdbc. And there are few things slower than jdbc.

But you don't really know if the network connection is a bottleneck or
not. With jdbc, I highly suspect it will not be. But without solid
information, you are prematurely optimizing - which almost always leads
to a substandard solution.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex (AT) attglobal (DOT) net
==================

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.