dbTalk Databases Forums  

High concurrency SELECT / UPDATE procedure

comp.databases.mysql comp.databases.mysql


Discuss High concurrency SELECT / UPDATE procedure in the comp.databases.mysql forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Mau C
 
Posts: n/a

Default High concurrency SELECT / UPDATE procedure - 02-10-2011 , 04:26 AM






Hi,
I'm messing with this scenario:
- MySQL 5.0.x (it should work on 5.5 too)
- high concurrency jdbc client connections.

The code below is a "simplified" extract of the real application to
make easier the example which is like a job scheduler or, if you prefer,
a "process starter" :

Requirements (steps)
1) it takes the first 10 available jobs (row=job)
2) it marks "taken" those rows and pass them to the external jdbc client
(as resultset)
3) the external client processes those jobs and, when each job is
completed, deletes the corresponding row.

The SQL code below refers to the 1) and 2) step only:

CREATE TABLE `tmp_table` (
`id` int(10) unsigned NOT NULL auto_increment,
`field1` varchar(45) NOT NULL default '',
`field2` varchar(45) NOT NULL default '',
`status` smallint(5) unsigned NOT NULL default '0', -- 0 free, 1 busy
PRIMARY KEY (`id`)
) ENGINE=InnoDB;


DELIMITER $$
CREATE PROCEDURE `pSelectTmpTable`()
BEGIN

START TRANSACTION;

SELECT id, field1, field2 FROM tmp_table WHERE status=0 LIMIT 10 FOR UPDATE;
UPDATE tmp_table SET status=1 WHERE status=0 LIMIT 10;

COMMIT;

END $$

Is it the right way to manipulate records in high concurrency context ?
Do particularly the clauses FOR UPDATE properly work under those
requirements ? Is there a more efficiently way to achieve this goal ?

Thanks in advance
Best regards,
M.

Reply With Quote
  #2  
Old   
Captain Paralytic
 
Posts: n/a

Default Re: High concurrency SELECT / UPDATE procedure - 02-10-2011 , 06:47 AM






On Feb 10, 10:26*am, Mau C <nob... (AT) hotmail (DOT) com> wrote:
Quote:
Hi,
I'm messing with this scenario:
- MySQL 5.0.x (it should work on 5.5 too)
- high concurrency jdbc client connections.

The code below is a "simplified" extract of the real application to
make easier the example which is like a job scheduler or, if you prefer,
a "process starter" :

Requirements (steps)
1) it takes the first 10 available jobs (row=job)
2) it marks "taken" those rows and pass them to the external jdbc client
(as resultset)
3) the external client processes those jobs and, when each job is
completed, deletes the corresponding row.

The SQL code below refers to the 1) and 2) step only:

CREATE TABLE *`tmp_table` (
* `id` int(10) unsigned NOT NULL auto_increment,
* `field1` varchar(45) NOT NULL default '',
* `field2` varchar(45) NOT NULL default '',
* `status` smallint(5) unsigned NOT NULL default '0', -- 0 free, 1 busy
* PRIMARY KEY *(`id`)
) ENGINE=InnoDB;

DELIMITER $$
CREATE PROCEDURE *`pSelectTmpTable`()
BEGIN

START TRANSACTION;

SELECT id, field1, field2 FROM tmp_table WHERE status=0 LIMIT 10 FOR UPDATE;
UPDATE tmp_table SET status=1 WHERE status=0 LIMIT 10;

COMMIT;

END $$

Is it the right way to manipulate records in high concurrency context ?
Do particularly the clauses FOR UPDATE properly work under those
requirements ? Is there a more efficiently way to achieve this goal ?

Thanks in advance
Best regards,
M.
You can actually achieve a similar result without the need for
transactions.

Suppose that each client has a unique client_id. The status field
holds 0 if the job has not been taken or the handling client_id if it
has been.

You therefore do
UPDATE tmp_table SET status={client_id} WHERE status=0 LIMIT 10;
SELECT id, field1, field2 FROM tmp_table WHERE status={client_id}

The benefit of this is that you know which client is handling which
jobs.

Reply With Quote
  #3  
Old   
Mau C
 
Posts: n/a

Default Re: High concurrency SELECT / UPDATE procedure - 02-10-2011 , 08:02 AM



Il 10/02/2011 13.47, Captain Paralytic ha scritto:
[...]
Quote:
You therefore do
UPDATE tmp_table SET status={client_id} WHERE status=0 LIMIT 10;
SELECT id, field1, field2 FROM tmp_table WHERE status={client_id}

The benefit of this is that you know which client is handling which
jobs.
Ok,
I like this point of view. Thank you.
Anyway, does this way prevent from any deadlock and/or row contention ?

Reply With Quote
  #4  
Old   
Captain Paralytic
 
Posts: n/a

Default Re: High concurrency SELECT / UPDATE procedure - 02-10-2011 , 08:19 AM



On Feb 10, 2:02*pm, Mau C <nob... (AT) hotmail (DOT) com> wrote:
Quote:
Il 10/02/2011 13.47, Captain Paralytic ha scritto:
[...]

You therefore do
UPDATE tmp_table SET status={client_id} WHERE status=0 LIMIT 10;
SELECT id, field1, field2 FROM tmp_table WHERE status={client_id}

The benefit of this is that you know which client is handling which
jobs.

Ok,
I like this point of view. Thank you.
Anyway, does this way prevent from any deadlock and/or row contention ?
Yes because it will only choose rows that have the status = 0. If that
wasn't the case, then there would be no point in having the WHERE
clause on a UPDATE statement.

Reply With Quote
  #5  
Old   
Mau C
 
Posts: n/a

Default Re: High concurrency SELECT / UPDATE procedure - 02-10-2011 , 08:30 AM



Il 10/02/2011 15.19, Captain Paralytic ha scritto:
Quote:
Yes because it will only choose rows that have the status = 0. If that
wasn't the case, then there would be no point in having the WHERE
clause on a UPDATE statement.
Ok,
but, without an isolated transaction, imagine to have something like this

TIME CLIENT xx CLIENT yy
---- --------- ---------
0 UPDATE
1 UPDATE
2 SELECT
3 SELECT

Which subset of rows will get the client xx ?
Client yy may override the xx updates... ? If UPDATE is perfectly
"atomic", the answer is definitely no.
If the answer is yes, for sure client xx won't never get wrong rows but
maybe less then 10...

Reply With Quote
  #6  
Old   
Captain Paralytic
 
Posts: n/a

Default Re: High concurrency SELECT / UPDATE procedure - 02-10-2011 , 10:25 AM



On Feb 10, 2:30*pm, Mau C <nob... (AT) hotmail (DOT) com> wrote:
Quote:
Il 10/02/2011 15.19, Captain Paralytic ha scritto:

Yes because it will only choose rows that have the status = 0. If that
wasn't the case, then there would be no point in having the WHERE
clause on a UPDATE statement.

Ok,
but, without an isolated transaction, imagine to have something like this

TIME * *CLIENT xx * * CLIENT yy
---- * *--------- * * ---------
*0 * * *UPDATE
*1 * * * * * * * * * *UPDATE
*2 * * *SELECT
*3 * * * * * * * * * *SELECT

Which subset of rows will get the client xx ?
Client yy may override the xx updates... ? If UPDATE is perfectly
"atomic", the answer is definitely no.
If the answer is yes, for sure client xx won't never get wrong rows but
maybe less then 10...
I have told you that it works and it does. I cannot tell you which
clients will get which rows, but I can tell you that, providing that
there are sufficient rows to allow for 10 rows per client, they will
each get their own 10 rows.

Reply With Quote
  #7  
Old   
Axel Schwenke
 
Posts: n/a

Default Re: High concurrency SELECT / UPDATE procedure - 02-10-2011 , 03:22 PM



Mau C <nobody (AT) hotmail (DOT) com> wrote:

Quote:
I'm messing with this scenario:
- MySQL 5.0.x (it should work on 5.5 too)
- high concurrency jdbc client connections.

The code below is a "simplified" extract of the real application to
make easier the example which is like a job scheduler or, if you prefer,
a "process starter" :

Requirements (steps)
1) it takes the first 10 available jobs (row=job)
2) it marks "taken" those rows and pass them to the external jdbc client
(as resultset)
3) the external client processes those jobs and, when each job is
completed, deletes the corresponding row.
OK so far.

Quote:
The SQL code below refers to the 1) and 2) step only:

CREATE TABLE `tmp_table` (
`id` int(10) unsigned NOT NULL auto_increment,
`field1` varchar(45) NOT NULL default '',
`field2` varchar(45) NOT NULL default '',
`status` smallint(5) unsigned NOT NULL default '0', -- 0 free, 1 busy
PRIMARY KEY (`id`)
) ENGINE=InnoDB;
This is far from perfect, but read on.

Quote:
CREATE PROCEDURE `pSelectTmpTable`()
BEGIN
START TRANSACTION;
SELECT id, field1, field2 FROM tmp_table WHERE status=0 LIMIT 10 FOR UPDATE;
UPDATE tmp_table SET status=1 WHERE status=0 LIMIT 10;
COMMIT;
END $$
Why using a procedure? Makes no sense IMHO.

Quote:
Is it the right way to manipulate records in high concurrency context ?
No. The above makes completely no sense. For several reasons.

*If* you do it in two stages
1. lock the rows with SELECT ... FOR UPDATE
2. modify those records
*then* you must do it right and do the UPDATE based on the id(s)
delivered by the first SELECT.

However it's much easier to do it how the captain suggested: give each
scheduler it's own unique id and reserve the rows with

UPDATE jobs SET taker=<my_id> WHERE taker=0 LIMIT 10

Then later select the reserved jobs with SELECT ... WHERE taker=<my_id>
and finally DELETE the rows based on id.


But there is more to it. In a highly concurrent situation you want not
only short transactions, you also want each transaction to lock as few
records as possible. And here your design above meets a problem in
MySQL. An UPDATE puts locks on all rows that it modifies. But if MySQL
has to do a scan to find those rows (table scan or index scan) then it
locks *all* records as it scans them.

Now look at the update statement again:

UPDATE jobs SET taker=<my_id> WHERE taker=0

since there is no index on the `taker` column (you called it `status`)
MySQL has to do a table scan and will thus lock all rows from the
beginning of the table up to the 10th record with `taker`=0.
Conclusion: you definitely want an index on the `taker` column.

Finally: the obligatory RTFM!

http://dev.mysql.com/doc/refman/5.5/...locks-set.html


XL

Reply With Quote
  #8  
Old   
Mau C
 
Posts: n/a

Default Re: High concurrency SELECT / UPDATE procedure - 02-11-2011 , 01:41 AM



Il 10/02/2011 22.22, Axel Schwenke ha scritto:
[...]

Thank you both. I definitively agree ...
As <my_id> I can use CONNECTION_ID()

Regards,
M.

Reply With Quote
  #9  
Old   
Mau C
 
Posts: n/a

Default Re: High concurrency SELECT / UPDATE procedure - 02-11-2011 , 04:41 AM



Il 11/02/2011 8.41, Mau C ha scritto:
Quote:
Il 10/02/2011 22.22, Axel Schwenke ha scritto:
[...]

Thank you both. I definitively agree ...
As <my_id> I can use CONNECTION_ID()
I notice right now that it is not easy to find a *unique* <my_id> ...

CONNECTION_ID() returns an unique client id ... But each client may use
the same connection id untile the connection pool decides to invalidate it.
So, the scenario is :
- A single client has a connection id xx
- A client thread FOO retreives <n> rows marked with status=xx and puts
into a buffer
- Another client thread BAR processes independently those <n> rows/jobs
reading from the buffer and when completed deletes each row; in the
meanwhile FOO tries to fill again the buffer but.... it marks others <n>
rows and retreives <n> + "the previous already marked" which are not
still processed (and deleted) ...
- The thread BAR may get and process from buffer the same rows/jobs again.

So the problem is to find an unique <my_id> which won't cause contention
in a high concurrency scenario...

Any rand() or hash code function could be the solution... but it does
not theorically prevent from contention...

Reply With Quote
  #10  
Old   
Captain Paralytic
 
Posts: n/a

Default Re: High concurrency SELECT / UPDATE procedure - 02-11-2011 , 05:46 AM



On Feb 11, 10:41*am, Mau C <nob... (AT) hotmail (DOT) com> wrote:
Quote:
Il 11/02/2011 8.41, Mau C ha scritto:

Il 10/02/2011 22.22, Axel Schwenke ha scritto:
[...]

Thank you both. I definitively agree ...
As <my_id> I can use CONNECTION_ID()

I notice right now that it is not easy to find a *unique* <my_id> ...

CONNECTION_ID() returns an unique client id ... But each client may use
the same connection id untile the connection pool decides to invalidate it.
So, the scenario is :
- A single client has a connection id xx
- A client thread FOO retreives <n> rows marked with status=xx and puts
into a buffer
- Another client thread BAR processes independently those <n> rows/jobs
reading from the buffer and when completed deletes each row; in the
meanwhile FOO tries to fill again the buffer but.... it marks others <n
rows and retreives <n> + "the previous already marked" which are not
still processed (and deleted) ...
- The thread BAR may get and process from buffer the same rows/jobs again..

So the problem is to find an unique <my_id> which won't cause contention
in a high concurrency scenario...

Any rand() or hash code function could be the solution... but it does
not theorically prevent from contention...
You haven't give us any details of what the clients are, but there is
normally some sort of task number or session id that is unique to a
client. If you could tell us something about the clients, maybe we
could help further.

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.