![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
#2
| |||
| |||
|
|
Hi, I'm messing with this scenario: - MySQL 5.0.x (it should work on 5.5 too) - high concurrency jdbc client connections. The code below is a "simplified" extract of the real application to make easier the example which is like a job scheduler or, if you prefer, a "process starter" : Requirements (steps) 1) it takes the first 10 available jobs (row=job) 2) it marks "taken" those rows and pass them to the external jdbc client (as resultset) 3) the external client processes those jobs and, when each job is completed, deletes the corresponding row. The SQL code below refers to the 1) and 2) step only: CREATE TABLE *`tmp_table` ( * `id` int(10) unsigned NOT NULL auto_increment, * `field1` varchar(45) NOT NULL default '', * `field2` varchar(45) NOT NULL default '', * `status` smallint(5) unsigned NOT NULL default '0', -- 0 free, 1 busy * PRIMARY KEY *(`id`) ) ENGINE=InnoDB; DELIMITER $$ CREATE PROCEDURE *`pSelectTmpTable`() BEGIN START TRANSACTION; SELECT id, field1, field2 FROM tmp_table WHERE status=0 LIMIT 10 FOR UPDATE; UPDATE tmp_table SET status=1 WHERE status=0 LIMIT 10; COMMIT; END $$ Is it the right way to manipulate records in high concurrency context ? Do particularly the clauses FOR UPDATE properly work under those requirements ? Is there a more efficiently way to achieve this goal ? Thanks in advance Best regards, M. |
#3
| |||
| |||
|
|
You therefore do UPDATE tmp_table SET status={client_id} WHERE status=0 LIMIT 10; SELECT id, field1, field2 FROM tmp_table WHERE status={client_id} The benefit of this is that you know which client is handling which jobs. |
#4
| |||
| |||
|
|
Il 10/02/2011 13.47, Captain Paralytic ha scritto: [...] You therefore do UPDATE tmp_table SET status={client_id} WHERE status=0 LIMIT 10; SELECT id, field1, field2 FROM tmp_table WHERE status={client_id} The benefit of this is that you know which client is handling which jobs. Ok, I like this point of view. Thank you. Anyway, does this way prevent from any deadlock and/or row contention ? |
#5
| |||
| |||
|
|
Yes because it will only choose rows that have the status = 0. If that wasn't the case, then there would be no point in having the WHERE clause on a UPDATE statement. |
#6
| |||
| |||
|
|
Il 10/02/2011 15.19, Captain Paralytic ha scritto: Yes because it will only choose rows that have the status = 0. If that wasn't the case, then there would be no point in having the WHERE clause on a UPDATE statement. Ok, but, without an isolated transaction, imagine to have something like this TIME * *CLIENT xx * * CLIENT yy ---- * *--------- * * --------- *0 * * *UPDATE *1 * * * * * * * * * *UPDATE *2 * * *SELECT *3 * * * * * * * * * *SELECT Which subset of rows will get the client xx ? Client yy may override the xx updates... ? If UPDATE is perfectly "atomic", the answer is definitely no. If the answer is yes, for sure client xx won't never get wrong rows but maybe less then 10... |
#7
| ||||
| ||||
|
|
I'm messing with this scenario: - MySQL 5.0.x (it should work on 5.5 too) - high concurrency jdbc client connections. The code below is a "simplified" extract of the real application to make easier the example which is like a job scheduler or, if you prefer, a "process starter" : Requirements (steps) 1) it takes the first 10 available jobs (row=job) 2) it marks "taken" those rows and pass them to the external jdbc client (as resultset) 3) the external client processes those jobs and, when each job is completed, deletes the corresponding row. |
|
The SQL code below refers to the 1) and 2) step only: CREATE TABLE `tmp_table` ( `id` int(10) unsigned NOT NULL auto_increment, `field1` varchar(45) NOT NULL default '', `field2` varchar(45) NOT NULL default '', `status` smallint(5) unsigned NOT NULL default '0', -- 0 free, 1 busy PRIMARY KEY (`id`) ) ENGINE=InnoDB; |
|
CREATE PROCEDURE `pSelectTmpTable`() BEGIN START TRANSACTION; SELECT id, field1, field2 FROM tmp_table WHERE status=0 LIMIT 10 FOR UPDATE; UPDATE tmp_table SET status=1 WHERE status=0 LIMIT 10; COMMIT; END $$ |
|
Is it the right way to manipulate records in high concurrency context ? |
#8
| |||
| |||
|
#9
| |||
| |||
|
|
Il 10/02/2011 22.22, Axel Schwenke ha scritto: [...] Thank you both. I definitively agree ... As <my_id> I can use CONNECTION_ID() |
#10
| |||
| |||
|
|
Il 11/02/2011 8.41, Mau C ha scritto: Il 10/02/2011 22.22, Axel Schwenke ha scritto: [...] Thank you both. I definitively agree ... As <my_id> I can use CONNECTION_ID() I notice right now that it is not easy to find a *unique* <my_id> ... CONNECTION_ID() returns an unique client id ... But each client may use the same connection id untile the connection pool decides to invalidate it. So, the scenario is : - A single client has a connection id xx - A client thread FOO retreives <n> rows marked with status=xx and puts into a buffer - Another client thread BAR processes independently those <n> rows/jobs reading from the buffer and when completed deletes each row; in the meanwhile FOO tries to fill again the buffer but.... it marks others <n rows and retreives <n> + "the previous already marked" which are not still processed (and deleted) ... - The thread BAR may get and process from buffer the same rows/jobs again.. So the problem is to find an unique <my_id> which won't cause contention in a high concurrency scenario... Any rand() or hash code function could be the solution... but it does not theorically prevent from contention... |
![]() |
| Thread Tools | |
| Display Modes | |
| |