![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
#2
| |||
| |||
|
|
I have a table containing a list of unique words, defined as: CREATE TABLE word_list ( id int(10) unsigned NOT NULL auto_increment, word VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL, PRIMARY KEY (id), UNIQUE KEY (word) ) ENGINE=InnoDB; I have a lot of threads that concurrently try to get the id of a word from the word_list table and if the word is not found, the word is inserted and the id returned. |
|
My concern is that I can incurr in integrity errors (unique key) since I'm using concurrency. |
|
Any idea on how to avoid such problems without removing concurrency? No idea how to do it, but the solution could be using a lock and a condition variable to tell every thread that the word's id is available once one thread is writing that value, but it seems to me a far complex solution... |
#3
| |||
| |||
|
|
I have a table containing a list of unique words, defined as: CREATE TABLE word_list ( id int(10) unsigned NOT NULL auto_increment, word VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL, PRIMARY KEY (id), UNIQUE KEY (word) ) ENGINE=InnoDB; I have a lot of threads that concurrently try to get the id of a word from the word_list table and if the word is not found, the word is inserted and the id returned. My concern is that I can incurr in integrity errors (unique key) since I'm using concurrency. Any idea on how to avoid such problems without removing concurrency? No idea how to do it, but the solution could be using a lock and a condition variable to tell every thread that the word's id is available once one thread is writing that value, but it seems to me a far complex solution... |
#4
| |||
| |||
|
|
El 22/08/2011 22:19, gervaz escribi /wrote: I have a table containing a list of unique words, defined as: CREATE TABLE word_list ( * * *id int(10) unsigned NOT NULL auto_increment, * * *word VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL, * * *PRIMARY KEY (id), * * *UNIQUE KEY (word) ) ENGINE=InnoDB; I have a lot of threads that concurrently try to get the id of a word from the word_list table and if the word is not found, the word is inserted and the id returned. My concern is that I can incurr in integrity errors (unique key) since I'm using concurrency. Any idea on how to avoid such problems without removing concurrency? No idea how to do it, but the solution could be using a lock and a condition variable to tell every thread that the word's id is available once one thread is writing that value, but it seems to me a far complex solution... Duplicates trigger a 1062 error code: http://dev.mysql.com/doc/refman/5.5/...erver.html#err... Some people suggest that it's better make a SELECT query prior to insertion (which possibly requires a table lock to work reliably) but I can't see anything wrong in adding logic to capture this specific error. (Can your threads remove data as well?) -- --http://alvaro.es- lvaro G. Vicario - Burgos, Spain -- Mi sitio sobre programaci n web:http://borrame.com -- Mi web de humor satinado:http://www.demogracia.com --- Nascondi testo citato - Mostra testo citato - |
#5
| |||
| |||
|
|
1000 clients issue 1000 queries looking for the word "aint", don't find it, and 1 millisecond later at 12:01:00.001000, 1000 clients issue 1000 queries to insert the word "aint", exactly 999 of them should fail with a duplicate key error and one should succeed. |
#6
| |||
| |||
|
|
Every thread uses a separate database connection. When looking for a word id I execute """SELECT id FROM word_list WHERE word = '{}';""" If the word is not found, then I use """INSERT INTO word_list (word) VALUES ('{}');""" No word deletion nor update is involved. My concern is regarding the example proposed by Gordon, where I can have duplicate key errors. |
|
In that case should I raise an exception and execute the SELECT again? |
#7
| |||
| |||
|
|
Every thread uses a separate database connection. When looking for a word id I execute """SELECT id FROM word_list WHERE word = '{}';""" If the word is not found, then I use """INSERT INTO word_list (word) VALUES ('{}');""" No word deletion nor update is involved. My concern is regarding the example proposed by Gordon, where I can have duplicate key errors. |
|
In that case should I raise an exception and execute the SELECT again? Consider that the whole code is inserted in a python program. |
#8
| |||
| |||
|
|
Gordon Burditt wrote: If at 12:01:00.000000, 1000 clients issue 1000 queries looking for the word "aint", don't find it, and 1 millisecond later at 12:01:00.001000, 1000 clients issue 1000 queries to insert the word "aint", exactly 999 of them should fail with a duplicate key error and one should succeed. I have exactly this problem. Or nearly so. I am counting hits from incoming IP addresses. The logic is Get the record for this IP address. If successful, increment the hits counter, record time ELSE create it with counter as one (and some other stuff) Sometimes I get duplicate records.. It's not a real world problem (records are for human analysis only) , but I'd be curious to know a suitable strategy to avoid it. |
#9
| |||
| |||
|
|
If at 12:01:00.000000, 1000 clients issue 1000 queries looking for the word "aint", don't find it, and 1 millisecond later at 12:01:00.001000, 1000 clients issue 1000 queries to insert the word "aint", exactly 999 of them should fail with a duplicate key error and one should succeed. I have exactly this problem. Or nearly so. I am counting hits from incoming IP addresses. |
|
The logic is Get the record for this IP address. If successful, increment the hits counter, record time ELSE create it with counter as one (and some other stuff) Sometimes I get duplicate records.. It's not a real world problem (records are for human analysis only) , but I'd be curious to know a suitable strategy to avoid it. |
#10
| |||
| |||
|
|
If at 12:01:00.000000, 1000 clients issue 1000 queries looking for the word "aint", don't find it, and 1 millisecond later at 12:01:00.001000, 1000 clients issue 1000 queries to insert the word "aint", exactly 999 of them should fail with a duplicate key error and one should succeed. I have exactly this problem. Or nearly so. I am counting hits from incoming IP addresses. They invented an extended INSERT statement just for your problem. It works a bit backwards from your scenario: try an insert. If it fails because the record already exists, do an update on the existing record. But it does what you want: update the record, creating it if necessary. And since it's done in one query, no transactions or extra locking required. The logic is Get the record for this IP address. If successful, increment the hits counter, record time ELSE create it with counter as one (and some other stuff) Sometimes I get duplicate records.. It's not a real world problem (records are for human analysis only) , but I'd be curious to know a suitable strategy to avoid it. Assuming you have 4 fields in the table hitlog: ip_address (varchar), count (int), first_contact (datetime) and last_contact (datetime). There is a unique index (or primary key) on ip_address. INSERT INTO hitlog(ip_address, count, first_contact, last_contact) VALUES ('$ip', 1, now(), now()) ON DUPLICATE KEY UPDATE count = count+1, last_contact = now(); Here, $ip is replaced with an appropriate value, say, by PHP. Note that it's ONE query. |
|
Field | Type | Null | Key | Default | Extra | +-------------+------------------+------+-----+---------+----------------+ id | int(10) unsigned | NO | PRI | NULL | auto_increment | ipaddr | tinytext | YES | | NULL | | hits | int(10) unsigned | YES | | NULL | | whois | varchar(1024) | YES | | NULL | | last_access | datetime | YES | | NULL | | +-------------+------------------+------+-----+---------+----------------+ |
![]() |
| Thread Tools | |
| Display Modes | |
| |