![]() | |
![]() |
| | Thread Tools | Display Modes |
#11
| |||||
| |||||
|
|
Gordon Burditt wrote: If at 12:01:00.000000, 1000 clients issue 1000 queries looking for the word "aint", don't find it, and 1 millisecond later at 12:01:00.001000, 1000 clients issue 1000 queries to insert the word "aint", exactly 999 of them should fail with a duplicate key error and one should succeed. I have exactly this problem. Or nearly so. I am counting hits from incoming IP addresses. They invented an extended INSERT statement just for your problem. It works a bit backwards from your scenario: try an insert. If it fails because the record already exists, do an update on the existing record. But it does what you want: update the record, creating it if necessary. And since it's done in one query, no transactions or extra locking required. The logic is Get the record for this IP address. If successful, increment the hits counter, record time ELSE create it with counter as one (and some other stuff) Sometimes I get duplicate records.. It's not a real world problem (records are for human analysis only) , but I'd be curious to know a suitable strategy to avoid it. Assuming you have 4 fields in the table hitlog: ip_address (varchar), count (int), first_contact (datetime) and last_contact (datetime). There is a unique index (or primary key) on ip_address. INSERT INTO hitlog(ip_address, count, first_contact, last_contact) VALUES ('$ip', 1, now(), now()) ON DUPLICATE KEY UPDATE count = count+1, last_contact = now(); Here, $ip is replaced with an appropriate value, say, by PHP. Note that it's ONE query. Sadly there is a separate key.. |
|
There is a unique index (or primary key) on ip_address. |
|
+-------------+------------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +-------------+------------------+------+-----+---------+----------------+ | id | int(10) unsigned | NO | PRI | NULL | auto_increment | | ipaddr | tinytext | YES | | NULL | | | hits | int(10) unsigned | YES | | NULL | | | whois | varchar(1024) | YES | | NULL | | | last_access | datetime | YES | | NULL | | +-------------+------------------+------+-----+---------+----------------+ 5 rows in set (0.07 sec) That probably scuppers that approach unless I redesign the table. :-( |
|
So ON DUPLICATE(ipaddr) is not allowed huh? :-) |
|
I hate table locks... |
#12
| |||
| |||
|
|
On Tue, 23 Aug 2011 15:52:38 +0100, The Natural Philosopher <tnp (AT) invalid (DOT) invalid> wrote: Gordon Burditt wrote: If at 12:01:00.000000, 1000 clients issue 1000 queries looking for the word "aint", don't find it, and 1 millisecond later at 12:01:00.001000, 1000 clients issue 1000 queries to insert the word "aint", exactly 999 of them should fail with a duplicate key error and one should succeed. I have exactly this problem. Or nearly so. I am counting hits from incoming IP addresses. They invented an extended INSERT statement just for your problem. It works a bit backwards from your scenario: try an insert. If it fails because the record already exists, do an update on the existing record. But it does what you want: update the record, creating it if necessary. And since it's done in one query, no transactions or extra locking required. The logic is Get the record for this IP address. If successful, increment the hits counter, record time ELSE create it with counter as one (and some other stuff) Sometimes I get duplicate records.. It's not a real world problem (records are for human analysis only) , but I'd be curious to know a suitable strategy to avoid it. Assuming you have 4 fields in the table hitlog: ip_address (varchar), count (int), first_contact (datetime) and last_contact (datetime). There is a unique index (or primary key) on ip_address. INSERT INTO hitlog(ip_address, count, first_contact, last_contact) VALUES ('$ip', 1, now(), now()) ON DUPLICATE KEY UPDATE count = count+1, last_contact = now(); Here, $ip is replaced with an appropriate value, say, by PHP. Note that it's ONE query. Sadly there is a separate key.. Read and think before you post. As Gordon said: There is a unique index (or primary key) on ip_address. If ipaddr is uniquely identifying a row, it is a candidate key. Candidate keys can be used as the primary key. If you don't use a candidate key as the primary key, but the datamodel requires it to be unique, you should use a unique constraint on that column. +-------------+------------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +-------------+------------------+------+-----+---------+----------------+ | id | int(10) unsigned | NO | PRI | NULL | auto_increment | | ipaddr | tinytext | YES | | NULL | | | hits | int(10) unsigned | YES | | NULL | | | whois | varchar(1024) | YES | | NULL | | | last_access | datetime | YES | | NULL | | +-------------+------------------+------+-----+---------+----------------+ 5 rows in set (0.07 sec) That probably scuppers that approach unless I redesign the table. :-( Easy change. So ON DUPLICATE(ipaddr) is not allowed huh? :-) ON DUPLICATE will fire on any violated UNIQUE constraint. I hate table locks... So, why don't you use InnoDB then? Best regards, |
#13
| |||
| |||
|
|
CREATE TABLE word_list ( id int(10) unsigned NOT NULL auto_increment, word VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL, PRIMARY KEY (id), UNIQUE KEY (word) ) ENGINE=InnoDB; I have a lot of threads that concurrently try to get the id of a word from the word_list table and if the word is not found, the word is inserted and the id returned. My concern is that I can incurr in integrity errors (unique key) since I'm using concurrency. |
![]() |
| Thread Tools | |
| Display Modes | |
| |