dbTalk Databases Forums  

Parallel access to the values of a table

comp.databases.mysql comp.databases.mysql


Discuss Parallel access to the values of a table in the comp.databases.mysql forum.



Reply
 
Thread Tools Display Modes
  #11  
Old   
Kees Nuyt
 
Posts: n/a

Default Re: Parallel access to the values of a table - 08-23-2011 , 11:22 AM






On Tue, 23 Aug 2011 15:52:38 +0100, The Natural
Philosopher <tnp (AT) invalid (DOT) invalid> wrote:

Quote:
Gordon Burditt wrote:
If at 12:01:00.000000,
1000 clients issue 1000 queries looking for the word "aint", don't
find it, and 1 millisecond later at 12:01:00.001000, 1000 clients
issue 1000 queries to insert the word "aint", exactly 999 of them
should fail with a duplicate key error and one should succeed.
I have exactly this problem. Or nearly so.

I am counting hits from incoming IP addresses.

They invented an extended INSERT statement just for your problem.
It works a bit backwards from your scenario: try an insert. If
it fails because the record already exists, do an update on the
existing record. But it does what you want: update the record,
creating it if necessary. And since it's done in one query,
no transactions or extra locking required.

The logic is

Get the record for this IP address.

If successful, increment the hits counter, record time

ELSE create it with counter as one (and some other stuff)

Sometimes I get duplicate records..

It's not a real world problem (records are for human analysis only) ,
but I'd be curious to know a suitable strategy to avoid it.

Assuming you have 4 fields in the table hitlog:
ip_address (varchar), count (int), first_contact (datetime)
and last_contact (datetime).
There is a unique index (or primary key) on ip_address.

INSERT INTO hitlog(ip_address, count, first_contact, last_contact)
VALUES ('$ip', 1, now(), now())
ON DUPLICATE KEY
UPDATE count = count+1, last_contact = now();

Here, $ip is replaced with an appropriate value, say, by PHP.
Note that it's ONE query.


Sadly there is a separate key..
Read and think before you post.
As Gordon said:

Quote:
There is a unique index (or primary key) on ip_address.
If ipaddr is uniquely identifying a row, it is a
candidate key. Candidate keys can be used as the primary
key.
If you don't use a candidate key as the primary key, but
the datamodel requires it to be unique, you should use a
unique constraint on that column.

Quote:
+-------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| ipaddr | tinytext | YES | | NULL | |
| hits | int(10) unsigned | YES | | NULL | |
| whois | varchar(1024) | YES | | NULL | |
| last_access | datetime | YES | | NULL | |
+-------------+------------------+------+-----+---------+----------------+
5 rows in set (0.07 sec)

That probably scuppers that approach unless I redesign the table. :-(
Easy change.

Quote:
So ON DUPLICATE(ipaddr) is not allowed huh? :-)
ON DUPLICATE will fire on any violated UNIQUE constraint.

Quote:
I hate table locks...
So, why don't you use InnoDB then?
Best regards,
--
( Kees Nuyt
)
c[_]

Reply With Quote
  #12  
Old   
The Natural Philosopher
 
Posts: n/a

Default Re: Parallel access to the values of a table - 08-23-2011 , 02:04 PM






Kees Nuyt wrote:
Quote:
On Tue, 23 Aug 2011 15:52:38 +0100, The Natural
Philosopher <tnp (AT) invalid (DOT) invalid> wrote:

Gordon Burditt wrote:
If at 12:01:00.000000,
1000 clients issue 1000 queries looking for the word "aint", don't
find it, and 1 millisecond later at 12:01:00.001000, 1000 clients
issue 1000 queries to insert the word "aint", exactly 999 of them
should fail with a duplicate key error and one should succeed.
I have exactly this problem. Or nearly so.
I am counting hits from incoming IP addresses.
They invented an extended INSERT statement just for your problem.
It works a bit backwards from your scenario: try an insert. If
it fails because the record already exists, do an update on the
existing record. But it does what you want: update the record,
creating it if necessary. And since it's done in one query,
no transactions or extra locking required.

The logic is

Get the record for this IP address.

If successful, increment the hits counter, record time

ELSE create it with counter as one (and some other stuff)

Sometimes I get duplicate records..

It's not a real world problem (records are for human analysis only) ,
but I'd be curious to know a suitable strategy to avoid it.
Assuming you have 4 fields in the table hitlog:
ip_address (varchar), count (int), first_contact (datetime)
and last_contact (datetime).
There is a unique index (or primary key) on ip_address.

INSERT INTO hitlog(ip_address, count, first_contact, last_contact)
VALUES ('$ip', 1, now(), now())
ON DUPLICATE KEY
UPDATE count = count+1, last_contact = now();

Here, $ip is replaced with an appropriate value, say, by PHP.
Note that it's ONE query.

Sadly there is a separate key..

Read and think before you post.
As Gordon said:

There is a unique index (or primary key) on ip_address.

If ipaddr is uniquely identifying a row, it is a
candidate key. Candidate keys can be used as the primary
key.
If you don't use a candidate key as the primary key, but
the datamodel requires it to be unique, you should use a
unique constraint on that column.

+-------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| ipaddr | tinytext | YES | | NULL | |
| hits | int(10) unsigned | YES | | NULL | |
| whois | varchar(1024) | YES | | NULL | |
| last_access | datetime | YES | | NULL | |
+-------------+------------------+------+-----+---------+----------------+
5 rows in set (0.07 sec)

That probably scuppers that approach unless I redesign the table. :-(

Easy change.

So ON DUPLICATE(ipaddr) is not allowed huh? :-)

ON DUPLICATE will fire on any violated UNIQUE constraint.

I hate table locks...

So, why don't you use InnoDB then?
Best regards,
Thanks for that. I'll ponder which approach is simplest.

Reply With Quote
  #13  
Old   
Axel Schwenke
 
Posts: n/a

Default Re: Parallel access to the values of a table - 08-30-2011 , 04:04 AM



gervaz <gervaz (AT) gmail (DOT) com> wrote:

Quote:
CREATE TABLE word_list (
id int(10) unsigned NOT NULL auto_increment,
word VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
PRIMARY KEY (id),
UNIQUE KEY (word)
) ENGINE=InnoDB;

I have a lot of threads that concurrently try to get the id of a word
from the word_list table and if the word is not found, the word is
inserted and the id returned.

My concern is that I can incurr in integrity errors (unique key) since
I'm using concurrency.
What concern? Handling concurrent access to data *is* one of the major
tasks of a DBMS.

For the above workload, the following may occur:

1. threads A and B both look for the same nonexisting word
2. the faster thread (we assume: A) first gets the lock for the
new row
3. thread B has to wait until thread A either commits or rolls back
4a. on commit thread B will get a "duplicate key" error; it now knows
that the word is there and repeats the select for the id
4b. on rollback thread B will succeed to insert the row itself

Of course your application must be able to cope with such "duplicate"
errors as well as "lock wait timeout" and "deadlock". RTFM!


XL

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.