dbTalk Databases Forums  

Efficient range query

comp.databases.mysql comp.databases.mysql


Discuss Efficient range query in the comp.databases.mysql forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Captain Paralytic
 
Posts: n/a

Default Efficient range query - 11-24-2010 , 05:07 AM






Supposing I have a table of non-overlapping ranges thus:

CREATE TABLE `ranges` (
`start` int(10) unsigned NOT NULL DEFAULT '0',
`end` int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`start`)
);
INSERT INTO `ranges` VALUES (1,6),(20,28),(50,65);

Now I know that, since the ranges are non-overlapping, if I want to
know if a number lies in any of the ranges (I don't really care which
one, only that it is in one of them), all I need to do is to check the
largest start that is lower than or equal to my number to see if it is
also smaller than or equal to that row's end number.

A query such as
SELECT
1
FROM ranges
WHERE 21 BETWEEN start AND end
LIMIT 1

will need to check the first 2 records, whereas if I am checking 51,
it will need to check all of them.

Can anyone think of a more efficient structure/indexing/query which
will be able to give the DBMS a good enough clue so that it only needs
to check a single row?

Reply With Quote
  #2  
Old   
strawberry
 
Posts: n/a

Default Re: Efficient range query - 11-24-2010 , 07:26 AM






On Nov 24, 11:07*am, Captain Paralytic <paul_laut... (AT) yahoo (DOT) com> wrote:
Quote:
Supposing I have a table of non-overlapping ranges thus:

CREATE TABLE `ranges` (
* `start` int(10) unsigned NOT NULL DEFAULT '0',
* `end` int(10) unsigned NOT NULL DEFAULT '0',
* PRIMARY KEY (`start`)
);
INSERT INTO `ranges` VALUES (1,6),(20,28),(50,65);

Now I know that, since the ranges are non-overlapping, if I want to
know if a number lies in any of the ranges (I don't really care which
one, only that it is in one of them), all I need to do is to check the
largest start that is lower than or equal to my number to see if it is
also smaller than or equal to that row's end number.

A query such as
SELECT
* 1
FROM ranges
WHERE 21 BETWEEN start AND end
LIMIT 1

will need to check the first 2 records, whereas if I am checking 51,
it will need to check all of them.

Can anyone think of a more efficient structure/indexing/query which
will be able to give the DBMS a good enough clue so that it only needs
to check a single row?
Just a thought... have you considered spatial indexes

Reply With Quote
  #3  
Old   
Jerry Stuckle
 
Posts: n/a

Default Re: Efficient range query - 11-24-2010 , 07:27 AM



On 11/24/2010 6:07 AM, Captain Paralytic wrote:
Quote:
Supposing I have a table of non-overlapping ranges thus:

CREATE TABLE `ranges` (
`start` int(10) unsigned NOT NULL DEFAULT '0',
`end` int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`start`)
);
INSERT INTO `ranges` VALUES (1,6),(20,28),(50,65);

Now I know that, since the ranges are non-overlapping, if I want to
know if a number lies in any of the ranges (I don't really care which
one, only that it is in one of them), all I need to do is to check the
largest start that is lower than or equal to my number to see if it is
also smaller than or equal to that row's end number.

A query such as
SELECT
1
FROM ranges
WHERE 21 BETWEEN start AND end
LIMIT 1

will need to check the first 2 records, whereas if I am checking 51,
it will need to check all of them.

Can anyone think of a more efficient structure/indexing/query which
will be able to give the DBMS a good enough clue so that it only needs
to check a single row?
Hi, Paul,

Actually, both will do a complete table scan - there is no way for MySQL
to know the ranges are non-overlapping.

An index on (start, end) should help, depending on the number of rows in
the table (3 rows will almost assuredly do a table scan no matter what).
EXPLAIN should help figure it out.

Also, I haven't checked, but I would hope that MySQL optimizer is smart
enough to know that

21 BETWEEN start AND end

is the same as

start <= 21 AND end >= 21

but I've never checked it specifically.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex (AT) attglobal (DOT) net
==================

Reply With Quote
  #4  
Old   
Peter H. Coffin
 
Posts: n/a

Default Re: Efficient range query - 11-24-2010 , 07:55 AM



On Wed, 24 Nov 2010 03:07:05 -0800 (PST), Captain Paralytic wrote:
Quote:
Supposing I have a table of non-overlapping ranges thus:

CREATE TABLE `ranges` (
`start` int(10) unsigned NOT NULL DEFAULT '0',
`end` int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`start`)
);
INSERT INTO `ranges` VALUES (1,6),(20,28),(50,65);

Now I know that, since the ranges are non-overlapping, if I want to
know if a number lies in any of the ranges (I don't really care which
one, only that it is in one of them), all I need to do is to check the
largest start that is lower than or equal to my number to see if it is
also smaller than or equal to that row's end number.

A query such as
SELECT
1
FROM ranges
WHERE 21 BETWEEN start AND end
LIMIT 1

will need to check the first 2 records, whereas if I am checking 51,
it will need to check all of them.

Can anyone think of a more efficient structure/indexing/query which
will be able to give the DBMS a good enough clue so that it only needs
to check a single row?
You may be overthinking this. Index both columns and let the optimizer
sort it out. (;

--
86. I will make sure that my doomsday device is up to code and properly
grounded.
--Peter Anspach's list of things to do as an Evil Overlord

Reply With Quote
  #5  
Old   
Captain Paralytic
 
Posts: n/a

Default Re: Efficient range query - 11-24-2010 , 08:03 AM



On Nov 24, 1:27*pm, Jerry Stuckle <jstuck... (AT) attglobal (DOT) net> wrote:
Quote:
On 11/24/2010 6:07 AM, Captain Paralytic wrote:
Supposing I have a table of non-overlapping ranges thus:

CREATE TABLE `ranges` (
* *`start` int(10) unsigned NOT NULL DEFAULT '0',
* *`end` int(10) unsigned NOT NULL DEFAULT '0',
* *PRIMARY KEY (`start`)
);
INSERT INTO `ranges` VALUES (1,6),(20,28),(50,65);

Now I know that, since the ranges are non-overlapping, if I want to
know if a number lies in any of the ranges (I don't really care which
one, only that it is in one of them), all I need to do is to check the
largest start that is lower than or equal to my number to see if it is
also smaller than or equal to that row's end number.

A query such as
SELECT
* *1
FROM ranges
WHERE 21 BETWEEN start AND end
LIMIT 1

will need to check the first 2 records, whereas if I am checking 51,
it will need to check all of them.

Can anyone think of a more efficient structure/indexing/query which
will be able to give the DBMS a good enough clue so that it only needs
to check a single row?

Hi, Paul,

Actually, both will do a complete table scan - there is no way for MySQL
to know the ranges are non-overlapping.

An index on (start, end) should help
Well I did think about this, but I can't actually picture in my mind a
way that it would help.

Quote:
depending on the number of rows in
the table (3 rows will almost assuredly do a table scan no matter what).
I'm sure you guessed that this was just a set up to demonstrate what
it is I want to do. The real table will be rather to big to include
here :-)

If I was doing this in VSAM, I would just position the pointer by that
record and then I would only need to read one record.

Reply With Quote
  #6  
Old   
Captain Paralytic
 
Posts: n/a

Default Re: Efficient range query - 11-24-2010 , 08:07 AM



On Nov 24, 1:26*pm, strawberry <zac.ca... (AT) gmail (DOT) com> wrote:
Quote:
Just a thought... have you considered spatial indexes
No I hadn't. I just took a look and I'm not sure it'd work because
this isn't spatial data.

Reply With Quote
  #7  
Old   
strawberry
 
Posts: n/a

Default Re: Efficient range query - 11-24-2010 , 08:17 AM



On Nov 24, 2:07*pm, Captain Paralytic <paul_laut... (AT) yahoo (DOT) com> wrote:
Quote:
On Nov 24, 1:26*pm, strawberry <zac.ca... (AT) gmail (DOT) com> wrote:

Just a thought... have you considered spatial indexes

No I hadn't. I just took a look and I'm not sure it'd work because
this isn't spatial data.
It isn't... but it could be ;-)

It's not something I've ever explored before myself but here's an
article on something similar...
http://explainextended.com/2009/09/2...sql/#more-3273

Reply With Quote
  #8  
Old   
Jerry Stuckle
 
Posts: n/a

Default Re: Efficient range query - 11-24-2010 , 08:19 AM



On 11/24/2010 9:03 AM, Captain Paralytic wrote:
Quote:
On Nov 24, 1:27 pm, Jerry Stuckle<jstuck... (AT) attglobal (DOT) net> wrote:
On 11/24/2010 6:07 AM, Captain Paralytic wrote:
Supposing I have a table of non-overlapping ranges thus:

CREATE TABLE `ranges` (
`start` int(10) unsigned NOT NULL DEFAULT '0',
`end` int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`start`)
);
INSERT INTO `ranges` VALUES (1,6),(20,28),(50,65);

Now I know that, since the ranges are non-overlapping, if I want to
know if a number lies in any of the ranges (I don't really care which
one, only that it is in one of them), all I need to do is to check the
largest start that is lower than or equal to my number to see if it is
also smaller than or equal to that row's end number.

A query such as
SELECT
1
FROM ranges
WHERE 21 BETWEEN start AND end
LIMIT 1

will need to check the first 2 records, whereas if I am checking 51,
it will need to check all of them.

Can anyone think of a more efficient structure/indexing/query which
will be able to give the DBMS a good enough clue so that it only needs
to check a single row?

Hi, Paul,

Actually, both will do a complete table scan - there is no way for MySQL
to know the ranges are non-overlapping.

An index on (start, end) should help
Well I did think about this, but I can't actually picture in my mind a
way that it would help.

depending on the number of rows in
the table (3 rows will almost assuredly do a table scan no matter what).
I'm sure you guessed that this was just a set up to demonstrate what
it is I want to do. The real table will be rather to big to include
here :-)

If I was doing this in VSAM, I would just position the pointer by that
record and then I would only need to read one record.

An index on start would return rows 1 and 2. An index on end would
return rows 2 and 3. The intersection of the two would return row 2.

The only question would be if the optimizer is smart enough to handle
it, and whether you need one combined index or two separate ones. I
know Oracle, SQL Server and DB2 will all do it, with separate indexes
being more efficient.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex (AT) attglobal (DOT) net
==================

Reply With Quote
  #9  
Old   
Axel Schwenke
 
Posts: n/a

Default Re: Efficient range query - 11-24-2010 , 11:51 AM



Captain Paralytic <paul_lautman (AT) yahoo (DOT) com> wrote:
Quote:
Supposing I have a table of non-overlapping ranges thus:

CREATE TABLE `ranges` (
`start` int(10) unsigned NOT NULL DEFAULT '0',
`end` int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`start`)
);
INSERT INTO `ranges` VALUES (1,6),(20,28),(50,65);

Now I know that, since the ranges are non-overlapping, if I want to
know if a number lies in any of the ranges (I don't really care which
one, only that it is in one of them), all I need to do is to check the
largest start that is lower than or equal to my number to see if it is
also smaller than or equal to that row's end number.

A query such as
SELECT
1
FROM ranges
WHERE 21 BETWEEN start AND end
LIMIT 1

will need to check the first 2 records, whereas if I am checking 51,
it will need to check all of them.

Can anyone think of a more efficient structure/indexing/query which
will be able to give the DBMS a good enough clue so that it only needs
to check a single row?
If you know for sure that your ranges are non-overlapping then you
should use that knowledge to write a better query. I.e.

SELECT start, end
FROM ranges
WHERE start <= 21
ORDER BY start DESC
LIMIT 1;

This will find the last (with the largest `start` number)
range starting below 21. MySQL executes it like so:

EXPLAIN SELECT start, end FROM ranges WHERE start <= 21 ORDER BY start DESC LIMIT 1\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: ranges
type: range
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: NULL
rows: 2
Extra: Using where

It does a range scan to read the records in sort order. The LIMIT
makes this scan stop after the first hit. This is *very* efficient,
because to find the starting point for the range scan, MySQL uses
an index access:

flush status;
SELECT start, end FROM ranges WHERE start <= 21 ORDER BY start DESC LIMIT 1;
show status like 'handler_read%';
+-----------------------+-------+
Quote:
Variable_name | Value |
+-----------------------+-------+
Handler_read_first | 0 |
Handler_read_key | 1 |
Handler_read_next | 0 |
Handler_read_prev | 0 |
Handler_read_rnd | 0 |
Handler_read_rnd_next | 0 |
+-----------------------+-------+


Still we aren't there yet because that query will also find
non-matches:

SELECT start, end FROM ranges WHERE start <= 8 ORDER BY start DESC LIMIT 1;
+-------+-----+
Quote:
start | end |
+-------+-----+
1 | 6 |
+-------+-----+
1 row in set (0,00 sec)


But this can be easily cured by wrapping the first result into another
query:

SELECT * FROM
(SELECT start, end FROM ranges WHERE start <= 8 ORDER BY start DESC LIMIT 1) a
WHERE 8 BETWEEN a.start AND a.end;
Empty set (0,00 sec)

Here the outer query will have to scan a "table" with only one row.
MySQL handles this case efficiently, using the "system" access type.


XL

Reply With Quote
  #10  
Old   
Jerry Stuckle
 
Posts: n/a

Default Re: Efficient range query - 11-24-2010 , 12:27 PM



On 11/24/2010 12:51 PM, Axel Schwenke wrote:
Quote:
Captain Paralytic<paul_lautman (AT) yahoo (DOT) com> wrote:
Supposing I have a table of non-overlapping ranges thus:

CREATE TABLE `ranges` (
`start` int(10) unsigned NOT NULL DEFAULT '0',
`end` int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`start`)
);
INSERT INTO `ranges` VALUES (1,6),(20,28),(50,65);

Now I know that, since the ranges are non-overlapping, if I want to
know if a number lies in any of the ranges (I don't really care which
one, only that it is in one of them), all I need to do is to check the
largest start that is lower than or equal to my number to see if it is
also smaller than or equal to that row's end number.

A query such as
SELECT
1
FROM ranges
WHERE 21 BETWEEN start AND end
LIMIT 1

will need to check the first 2 records, whereas if I am checking 51,
it will need to check all of them.

Can anyone think of a more efficient structure/indexing/query which
will be able to give the DBMS a good enough clue so that it only needs
to check a single row?

If you know for sure that your ranges are non-overlapping then you
should use that knowledge to write a better query. I.e.

SELECT start, end
FROM ranges
WHERE start<= 21
ORDER BY start DESC
LIMIT 1;

This will find the last (with the largest `start` number)
range starting below 21. MySQL executes it like so:

EXPLAIN SELECT start, end FROM ranges WHERE start<= 21 ORDER BY start DESC LIMIT 1\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: ranges
type: range
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: NULL
rows: 2
Extra: Using where

It does a range scan to read the records in sort order. The LIMIT
makes this scan stop after the first hit. This is *very* efficient,
because to find the starting point for the range scan, MySQL uses
an index access:

flush status;
SELECT start, end FROM ranges WHERE start<= 21 ORDER BY start DESC LIMIT 1;
show status like 'handler_read%';
+-----------------------+-------+
| Variable_name | Value |
+-----------------------+-------+
| Handler_read_first | 0 |
| Handler_read_key | 1 |
| Handler_read_next | 0 |
| Handler_read_prev | 0 |
| Handler_read_rnd | 0 |
| Handler_read_rnd_next | 0 |
+-----------------------+-------+


Still we aren't there yet because that query will also find
non-matches:

SELECT start, end FROM ranges WHERE start<= 8 ORDER BY start DESC LIMIT 1;
+-------+-----+
| start | end |
+-------+-----+
| 1 | 6 |
+-------+-----+
1 row in set (0,00 sec)


But this can be easily cured by wrapping the first result into another
query:

SELECT * FROM
(SELECT start, end FROM ranges WHERE start<= 8 ORDER BY start DESC LIMIT 1) a
WHERE 8 BETWEEN a.start AND a.end;
Empty set (0,00 sec)

Here the outer query will have to scan a "table" with only one row.
MySQL handles this case efficiently, using the "system" access type.


XL
But what if the value isn't within a range? You're query will return
incorrect results.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex (AT) attglobal (DOT) net
==================

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.