![]() | |
#1
| |||
| |||
|
#2
| |||
| |||
|
|
Supposing I have a table of non-overlapping ranges thus: CREATE TABLE `ranges` ( * `start` int(10) unsigned NOT NULL DEFAULT '0', * `end` int(10) unsigned NOT NULL DEFAULT '0', * PRIMARY KEY (`start`) ); INSERT INTO `ranges` VALUES (1,6),(20,28),(50,65); Now I know that, since the ranges are non-overlapping, if I want to know if a number lies in any of the ranges (I don't really care which one, only that it is in one of them), all I need to do is to check the largest start that is lower than or equal to my number to see if it is also smaller than or equal to that row's end number. A query such as SELECT * 1 FROM ranges WHERE 21 BETWEEN start AND end LIMIT 1 will need to check the first 2 records, whereas if I am checking 51, it will need to check all of them. Can anyone think of a more efficient structure/indexing/query which will be able to give the DBMS a good enough clue so that it only needs to check a single row? |
#3
| |||
| |||
|
|
Supposing I have a table of non-overlapping ranges thus: CREATE TABLE `ranges` ( `start` int(10) unsigned NOT NULL DEFAULT '0', `end` int(10) unsigned NOT NULL DEFAULT '0', PRIMARY KEY (`start`) ); INSERT INTO `ranges` VALUES (1,6),(20,28),(50,65); Now I know that, since the ranges are non-overlapping, if I want to know if a number lies in any of the ranges (I don't really care which one, only that it is in one of them), all I need to do is to check the largest start that is lower than or equal to my number to see if it is also smaller than or equal to that row's end number. A query such as SELECT 1 FROM ranges WHERE 21 BETWEEN start AND end LIMIT 1 will need to check the first 2 records, whereas if I am checking 51, it will need to check all of them. Can anyone think of a more efficient structure/indexing/query which will be able to give the DBMS a good enough clue so that it only needs to check a single row? |
#4
| |||
| |||
|
|
Supposing I have a table of non-overlapping ranges thus: CREATE TABLE `ranges` ( `start` int(10) unsigned NOT NULL DEFAULT '0', `end` int(10) unsigned NOT NULL DEFAULT '0', PRIMARY KEY (`start`) ); INSERT INTO `ranges` VALUES (1,6),(20,28),(50,65); Now I know that, since the ranges are non-overlapping, if I want to know if a number lies in any of the ranges (I don't really care which one, only that it is in one of them), all I need to do is to check the largest start that is lower than or equal to my number to see if it is also smaller than or equal to that row's end number. A query such as SELECT 1 FROM ranges WHERE 21 BETWEEN start AND end LIMIT 1 will need to check the first 2 records, whereas if I am checking 51, it will need to check all of them. Can anyone think of a more efficient structure/indexing/query which will be able to give the DBMS a good enough clue so that it only needs to check a single row? |
#5
| |||
| |||
|
|
On 11/24/2010 6:07 AM, Captain Paralytic wrote: Supposing I have a table of non-overlapping ranges thus: CREATE TABLE `ranges` ( * *`start` int(10) unsigned NOT NULL DEFAULT '0', * *`end` int(10) unsigned NOT NULL DEFAULT '0', * *PRIMARY KEY (`start`) ); INSERT INTO `ranges` VALUES (1,6),(20,28),(50,65); Now I know that, since the ranges are non-overlapping, if I want to know if a number lies in any of the ranges (I don't really care which one, only that it is in one of them), all I need to do is to check the largest start that is lower than or equal to my number to see if it is also smaller than or equal to that row's end number. A query such as SELECT * *1 FROM ranges WHERE 21 BETWEEN start AND end LIMIT 1 will need to check the first 2 records, whereas if I am checking 51, it will need to check all of them. Can anyone think of a more efficient structure/indexing/query which will be able to give the DBMS a good enough clue so that it only needs to check a single row? Hi, Paul, Actually, both will do a complete table scan - there is no way for MySQL to know the ranges are non-overlapping. An index on (start, end) should help Well I did think about this, but I can't actually picture in my mind a |
|
depending on the number of rows in the table (3 rows will almost assuredly do a table scan no matter what). I'm sure you guessed that this was just a set up to demonstrate what |
#6
| |||
| |||
|
|
Just a thought... have you considered spatial indexes No I hadn't. I just took a look and I'm not sure it'd work because |
#7
| |||
| |||
|
|
On Nov 24, 1:26*pm, strawberry <zac.ca... (AT) gmail (DOT) com> wrote: Just a thought... have you considered spatial indexes No I hadn't. I just took a look and I'm not sure it'd work because this isn't spatial data. |
#8
| |||
| |||
|
|
On Nov 24, 1:27 pm, Jerry Stuckle<jstuck... (AT) attglobal (DOT) net> wrote: On 11/24/2010 6:07 AM, Captain Paralytic wrote: Supposing I have a table of non-overlapping ranges thus: CREATE TABLE `ranges` ( `start` int(10) unsigned NOT NULL DEFAULT '0', `end` int(10) unsigned NOT NULL DEFAULT '0', PRIMARY KEY (`start`) ); INSERT INTO `ranges` VALUES (1,6),(20,28),(50,65); Now I know that, since the ranges are non-overlapping, if I want to know if a number lies in any of the ranges (I don't really care which one, only that it is in one of them), all I need to do is to check the largest start that is lower than or equal to my number to see if it is also smaller than or equal to that row's end number. A query such as SELECT 1 FROM ranges WHERE 21 BETWEEN start AND end LIMIT 1 will need to check the first 2 records, whereas if I am checking 51, it will need to check all of them. Can anyone think of a more efficient structure/indexing/query which will be able to give the DBMS a good enough clue so that it only needs to check a single row? Hi, Paul, Actually, both will do a complete table scan - there is no way for MySQL to know the ranges are non-overlapping. An index on (start, end) should help Well I did think about this, but I can't actually picture in my mind a way that it would help. depending on the number of rows in the table (3 rows will almost assuredly do a table scan no matter what). I'm sure you guessed that this was just a set up to demonstrate what it is I want to do. The real table will be rather to big to include here :-) If I was doing this in VSAM, I would just position the pointer by that record and then I would only need to read one record. |
#9
| |||
| |||
|
|
Supposing I have a table of non-overlapping ranges thus: CREATE TABLE `ranges` ( `start` int(10) unsigned NOT NULL DEFAULT '0', `end` int(10) unsigned NOT NULL DEFAULT '0', PRIMARY KEY (`start`) ); INSERT INTO `ranges` VALUES (1,6),(20,28),(50,65); Now I know that, since the ranges are non-overlapping, if I want to know if a number lies in any of the ranges (I don't really care which one, only that it is in one of them), all I need to do is to check the largest start that is lower than or equal to my number to see if it is also smaller than or equal to that row's end number. A query such as SELECT 1 FROM ranges WHERE 21 BETWEEN start AND end LIMIT 1 will need to check the first 2 records, whereas if I am checking 51, it will need to check all of them. Can anyone think of a more efficient structure/indexing/query which will be able to give the DBMS a good enough clue so that it only needs to check a single row? |
|
Variable_name | Value | +-----------------------+-------+ Handler_read_first | 0 | Handler_read_key | 1 | Handler_read_next | 0 | Handler_read_prev | 0 | Handler_read_rnd | 0 | Handler_read_rnd_next | 0 | +-----------------------+-------+ |
|
start | end | +-------+-----+ 1 | 6 | +-------+-----+ |
#10
| |||
| |||
|
|
Captain Paralytic<paul_lautman (AT) yahoo (DOT) com> wrote: Supposing I have a table of non-overlapping ranges thus: CREATE TABLE `ranges` ( `start` int(10) unsigned NOT NULL DEFAULT '0', `end` int(10) unsigned NOT NULL DEFAULT '0', PRIMARY KEY (`start`) ); INSERT INTO `ranges` VALUES (1,6),(20,28),(50,65); Now I know that, since the ranges are non-overlapping, if I want to know if a number lies in any of the ranges (I don't really care which one, only that it is in one of them), all I need to do is to check the largest start that is lower than or equal to my number to see if it is also smaller than or equal to that row's end number. A query such as SELECT 1 FROM ranges WHERE 21 BETWEEN start AND end LIMIT 1 will need to check the first 2 records, whereas if I am checking 51, it will need to check all of them. Can anyone think of a more efficient structure/indexing/query which will be able to give the DBMS a good enough clue so that it only needs to check a single row? If you know for sure that your ranges are non-overlapping then you should use that knowledge to write a better query. I.e. SELECT start, end FROM ranges WHERE start<= 21 ORDER BY start DESC LIMIT 1; This will find the last (with the largest `start` number) range starting below 21. MySQL executes it like so: EXPLAIN SELECT start, end FROM ranges WHERE start<= 21 ORDER BY start DESC LIMIT 1\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: ranges type: range possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: NULL rows: 2 Extra: Using where It does a range scan to read the records in sort order. The LIMIT makes this scan stop after the first hit. This is *very* efficient, because to find the starting point for the range scan, MySQL uses an index access: flush status; SELECT start, end FROM ranges WHERE start<= 21 ORDER BY start DESC LIMIT 1; show status like 'handler_read%'; +-----------------------+-------+ | Variable_name | Value | +-----------------------+-------+ | Handler_read_first | 0 | | Handler_read_key | 1 | | Handler_read_next | 0 | | Handler_read_prev | 0 | | Handler_read_rnd | 0 | | Handler_read_rnd_next | 0 | +-----------------------+-------+ Still we aren't there yet because that query will also find non-matches: SELECT start, end FROM ranges WHERE start<= 8 ORDER BY start DESC LIMIT 1; +-------+-----+ | start | end | +-------+-----+ | 1 | 6 | +-------+-----+ 1 row in set (0,00 sec) But this can be easily cured by wrapping the first result into another query: SELECT * FROM (SELECT start, end FROM ranges WHERE start<= 8 ORDER BY start DESC LIMIT 1) a WHERE 8 BETWEEN a.start AND a.end; Empty set (0,00 sec) Here the outer query will have to scan a "table" with only one row. MySQL handles this case efficiently, using the "system" access type. XL |
![]() |
| Thread Tools | |
| Display Modes | |
| |