dbTalk Databases Forums  

Re: Self-Join conundrum

comp.databases.rdb comp.databases.rdb


Discuss Re: Self-Join conundrum in the comp.databases.rdb forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
ventre-à-pattes
 
Posts: n/a

Default Re: Self-Join conundrum - 06-25-2003 , 06:44 PM






"VisionSet" <spam (AT) ntlworld (DOT) com> wrote

Quote:
Consider this table (Person) with example row:

pKey | name | city | county | coordX | coordY
4321 | Tom | Peel | Ville | 123654 | 987123
And this self join:
SELECT * FROM Person AS P1
LEFT JOIN Person AS P2 ON
sqrt(pow(P1.coordX - P2.coordX, 2) + pow(P1.coordY - P2.coordY, 2)) < 666
GROUP BY P1.pKey
So each resulting row will represent a group of Persons that are located
within 666 of each other.
However joining on this pythag expression requires the calculation on
every
row for every row, cartesian join style.
I have 2000 rows in this table, it takes along time
by adding an addition join condition like
p1.county = p2.county
I can perform the query in a fraction of a second, but this is a hack,
what
about boundary conditions where a person is located between two cities?
Anyone have any solution to this?

Not sure if this would work, but assuming your coordinate fields are
indexed, you could restrict first on (ABS(P1.X - P2.X) < 666) AND
(ABS(P1.Y - P2.Y) < 666)) which is implicit as long as your coordinates are
real numbers (no imaginary part).
These should compute faster and, as long as MySQL does not perform a
complete boolean evaluation, speed up the query.
The restriction on counties will not work if you have two locations closer
to each other than your limit, each of them in a different county (I guess
this can happen).
Keep us posted on the results
--
Eric Lafontaine




Reply With Quote
  #2  
Old   
Paul G. Brown
 
Posts: n/a

Default Re: Self-Join conundrum - 06-25-2003 , 07:46 PM






"VisionSet" <spam (AT) ntlworld (DOT) com> wrote

Quote:
Consider this table (Person) with example row:

pKey | name | city | county | coordX | coordY
----------------------------------------
4321 | Tom | Peel | Ville | 123654 | 987123


And this self join:

SELECT * FROM Person AS P1
LEFT JOIN Person AS P2 ON
sqrt(pow(P1.coordX - P2.coordX, 2) + pow(P1.coordY - P2.coordY, 2)) < 666
GROUP BY P1.pKey
Use postgres. It comes with a set of spatial types and operators, and a
spatial index to speed up the query.

http://www.postgresql.org/docs/view....tatype1681.htm

CREATE TABLE Person (
PKey INTEGER PRIMARY KEY,
Name VARCHAR(32) NOT NULL,
City VARCHAR(32) NOT NULL,
County VARCHAR(32) NOT NULL,
Where Point NOT NULL
);

SELECT P1.Person, COUNT(*)
FROM Person P1, Person P2
WHERE Contains ( Circle ( P1.Where, 666), P2.Where )
GROUP BY P1.Person;


Reply With Quote
  #3  
Old   
programmer
 
Posts: n/a

Default Re: Self-Join conundrum - 06-26-2003 , 10:35 AM



This performs checks that the N-S and E-W distances are within the given
distance, cutting down on the number of rows that need the sqrt check.


SELECT * FROM Person AS P1
LEFT JOIN Person AS P2 ON
abs(P1.coordX - P2.coordX) < 666
and abs(P1.coordY - P2.coordY) < 666
and sqrt(pow(P1.coordX - P2.coordX, 2) + pow(P1.coordY - P2.coordY, 2)) <
666
GROUP BY P1.pKey






Reply With Quote
  #4  
Old   
VisionSet
 
Posts: n/a

Default Re: Self-Join conundrum - 06-26-2003 , 03:07 PM




"ventre-à-pattes" <news (AT) ventre-a-pattes (DOT) com> wrote

Quote:
"VisionSet" <spam (AT) ntlworld (DOT) com> wrote in message
newsIjKa.357$%a.5664 (AT) newsfep4-glfd (DOT) server.ntli.net...

Not sure if this would work, but assuming your coordinate fields are
indexed, you could restrict first on (ABS(P1.X - P2.X) < 666) AND
(ABS(P1.Y - P2.Y) < 666)) which is implicit as long as your coordinates
are
real numbers (no imaginary part).
These should compute faster and, as long as MySQL does not perform a
complete boolean evaluation, speed up the query.
mySQL does not use indexes on

ABS(P1.X - P2.X) < 40

or on

P2.y BETWEEN P1.y - 40 AND P1.y + 40

It would use an index on

P2.y BETWEEN 7000 AND 8000

So I'm still left with a 2000x2000 join, and a 2 minute query time.

There was another suggestion about creating a neighbours table, and I could
do this, despite thinking before I couldn't
Even though the 666 value is actually arrived at at runtime, I can give it
an upper limit, if I use this upper limit to limit the neighbours in the new
table, then this should cut down the number of rows to examine in the
pythag. function.
However this new table will be huge, I estimate possibly 50,000 rows say if
each 'city' has 25 neighbours.
(They aren't really cities of course, thats just an analogy.)

mySQL 4.1 has geometrical abilities that should fit the bill, anyone know
when this will production released?

Cheers,
--
Mike W





Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.