dbTalk Databases Forums  

Groupings based on time offset

comp.databases.postgresql comp.databases.postgresql


Discuss Groupings based on time offset in the comp.databases.postgresql forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Kurt
 
Posts: n/a

Default Groupings based on time offset - 07-01-2007 , 06:25 AM






Hi All,

I am pretty new to SQL and having trouble trying to formulate a query
from what I have been reading in a couple of books (and got a
suggestion from a friend that included cursors and is totally over my
head). I would really appreciate any help getting this figured out.

I am trying to group track positions from a ship into sections split
by gaps of no recorded position from a particular ship for a period of
time (say 60 minutes). I have a table "position_reports" something
like this:

key SERIAL PRIMARY KEY, ship_id INTEGER, position GEOMETRY, received
TIMESTAMP, received_sec INTEGER

position is a postgis POINT. The received_sec is the seconds since
the epoc from the original logs that I have converted to a TIMESTAMP.
Figure this query might be easier with one over the other?

The goal is to be able to generate LINESTRINGs from each ship by time
block that I record the ship. Right now, I end up doing a MakeLine
for each ship which joins different passes of the ship through an area
(which has ships flying over land quite frequently

1 123 someXY 12:01
2 123 someXY 12:04
3 123 someXY 12:13

# more than 60 minutes gap, so split groups here

4 123 someXY 15:23
5 123 someXY 15:46
6 123 someXY 15:53
7 123 someXY 16:01

Thanks,
-Kurt


Reply With Quote
  #2  
Old   
Laurenz Albe
 
Posts: n/a

Default Re: Groupings based on time offset - 07-02-2007 , 02:36 AM






Kurt <schwehr (AT) gmail (DOT) com> wrote:
Quote:
I am trying to group track positions from a ship into sections split
by gaps of no recorded position from a particular ship for a period of
time (say 60 minutes).
One of the challenges of computer engineering is to translate a problem
from some other field (say ship logistics) into a language that a
computer engineer (in this case a database specialist) understands.

Could you assume that role and explain "for idiots" what you are trying
to achieve?

Yours,
Laurenz Albe


Reply With Quote
  #3  
Old   
Kurt
 
Posts: n/a

Default Re: Groupings based on time offset - 07-02-2007 , 11:09 AM




Quote:
Could you assume that role and explain "for idiots" what you are trying
to achieve?

Yours,
Laurenz Albe
Hi Laurenz,

I am trying to group into a series of runs where the runs are
separated by periods when there are no received points for 60
minutes. Does that make any sense?

select key,userid,cg_sec,cg_timestamp from oneShip;
key | userid | cg_sec | cg_timestamp
-----+-----------+------------+---------------------
251 | 366725230 | 1178661410 | 2007-05-08 21:56:50
252 | 366725230 | 1178661421 | 2007-05-08 21:57:01
253 | 366725230 | 1178661561 | 2007-05-08 21:59:21
254 | 366725230 | 1178661710 | 2007-05-08 22:01:50
255 | 366725230 | 1178661900 | 2007-05-08 22:05:00
256 | 366725230 | 1178661940 | 2007-05-08 22:05:40
257 | 366725230 | 1178663311 | 2007-05-08 22:28:31
258 | 366725230 | 1178663360 | 2007-05-08 22:29:20
259 | 366725230 | 1178663441 | 2007-05-08 22:30:41
260 | 366725230 | 1178663540 | 2007-05-08 22:32:20
-- here is the time jump between the two that I need to split on
266 | 366725230 | 1178729461 | 2007-05-09 16:51:01
267 | 366725230 | 1178730221 | 2007-05-09 17:03:41
(12 rows)

Where the results should be this for the separated transits:

userid | startKey | endKey
366725230 | 251 | 260
366725230 | 266 | 267

I've also put a dump of the db up on the web of "pg_dump --create ais>
ais_pgdump.sql" in case that might be easier to deal with.

http://vislab-ccom.unh.edu/~schwehr/ais/tmp/


Or to state it differently, this is how I solved the problem in python/
psycopg2. I could still really use a SQL solution both for learning
SQL and for down the road when I want to craft a trigger to make
transits when new data arrives:

psql ais_2006
ais_2006=# CREATE INDEX userid_idx ON position (userid);
took about 30 seconds...

ais_2006=# \d position
....
Indexes:
"position_pkey" PRIMARY KEY, btree ("key")
"userid_idx" btree (userid)
....

../ais_pg_create_transit_table.py -C -d ais_2006

#!/usr/bin/env python

__version__ = '$Revision: 4791 $'.split()[1]

import os,sys

if __name__=='__main__':
from optparse import OptionParser
parser = OptionParser(usage="%prog [options] ",version="%prog
"+__version__)
parser.add_option('-d','--database-
name',dest='databaseName',default='ais',
help='Name of database within the postgres server [default:
%default]')
parser.add_option('-D','--database-
host',dest='databaseHost',default='localhost',
help='Host name of the computer serving the dbx [default:
%default]')
defaultUser = os.getlogin()
parser.add_option('-u','--database-
user',dest='databaseUser',default=defaultUser,
help='Host name of the to access the database with [default:
%default]')
parser.add_option('-C','--with-
create',dest='createTables',default=False, action='store_true',
help='Do not create the tables in the database')

parser.add_option('-t','--delta-time',dest='deltaT'
,default=60*60
,type='int'
,help='Time gap in seconds that determines when a new transit
starts [default: %default]')

(options,args) = parser.parse_args()

import psycopg2 as psycopg
deltaT = options.deltaT
connectStr = "dbname='"+options.databaseName+"'
user='"+options.databaseUser+"' host='"+options.databaseHost+"'"
cx = psycopg.connect(connectStr)
cu = cx.cursor()

if options.createTables:
cu.execute('''
CREATE TABLE transit
(
id serial NOT NULL,
userid integer NOT NULL,
startpos integer NOT NULL,
endpos integer NOT NULL,
CONSTRAINT transit_pkey PRIMARY KEY (id)
);
''')
cx.commit()

# BEGIN algorithm to find transits...

cu.execute('SELECT DISTINCT(userid) FROM position;');
ships= [ship[0] for ship in cu.fetchall()]
print ships
for ship in ships:
print 'Processing ship: ',ship
cu.execute('SELECT key,cg_sec FROM position WHERE userid=
%s ORDER BY cg_sec',(ship,))
startKey,startTime=cu.fetchone()
print startKey,startTime
lastKey,lastTime=startKey,startTime
needFinal=True
# Now go through the rest of the ship position records
for row in cu.fetchall():
needFinal=True
key,time = row
if time>lastTime+deltaT:
print 'FOUND',startKey,startTime,'-
Quote:
',lastKey,lastTime
cu.execute('INSERT INTO transit
(userid,startPos,endPos) VALUES (%s,%s,%s);',(ship,startKey,lastKey))
startKey,startTime=key,time
needFinal=False
lastKey,lastTime=key,time # Save for the next loop
#sys.exit()

if needFinal:
print 'Final transit...'
print 'FOUND',startKey,startTime,'->',lastKey,lastTime
cu.execute('INSERT INTO transit
(userid,startPos,endPos) VALUES (%s,%s,%s);',(ship,startKey,lastKey))
cx.commit()



Reply With Quote
  #4  
Old   
Laurenz Albe
 
Posts: n/a

Default Re: Groupings based on time offset - 07-03-2007 , 05:09 AM



Kurt <schwehr (AT) gmail (DOT) com> wrote:
Quote:
I am trying to group into a series of runs where the runs are
separated by periods when there are no received points for 60
minutes. Does that make any sense?

select key,userid,cg_sec,cg_timestamp from oneShip;
key | userid | cg_sec | cg_timestamp
-----+-----------+------------+---------------------
251 | 366725230 | 1178661410 | 2007-05-08 21:56:50
252 | 366725230 | 1178661421 | 2007-05-08 21:57:01
253 | 366725230 | 1178661561 | 2007-05-08 21:59:21
254 | 366725230 | 1178661710 | 2007-05-08 22:01:50
255 | 366725230 | 1178661900 | 2007-05-08 22:05:00
256 | 366725230 | 1178661940 | 2007-05-08 22:05:40
257 | 366725230 | 1178663311 | 2007-05-08 22:28:31
258 | 366725230 | 1178663360 | 2007-05-08 22:29:20
259 | 366725230 | 1178663441 | 2007-05-08 22:30:41
260 | 366725230 | 1178663540 | 2007-05-08 22:32:20
-- here is the time jump between the two that I need to split on
266 | 366725230 | 1178729461 | 2007-05-09 16:51:01
267 | 366725230 | 1178730221 | 2007-05-09 17:03:41
(12 rows)

Where the results should be this for the separated transits:

userid | startKey | endKey
366725230 | 251 | 260
366725230 | 266 | 267
This query definitely does NOT perform well, as it will execute several
sequential scans on the table, but it should produce what you want
(there may be errors if the time gap is exactly 60 minutes, but fixing
those is left as an exercise to the reader):

SELECT ship1.key AS startkey, ship2.key AS endkey
FROM (SELECT s.key, s.cg_sec
FROM oneShip AS s
WHERE NOT EXISTS
(SELECT 1 FROM oneShip AS s2
WHERE (s.cg_sec > s2.cg_sec)
AND (s.cg_sec <= s2.cg_sec + 3600))) AS ship1,
(SELECT s.key, s.cg_sec
FROM oneShip AS s
WHERE NOT EXISTS
(SELECT 1 FROM oneShip AS s2
WHERE (s.cg_sec < s2.cg_sec)
AND (s.cg_sec >= s2.cg_sec - 3600))) AS ship2
WHERE ship1.cg_sec < ship2.cg_sec
AND NOT EXISTS
(SELECT 1
FROM (SELECT s.key, s.cg_sec
FROM oneShip AS s
WHERE NOT EXISTS
(SELECT 1 FROM oneShip AS s2
WHERE (s.cg_sec < s2.cg_sec)
AND (s.cg_sec >= s2.cg_sec - 3600))) AS ship3
WHERE (ship1.cg_sec < ship3.cg_sec)
AND (ship3.cg_sec < ship2.cg_sec));

startkey | endkey
----------+--------
251 | 260
266 | 267
(2 rows)

I would recommend to not code this as an SQL query because it can be
done much easier and with better performance as a function
RETURNS SETOF oneShip in PL/pgSQL or any language of your choice.

Yours,
Laurenz Albe


Reply With Quote
  #5  
Old   
Kurt
 
Posts: n/a

Default Re: Groupings based on time offset - 07-03-2007 , 05:37 AM



Laurenz, Thanks very much!

-kurt

Quote:
This query definitely does NOT perform well, as it will execute several
sequential scans on the table, but it should produce what you want
(there may be errors if the time gap is exactly 60 minutes, but fixing
those is left as an exercise to the reader):

SELECT ship1.key AS startkey, ship2.key AS endkey
....

I would recommend to not code this as an SQL query because it can be
done much easier and with better performance as a function
RETURNS SETOF oneShip in PL/pgSQL or any language of your choice.

Yours,
Laurenz Albe



Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.