dbTalk Databases Forums  

Querying tables with different aggregation levels

comp.databases comp.databases


Discuss Querying tables with different aggregation levels in the comp.databases forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Mark Wilden
 
Posts: n/a

Default Querying tables with different aggregation levels - 09-17-2008 , 05:18 PM






We have been considering improving the performance of date-based
queries by maintaining two tables: one that contains a row for each
event, and another that aggregates events by month.

We could support any user query with three table queries: one query
against the base table for the beginning of the requested period up
until the next month boundary, one query against the aggregate table
from that month boundary to the boundary nearest the end of the
requested period, and one query against the base table from that month
boundary to the end of the requested period.

1) Has anyone heard of such a thing being done?

2) What is it called? I can't even figure out how to Google it. Our
working name is "horizontal rollup."

advthanksance,

///ark

Reply With Quote
  #2  
Old   
Robert Klemme
 
Posts: n/a

Default Re: Querying tables with different aggregation levels - 09-18-2008 , 01:40 AM






On 18.09.2008 00:18, Mark Wilden wrote:
Quote:
We have been considering improving the performance of date-based
queries by maintaining two tables: one that contains a row for each
event, and another that aggregates events by month.

We could support any user query with three table queries: one query
against the base table for the beginning of the requested period up
until the next month boundary, one query against the aggregate table
from that month boundary to the boundary nearest the end of the
requested period, and one query against the base table from that month
boundary to the end of the requested period.
If I understand you correctly you want to have overlap between data in
tables. The more familiar scheme seems to be that you have a table with
current data (up to x days old) and one table with aggregated data for
older entries. So there is no overlap. In that case you could easily
define a view with UNION ALL which you then query.

If you want to have redundant storage I would prefer to use a
materialized view for your query of aggregated data. You would for
example have the MV aggregate on day basis so you loose precision if the
original data contained timestamps. But you could at least do queries
with day precision efficiently.

You should probably give more details about your data and the types of
querying and aggregation you do.

Cheers

robert


Reply With Quote
  #3  
Old   
Robert Klemme
 
Posts: n/a

Default Re: Querying tables with different aggregation levels - 09-18-2008 , 01:40 AM



On 18.09.2008 00:18, Mark Wilden wrote:
Quote:
We have been considering improving the performance of date-based
queries by maintaining two tables: one that contains a row for each
event, and another that aggregates events by month.

We could support any user query with three table queries: one query
against the base table for the beginning of the requested period up
until the next month boundary, one query against the aggregate table
from that month boundary to the boundary nearest the end of the
requested period, and one query against the base table from that month
boundary to the end of the requested period.
If I understand you correctly you want to have overlap between data in
tables. The more familiar scheme seems to be that you have a table with
current data (up to x days old) and one table with aggregated data for
older entries. So there is no overlap. In that case you could easily
define a view with UNION ALL which you then query.

If you want to have redundant storage I would prefer to use a
materialized view for your query of aggregated data. You would for
example have the MV aggregate on day basis so you loose precision if the
original data contained timestamps. But you could at least do queries
with day precision efficiently.

You should probably give more details about your data and the types of
querying and aggregation you do.

Cheers

robert


Reply With Quote
  #4  
Old   
Robert Klemme
 
Posts: n/a

Default Re: Querying tables with different aggregation levels - 09-18-2008 , 01:40 AM



On 18.09.2008 00:18, Mark Wilden wrote:
Quote:
We have been considering improving the performance of date-based
queries by maintaining two tables: one that contains a row for each
event, and another that aggregates events by month.

We could support any user query with three table queries: one query
against the base table for the beginning of the requested period up
until the next month boundary, one query against the aggregate table
from that month boundary to the boundary nearest the end of the
requested period, and one query against the base table from that month
boundary to the end of the requested period.
If I understand you correctly you want to have overlap between data in
tables. The more familiar scheme seems to be that you have a table with
current data (up to x days old) and one table with aggregated data for
older entries. So there is no overlap. In that case you could easily
define a view with UNION ALL which you then query.

If you want to have redundant storage I would prefer to use a
materialized view for your query of aggregated data. You would for
example have the MV aggregate on day basis so you loose precision if the
original data contained timestamps. But you could at least do queries
with day precision efficiently.

You should probably give more details about your data and the types of
querying and aggregation you do.

Cheers

robert


Reply With Quote
  #5  
Old   
--CELKO--
 
Posts: n/a

Default Re: Querying tables with different aggregation levels - 09-18-2008 , 08:13 AM



A more common schema is to have a table of reporting periods and use a
VIEW to get the summary information. Here is a skeleton:

CREATE TABLE Events
(event_date DATE NOT NULL,
...);

CREATE TABLE ReportPeriods
(report_period_name CHAR(15) NOT NULL PRIMARY KEY,
event_start_date DATE NOT NULL,
event_end_date DATE NOT NULL,
CHECK(event_start_date < event_end_date),
..);

CREATE VIEW EventSummary
AS
SELECT ..
FROM Events AS E, ReportPeriods AS R
WHERE E.event_date BETWEEN R.event_start_date AND R. event_end_date
GROUP BY R.report_period_name;

Reply With Quote
  #6  
Old   
--CELKO--
 
Posts: n/a

Default Re: Querying tables with different aggregation levels - 09-18-2008 , 08:13 AM



A more common schema is to have a table of reporting periods and use a
VIEW to get the summary information. Here is a skeleton:

CREATE TABLE Events
(event_date DATE NOT NULL,
...);

CREATE TABLE ReportPeriods
(report_period_name CHAR(15) NOT NULL PRIMARY KEY,
event_start_date DATE NOT NULL,
event_end_date DATE NOT NULL,
CHECK(event_start_date < event_end_date),
..);

CREATE VIEW EventSummary
AS
SELECT ..
FROM Events AS E, ReportPeriods AS R
WHERE E.event_date BETWEEN R.event_start_date AND R. event_end_date
GROUP BY R.report_period_name;

Reply With Quote
  #7  
Old   
--CELKO--
 
Posts: n/a

Default Re: Querying tables with different aggregation levels - 09-18-2008 , 08:13 AM



A more common schema is to have a table of reporting periods and use a
VIEW to get the summary information. Here is a skeleton:

CREATE TABLE Events
(event_date DATE NOT NULL,
...);

CREATE TABLE ReportPeriods
(report_period_name CHAR(15) NOT NULL PRIMARY KEY,
event_start_date DATE NOT NULL,
event_end_date DATE NOT NULL,
CHECK(event_start_date < event_end_date),
..);

CREATE VIEW EventSummary
AS
SELECT ..
FROM Events AS E, ReportPeriods AS R
WHERE E.event_date BETWEEN R.event_start_date AND R. event_end_date
GROUP BY R.report_period_name;

Reply With Quote
  #8  
Old   
Mark Wilden
 
Posts: n/a

Default Re: Querying tables with different aggregation levels - 09-18-2008 , 12:24 PM



On Sep 17, 11:40*pm, Robert Klemme <shortcut... (AT) googlemail (DOT) com> wrote:

Quote:
On 18.09.2008 00:18, Mark Wilden wrote:
If I understand you correctly you want to have overlap between data in
tables.
That's right. The reason is to be able to support queries such as
"data between 2/15/2008 and 6/15/2008" without querying a single huge
table containing data at that granularity. In the proposed schema,
there would be one query of the huge base table for data between
2/15-2/28, another query of the month-aggregated table for the months
of Mar, Apr, and May, then a final query of the base table from
6/1-6/15. In theory, at least, this would mean reading far fewer rows
than a single query of the base table over the same date range.

The types of data and queries we want to do are very simple. We're
recording web page hits (essentially). There could be millions of
these tiny rows inserted per day. Our customers would want to see
counts of these rows over a date range. My goal is to allow that date
range to be specified at the day level but use a monthly aggregation
for the "middle" of the date range.

///ark


Reply With Quote
  #9  
Old   
Mark Wilden
 
Posts: n/a

Default Re: Querying tables with different aggregation levels - 09-18-2008 , 12:24 PM



On Sep 17, 11:40*pm, Robert Klemme <shortcut... (AT) googlemail (DOT) com> wrote:

Quote:
On 18.09.2008 00:18, Mark Wilden wrote:
If I understand you correctly you want to have overlap between data in
tables.
That's right. The reason is to be able to support queries such as
"data between 2/15/2008 and 6/15/2008" without querying a single huge
table containing data at that granularity. In the proposed schema,
there would be one query of the huge base table for data between
2/15-2/28, another query of the month-aggregated table for the months
of Mar, Apr, and May, then a final query of the base table from
6/1-6/15. In theory, at least, this would mean reading far fewer rows
than a single query of the base table over the same date range.

The types of data and queries we want to do are very simple. We're
recording web page hits (essentially). There could be millions of
these tiny rows inserted per day. Our customers would want to see
counts of these rows over a date range. My goal is to allow that date
range to be specified at the day level but use a monthly aggregation
for the "middle" of the date range.

///ark


Reply With Quote
  #10  
Old   
Mark Wilden
 
Posts: n/a

Default Re: Querying tables with different aggregation levels - 09-18-2008 , 12:24 PM



On Sep 17, 11:40*pm, Robert Klemme <shortcut... (AT) googlemail (DOT) com> wrote:

Quote:
On 18.09.2008 00:18, Mark Wilden wrote:
If I understand you correctly you want to have overlap between data in
tables.
That's right. The reason is to be able to support queries such as
"data between 2/15/2008 and 6/15/2008" without querying a single huge
table containing data at that granularity. In the proposed schema,
there would be one query of the huge base table for data between
2/15-2/28, another query of the month-aggregated table for the months
of Mar, Apr, and May, then a final query of the base table from
6/1-6/15. In theory, at least, this would mean reading far fewer rows
than a single query of the base table over the same date range.

The types of data and queries we want to do are very simple. We're
recording web page hits (essentially). There could be millions of
these tiny rows inserted per day. Our customers would want to see
counts of these rows over a date range. My goal is to allow that date
range to be specified at the day level but use a monthly aggregation
for the "middle" of the date range.

///ark


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.