dbTalk Databases Forums  

[Info-ingres] Getting More Info

comp.databases.ingres comp.databases.ingres


Discuss [Info-ingres] Getting More Info in the comp.databases.ingres forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Andrew Hill
 
Posts: n/a

Default [Info-ingres] Getting More Info - 03-28-2006 , 09:11 AM






Hi all,

I'm looking into Ingres 2006 as a replacement for a MySQL database. I've
read through what I can find on the Ingres web site, but I have a number
of questions about Ingres that I haven't been able to find the answers
to/have been to dense to comprehend from the documentation.


Most of these questions are seemingly simple, for example:

Does Ingres 2006 support:
- Correlated Subqueries?
- Online node addition?
- Mixed hardware on nodes? Mixed OSes? Mixed 32-bit/64-bit installs?

Is there a limit to the number of nodes?


Is this an appropriate place to ask about these kinds of questions, or
should I be looking/asking elsewhere?

TIA!

--
Andrew Hill
list (AT) fornax (DOT) net


Reply With Quote
  #2  
Old   
Betty & Karl Schendel
 
Posts: n/a

Default Re: [Info-ingres] Getting More Info - 03-28-2006 , 09:30 AM






At 3:11 PM +0100 3/28/06, Andrew Hill wrote:
Quote:
Hi all,

I'm looking into Ingres 2006 as a replacement for a MySQL database. I've
read through what I can find on the Ingres web site, but I have a number
of questions about Ingres that I haven't been able to find the answers
to/have been to dense to comprehend from the documentation.


Most of these questions are seemingly simple, for example:

Does Ingres 2006 support:
- Correlated Subqueries?
Yes

Quote:
- Online node addition?
- Mixed hardware on nodes? Mixed OSes? Mixed 32-bit/64-bit installs?
Are you talking about clusters here? I'm not sure that active/active
clusters are officially GA yet. (I gather that it's due to ongoing
glitches with the available cluster filesystems, none of the free ones
seem to be truly production ready. They pretty much mostly work, and
that isn't good enough.)

Ingres does failover "clustering", about which I know little.

Quote:
Is this an appropriate place to ask about these kinds of questions, or
should I be looking/asking elsewhere?
This is a pretty good place. You can also check out ingres.com for
license info and sales email addresses.

--
Karl R. Schendel, Jr. schendel (AT) kbcomputer (DOT) com
Ingres Corp
Ingres Server Development


Reply With Quote
  #3  
Old   
Andrew Hill
 
Posts: n/a

Default Re: [Info-ingres] Getting More Info - 03-28-2006 , 09:52 AM



On Tue, 2006-03-28 at 09:30 -0500, Betty & Karl Schendel wrote:
Quote:
At 3:11 PM +0100 3/28/06, Andrew Hill wrote:
- Online node addition?
- Mixed hardware on nodes? Mixed OSes? Mixed 32-bit/64-bit installs?

Are you talking about clusters here? I'm not sure that active/active
clusters are officially GA yet. (I gather that it's due to ongoing
glitches with the available cluster filesystems, none of the free ones
seem to be truly production ready. They pretty much mostly work, and
that isn't good enough.)
Yes, I was talking about clusters. The MySQL database I manage is
starting to push the limits of what it can do, and horizontal scaling is
starting to look attractive.

Does anyone have experience with running a cluster that they would like
to share?

TIA,

--
Andrew Hill
list (AT) fornax (DOT) net



Reply With Quote
  #4  
Old   
Roy Hann
 
Posts: n/a

Default Re: [Info-ingres] Getting More Info - 03-28-2006 , 01:21 PM



"Andrew Hill" <list (AT) fornax (DOT) net> wrote

Quote:
On Tue, 2006-03-28 at 09:30 -0500, Betty & Karl Schendel wrote:
At 3:11 PM +0100 3/28/06, Andrew Hill wrote:
- Online node addition?
- Mixed hardware on nodes? Mixed OSes? Mixed 32-bit/64-bit installs?

Are you talking about clusters here? I'm not sure that active/active
clusters are officially GA yet. (I gather that it's due to ongoing
glitches with the available cluster filesystems, none of the free ones
seem to be truly production ready. They pretty much mostly work, and
that isn't good enough.)

Yes, I was talking about clusters. The MySQL database I manage is
starting to push the limits of what it can do, and horizontal scaling is
starting to look attractive.
You haven't given us any information about volume and job-mix so it is hard
to be confident about any advice, but if you are reaching throughput limits
I suspect Ingres clusters are a long way from helping you with that even if
the filesystems can be made to work. Ingres will do a fine job of using a
cluster for failover (i.e. high-availability clusters (HAC)), but the
distributed lock management still needs to be proven to scale to support a
high-performance cluster (HPC). Personally I would not bet my farm on it,
although by the same token I don't know for sure that it doesn't scale
either.

On the other hand, Ingres does allow up to 30 servers instances to serve the
same database, and a practically unlimited number of "net" servers, so
perhaps that gives you some room to scale up.

What are your performance/workload targets?

Roy Hann (rhann at rationalcommerce dot com)
Rational Commerce Ltd.
www.rationalcommerce.com
"Ingres development, tuning, and training experts"
Ingres Corporation Partner




Reply With Quote
  #5  
Old   
Andrew Hill
 
Posts: n/a

Default Re: [Info-ingres] Getting More Info - 03-30-2006 , 05:13 AM



On Tue, 2006-03-28 at 19:21 +0100, Roy Hann wrote:
Quote:
"Andrew Hill" <list (AT) fornax (DOT) net> wrote in message
news:mailman.1143557582.8013.info-ingres (AT) cariboulake (DOT) com...
I was talking about clusters. The MySQL database I manage is
starting to push the limits of what it can do, and horizontal scaling is
starting to look attractive.

You haven't given us any information about volume and job-mix so it is hard
to be confident about any advice, but if you are reaching throughput limits
I suspect Ingres clusters are a long way from helping you with that even if
the filesystems can be made to work. Ingres will do a fine job of using a
cluster for failover (i.e. high-availability clusters (HAC)), but the
distributed lock management still needs to be proven to scale to support a
high-performance cluster (HPC). Personally I would not bet my farm on it,
although by the same token I don't know for sure that it doesn't scale
either.

On the other hand, Ingres does allow up to 30 servers instances to serve the
same database, and a practically unlimited number of "net" servers, so
perhaps that gives you some room to scale up.

What are your performance/workload targets?
I was a bit vague, wasn't I? But aren't you going to do my thinking for
me? ;-)

The main problem I have with MySQL at the moment is for a quite simple
query:

INSERT INTO
summary_table
(
day,
hour,
operation_interval,
operation_interval_id,
interval_start,
interval_end,
ad_id,
creative_id,
zone_id,
impressions
)
SELECT
DATE_FORMAT(drad.date_time, '%Y-%m-%d') AS day,
DATE_FORMAT(drad.date_time, '%k') AS hour,
60 AS operation_interval,
0 AS operation_interval_id,
'2006-03-26 00:00:00' AS interval_start,
'2006-03-26 00:59:59' AS interval_end,
drad.ad_id AS ad_id,
drad.creative_id AS creative_id,
drad.zone_id AS zone_id,
COUNT(*) AS impressions
FROM
raw_impressions_20060326 AS drad
WHERE
drad.date_time >= 20060326000000
AND drad.date_time <= 20060326005959
GROUP BY
day, hour, ad_id, creative_id, zone_id;

As you can see, it's not rocket science; however, with the number of
rows the daily raw_impressions_20060326 table (currently about 40
million), MySQL has problems performing the GROUP BY in memory, and so
it's slow. (Yes, there's various optimisations in place, etc. but they
will only take us so far...)

I'd like to be able to split this query up so that it runs on a number
of nodes, so that we can add more nodes as the number of rows
increases...

Is this something that Ingres can do?

TIA,

--
Andrew Hill
list (AT) fornax (DOT) net



Reply With Quote
  #6  
Old   
Roy Hann
 
Posts: n/a

Default Re: [Info-ingres] Getting More Info - 03-30-2006 , 07:11 AM



"Andrew Hill" <list (AT) fornax (DOT) net> wrote

Quote:
What are your performance/workload targets?

I was a bit vague, wasn't I? But aren't you going to do my thinking for
me? ;-)

The main problem I have with MySQL at the moment is for a quite simple
query:

INSERT INTO
summary_table
(
day,
hour,
operation_interval,
operation_interval_id,
interval_start,
interval_end,
ad_id,
creative_id,
zone_id,
impressions
)
SELECT
DATE_FORMAT(drad.date_time, '%Y-%m-%d') AS day,
DATE_FORMAT(drad.date_time, '%k') AS hour,
60 AS operation_interval,
0 AS operation_interval_id,
'2006-03-26 00:00:00' AS interval_start,
'2006-03-26 00:59:59' AS interval_end,
drad.ad_id AS ad_id,
drad.creative_id AS creative_id,
drad.zone_id AS zone_id,
COUNT(*) AS impressions
FROM
raw_impressions_20060326 AS drad
WHERE
drad.date_time >= 20060326000000
AND drad.date_time <= 20060326005959
GROUP BY
day, hour, ad_id, creative_id, zone_id;

As you can see, it's not rocket science; however, with the number of
rows the daily raw_impressions_20060326 table (currently about 40
million), MySQL has problems performing the GROUP BY in memory, and so
it's slow. (Yes, there's various optimisations in place, etc. but they
will only take us so far...)

I'd like to be able to split this query up so that it runs on a number
of nodes, so that we can add more nodes as the number of rows
increases...

Is this something that Ingres can do?
My first impulse is to say "in its sleep", but before I shoot my mouth off,
where does the raw_impressions_20060326 table itself come from, and how many
queries similar to the above will you want to run on it? Also what is its
useful lifespan, and how many distinct sessions need to access it during its
lifespan?

Roy Hann (rhann at rationalcommerce dot com)
Rational Commerce Ltd.
www.rationalcommerce.com
"Ingres development, tuning, and training experts"
Ingres Corporation Partner




Reply With Quote
  #7  
Old   
Andrew Hill
 
Posts: n/a

Default Re: [Info-ingres] Getting More Info - 03-30-2006 , 07:24 AM



On Thu, 2006-03-30 at 13:11 +0100, Roy Hann wrote:
Quote:
"Andrew Hill" <list (AT) fornax (DOT) net> wrote in message
news:mailman.1143713640.6031.info-ingres (AT) cariboulake (DOT) com...
What are your performance/workload targets?

The main problem I have with MySQL at the moment is for a quite simple
query:
*snip*

Quote:
I'd like to be able to split this query up so that it runs on a number
of nodes, so that we can add more nodes as the number of rows
increases...

Is this something that Ingres can do?

My first impulse is to say "in its sleep", but before I shoot my mouth off,
where does the raw_impressions_20060326 table itself come from, and how many
queries similar to the above will you want to run on it? Also what is its
useful lifespan, and how many distinct sessions need to access it during its
lifespan?
Well, the data comes from a collection of web servers. Each has a local
instance of MySQL, and we package up our raw_impressions every 5 minutes
on each web server, and bulk import them into the central
raw_impressions_20060326 table.

(A nice, fast bulk import into the Ingres 2006 cluster would be handy
here. Can that be done in // too?)

We run 3 queries similar to the above, but all one after the other, so
there's no problems with lots of people wanting to read/write the table
- the only other thing that would be (might be) happening at the same
time would be the insertion of new data (as per the above process).

All of our reporting/BI work is done from the summary tables - however,
the lifespan of the raw daily tables is ~90 days, as we do also do a
join on a (smaller) raw table to this table as well. (However, it's
quite fast, and I'd be happy limiting that query to run on a single
node, to avoid inter-node communications, if it worked better that
way...)

Still sounding good?

TIA,

--
Andrew Hill
list (AT) fornax (DOT) net



Reply With Quote
  #8  
Old   
Betty & Karl Schendel
 
Posts: n/a

Default Re: [Info-ingres] Getting More Info - 03-30-2006 , 07:51 AM



At 1:24 PM +0100 3/30/06, Andrew Hill wrote:
Quote:
On Thu, 2006-03-30 at 13:11 +0100, Roy Hann wrote:
"Andrew Hill" <list (AT) fornax (DOT) net> wrote in message
news:mailman.1143713640.6031.info-ingres (AT) cariboulake (DOT) com...
What are your performance/workload targets?

The main problem I have with MySQL at the moment is for a quite simple
query:

*snip*

I'd like to be able to split this query up so that it runs on a number
of nodes, so that we can add more nodes as the number of rows
increases...

Is this something that Ingres can do?

My first impulse is to say "in its sleep"...

I am inclined to agree with Roy. Unless the query results in many
tens or hundreds of thousands of result groups, I would expect
Ingres to hash-aggregate the data without really breaking a sweat.
On average hardware, no partitioning, I would expect 40 million rows to
take a few minutes. If the query DOES result in hundreds of thousands
of result groups, the aggregation will overflow to disk, and it might
take a minute or two longer.

As a data point, Q1 of the TPC-H suite is basically a pure aggregation.
Last time I ran it on my Ultra-60, I think it took about 30 minutes for
600 million rows. We know how to take a factor of 5 or so out of that
by pushing the aggregation lower in the query plan, it's just a matter
of time before someone gets around to doing that. (there are some
QP architecture things to deal with first.)

Quote:
(A nice, fast bulk import into the Ingres 2006 cluster would be handy
here. Can that be done in // too?)
Bulk load is easy enough if you observe the rules. No parallel bulk-load
yet, that's on the plate for the next major release. (I will note
that sorting and table structure modify are both parallelized now.)

Karl


Reply With Quote
  #9  
Old   
Roy Hann
 
Posts: n/a

Default Re: [Info-ingres] Getting More Info - 03-30-2006 , 08:19 AM



"Andrew Hill" <list (AT) fornax (DOT) net> wrote

Quote:
My first impulse is to say "in its sleep", but before I shoot my mouth
off,
where does the raw_impressions_20060326 table itself come from, and how
many
queries similar to the above will you want to run on it? Also what is
its
useful lifespan, and how many distinct sessions need to access it during
its
lifespan?

Well, the data comes from a collection of web servers. Each has a local
instance of MySQL, and we package up our raw_impressions every 5 minutes
on each web server, and bulk import them into the central
raw_impressions_20060326 table.
I'm still not entirely clear on this. These packages at 5 minute intervals:
are they incremental (i.e. just the additional data from the last 5
minutes), or are they are complete refresh of the whole day-till-now done
five minutes after the last one?

Quote:
(A nice, fast bulk import into the Ingres 2006 cluster would be handy
here. Can that be done in // too?)
There's a couple of possibilities. There is the Fastload utility (RT*M), or
there is the endlessly versatile COPY...FROM extension to SQL for loading
data into tables from a file (ASCII or binary). If you are desperate to
load data in parallel--and I am not certain it's worth the bother--one can
use table synonyms to multiplex two tables (to do a kind of
double-buffering) without changing the application code. Again, I am still
very unclear about the target response times for these operations, so I may
be talking the purest tosh.

Quote:
We run 3 queries similar to the above, but all one after the other, so
there's no problems with lots of people wanting to read/write the table
- the only other thing that would be (might be) happening at the same
time would be the insertion of new data (as per the above process).
With an isolation level of read uncommitted there won't be any locking
problems. Raw power is another matter (to be determined by trial-and-error
I am afraid).

Quote:
All of our reporting/BI work is done from the summary tables - however,
the lifespan of the raw daily tables is ~90 days, as we do also do a
join on a (smaller) raw table to this table as well. (However, it's
quite fast, and I'd be happy limiting that query to run on a single
node, to avoid inter-node communications, if it worked better that
way...)

Still sounding good?
Yes, a recipe is emerging. I shall assume your packages at 5-minute
intervals are incremental. I am still vague about the number of sessions
that will be involved, so I am not going to advise the use of global
temporary tables, although they could speed things up vastly if you could
use them. (The problem is that they are visible only to the session that
created them.)

Based in your example query above, which is restricted by a range search on
date_time, my first attempt is to tell you to create the
raw_impressions_20060326 table partitioned using a range rule on date_time.
Maybe divide it into 24 partitions of 1 hour each if that's how the queries
work. I'd make the table a HEAP to enable the bulk-load algorithm. You are
going to be scanning it for all your queries anyway, so that will also
permit efficient read-ahead. For the same reason I'd opt for a large page
size too, say 32k.

I say all the above from a position of unassailable ignorance. I don't know
your target response times, so nothing that I've said may be good enough.
Also I have no idea if the entire approach you are currently taking is
efficient (I shall assume you know best). Even if this is on the right
track there is loads more to add.

Roy Hann (rhann at rationalcommerce dot com)
Rational Commerce Ltd.
www.rationalcommerce.com
"Ingres development, tuning, and training experts"
Ingres Corporation Partner




Reply With Quote
  #10  
Old   
Andrew Hill
 
Posts: n/a

Default Re: [Info-ingres] Getting More Info - 03-31-2006 , 09:02 AM



On Thu, 2006-03-30 at 07:51 -0500, Betty & Karl Schendel wrote:
Quote:
I am inclined to agree with Roy. Unless the query results in many
tens or hundreds of thousands of result groups, I would expect
Ingres to hash-aggregate the data without really breaking a sweat.
No, there's < 1000 at the moment.

Okay, so starting to sound very good :-)

Thanks!

--
Andrew Hill
list (AT) fornax (DOT) net



Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.