dbTalk Databases Forums  

MySQL vs Postgres for a "data warehouse", 5-10 TB

comp.databases.mysql comp.databases.mysql


Discuss MySQL vs Postgres for a "data warehouse", 5-10 TB in the comp.databases.mysql forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Ignoramus5263
 
Posts: n/a

Default MySQL vs Postgres for a "data warehouse", 5-10 TB - 09-10-2011 , 12:14 AM






First of all, this is not an attempt to create a flamewar of any
kind. I have been a MySQL user for years, including owning a few
multi-gigabyte databases for my websites, and using it to host
algebra.com (about 12 GB database).

I have had my ups and downs with MySQL. The ups were ease of use and
decent performance for small databases such as algebra.com. The downs
were things like twenty hour REPAIR TABLE operations on a 35 GB
table, etc.

Right now I have a personal (one user) project to create a 5-10
Terabyte data warehouse. The largest table will consume the most space
and will take, perhaps, 200,000,000 rows.

I want to use it to obtain valuable business intelligence and to make
money.

I expect it to grow, never shrink, and to be accessed via batch
queries. I do not care for batch queries to be super fast, for example
an hour per query would be just fine.

However, while an hour is fine, two weeks per query is NOT fine.

I have a server with about 18 TB of storage and 48 GB of RAM, and 12
CPU cores.

My initial plan was to use MySQL, InnoDB, and deal with problems as
they arise. Perhaps, say, I would implement my own joining
procedures.

After reading some disparaging stuff about InnoDB performance on large
datasets, however, I am getting cold feet. I have a general feeling
that, perhaps, I will not be able to succeed with MySQL, or, perhaps,
with either MySQL and Postgres.

I do not know much about Postgres, but I am very eager to learn and
see if I can use it for my purposes more effectively than MySQL.

I cannot shell out $47,000 per CPU for Oracle for this project.

To be more specific, the batch queries that I would do, I hope,
would either use small JOINS of a small dataset to a large dataset, or
just SELECTS from one big table.

So... Can Postgres support a 5-10 TB database with the use pattern
stated above?

i

Reply With Quote
  #2  
Old   
Luuk
 
Posts: n/a

Default Re: MySQL vs Postgres for a "data warehouse", 5-10 TB - 09-10-2011 , 03:13 AM






On 10-09-2011 08:30, Michael Vilain wrote:
Quote:
Basically, unless you're a high power database person who knows the ins
and outs of each vendor's products, how to optimize performance (that's
$150K/year easy), and can design and administer such systems, I think
you're going to have do the research *own* your own.
nice typo


--
Luuk

Reply With Quote
  #3  
Old   
John Levine
 
Posts: n/a

Default Re: MySQL vs Postgres for a "data warehouse", 5-10 TB - 09-10-2011 , 02:47 PM



Quote:
For example, I remember the horrors and upsets of REPAIR TABLE
statements that took almost a day. That was with a MyISAM table around
40 gigs (I do not remember the size exactly).
Yeah, they do. I'm most of the way through repairing a 57G table which
takes about a day and a half.

Quote:
This means that I cannot use MyISAM, period, end of story, for
something 100 times the size of the above.
Hmmn. This suggests that you might want to reread the manual.

A few moments spent looking at the MySQL manual reveals partitioned
tables and MERGE tables, both of which are reasonable ways to divide
one logical table into several physical ones. If your data just
grows, but the oldest stuff never changes, merged tables may be
appropriate, since you can make the older tables packed and read-only.
Either way, the individual tables can be much smaller and more
tractable.

If your data naturally falls into slices (by numeric value in 5.1,
number or string in 5.5), partitioned tables may work, particularly if
you can construct your queries to take advantage of partition pruning
in your queries.

R's,
John

Reply With Quote
  #4  
Old   
Ignoramus13208
 
Posts: n/a

Default Re: MySQL vs Postgres for a "data warehouse", 5-10 TB - 09-10-2011 , 04:02 PM



On 2011-09-10, John Levine <johnl (AT) iecc (DOT) com> wrote:
Quote:
For example, I remember the horrors and upsets of REPAIR TABLE
statements that took almost a day. That was with a MyISAM table around
40 gigs (I do not remember the size exactly).

Yeah, they do. I'm most of the way through repairing a 57G table which
takes about a day and a half.
Then you know!

Quote:
This means that I cannot use MyISAM, period, end of story, for
something 100 times the size of the above.

Hmmn. This suggests that you might want to reread the manual.

A few moments spent looking at the MySQL manual reveals partitioned
tables and MERGE tables, both of which are reasonable ways to divide
one logical table into several physical ones. If your data just
grows, but the oldest stuff never changes, merged tables may be
appropriate, since you can make the older tables packed and read-only.
Either way, the individual tables can be much smaller and more
tractable.
At first, I really liked your idea of a MERGE. It does fit my dataset
very well. I had NO idea that it was possible. I thought that I can,
essentially, split my data by month.

Then I realized that I perform my updates for the more recent items by
means of INSERT... ON DUPLICATE KEY UPDATE, and it would not work with
MERGE tables.


``Because the MERGE engine cannot enforce uniqueness over the set of
underlying tables''

Quote:
If your data naturally falls into slices (by numeric value in 5.1,
number or string in 5.5), partitioned tables may work, particularly if
you can construct your queries to take advantage of partition pruning
in your queries.
Maybe I can operate it by slicing data by, say, last digit of the
numeric ID. (I get the numeric ID of my items with the items, they
already have a natural unique key)

The more I think about the problem, the more I have a bad feeling
about it.

i

Reply With Quote
  #5  
Old   
Matthew Woodcraft
 
Posts: n/a

Default Re: MySQL vs Postgres for a "data warehouse", 5-10 TB - 09-11-2011 , 04:24 AM



Ignoramus5263 <ignoramus5263 (AT) NOSPAM (DOT) 5263.invalid> wrote:
Quote:
So... Can Postgres support a 5-10 TB database with the use pattern
stated above?
If nobody here gives you a concrete answer, I suggest asking on the
pgsql-performance mailing list; it's the sort of question they're used
to answering.

http://archives.postgresql.org/pgsql-performance/

-M-

Reply With Quote
  #6  
Old   
Ignoramus21330
 
Posts: n/a

Default Re: MySQL vs Postgres for a "data warehouse", 5-10 TB - 09-11-2011 , 07:35 AM



On 2011-09-11, Matthew Woodcraft <mattheww (AT) chiark (DOT) greenend.org.uk> wrote:
Quote:
Ignoramus5263 <ignoramus5263 (AT) NOSPAM (DOT) 5263.invalid> wrote:
So... Can Postgres support a 5-10 TB database with the use pattern
stated above?

If nobody here gives you a concrete answer, I suggest asking on the
pgsql-performance mailing list; it's the sort of question they're used
to answering.

http://archives.postgresql.org/pgsql-performance/

-M-
Thanks. I sbuscribed and sent the same question, slightly reworded, to that list.

Reply With Quote
  #7  
Old   
The Natural Philosopher
 
Posts: n/a

Default Re: MySQL vs Postgres for a "data warehouse", 5-10 TB - 09-11-2011 , 09:40 AM



Ignoramus13208 wrote:
Quote:
On 2011-09-10, John Levine <johnl (AT) iecc (DOT) com> wrote:
For example, I remember the horrors and upsets of REPAIR TABLE
statements that took almost a day. That was with a MyISAM table around
40 gigs (I do not remember the size exactly).
Yeah, they do. I'm most of the way through repairing a 57G table which
takes about a day and a half.

Then you know!

This means that I cannot use MyISAM, period, end of story, for
something 100 times the size of the above.
Hmmn. This suggests that you might want to reread the manual.

A few moments spent looking at the MySQL manual reveals partitioned
tables and MERGE tables, both of which are reasonable ways to divide
one logical table into several physical ones. If your data just
grows, but the oldest stuff never changes, merged tables may be
appropriate, since you can make the older tables packed and read-only.
Either way, the individual tables can be much smaller and more
tractable.

At first, I really liked your idea of a MERGE. It does fit my dataset
very well. I had NO idea that it was possible. I thought that I can,
essentially, split my data by month.

Then I realized that I perform my updates for the more recent items by
means of INSERT... ON DUPLICATE KEY UPDATE, and it would not work with
MERGE tables.

so do it a different way.

Reply With Quote
  #8  
Old   
Axel Schwenke
 
Posts: n/a

Default Re: MySQL vs Postgres for a "data warehouse", 5-10 TB - 09-11-2011 , 11:11 AM



Ignoramus13208 <ignoramus13208 (AT) NOSPAM (DOT) 13208.invalid> wrote:
Quote:
A few moments spent looking at the MySQL manual reveals partitioned
tables and MERGE tables

At first, I really liked your idea of a MERGE. It does fit my dataset
very well.
I doubt that. MERGE is the much older and quite crappy sister of
partitioned tables. And since you claimed partitions would not fit,
MERGE won't too.

Also the MERGE engine is notoriously unstable. I expect to see
support for MERGE tables being dropped soon.


XL

Reply With Quote
  #9  
Old   
Ignoramus21330
 
Posts: n/a

Default Re: MySQL vs Postgres for a "data warehouse", 5-10 TB - 09-11-2011 , 12:07 PM



On 2011-09-11, Axel Schwenke <axel.schwenke (AT) gmx (DOT) de> wrote:
Quote:
Ignoramus13208 <ignoramus13208 (AT) NOSPAM (DOT) 13208.invalid> wrote:

A few moments spent looking at the MySQL manual reveals partitioned
tables and MERGE tables

At first, I really liked your idea of a MERGE. It does fit my dataset
very well.

I doubt that. MERGE is the much older and quite crappy sister of
partitioned tables. And since you claimed partitions would not fit,
MERGE won't too.

Also the MERGE engine is notoriously unstable. I expect to see
support for MERGE tables being dropped soon.
OK, thanks for letting me know... I will not even consider...

i

Reply With Quote
  #10  
Old   
Bodo
 
Posts: n/a

Default Re: MySQL vs Postgres for a "data warehouse", 5-10 TB - 09-23-2011 , 06:01 AM



Quote:
The largest table will consume the most space
and will take, perhaps, 200,000,000 rows.

That should be no problem.
We run a 250m rows beast here unpartitioned.
Updated in real time.

BTW select count(*) needs 5 minutes under full load and 2 minutes on a smaller but less busy box.

regards,

Toni

MySQL Innobase 5.5 and 5.1

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.