dbTalk Databases Forums  

Direct I/O

comp.databases.postgresql comp.databases.postgresql


Discuss Direct I/O in the comp.databases.postgresql forum.



Reply
 
Thread Tools Display Modes
  #31  
Old   
V.J. Kumar
 
Posts: n/a

Default Re: Direct I/O - 03-17-2010 , 07:04 AM






On Jan 28, 12:24*pm, Mladen Gogala <n... (AT) email (DOT) here.invalid> wrote:
Quote:
On Thu, 28 Jan 2010 14:08:20 +0100, Laurenz Albe wrote:
*assume that the reason for this is that in the old days, file systems
used to be much worse than they are now, with performance penalties on
directories with many files in them and the like.


The company wants to move a data warehouse from an Oracle RAC to
PostgreSQL. The database on RAC is essentially one gigantic table, having
400 million records, partitioned into month-sized chunks and
searched via text indexes. The pilot project loaded 3 months worth of
partitions into PgSQL and developers are now writing a little Groovy
Grails application to search it. I am just the DBA here. The size of the
DW is going to exceed 4TB.

Well, it's a brave decision on part of you management. Whilst
Postgresql is not a bad choice in many cases, datawarehousing is not
one of them. For some more or less objective comparison, you may
want to run a 100GB scale TPC-H benchmark on the same hardware and
compare the results. Now, many would claim that TPC-H is not
representative of the "real world queries", but the queries are in
fact nothing more but a bunch of joins and aggregations on a pretty
simple schema. If the results do not convince the management,
nothing will. In our experiments with Pg, the latter was about 6
times slower on our typical queries than the existing Oracle DB.

In my opinion, it might be a better idea to run your warehouse on
Oracle SE rather than on any of the free databases, even implementing
partitioning manually.

Also, you may want to take a look at a free single node Greenplum
edition which might be in my opinion a viable option for 'free'
datawarehousing. It is based on Postgres, but improves its optimizer
in many respects, has additional windowing queries and more
importantly can parallelize queries according to the number of CPUs
(which Postgres cannot). With SN Greenplum using 6 CPUs, we could
match and sometimes exceed Oracle performance with one CPU (a single
threaded query).

VJ

> --http://mgogala.byethost5.com

Reply With Quote
  #32  
Old   
Mladen Gogala
 
Posts: n/a

Default Re: Direct I/O - 03-17-2010 , 10:13 AM






On Wed, 17 Mar 2010 06:04:42 -0700, V.J. Kumar wrote:

Quote:
Well, it's a brave decision on part of you management. Whilst
Postgresql is not a bad choice in many cases, datawarehousing is not
one of them.
This data warehouse is specific, it doesn't include joins or aggregation.
It is essentially a huge document repository which would allow searches.
Library would, at least in this case, be much better paradigm. The only
snag that I've hit so far is the fact that Postgres built in text search
engine is woefully inadequate because it cannot do things like phrase
search or the usual "near" operator. We tested Sphinx and Lucene and
opted for Sphinx. There is, however, another commercial product which is
still in the game and will probably win the game because of the fabulous
ways with text search.

Quote:
For some more or less objective comparison, you may want
to run a 100GB scale TPC-H benchmark on the same hardware and compare
the results. Now, many would claim that TPC-H is not representative of
the "real world queries", but the queries are in fact nothing more but
a bunch of joins and aggregations on a pretty simple schema. If the
results do not convince the management, nothing will. In our
experiments with Pg, the latter was about 6 times slower on our typical
queries than the existing Oracle DB.
I am well aware of that. I am primarily an Oracle DBA. Postgres doesn't
have star query and its bitmap ("hash") indexes are a very crude device
which is not well advised for production use. As a matter of fact, 8.4
manuals explicitly advise users against using hash indexes.
Another thing that Oracle does have and Postgres does not is parallel
query. However, with partitioning, replication and sqlgrid, I can achieve
much the same effects, at the expense of the duplicate disk space.

Oracle RDBMS still has much better engine and much better instrumentation
with wait event interface, tracing, hints and automatic SQL tuning.
Postgres comes nowhere near this, even with the latest development
versions.

On the other hand, Oracle is not needed for everything. There are
applications which utilize some form of RDBMS but not nearly enough to
justify the purchase of an enormously expensive Oracle RDBMS license.
Postgres also has a very interesting optimizer which compiles the list of
the most frequently used values, which oracle does not. Postgres is
reliable, easily integrated with Perl, PHP, Symfony, Groovy&Grails and
other development frameworks. For instance, HR needs a database with all
the resumes that are sent to us and all the candidates. The uptime
requirements for this database are: 9-5, Mon-Fri. Not exactly justifying
RAC system with a hot standby. There are 12 tables with less than 10000
records in the database. Postgres does the job beautifully.




--
http://mgogala.byethost5.com

Reply With Quote
  #33  
Old   
V.J. Kumar
 
Posts: n/a

Default Re: Direct I/O - 03-18-2010 , 08:47 AM



On Mar 17, 12:13*pm, Mladen Gogala <n... (AT) email (DOT) here.invalid> wrote:
Quote:
On Wed, 17 Mar 2010 06:04:42 -0700, V.J. Kumar wrote:
Well, *it's a brave decision on part of you management. *Whilst
Postgresql is not a bad choice in many cases, *datawarehousing is not
one of them. *

This data warehouse is specific, it doesn't include joins or aggregation.
It is essentially a huge document repository which would allow searches.
Ah, it may be usable then. We've experienced a 6x slowdown even on
simple aggregations with a 25mil rows table (about 2.5 GB). Event
count(*) was about 4 times slower with the same table on the same
hardware. In both experiments, the table was fully cached, so no IO
penalty incurred.
Quote:
--http://mgogala.byethost5.com

Reply With Quote
  #34  
Old   
Mladen Gogala
 
Posts: n/a

Default Re: Direct I/O - 03-18-2010 , 08:31 PM



On Thu, 18 Mar 2010 07:47:40 -0700, V.J. Kumar wrote:

Quote:
On Mar 17, 12:13Â*pm, Mladen Gogala <n... (AT) email (DOT) here.invalid> wrote:
On Wed, 17 Mar 2010 06:04:42 -0700, V.J. Kumar wrote:
Well, Â*it's a brave decision on part of you management. Â*Whilst
Postgresql is not a bad choice in many cases, Â*datawarehousing is not
one of them.

This data warehouse is specific, it doesn't include joins or
aggregation. It is essentially a huge document repository which would
allow searches.

Ah, it may be usable then. We've experienced a 6x slowdown even on
simple aggregations with a 25mil rows table (about 2.5 GB). Event
count(*) was about 4 times slower with the same table on the same
hardware. In both experiments, the table was fully cached, so no IO
penalty incurred.
I am not sure what was the version of Oracle, but newer versions do
tricks like direct I/O read into PGA when doing full table scans and can
optimize away count(*) to use fast full scan of the primary key, which
will really help if the table has row significantly larger than the
primary key. Also, if the table was created with "PARALLEL" attribute,
you may actually be using parallel query, especially likely for
aggregations. Also, version 11 can cache client query results which does
wonders on frequently executed "stay alive" queries like "select 1 from
dual".
The only problem with the Oracle is its price, not the performance of its
engine. I've been asked to come up with the strategies to replace as many
Oracle databases with Postgres as I can, without hurting business. I am
working with Oracle ever since 1988 (yes, it was version 4 then) and it's
still the best database around, but the price and licensing policies are
a cause for concern. EnterpriseDB is well funded, minority stake held by
Red Hat, IBM and some other well known companies, we can possibly expect
some improvements soon. Postgres is rather primitive in many aspects when
compared with full fledged Oracle 11.2 EE database but it can be easily
integrated with Perl or Python which provides many useful capabilities in
the XML, email or web access arena.
We were making our mind between PgSQL and MySQL and when Oracle bought
MySQL, the decision was made. There are several smaller databases in
house that do not actually need a full fledged Oracle RDBMS. Postgres
seems to be a good solution for that.

--
http://mgogala.freehostia.com

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.