dbTalk Databases Forums  

A few questions

comp.databases.postgresql comp.databases.postgresql


Discuss A few questions in the comp.databases.postgresql forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Jurgen Haan
 
Posts: n/a

Default A few questions - 10-11-2007 , 09:28 AM






Hello.

At the moment, I'm active as an IBM DB2 DBA.
But just out of curiousity, I'm restoring a copy of our production
database into an PostgreSQL (8.2) database.
It's an 60+ GB database.
My goal is to get an insight on performance of postgres vs db2, I've
seen some pretty impressive cases on postgres, so now I'm actually
curious as to how our environment would run on a PG environment as
opposed to a db2 envoronment.

As we speak, I'm recreating all of our 40.000+ tables into a postgres
instance (after making some necessary adjustments in the SQL) and
transferring a CSV export of all our data.

Now, before I start populating the database, I'd like to know what the
major influences on performance are on a PSQL environment. What memory
parameter adjustments do I have to make in order for postgres to make an
optimal use of available system memory? And do I need to make any
adjustments to the optimizer? How do I influence multi-processs /
multithreading use?

Thanks in advance for any replies.

Kind Regards,

-R-

Reply With Quote
  #2  
Old   
Laurenz Albe
 
Posts: n/a

Default Re: A few questions - 10-12-2007 , 02:46 AM






Jurgen Haan <jurgen (AT) fake (DOT) dom> wrote:
Quote:
At the moment, I'm active as an IBM DB2 DBA.
But just out of curiousity, I'm restoring a copy of our production
database into an PostgreSQL (8.2) database.
It's an 60+ GB database.
My goal is to get an insight on performance of postgres vs db2, I've
seen some pretty impressive cases on postgres, so now I'm actually
curious as to how our environment would run on a PG environment as
opposed to a db2 envoronment.
I have worked with DB2 a lot myself, and I'd like to alert you to
what I consider one of the major differences between DB2 and PostgreSQL:
The handling of concurrency.
While DB2 ensures transaction isolation with read locks and always has
only one version of each database row, PostgreSQL doesn't use read locks
at all, but maintains more than one version of each row to present
each transaction with the appropriate version of the data.

One of the consequences is a radically different behaviour to ascertain
serializable transactions: when a serializable transaction has read a row
in DB2, this row will be locked and concurrent updates will have to wait.
In PostgreSQL, there is no read lock, concurrent transactions can alter
the row and the serializable transaction will continue to see the old
version of the row. When the serializable transaction then tries to update
this row, it will be aborted with a serialization error.
So while on a busy OLTP database in DB2 a big serializable transaction
will finish, but disrupt transaction processing, in PostgreSQL such a
big transaction is likely to never succeed at all.

Quote:
As we speak, I'm recreating all of our 40.000+ tables into a postgres
instance (after making some necessary adjustments in the SQL) and
transferring a CSV export of all our data.

Now, before I start populating the database, I'd like to know what the
major influences on performance are on a PSQL environment. What memory
parameter adjustments do I have to make in order for postgres to make an
optimal use of available system memory? And do I need to make any
adjustments to the optimizer? How do I influence multi-processs /
multithreading use?
You can get the whole story from
http://www.postgresql.org/docs/curre...me-config.html

The most important parameter is probably "shared_buffers", which determines
the size of the shared memory allocated by the database processes,
the "buffer pool", so to say.

"work_mem" is also important. It is the amount of memory that each
backend process is allowed to allocate; performance will sink drastically
if a backend process has too little memory and has to use disk files for
temporary storage. Keep in mind that there will be a backend process for
each client connection, and each of those can allocate this amount.

Make sure that you VACUUM the database regularly, and (particularly if
there are many changes in the database) make sure that "max_fsm_pages" is
set big enough, else the database cannot remember all free pages, which
will lead to bloat.

On busy databases, increasing "checkpoint_segments" will lead to fewer
checkpoints and increase performance at the cost of recovery time.

The optimizer is best left alone; set "effective_cache_size" to at least
the value of "shared_buffers" to encourage the optimizer to use indexes.

There are several "dangerous" parameters that can give you a performance
boost for updates at the possible price of integrity and durability.
Read section 17.5.1 for details.

You cannot influence the number of processes - each client connection will
be served by its own server process. PostgreSQL has decided against
multithreading in the backend: it reduces complexity, makes other backend
processes immune against crashes and (at least on Linux) isn't much of
a performance penalty.

Yours,
Laurenz Albe


Reply With Quote
  #3  
Old   
Jurgen Haan
 
Posts: n/a

Default Re: A few questions - 10-12-2007 , 03:24 AM



Laurenz Albe wrote:
Quote:
One of the consequences is a radically different behaviour to ascertain
serializable transactions: ..
Ah.. did not know that. But knowing most of the transactions that take
place, I don't think it will be a culprit. But thanks for the warning.

Quote:
You can get the whole story from
http://www.postgresql.org/docs/curre...me-config.html

The most important parameter is probably "shared_buffers",...
"work_mem" ...
Cool, thanks for the info. I've set them properly now, and indeed,
memory consumption seems a lot better now. (Always a shame to have
memory unused).

Quote:
Make sure that you VACUUM the database regularly, and (particularly if
there are many changes in the database) make sure that "max_fsm_pages" is
set big enough, else the database cannot remember all free pages, which
will lead to bloat.
I've read somewhere about an Auto Vacuum Daemon? I take it, that it's a
lot like DB2's auto Runstats & Reorgs?
In any case, I can always schedule it through Cron.

Quote:
On busy databases, increasing "checkpoint_segments" ...
The optimizer is best left alone; set "effective_cache_size" ...
*notes*
Quote:
There are several "dangerous" parameters that can give you a performance
boost for updates at the possible price of integrity and durability.
Read section 17.5.1 for details.
Brrr... rather not :P
I choose stability over performance, besides, a database corruption does
degrade overall performance a lot.

Quote:
You cannot influence the number of processes - each client connection will
be served by its own server process. PostgreSQL has decided against
multithreading in the backend: it reduces complexity, makes other backend
processes immune against crashes and (at least on Linux) isn't much of
a performance penalty.

Sounds fair enough.

Thanks for all the info. Exactly what I needed.

Kind regards.

-R-


Reply With Quote
  #4  
Old   
Laurenz Albe
 
Posts: n/a

Default Re: A few questions - 10-12-2007 , 04:46 AM



Jurgen Haan <jurgen (AT) fake (DOT) dom> wrote:
Quote:
Make sure that you VACUUM the database regularly, and (particularly if
there are many changes in the database) make sure that "max_fsm_pages" is
set big enough, else the database cannot remember all free pages, which
will lead to bloat.

I've read somewhere about an Auto Vacuum Daemon? I take it, that it's a
lot like DB2's auto Runstats & Reorgs?
In any case, I can always schedule it through Cron.
You can also use autovacuum, no problem - be sure to enable it in
the config file (I think it is off by default).

Yours,
Laurenz Albe


Reply With Quote
  #5  
Old   
Lew
 
Posts: n/a

Default Re: A few questions - 10-14-2007 , 10:05 PM



Jurgen Haan <jurgen (AT) fake (DOT) dom> wrote:
Quote:
My goal is to get an insight on performance of postgres vs db2, I've
seen some pretty impressive cases on postgres, so now I'm actually
curious as to how our environment would run on a PG environment as
opposed to a db2 environment.
FWIW, a colleague of mine did a similar comparison between PG and Oracle. He
was the designated PG expert, and a co-worker was the Oracle maven. They
implemented their company's actual data in both RDBMSes (around 1 TB),
supporting the same applications, and optimized the heck out of both, sharing
all information that they could between them. They bet the next round of
beers on who would have the fastest result. The PG guy won, even after the
two of them worked together to speed up the Oracle database for a second round.

My friend tweaked many PG parameters, including some of the more obscure. The
details I don't know, but he alluded to some of the same parameters mentioned
upthread. I only have his description of a qualitative result, so I cannot
give hard data on this case. If there is a lesson, it's that PG is at least
competitive with the traditional Big Guns.

--
Lew


Reply With Quote
  #6  
Old   
Mario Splivalo
 
Posts: n/a

Default Re: A few questions - 10-31-2007 , 02:38 AM



On 2007-10-15, Lew <lew (AT) lewscanon (DOT) com> wrote:
Quote:
Jurgen Haan <jurgen (AT) fake (DOT) dom> wrote:
My goal is to get an insight on performance of postgres vs db2, I've
seen some pretty impressive cases on postgres, so now I'm actually
curious as to how our environment would run on a PG environment as
opposed to a db2 environment.

FWIW, a colleague of mine did a similar comparison between PG and Oracle. He
was the designated PG expert, and a co-worker was the Oracle maven. They
implemented their company's actual data in both RDBMSes (around 1 TB),
supporting the same applications, and optimized the heck out of both, sharing
all information that they could between them. They bet the next round of
beers on who would have the fastest result. The PG guy won, even after the
two of them worked together to speed up the Oracle database for a second round.

My friend tweaked many PG parameters, including some of the more obscure. The
details I don't know, but he alluded to some of the same parameters mentioned
upthread. I only have his description of a qualitative result, so I cannot
give hard data on this case. If there is a lesson, it's that PG is at least
competitive with the traditional Big Guns.

It would be realy nice if you could provide some more data on the subject
you just mentioned above. I have made "similair" tests between Oracle,
MSSQL, the database was around 20 GB in size, and I found out that all of
the RDBMSes perform very similair. I even found out that PostgreSQL on
Windows is not slower nor faster than PostgreSQL on Linux. I love MSSQL
query analyzer and Enterprise Manager. It's just that MSSQL is tied to
Windows, and I'm much more inclined on ease of Linux administration. And
Postgres is free. And the community is great!

So, I'd like to have a peek on the data you mention above... or, would it be
possible for me to contact that friend of yours?

Mike
--
"I can do it quick. I can do it cheap. I can do it well. Pick any two."

Mario Splivalo
majk (AT) fly (DOT) srk.fer.hr


Reply With Quote
  #7  
Old   
Lew
 
Posts: n/a

Default Re: A few questions - 10-31-2007 , 07:14 PM



Lew wrote:
Quote:
The details I don't know,
... I only have his description of a qualitative result, so I cannot
give hard data on this case. If there is a lesson, it's that PG is at least
competitive with the traditional Big Guns.
Mario Splivalo wrote:
Quote:
It would be realy nice if you could provide some more data on the subject
you just mentioned above.
Sorry.

Quote:
I have made "similair" tests between Oracle, MSSQL, the database was around 20 GB in size, and I found out that all of
the RDBMSes perform very similair. I even found out that PostgreSQL on
Windows is not slower nor faster than PostgreSQL on Linux. I love MSSQL
query analyzer and Enterprise Manager. It's just that MSSQL is tied to
Windows, and I'm much more inclined on ease of Linux administration. And
Postgres is free. And the community is great!
Showing that Postgres is at least competitive, as above.

Quote:
So, I'd like to have a peek on the data you mention above... or, would it be
possible for me to contact that friend of yours?
Sorry. My friend no longer works for the company, so neither of us has access
to any data. It wouldn't be meaningful out of context anyway. Furthermore,
doesn't Oracle prohibit publication of benchmark results involving their
products? If so, even the company that owns the results wouldn't be able to
tell us.

We're going to have to leave this one as hearsay.

--
Lew


Reply With Quote
  #8  
Old   
Jurgen Haan
 
Posts: n/a

Default Re: A few questions - 11-01-2007 , 04:01 AM



<snip>

A little update from my side.
I've been running PGSQL with our production data for a while now on a
small test setting. It's just a dual-core AMD64 with 4GB mem. I use 2
disks, one for the logs and one for the data. This is because for the
amount of disks, a single disk performs better than a raid-5 with 3-4
disks, because with such an amount of disk, the controller overhead is a
bottleneck. Besides, I don't really care if it breaks down, it's only a
test setting anyway. :P

For comparison material, I'm using a comparable system installed with
DB2. (Though hdparm reports a slightly higher throughput on the disks).

And to my genuine surprise, both databases performed just about the
same. With some queries PGSQL was faster and with others DB2 had the
upper hand, but in both cases, it was only marginal. I can't really
comment on stability, just that neither of them has ever crashed on me
in the process.

So, I'm pleasantly surprised, to say the least.

But there's still a long way of benchmarking to go.
I've only covered single client performance so far.
So as a DW environment, PGSQL seems viable.
But now testing it OLTP style; Lot's of concurrent queries.

Our production environment is one consisting of a bunch of webservers
communicating with another bunch of application servers which molest a
database back-end. Which means a constant flurry of queries.
So, it'll be interesting.

I'll keep you posted.

Regards,

-R-

Reply With Quote
  #9  
Old   
Marco Mariani
 
Posts: n/a

Default Re: A few questions - 11-07-2007 , 07:14 AM



Lew wrote:

Quote:
Furthermore, doesn't Oracle prohibit publication of benchmark results
involving their products?
And... does it make any sense? In the US, or anywhere else?



Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.