dbTalk Databases Forums  

The Two Towers

comp.databases.postgresql comp.databases.postgresql


Discuss The Two Towers in the comp.databases.postgresql forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Mladen Gogala
 
Posts: n/a

Default The Two Towers - 06-07-2010 , 08:58 PM






Battleground was a machine running CentOS 5.5 and the two towers were 2
databases. First, PostgreSQL 8.4.4:

scott=# select count(*) from emp e1,emp e2, emp e3, emp e4,emp e5, emp
e6;
count
---------
7529536
(1 row)

Time: 2127.345 ms
scott=# select version();

version

--------------------------------------------------------------------------------
--------------------------------
PostgreSQL 8.4.4 on i686-redhat-linux-gnu, compiled by GCC gcc (GCC)
4.1.2 2008
0704 (Red Hat 4.1.2-46), 32-bit
(1 row)

Time: 1.008 ms
scott=# explain analyze
scott-# select count(*) from emp e1,emp e2, emp e3, emp e4,emp e5, emp e6;
QUERY
PLAN

--------------------------------------------------------------------------------
---------------------------------------------------------
Aggregate (cost=1052615959933232.62..1052615959933232.62 rows=1
width=0) (actu
al time=13684.811..13684.811 rows=1 loops=1)
-> Nested Loop (cost=16.60..935975959933232.62
rows=46656000000000000 width
=0) (actual time=0.038..10508.108 rows=7529536 loops=1)
-> Nested Loop (cost=0.00..2855959933216.00
rows=77760000000000 width
=0) (actual time=0.025..820.829 rows=537824 loops=1)
-> Nested Loop (cost=0.00..4759933216.00
rows=129600000000 widt
h=0) (actual time=0.021..59.701 rows=38416 loops=1)
-> Nested Loop (cost=0.00..7933216.00
rows=216000000 widt
h=0) (actual time=0.018..4.424 rows=2744 loops=1)
-> Nested Loop (cost=0.00..13216.00
rows=360000 wid
th=0) (actual time=0.015..0.333 rows=196 loops=1)
-> Seq Scan on emp e1
(cost=0.00..16.00 rows=
600 width=0) (actual time=0.009..0.018 rows=14 loops=1)
-> Seq Scan on emp e6
(cost=0.00..16.00 rows=
600 width=0) (actual time=0.002..0.010 rows=14 loops=14)
-> Seq Scan on emp e5 (cost=0.00..16.00
rows=600 wi
dth=0) (actual time=0.002..0.008 rows=14 loops=196)
-> Seq Scan on emp e4 (cost=0.00..16.00 rows=600
width=0)
(actual time=0.001..0.007 rows=14 loops=2744)
-> Seq Scan on emp e3 (cost=0.00..16.00 rows=600
width=0) (actu
al time=0.001..0.007 rows=14 loops=38416)
-> Materialize (cost=16.60..22.60 rows=600 width=0) (actual
time=0.00
0..0.006 rows=14 loops=537824)
-> Seq Scan on emp e2 (cost=0.00..16.00 rows=600
width=0) (actu
al time=0.003..0.012 rows=14 loops=1)
Total runtime: 13684.977 ms
(14 rows)
Time: 13714.576 ms
scott=# \d+ emp
Table "public.emp"
Column | Type | Modifiers | Storage |
Description
----------+-----------------------------+-----------+----------
+-------------
empno | smallint | not null | plain |
ename | character varying(10) | not null | extended |
job | character varying(9) | | extended |
mgr | smallint | | plain |
hiredate | timestamp without time zone | | plain |
sal | double precision | | plain |
comm | double precision | | plain |
deptno | smallint | | plain |
Indexes:
"emp_pkey" PRIMARY KEY, btree (empno)
"emp_deptno_i" btree (deptno)
"emp_ename_id" btree (ename)
"ind_emp_deptno" btree (deptno)
Foreign-key constraints:
"fk_deptno" FOREIGN KEY (deptno) REFERENCES dept(deptno)
Has OIDs: no

Now, Oracle 11.2.0.1 with the April patch:
SQL> set autotrace on
SQL> select count(*) from emp e1,emp e2, emp e3, emp e4,emp e5, emp e6;

COUNT(*)
----------
7529536

Elapsed: 00:00:00.85

Execution Plan
----------------------------------------------------------
Plan hash value: 1049923164

------------------------------------------------------------------------------
Quote:
Id | Operation | Name | Rows | Cost (%CPU)|
Time |
------------------------------------------------------------------------------
Quote:
0 | SELECT STATEMENT | | 1 | 157K (1)|
00:31:25 |
1 | SORT AGGREGATE | | 1 |
|
2 | MERGE JOIN CARTESIAN | | 7529K| 157K (1)|
00:31:25 |
3 | MERGE JOIN CARTESIAN | | 537K| 11224 (1)|
00:02:15 |
4 | MERGE JOIN CARTESIAN | | 38416 | 808 (1)|
00:00:10 |
5 | MERGE JOIN CARTESIAN | | 2744 | 62 (0)|
00:00:01 |
6 | MERGE JOIN CARTESIAN | | 196 | 7 (0)|
00:00:01 |
7 | INDEX FULL SCAN | PK_EMP | 14 | 1 (0)|
00:00:01 |
8 | BUFFER SORT | | 14 | 6 (0)|
00:00:01 |
9 | INDEX FAST FULL SCAN| PK_EMP | 14 | 0 (0)|
00:00:01 |
10 | BUFFER SORT | | 14 | 62 (0)|
00:00:01 |
11 | INDEX FAST FULL SCAN | PK_EMP | 14 | 0 (0)|
00:00:01 |
12 | BUFFER SORT | | 14 | 808 (1)|
00:00:10 |
13 | INDEX FAST FULL SCAN | PK_EMP | 14 | 0 (0)|
00:00:01 |
14 | BUFFER SORT | | 14 | 11224 (1)|
00:02:15 |
15 | INDEX FAST FULL SCAN | PK_EMP | 14 | 0 (0)|
00:00:01 |
16 | BUFFER SORT | | 14 | 157K (1)|
00:31:25 |
17 | INDEX FAST FULL SCAN | PK_EMP | 14 | 0 (0)|
00:00:01 |
------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
21 consistent gets
0 physical reads
0 redo size
425 bytes sent via SQL*Net to client
419 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
5 sorts (memory)
0 sorts (disk)
1 rows processed

SQL>

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
PL/SQL Release 11.2.0.1.0 - Production
CORE 11.2.0.1.0 Production
TNS for Linux: Version 11.2.0.1.0 - Production
NLSRTL Version 11.2.0.1.0 - Production

Both databases have current statistics, the data is exactly the same, as
shown by the results. Oracle finished in 0.85 second (850 milliseconds)
while PostgreSQL took 2127.345 ms, 2.5 times slower than Oracle. The
difference that is obvious is the access path: Postgres chose nested
loops, while Oracle chose merge join and utilized the primary key for the
table. Postgres, apparently cannot do "fast full scan" of an index and
doesn't know how to utilize the primary key in cases like this. However,
my gripe is that the optimizer should have selected merge join, just like
Oracle did. In this case, nested loops are definitely the wrong choice.
Rule based optimizers, the kind of the optimizer that takes into
consideration only the structure of the table, usually ends up being
dominated by the nested loops method. Nested loops method usually
dominates the OLTP type applications but can really mess up large
reports. I am under the impression that Postgres query planner is geared
toward the OLTP type of the database. Maybe a new parameter is needed
that would somehow shift gears to "data warehouse use", on demand? I have
to say, the advantage is still on the side of Sauron.


--
http://mgogala.byethost5.com

Reply With Quote
  #2  
Old   
Lennart Jonsson
 
Posts: n/a

Default Re: The Two Towers - 06-07-2010 , 11:46 PM






On 2010-06-08 03:58, Mladen Gogala wrote:
Quote:
Battleground was a machine running CentOS 5.5 and the two towers were 2
databases. First, PostgreSQL 8.4.4:

Is the table definition and the sample data available somewhere? If it
is of interest I can examine the behaviour of db2 9.5 and mysql 5.1 for
comparison. Perhaps others will jump in with other dbms as well.


/Lennart

[...]

Reply With Quote
  #3  
Old   
Robert Klemme
 
Posts: n/a

Default Re: The Two Towers - 06-09-2010 , 12:53 AM



On 08.06.2010 03:58, Mladen Gogala wrote:
Quote:
Battleground was a machine running CentOS 5.5 and the two towers were 2
databases. First, PostgreSQL 8.4.4:

scott=# select count(*) from emp e1,emp e2, emp e3, emp e4,emp e5, emp
e6;
count
---------
7529536
(1 row)

Time: 2127.345 ms

Now, Oracle 11.2.0.1 with the April patch:
SQL> set autotrace on
SQL> select count(*) from emp e1,emp e2, emp e3, emp e4,emp e5, emp e6;

COUNT(*)
----------
7529536

Elapsed: 00:00:00.85

Both databases have current statistics, the data is exactly the same, as
shown by the results. Oracle finished in 0.85 second (850 milliseconds)
while PostgreSQL took 2127.345 ms, 2.5 times slower than Oracle. The
difference that is obvious is the access path: Postgres chose nested
loops, while Oracle chose merge join and utilized the primary key for the
table. Postgres, apparently cannot do "fast full scan" of an index and
doesn't know how to utilize the primary key in cases like this. However,
my gripe is that the optimizer should have selected merge join, just like
Oracle did. In this case, nested loops are definitely the wrong choice.
Rule based optimizers, the kind of the optimizer that takes into
consideration only the structure of the table, usually ends up being
dominated by the nested loops method. Nested loops method usually
dominates the OLTP type applications but can really mess up large
reports. I am under the impression that Postgres query planner is geared
toward the OLTP type of the database. Maybe a new parameter is needed
that would somehow shift gears to "data warehouse use", on demand? I have
to say, the advantage is still on the side of Sauron.
I don't have enough insight into PostgreSQL yet but another explanation
would be that developers did not bother to optimize for such an
artificial query. Or did you derive this from some realistic use case?

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Reply With Quote
  #4  
Old   
Mladen Gogala
 
Posts: n/a

Default Re: The Two Towers - 06-09-2010 , 07:28 AM



On Wed, 09 Jun 2010 07:53:14 +0200, Robert Klemme wrote:

Quote:
I don't have enough insight into PostgreSQL yet but another explanation
would be that developers did not bother to optimize for such an
artificial query. Or did you derive this from some realistic use case?

Kind regards

robert
No,there is no application here. This is a completely artificial query
which I used to use as a benchmark for the machine running Oracle RDBMS.
The first machine I tested was a HP workstation running HP-UX 7 and
Oracle 5.1.22. As I moved Oracle demo account (SCOTT/TIGER) to Postgres,
it dawned to me that the same query can be used to compare the relative
speed of the two databases. You see, this query generates large amount of
rows in the result set from a single block table, primarily by utilizing
the CPU resources at hand. There is no I/O involved. Having been a
consultant, I used the query to gage how fast the CPU on the machine
really is, by comparing the time needed to execute this query.

My idea was to see how fast will Postgres do the same operation but the
results were something of a disappointment, the optimizer chose the wrong
plan, the comparison cannot be made that way.

Instead of comparing the efficiency of the database processing, this was
turned into a tale of the two optimizers. This is an artificial query,
indeed. And Oracle's optimizer did undeniably do much better job with it
than the Postgres optimizer.



--
http://mgogala.byethost5.com

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.