dbTalk Databases Forums  

[DOCS] Observation on integer types documentation

mailing.database.pgsql-docs mailing.database.pgsql-docs


Discuss [DOCS] Observation on integer types documentation in the mailing.database.pgsql-docs forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Dan McGee
 
Posts: n/a

Default [DOCS] Observation on integer types documentation - 04-30-2012 , 03:59 PM






Hey everyone,

Reading the docs today, I came across this paragraph
(http://www.postgresql.org/docs/devel...#DATATYPE-INT),
which goes back several major versions:

Quote:
The type integer is the common choice, as it offers the best balance between range, storage size, and performance. The smallint type is generally only used if disk space is at a premium. The bigint type should only be used if the range of the integer type is insufficient, because the latter is definitely faster.
A few thoughts on this.
1) the use of the word "latter" isn't totally clear, or at least I had
to re-read it to realize former was 'bigint' and latter was 'integer'.
It might just be the style of writing.
2) I'm less than convinced this note belongs in modern documentation,
and set out to test that theory. My full results are below, but the
summary is this: on a 64-bit system, there seems to be only a minimal
measurable performance difference (< 5%) and very little size
difference. In the case of the indexes, the size difference is zero.
This is not true for a 32-bit system (where it is 39% slower), but the
blanket statement doesn't hold true, which is why I'm writing all this
up.

On a final note, the following paragraph also seems like it has
outlived its useful life:

Quote:
On very minimal operating systems the bigint type might not function correctly, because it relies on compiler support for eight-byte integers. On such machines, bigint acts the same as integer, but still takes up eight bytes of storage. (We are not aware of any modern platform where this is the case.)
Thanks!
-Dan


Table setup (only difference is type of 'id' column):

Table "public.package_files"
Column | Type | Modifiers
--------------+------------------------+------------------------
id | integer | not null
pkg_id | integer | not null
is_directory | boolean | not null default false
directory | character varying(255) | not null
filename | character varying(255) |
Indexes:
"package_files_pkey" PRIMARY KEY, btree (id)
"package_files_pkg_id" btree (pkg_id) CLUSTER


Table "public.package_files_int8"
Column | Type | Modifiers
--------------+------------------------+------------------------
id | bigint | not null
pkg_id | integer | not null
is_directory | boolean | not null default false
directory | character varying(255) | not null
filename | character varying(255) |
Indexes:
"package_files_int8_pkey" PRIMARY KEY, btree (id)
"package_files_int8_pkg_id" btree (pkg_id) CLUSTER


# select count(*) from package_files;
2621418
# select count(*) from package_files_int8 ;
2621418


All runs below were done after issuing a few warm up queries, and both
tables went through a VACUUM/CLUSTER/ANALYZE sequence.

32-bit P4 2.4 GHz (single core). no enabled CPU frequency scaling, 1GB
total ram, shared_buffers 128MB, work_mem 4MB:

relation | size
----------------------------------+------------
public.package_files_int8 | 239 MB
public.package_files | 229 MB
public.package_files_int8_pkey | 56 MB
public.package_files_int8_pkg_id | 45 MB
public.package_files_pkey | 45 MB
public.package_files_pkg_id | 45 MB

archweb=> \timing on
Timing is on.
archweb=> \t
Showing only tuples.
archweb=> select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 516.558 ms
archweb=> select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 519.720 ms
archweb=> select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 533.330 ms
archweb=> select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 519.095 ms
archweb=> select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 520.253 ms

archweb=> select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 731.194 ms
archweb=> select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 730.329 ms
archweb=> select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 724.646 ms
archweb=> select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 710.815 ms
archweb=> select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 724.519 ms




64-bit Core2 Quad 2.66 GHz (four core), CPU freq scaling disabled
(performance governor used), 8GB total ram, shared_buffers 128MB,
work_mem 4MB:

relation | size
----------------------------------+------------
public.package_files_int8 | 245 MB
public.package_files | 234 MB
public.package_files_int8_pkey | 56 MB
public.package_files_pkg_id | 56 MB
public.package_files_int8_pkg_id | 56 MB
public.package_files_pkey | 56 MB

dmcgee=# select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 177.078 ms
dmcgee=# select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 176.109 ms
dmcgee=# select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 177.478 ms
dmcgee=# select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 176.639 ms
dmcgee=# select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 176.453 ms

dmcgee=# select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 185.768 ms
dmcgee=# select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 185.159 ms
dmcgee=# select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 184.407 ms
dmcgee=# select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 184.555 ms
dmcgee=# select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 184.290 ms


Overall results:

i686 x86_64
int4 int8 int4 int8
516.558 731.194 177.078 185.768
519.72 730.329 176.109 185.159
533.33 724.646 177.478 184.407
519.095 710.815 176.639 184.555
520.253 724.519 176.453 184.29



Average 521.7912 724.3006 176.7514 184.8358
Stddev 6.6040841681 8.1530512264 0.5359499044 0.619288059
Ratio 1.3881042839 1.0457388173

--
Sent via pgsql-docs mailing list (pgsql-docs (AT) postgresql (DOT) org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs

Reply With Quote
  #2  
Old   
Marcelo Sena
 
Posts: n/a

Default Re: [DOCS] Observation on integer types documentation - 05-13-2012 , 07:08 PM






I'm new here but your proposal makes sense to me. Are the query plans equal
on both architectures?

--
Marcelo Lacerda


On Mon, Apr 30, 2012 at 5:59 PM, Dan McGee <dpmcgee (AT) gmail (DOT) com> wrote:

Quote:
Hey everyone,

Reading the docs today, I came across this paragraph
(
http://www.postgresql.org/docs/devel...l#DATATYPE-INT
),
which goes back several major versions:

The type integer is the common choice, as it offers the best balance
between range, storage size, and performance. The smallint type is
generally only used if disk space is at a premium. The bigint type should
only be used if the range of the integer type is insufficient, because the
latter is definitely faster.

A few thoughts on this.
1) the use of the word "latter" isn't totally clear, or at least I had
to re-read it to realize former was 'bigint' and latter was 'integer'.
It might just be the style of writing.
2) I'm less than convinced this note belongs in modern documentation,
and set out to test that theory. My full results are below, but the
summary is this: on a 64-bit system, there seems to be only a minimal
measurable performance difference (< 5%) and very little size
difference. In the case of the indexes, the size difference is zero.
This is not true for a 32-bit system (where it is 39% slower), but the
blanket statement doesn't hold true, which is why I'm writing all this
up.

On a final note, the following paragraph also seems like it has
outlived its useful life:

On very minimal operating systems the bigint type might not function
correctly, because it relies on compiler support for eight-byte integers.
On such machines, bigint acts the same as integer, but still takes up eight
bytes of storage. (We are not aware of any modern platform where this is
the case.)

Thanks!
-Dan


Table setup (only difference is type of 'id' column):

Table "public.package_files"
Column | Type | Modifiers
--------------+------------------------+------------------------
id | integer | not null
pkg_id | integer | not null
is_directory | boolean | not null default false
directory | character varying(255) | not null
filename | character varying(255) |
Indexes:
"package_files_pkey" PRIMARY KEY, btree (id)
"package_files_pkg_id" btree (pkg_id) CLUSTER


Table "public.package_files_int8"
Column | Type | Modifiers
--------------+------------------------+------------------------
id | bigint | not null
pkg_id | integer | not null
is_directory | boolean | not null default false
directory | character varying(255) | not null
filename | character varying(255) |
Indexes:
"package_files_int8_pkey" PRIMARY KEY, btree (id)
"package_files_int8_pkg_id" btree (pkg_id) CLUSTER


# select count(*) from package_files;
2621418
# select count(*) from package_files_int8 ;
2621418


All runs below were done after issuing a few warm up queries, and both
tables went through a VACUUM/CLUSTER/ANALYZE sequence.

32-bit P4 2.4 GHz (single core). no enabled CPU frequency scaling, 1GB
total ram, shared_buffers 128MB, work_mem 4MB:

relation | size
----------------------------------+------------
public.package_files_int8 | 239 MB
public.package_files | 229 MB
public.package_files_int8_pkey | 56 MB
public.package_files_int8_pkg_id | 45 MB
public.package_files_pkey | 45 MB
public.package_files_pkg_id | 45 MB

archweb=> \timing on
Timing is on.
archweb=> \t
Showing only tuples.
archweb=> select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 516.558 ms
archweb=> select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 519.720 ms
archweb=> select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 533.330 ms
archweb=> select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 519.095 ms
archweb=> select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 520.253 ms

archweb=> select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 731.194 ms
archweb=> select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 730.329 ms
archweb=> select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 724.646 ms
archweb=> select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 710.815 ms
archweb=> select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 724.519 ms




64-bit Core2 Quad 2.66 GHz (four core), CPU freq scaling disabled
(performance governor used), 8GB total ram, shared_buffers 128MB,
work_mem 4MB:

relation | size
----------------------------------+------------
public.package_files_int8 | 245 MB
public.package_files | 234 MB
public.package_files_int8_pkey | 56 MB
public.package_files_pkg_id | 56 MB
public.package_files_int8_pkg_id | 56 MB
public.package_files_pkey | 56 MB

dmcgee=# select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 177.078 ms
dmcgee=# select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 176.109 ms
dmcgee=# select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 177.478 ms
dmcgee=# select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 176.639 ms
dmcgee=# select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 176.453 ms

dmcgee=# select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 185.768 ms
dmcgee=# select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 185.159 ms
dmcgee=# select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 184.407 ms
dmcgee=# select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 184.555 ms
dmcgee=# select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 184.290 ms


Overall results:

i686 x86_64
int4 int8 int4 int8
516.558 731.194 177.078 185.768
519.72 730.329 176.109 185.159
533.33 724.646 177.478 184.407
519.095 710.815 176.639 184.555
520.253 724.519 176.453 184.29



Average 521.7912 724.3006 176.7514 184.8358
Stddev 6.6040841681 8.1530512264 0.5359499044 0.619288059
Ratio 1.3881042839 1.0457388173

--
Sent via pgsql-docs mailing list (pgsql-docs (AT) postgresql (DOT) org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs

Reply With Quote
  #3  
Old   
Dan McGee
 
Posts: n/a

Default Re: [DOCS] Observation on integer types documentation - 05-21-2012 , 05:11 AM



Yes, I didn't see any difference in query plans between the two.

-Dan

On Sun, May 13, 2012 at 7:08 PM, Marcelo Sena <marceloslacerda (AT) gmail (DOT) com> wrote:
Quote:
I'm new here but your proposal makes sense to me. Are the query plans equal
on both architectures?

--
Marcelo Lacerda


On Mon, Apr 30, 2012 at 5:59 PM, Dan McGee <dpmcgee (AT) gmail (DOT) com> wrote:

Hey everyone,

Reading the docs today, I came across this paragraph

(http://www.postgresql.org/docs/devel...#DATATYPE-INT),
which goes back several major versions:

The type integer is the common choice, as it offers the best balance
between range, storage size, and performance. The smallint type is generally
only used if disk space is at a premium. The bigint type should only be used
if the range of the integer type is insufficient, because the latter is
definitely faster.

A few thoughts on this.
1) the use of the word "latter" isn't totally clear, or at least I had
to re-read it to realize former was 'bigint' and latter was 'integer'.
It might just be the style of writing.
2) I'm less than convinced this note belongs in modern documentation,
and set out to test that theory. My full results are below, but the
summary is this: on a 64-bit system, there seems to be only a minimal
measurable performance difference (< 5%) and very little size
difference. In the case of the indexes, the size difference is zero.
This is not true for a 32-bit system (where it is 39% slower), but the
blanket statement doesn't hold true, which is why I'm writing all this
up.

On a final note, the following paragraph also seems like it has
outlived its useful life:

On very minimal operating systems the bigint type might not function
correctly, because it relies on compiler support for eight-byte integers. On
such machines, bigint acts the same as integer, but still takes up eight
bytes of storage. (We are not aware of any modern platform where this is the
case.)

Thanks!
-Dan


Table setup (only difference is type of 'id' column):

Table "public.package_files"
Â* Â*Column Â* Â*| Â* Â* Â* Â* Â*Type Â* Â* Â* Â* Â*| Â* Â* Â* Modifiers
--------------+------------------------+------------------------
Â*id Â* Â* Â* Â* Â* | integer Â* Â* Â* Â* Â* Â* Â* Â*| not null
Â*pkg_id Â* Â* Â* | integer Â* Â* Â* Â* Â* Â* Â* Â*| not null
Â*is_directory | boolean Â* Â* Â* Â* Â* Â* Â* Â*| not null default false
Â*directory Â* Â*| character varying(255) | not null
Â*filename Â* Â* | character varying(255) |
Indexes:
Â* Â*"package_files_pkey" PRIMARY KEY, btree (id)
Â* Â*"package_files_pkg_id" btree (pkg_id) CLUSTER


Table "public.package_files_int8"
Â* Â*Column Â* Â*| Â* Â* Â* Â* Â*Type Â* Â* Â* Â* Â*| Â* Â* Â* Modifiers
--------------+------------------------+------------------------
Â*id Â* Â* Â* Â* Â* | bigint Â* Â* Â* Â* Â* Â* Â* Â* | not null
Â*pkg_id Â* Â* Â* | integer Â* Â* Â* Â* Â* Â* Â* Â*| not null
Â*is_directory | boolean Â* Â* Â* Â* Â* Â* Â* Â*| not null default false
Â*directory Â* Â*| character varying(255) | not null
Â*filename Â* Â* | character varying(255) |
Indexes:
Â* Â*"package_files_int8_pkey" PRIMARY KEY, btree (id)
Â* Â*"package_files_int8_pkg_id" btree (pkg_id) CLUSTER


# select count(*) from package_files;
Â*2621418
# select count(*) from package_files_int8 ;
Â*2621418


All runs below were done after issuing a few warm up queries, and both
tables went through a VACUUM/CLUSTER/ANALYZE sequence.

32-bit P4 2.4 GHz (single core). no enabled CPU frequency scaling, 1GB
total ram, shared_buffers 128MB, work_mem 4MB:

Â* Â* Â* Â* Â* Â* relation Â* Â* Â* Â* Â* Â* | Â* Â*size
----------------------------------+------------
Â*public.package_files_int8 Â* Â* Â* Â*| 239 MB
Â*public.package_files Â* Â* Â* Â* Â* Â* | 229 MB
Â*public.package_files_int8_pkey Â* | 56 MB
Â*public.package_files_int8_pkg_id | 45 MB
Â*public.package_files_pkey Â* Â* Â* Â*| 45 MB
Â*public.package_files_pkg_id Â* Â* Â*| 45 MB

archweb=> \timing on
Timing is on.
archweb=> \t
Showing only tuples.
archweb=> select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
Â*119325
Time: 516.558 ms
archweb=> select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
Â*119325
Time: 519.720 ms
archweb=> select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
Â*119325
Time: 533.330 ms
archweb=> select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
Â*119325
Time: 519.095 ms
archweb=> select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
Â*119325
Time: 520.253 ms

archweb=> select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
Â*119325
Time: 731.194 ms
archweb=> select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
Â*119325
Time: 730.329 ms
archweb=> select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
Â*119325
Time: 724.646 ms
archweb=> select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
Â*119325
Time: 710.815 ms
archweb=> select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
Â*119325
Time: 724.519 ms




64-bit Core2 Quad 2.66 GHz (four core), CPU freq scaling disabled
(performance governor used), 8GB total ram, shared_buffers 128MB,
work_mem 4MB:

Â* Â* Â* Â* Â* Â* relation Â* Â* Â* Â* Â* Â* | Â* Â*size
----------------------------------+------------
Â*public.package_files_int8 Â* Â* Â* Â*| 245 MB
Â*public.package_files Â* Â* Â* Â* Â* Â* | 234 MB
Â*public.package_files_int8_pkey Â* | 56 MB
Â*public.package_files_pkg_id Â* Â* Â*| 56 MB
Â*public.package_files_int8_pkg_id | 56 MB
Â*public.package_files_pkey Â* Â* Â* Â*| 56 MB

dmcgee=# select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
Â*119325
Time: 177.078 ms
dmcgee=# select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
Â*119325
Time: 176.109 ms
dmcgee=# select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
Â*119325
Time: 177.478 ms
dmcgee=# select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
Â*119325
Time: 176.639 ms
dmcgee=# select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
Â*119325
Time: 176.453 ms

dmcgee=# select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
Â*119325
Time: 185.768 ms
dmcgee=# select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
Â*119325
Time: 185.159 ms
dmcgee=# select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
Â*119325
Time: 184.407 ms
dmcgee=# select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
Â*119325
Time: 184.555 ms
dmcgee=# select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
Â*119325
Time: 184.290 ms


Overall results:

Â* Â* Â* Â* Â* Â* Â* Â* Â* Â* Â* i686 Â* Â* Â* Â* Â* Â* Â* Â* Â* Â* Â* Â* Â* Â*x86_64
Â* Â* Â* Â* Â* Â* Â* Â*int4 Â* Â* Â* Â* Â* Â*int8 Â* Â* Â* Â* Â* Â*int4 Â* Â* Â* Â* Â* Â*int8
Â* Â* Â* Â* Â* Â* Â* Â*516.558 Â* Â* Â* Â* 731.194 Â* Â* Â* Â* 177.078 Â* Â* Â* Â* 185.768
Â* Â* Â* Â* Â* Â* Â* Â*519.72 Â* Â* Â* Â* Â*730.329 Â* Â* Â* Â* 176.109 Â* Â* Â* Â* 185.159
Â* Â* Â* Â* Â* Â* Â* Â*533.33 Â* Â* Â* Â* Â*724.646 Â* Â* Â* Â* 177.478 Â* Â* Â* Â* 184.407
Â* Â* Â* Â* Â* Â* Â* Â*519.095 Â* Â* Â* Â* 710.815 Â* Â* Â* Â* 176.639 Â* Â* Â* Â* 184.555
Â* Â* Â* Â* Â* Â* Â* Â*520.253 Â* Â* Â* Â* 724.519 Â* Â* Â* Â* 176.453 Â* Â* Â* Â* 184.29



Average Â* Â* Â* Â* 521.7912 Â* Â* Â* Â*724.3006 Â* Â* Â* Â*176.7514 Â* Â* Â* Â*184..8358
Stddev Â* Â* Â* Â* Â*6.6040841681 Â* Â*8.1530512264 Â* Â*0.5359499044
Â*0.619288059
Ratio Â* Â* Â* Â* Â* Â* Â* Â* Â* 1.3881042839 Â* Â* Â* Â* Â* Â* Â* Â* Â* Â*1.0457388173

--
Sent via pgsql-docs mailing list (pgsql-docs (AT) postgresql (DOT) org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs


--
Sent via pgsql-docs mailing list (pgsql-docs (AT) postgresql (DOT) org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs

Reply With Quote
  #4  
Old   
Bruce Momjian
 
Posts: n/a

Default Re: [DOCS] Observation on integer types documentation - 08-29-2012 , 05:38 PM



I have developed the attached patch based on your observations.

---------------------------------------------------------------------------

On Mon, Apr 30, 2012 at 03:59:20PM -0500, Dan McGee wrote:
Quote:
Hey everyone,

Reading the docs today, I came across this paragraph
(http://www.postgresql.org/docs/devel...#DATATYPE-INT),
which goes back several major versions:

The type integer is the common choice, as it offers the best balance between range, storage size, and performance. The smallint type is generally only used if disk space is at a premium. The bigint type should only be used if the range of the integer type is insufficient, because the latter is definitely faster.

A few thoughts on this.
1) the use of the word "latter" isn't totally clear, or at least I had
to re-read it to realize former was 'bigint' and latter was 'integer'.
It might just be the style of writing.
2) I'm less than convinced this note belongs in modern documentation,
and set out to test that theory. My full results are below, but the
summary is this: on a 64-bit system, there seems to be only a minimal
measurable performance difference (< 5%) and very little size
difference. In the case of the indexes, the size difference is zero.
This is not true for a 32-bit system (where it is 39% slower), but the
blanket statement doesn't hold true, which is why I'm writing all this
up.

On a final note, the following paragraph also seems like it has
outlived its useful life:

On very minimal operating systems the bigint type might not function correctly, because it relies on compiler support for eight-byte integers. On such machines, bigint acts the same as integer, but still takes up eight bytes of storage. (We are not aware of any modern platform where this is the case.)

Thanks!
-Dan


Table setup (only difference is type of 'id' column):

Table "public.package_files"
Column | Type | Modifiers
--------------+------------------------+------------------------
id | integer | not null
pkg_id | integer | not null
is_directory | boolean | not null default false
directory | character varying(255) | not null
filename | character varying(255) |
Indexes:
"package_files_pkey" PRIMARY KEY, btree (id)
"package_files_pkg_id" btree (pkg_id) CLUSTER


Table "public.package_files_int8"
Column | Type | Modifiers
--------------+------------------------+------------------------
id | bigint | not null
pkg_id | integer | not null
is_directory | boolean | not null default false
directory | character varying(255) | not null
filename | character varying(255) |
Indexes:
"package_files_int8_pkey" PRIMARY KEY, btree (id)
"package_files_int8_pkg_id" btree (pkg_id) CLUSTER


# select count(*) from package_files;
2621418
# select count(*) from package_files_int8 ;
2621418


All runs below were done after issuing a few warm up queries, and both
tables went through a VACUUM/CLUSTER/ANALYZE sequence.

32-bit P4 2.4 GHz (single core). no enabled CPU frequency scaling, 1GB
total ram, shared_buffers 128MB, work_mem 4MB:

relation | size
----------------------------------+------------
public.package_files_int8 | 239 MB
public.package_files | 229 MB
public.package_files_int8_pkey | 56 MB
public.package_files_int8_pkg_id | 45 MB
public.package_files_pkey | 45 MB
public.package_files_pkg_id | 45 MB

archweb=> \timing on
Timing is on.
archweb=> \t
Showing only tuples.
archweb=> select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 516.558 ms
archweb=> select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 519.720 ms
archweb=> select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 533.330 ms
archweb=> select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 519.095 ms
archweb=> select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 520.253 ms

archweb=> select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 731.194 ms
archweb=> select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 730.329 ms
archweb=> select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 724.646 ms
archweb=> select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 710.815 ms
archweb=> select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 724.519 ms




64-bit Core2 Quad 2.66 GHz (four core), CPU freq scaling disabled
(performance governor used), 8GB total ram, shared_buffers 128MB,
work_mem 4MB:

relation | size
----------------------------------+------------
public.package_files_int8 | 245 MB
public.package_files | 234 MB
public.package_files_int8_pkey | 56 MB
public.package_files_pkg_id | 56 MB
public.package_files_int8_pkg_id | 56 MB
public.package_files_pkey | 56 MB

dmcgee=# select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 177.078 ms
dmcgee=# select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 176.109 ms
dmcgee=# select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 177.478 ms
dmcgee=# select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 176.639 ms
dmcgee=# select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 176.453 ms

dmcgee=# select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 185.768 ms
dmcgee=# select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 185.159 ms
dmcgee=# select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 184.407 ms
dmcgee=# select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 184.555 ms
dmcgee=# select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 184.290 ms


Overall results:

i686 x86_64
int4 int8 int4 int8
516.558 731.194 177.078 185.768
519.72 730.329 176.109 185.159
533.33 724.646 177.478 184.407
519.095 710.815 176.639 184.555
520.253 724.519 176.453 184.29



Average 521.7912 724.3006 176.7514 184.8358
Stddev 6.6040841681 8.1530512264 0.5359499044 0.619288059
Ratio 1.3881042839 1.0457388173

--
Sent via pgsql-docs mailing list (pgsql-docs (AT) postgresql (DOT) org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs
--
Bruce Momjian <bruce (AT) momjian (DOT) us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


--
Sent via pgsql-docs mailing list (pgsql-docs (AT) postgresql (DOT) org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs

Reply With Quote
  #5  
Old   
Bruce Momjian
 
Posts: n/a

Default Re: [DOCS] Observation on integer types documentation - 08-30-2012 , 12:13 PM



On Wed, Aug 29, 2012 at 06:38:47PM -0400, Bruce Momjian wrote:
Quote:
I have developed the attached patch based on your observations.
Applied.

---------------------------------------------------------------------------

Quote:
On Mon, Apr 30, 2012 at 03:59:20PM -0500, Dan McGee wrote:
Hey everyone,

Reading the docs today, I came across this paragraph
(http://www.postgresql.org/docs/devel...#DATATYPE-INT),
which goes back several major versions:

The type integer is the common choice, as it offers the best balance between range, storage size, and performance. The smallint type is generally only used if disk space is at a premium. The bigint type should only be used if the range of the integer type is insufficient, because the latter is definitely faster.

A few thoughts on this.
1) the use of the word "latter" isn't totally clear, or at least I had
to re-read it to realize former was 'bigint' and latter was 'integer'.
It might just be the style of writing.
2) I'm less than convinced this note belongs in modern documentation,
and set out to test that theory. My full results are below, but the
summary is this: on a 64-bit system, there seems to be only a minimal
measurable performance difference (< 5%) and very little size
difference. In the case of the indexes, the size difference is zero.
This is not true for a 32-bit system (where it is 39% slower), but the
blanket statement doesn't hold true, which is why I'm writing all this
up.

On a final note, the following paragraph also seems like it has
outlived its useful life:

On very minimal operating systems the bigint type might not function correctly, because it relies on compiler support for eight-byte integers. On such machines, bigint acts the same as integer, but still takes up eight bytes of storage. (We are not aware of any modern platform where this is the case.)

Thanks!
-Dan


Table setup (only difference is type of 'id' column):

Table "public.package_files"
Column | Type | Modifiers
--------------+------------------------+------------------------
id | integer | not null
pkg_id | integer | not null
is_directory | boolean | not null default false
directory | character varying(255) | not null
filename | character varying(255) |
Indexes:
"package_files_pkey" PRIMARY KEY, btree (id)
"package_files_pkg_id" btree (pkg_id) CLUSTER


Table "public.package_files_int8"
Column | Type | Modifiers
--------------+------------------------+------------------------
id | bigint | not null
pkg_id | integer | not null
is_directory | boolean | not null default false
directory | character varying(255) | not null
filename | character varying(255) |
Indexes:
"package_files_int8_pkey" PRIMARY KEY, btree (id)
"package_files_int8_pkg_id" btree (pkg_id) CLUSTER


# select count(*) from package_files;
2621418
# select count(*) from package_files_int8 ;
2621418


All runs below were done after issuing a few warm up queries, and both
tables went through a VACUUM/CLUSTER/ANALYZE sequence.

32-bit P4 2.4 GHz (single core). no enabled CPU frequency scaling, 1GB
total ram, shared_buffers 128MB, work_mem 4MB:

relation | size
----------------------------------+------------
public.package_files_int8 | 239 MB
public.package_files | 229 MB
public.package_files_int8_pkey | 56 MB
public.package_files_int8_pkg_id | 45 MB
public.package_files_pkey | 45 MB
public.package_files_pkg_id | 45 MB

archweb=> \timing on
Timing is on.
archweb=> \t
Showing only tuples.
archweb=> select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 516.558 ms
archweb=> select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 519.720 ms
archweb=> select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 533.330 ms
archweb=> select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 519.095 ms
archweb=> select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 520.253 ms

archweb=> select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 731.194 ms
archweb=> select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 730.329 ms
archweb=> select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 724.646 ms
archweb=> select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 710.815 ms
archweb=> select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 724.519 ms




64-bit Core2 Quad 2.66 GHz (four core), CPU freq scaling disabled
(performance governor used), 8GB total ram, shared_buffers 128MB,
work_mem 4MB:

relation | size
----------------------------------+------------
public.package_files_int8 | 245 MB
public.package_files | 234 MB
public.package_files_int8_pkey | 56 MB
public.package_files_pkg_id | 56 MB
public.package_files_int8_pkg_id | 56 MB
public.package_files_pkey | 56 MB

dmcgee=# select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 177.078 ms
dmcgee=# select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 176.109 ms
dmcgee=# select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 177.478 ms
dmcgee=# select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 176.639 ms
dmcgee=# select count(*) from (select * from package_files where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 176.453 ms

dmcgee=# select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 185.768 ms
dmcgee=# select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 185.159 ms
dmcgee=# select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 184.407 ms
dmcgee=# select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 184.555 ms
dmcgee=# select count(*) from (select * from package_files_int8 where
pkg_id in (48024, 48025, 40343) order by id) a;
119325
Time: 184.290 ms


Overall results:

i686 x86_64
int4 int8 int4 int8
516.558 731.194 177.078 185.768
519.72 730.329 176.109 185.159
533.33 724.646 177.478 184.407
519.095 710.815 176.639 184.555
520.253 724.519 176.453 184.29



Average 521.7912 724.3006 176.7514 184.8358
Stddev 6.6040841681 8.1530512264 0.5359499044 0.619288059
Ratio 1.3881042839 1.0457388173

--
Sent via pgsql-docs mailing list (pgsql-docs (AT) postgresql (DOT) org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs

--
Bruce Momjian <bruce (AT) momjian (DOT) us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
new file mode 100644
index 7f6e661..71cf59e
*** a/doc/src/sgml/datatype.sgml
--- b/doc/src/sgml/datatype.sgml
***************
*** 453,470 ****
The type <type>integer</type> is the common choice, as it offers
the best balance between range, storage size, and performance.
The <type>smallint</type> type is generally only used if disk
! space is at a premium. The <type>bigint</type> type should only
! be used if the range of the <type>integer</type> type is insufficient,
! because the latter is definitely faster.
! </para
!
! <para
! On very minimal operating systems the <type>bigint</type> type
! might not function correctly, because it relies on compiler support
! for eight-byte integers. On such machines, <type>bigint</type
! acts the same as <type>integer</type>, but still takes up eight
! bytes of storage. (We are not aware of any modern
! platform where this is the case.)
/para

para
--- 453,460 ----
The type <type>integer</type> is the common choice, as it offers
the best balance between range, storage size, and performance.
The <type>smallint</type> type is generally only used if disk
! space is at a premium. The <type>bigint</type> type is designed to be
! used when the range of the <type>integer</type> type is insufficient.
/para

para


--
Sent via pgsql-docs mailing list (pgsql-docs (AT) postgresql (DOT) org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs

--
Bruce Momjian <bruce (AT) momjian (DOT) us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


--
Sent via pgsql-docs mailing list (pgsql-docs (AT) postgresql (DOT) org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.