dbTalk Databases Forums  

Direct I/O

comp.databases.postgresql comp.databases.postgresql


Discuss Direct I/O in the comp.databases.postgresql forum.



Reply
 
Thread Tools Display Modes
  #11  
Old   
Mladen Gogala
 
Posts: n/a

Default Re: Direct I/O - 01-27-2010 , 05:05 PM






On Wed, 27 Jan 2010 16:51:56 +0100, Laurenz Albe wrote:

Quote:
As far as I know, the memory *is* usable by other processes. Normally
"free" will report very small values on a Linux machine, but that won't
keep a large memory allocation request from succeeding.
Well, the story is a bit more complex than that. The memory used for
buffer cache will allocate swap partition, too. It will not actually
write to it, but the picture you will see when you execute "free" is that
a lot of swap space is taken. Furthermore, memory consumed by the FS
cache is not declared as "free", which means that it has to be freed for
each larger allocation. You can circumvent that by setting
min_free_kbytes to 5% of the memory. These are my settings for the
PostgreSQL server:

vm.min_free_kbytes = 262144
vm.swappiness = 0
vm.page-cluster = 5
vm.overcommit_memory = 1
kernel.shmall = 2097152
kernel.shmmax = 1073741824
kernel.shmmni = 4096
kernel.sem = 1024 32000 100 1024
fs.file-max = 6815744
fs.aio-max-nr = 1048576
net.ipv4.ip_local_port_range = 9000 65500
net.core.rmem_default = 262144
net.core.rmem_max = 4194304
net.core.wmem_default = 262144
net.core.wmem_max = 1048576

I've also increased the readahead on my main tablespace:

[root@lpo-postgres-01 vm]# hdparm /dev/cciss/c0d0p8

/dev/cciss/c0d0p8:
readonly = 0 (off)
readahead = 2048 (on)
geometry = 4730/255/32, sectors = 513389142, start = 59970708
[root@lpo-postgres-01 vm]#

This is what the memory looks like:
[mgogala@lpo-postgres-01 ~]$ free
total used free shared buffers cached
Mem: 4148528 3062840 1085688 0 13016 2939612
-/+ buffers/cache: 110212 4038316
Swap: 8385920 997744 7388176

The machine has 4 CPU threads:
[mgogala@lpo-postgres-01 ~]$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Xeon(R) CPU X5450 @ 3.00GHz
stepping : 6
cpu MHz : 3000.134
cache size : 6144 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 4
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm
constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
bogomips : 6003.20

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Xeon(R) CPU X5450 @ 3.00GHz
stepping : 6
cpu MHz : 3000.134
cache size : 6144 KB
physical id : 0
siblings : 4
core id : 1
cpu cores : 4
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm
constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
bogomips : 5999.97

processor : 2
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Xeon(R) CPU X5450 @ 3.00GHz
stepping : 6
cpu MHz : 3000.134
cache size : 6144 KB
physical id : 0
siblings : 4
core id : 2
cpu cores : 4
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm
constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
bogomips : 5999.98

processor : 3
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Xeon(R) CPU X5450 @ 3.00GHz
stepping : 6
cpu MHz : 3000.134
cache size : 6144 KB
physical id : 0
siblings : 4
core id : 3
cpu cores : 4
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm
constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
bogomips : 6000.01

[mgogala@lpo-postgres-01 ~]$

Yet, despite the fact that it is a fairly powerful machine, the
performance is not satisfactory:

mgogala@nycwxp2622:~$ psql -d news_archive -h lpo-postgres-01
Timing is on.
psql (8.4.2, server 8.3.9)
WARNING: psql version 8.4, server version 8.3.
Some psql features might not work.
Type "help" for help.

news_archive=# set search_path=news,public;
SET
Time: 30.157 ms
news_archive=# select count(*) from news_segments;
count
----------
16192541
(1 row)

Time: 2173193.882 ms
news_archive=#

That is almost 40 minutes to count rows in a partitioned table.




--
http://mgogala.byethost5.com

Reply With Quote
  #12  
Old   
Laurenz Albe
 
Posts: n/a

Default Re: Direct I/O - 01-28-2010 , 08:08 AM






Mladen Gogala wrote:
Quote:
http://kerneltrap.org/node/7563

I am afraid that I am the one of those "brain dead database people" who
has been using it on Unix, long before the advent of Linux. What is more,
I am hell bent on remaining one of those "brain dead database people" and
if Linux doesn't provide that interface, something else will. The
question is only whether my DB of choice will allow me to use it. If not,
there is always another DB.
Again, I am no expert in this matter and I don't want to change your ways.

Yet when I compare old DBMS (like Oracle) with newer ones (like PostgreSQL),
I find that the old ones have a strong mistrust in file systems.

Oracle supports raw devices, and it always uses its own "file system"
(an Oracle tablespace is a file system to all practical purposes, with
"segments" being files).

PostgreSQL, on the other hand, relies a lot on the underlying
file system - up to the point of embracing file system buffering
as being something helpful.

I assume that the reason for this is that in the old days, file systems
used to be much worse than they are now, with performance penalties
on directories with many files in them and the like.


The test case you show in another posting on the same thread
seems indeed very slow. Did you try it without kernel/file system tuning
(excluding the necessary shared memory parameters)?
Was is even slower there?
Is there a lot of dead space in the table?

Yours,
Laurenz Albe

Reply With Quote
  #13  
Old   
Mladen Gogala
 
Posts: n/a

Default Re: Direct I/O - 01-28-2010 , 11:24 AM



On Thu, 28 Jan 2010 14:08:20 +0100, Laurenz Albe wrote:

Quote:
assume that the reason for this is that in the old days, file systems
used to be much worse than they are now, with performance penalties on
directories with many files in them and the like.
Actually, file systems haven't changed that much. Ext3 is, essentially, a
plain vanilla file system which doesn't perform well when compared with
more advanced file systems like VxFS or JFS2. Given the ACID
requirements, the mistrust of the file systems is well founded.

Quote:

The test case you show in another posting on the same thread seems
indeed very slow. Did you try it without kernel/file system tuning
(excluding the necessary shared memory parameters)? Was is even slower
there?
Yes, it was slower.

Quote:
Is there a lot of dead space in the table?
That is the main problem. I should have done a full vacuum. Also, the
table is partitioned into 3 tables, each containing a month worth of data
and a large text column on which I am creating indexes:

news_archive=# create index fti_segs_data0558 on news.news_segs_data0558
news_archive-# using gin(to_tsvector('english',segment_text));
NOTICE: word is too long to be indexed
DETAIL: Words longer than 2047 characters are ignored.

The company wants to move a data warehouse from an Oracle RAC to
PostgreSQL. The database on RAC is essentially one gigantic table, having
400 million records, partitioned into month-sized chunks and
searched via text indexes. The pilot project loaded 3 months worth of
partitions into PgSQL and developers are now writing a little Groovy
Grails application to search it. I am just the DBA here. The size of the
DW is going to exceed 4TB.


--
http://mgogala.byethost5.com

Reply With Quote
  #14  
Old   
Laurenz Albe
 
Posts: n/a

Default Re: Direct I/O - 01-29-2010 , 02:41 AM



Mladen Gogala wrote:
Quote:
That is the main problem. I should have done a full vacuum. Also, the
table is partitioned into 3 tables, each containing a month worth of data
and a large text column on which I am creating indexes:

news_archive=# create index fti_segs_data0558 on news.news_segs_data0558
news_archive-# using gin(to_tsvector('english',segment_text));
NOTICE: word is too long to be indexed
DETAIL: Words longer than 2047 characters are ignored.
The difference between the subject and the message indicates that
this thread is almost done :^)

The notice is nothing to worry about and has nothing to
do with the size of the text column. It indicates that one row
contains a single word longer than 2047 characters, which will
be ignored. That is rarely a problem.

Yours,
Laurenz Albe

Reply With Quote
  #15  
Old   
Anselmo Canfora
 
Posts: n/a

Default Re: Direct I/O - 01-29-2010 , 05:02 AM



Il 27/01/2010 16.52, Mladen Gogala ha scritto:
Quote:
On Wed, 27 Jan 2010 15:44:15 +0100, Anselmo Canfora wrote:

Contrary to what I had expected, many knowledgable PostgreSQL
developers advocate version b) and say that they experience better
performance that way.

are performance better only on write or in queries too?

I imagine that the performance would be better primarily on queries,
because of the pre-fetch.
I think in performance would be worse with a low shared_buffers value
when fetching data, I had some tests some time ago that confirmed me
that (in my particular conf, Linux 2.6, ext3 and default fs behavior)

Reply With Quote
  #16  
Old   
Anselmo Canfora
 
Posts: n/a

Default Re: Direct I/O - 01-29-2010 , 06:07 AM



Il 27/01/2010 16.56, Laurenz Albe ha scritto:
Quote:
Anselmo Canfora wrote:
I ask because there are two approaches to buffering PostgreSQL
databases in RAM:
a) make shared_buffers as large as possible and let PostgreSQL
do the buffering (this is probably what you are doing).
b) keep shared_buffers comparatively small and let the file
system buffer do the work.

Hi, usually I keep shared_buffers high (25% of RAM)
how much % of memory is it meant for comparatively small?

I don't know :^)

25% of RAM seems to be the wrong value in any case -
see the mail I quoted in my response to Mladen's posting
in the same thread.
Some people advises so:
http://www.slideshare.net/oscon2007/...e-whack-a-mole
slide 30

I am used to keep shared_buffers at 25% with good performances.
The only thing I am dissatisfied about is update performances on large
tables (say, +50.000 records).

Quote:
What impact would have a low shared_buffers value on checkpoints?

I would expect checkpoints to be faster, because there are
probably fewer dirty pages. On the downside, overall I/O performance
may be worse because dirty pages will be forced to disk more
often...
In concurrent scenarios to have a short RAM queue against disk
operations perhaps would drive to very bad performances, isn't?

Quote:
Contrary to what I had expected, many knowledgable PostgreSQL
developers advocate version b) and say that they experience
better performance that way.

are performance better only on write or in queries too?

Again, I don't know. You'd have to experiment.

Yours,
Laurenz Albe


Reply With Quote
  #17  
Old   
Laurenz Albe
 
Posts: n/a

Default Re: Direct I/O - 01-29-2010 , 09:52 AM



Anselmo Canfora wrote:
Quote:
25% of RAM seems to be the wrong value in any case -
see the mail I quoted in my response to Mladen's posting
in the same thread.

Some people advises so:
http://www.slideshare.net/oscon2007/...e-whack-a-mole
slide 30
Immediately below that piece of advice there is a recommendation
to set work_mem to some constant, with the comment:
- but not more than RAM / no_connections

That is pretty obviously nonsense, because that limit is way too
high: if all connections use their work_mem, then all your RAM
will get exhausted.

So I am not too sure if I would trust that source.

Quote:
I am used to keep shared_buffers at 25% with good performances.
Never change a working system.

Quote:
The only thing I am dissatisfied about is update performances on large tables (say, +50.000 records).
Do you have fillfactor < 100? That can improve update performance.

Quote:
I would expect checkpoints to be faster, because there are
probably fewer dirty pages. On the downside, overall I/O performance
may be worse because dirty pages will be forced to disk more
often...

In concurrent scenarios to have a short RAM queue against disk operations perhaps would drive to very bad performances, isn't?
What is a "RAM queue"?

Yours,
Laurenz Albe

Reply With Quote
  #18  
Old   
Anselmo Canfora
 
Posts: n/a

Default Re: Direct I/O - 01-29-2010 , 10:57 AM



Il 29/01/2010 15.52, Laurenz Albe ha scritto:
Quote:
Anselmo Canfora wrote:
25% of RAM seems to be the wrong value in any case -
see the mail I quoted in my response to Mladen's posting
in the same thread.

Some people advises so:
http://www.slideshare.net/oscon2007/...e-whack-a-mole
slide 30

Immediately below that piece of advice there is a recommendation
to set work_mem to some constant, with the comment:
- but not more than RAM / no_connections
yes, that piece is inaccurate

Quote:
That is pretty obviously nonsense, because that limit is way too
high: if all connections use their work_mem, then all your RAM
will get exhausted.

So I am not too sure if I would trust that source.

I am used to keep shared_buffers at 25% with good performances.

Never change a working system.

The only thing I am dissatisfied about is update performances on large tables (say, +50.000 records).

Do you have fillfactor< 100? That can improve update performance.
do you mean for indexes? never tweaked that, how can I visualize the
fillfactor of an already existing index?

Quote:
I would expect checkpoints to be faster, because there are
probably fewer dirty pages. On the downside, overall I/O performance
may be worse because dirty pages will be forced to disk more
often...

In concurrent scenarios to have a short RAM queue against disk operations perhaps would drive to very bad performances, isn't?

What is a "RAM queue"?
generic term to denote the operations performed in RAM

Quote:
Yours,
Laurenz Albe


Reply With Quote
  #19  
Old   
Laurenz Albe
 
Posts: n/a

Default Re: Direct I/O - 02-02-2010 , 05:17 AM



Anselmo Canfora wrote:
Quote:
The only thing I am dissatisfied about is update performances on
large tables (say, +50.000 records).

Do you have fillfactor< 100? That can improve update performance.

do you mean for indexes? never tweaked that, how can I visualize the fillfactor of an already existing index?
Thsi is already quite off-thread...

No, I meant fillfactor on tables.

The default fillfactor for a B-tree index is 90 by default anyway:
http://www.postgresql.org/docs/curre...AGE-PARAMETERS

You can find the setting in the "reloptions" column of
the "pg_class" catalog. If you *cannot* find it there, it
is set to the default.

If fillfactor on a table is less than 100, you can profit
from "heap only tuples": if there is enough free space in the
data block, the new row will be written to the same block
as the old row and any indexes will not need to be updated
(if no indexed column has changed of course) because the old
row will reference the new row.
Moreover, only one table block will be touched.

Yours,
Laurenz Albe

Reply With Quote
  #20  
Old   
Anselmo Canfora
 
Posts: n/a

Default fillfactor and updates - 02-02-2010 , 08:05 AM



Il 02/02/2010 11.17, Laurenz Albe ha scritto:
Quote:
Anselmo Canfora wrote:
The only thing I am dissatisfied about is update performances on
large tables (say, +50.000 records).

Do you have fillfactor< 100? That can improve update performance.

do you mean for indexes? never tweaked that, how can I visualize the fillfactor of an already existing index?

Thsi is already quite off-thread...
I changed object

Quote:
No, I meant fillfactor on tables.

The default fillfactor for a B-tree index is 90 by default anyway:
http://www.postgresql.org/docs/curre...AGE-PARAMETERS

You can find the setting in the "reloptions" column of
the "pg_class" catalog. If you *cannot* find it there, it
is set to the default.

If fillfactor on a table is less than 100, you can profit
from "heap only tuples": if there is enough free space in the
data block, the new row will be written to the same block
as the old row and any indexes will not need to be updated
(if no indexed column has changed of course) because the old
row will reference the new row.
Moreover, only one table block will be touched.
I ran a simple dumb test, actually it seems that lowering fillfactor
improves update speed and decrease insert speed as it shuld be in
theory, it is a matter of tradeoffs between update speed and read speed.
Perhaps for maintenance updatates on a large part of big tables as I am
thinking about it is better to drop related indexes, do the update and
then rebuild the index.

db_atm=# create table tmp.test (id int primary key);
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index
"test_pkey" for table "test"
CREATE TABLE
Time: 48,258 ms
db_atm=# select relname, reloptions from pg_class where relname='test';
relname | reloptions
---------+------------
test |
(1 row)

Time: 0,985 ms
db_atm=# insert into tmp.test select * from generate_series(1, 2000000);
INSERT 0 2000000
Time: 16342,381 ms
db_atm=# update tmp.test set id=id+2000000;
UPDATE 2000000
Time: 20341,573 ms
db_atm=# truncate table tmp.test;
TRUNCATE TABLE
Time: 336,947 ms
db_atm=# alter table tmp.test set (fillfactor=50);
ALTER TABLE
Time: 98,226 ms
db_atm=# reindex table tmp.test;
REINDEX
Time: 53,652 ms
db_atm=# insert into tmp.test select * from generate_series(1, 2000000);
INSERT 0 2000000
Time: 19497,379 ms
db_atm=# update tmp.test set id=id+2000000;
UPDATE 2000000
Time: 12693,774 ms
db_atm=# select relname, reloptions from pg_class where relname='test';
relname | reloptions
---------+-----------------
test | {fillfactor=50}
(1 row)

Time: 7,796 ms
db_atm=# drop table tmp.test ;
DROP TABLE
Time: 145,062 ms
db_atm=#

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.