dbTalk Databases Forums  

Constraints for R/O ./data/{base, et al.}

comp.databases.postgresql comp.databases.postgresql


Discuss Constraints for R/O ./data/{base, et al.} in the comp.databases.postgresql forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Don Y
 
Posts: n/a

Default Constraints for R/O ./data/{base, et al.} - 09-28-2011 , 10:58 AM






Hi,

Given largely static tables (etc.), how much of ./data
can I move onto R/O media? And, which types of operations
would a user then have to avoid?

Loosening constraints a little, how much more could a user
do if the R/O media was just "very slowly writable" media?

Finally, what operations *require* the fastest media updates
(i.e., "RAM")?

Thx,
--don

Reply With Quote
  #2  
Old   
Mladen Gogala
 
Posts: n/a

Default Re: Constraints for R/O ./data/{base, et al.} - 09-28-2011 , 01:51 PM






On Wed, 28 Sep 2011 08:58:31 -0700, Don Y wrote:

Quote:
Given largely static tables (etc.), how much of ./data can I move onto
R/O media?
No. Postgres doesn't support read only media.



--
http://mgogala.byethost5.com

Reply With Quote
  #3  
Old   
Laurenz Albe
 
Posts: n/a

Default Re: Constraints for R/O ./data/{base, et al.} - 09-29-2011 , 02:37 AM



Don Y wrote:
Quote:
Given largely static tables (etc.), how much of ./data
can I move onto R/O media? And, which types of operations
would a user then have to avoid?
As has been mentioned, that is not supported.
For example, every checkpoint changes files in the data directory.

Quote:
Loosening constraints a little, how much more could a user
do if the R/O media was just "very slowly writable" media?
He or she could do everything, DML would be very slow.
The only file that is updated frequently is
$PGDATA/pg_stat_tmp/pgstat.stat, that should be the only impact
on read-only queries.

Quote:
Finally, what operations *require* the fastest media updates
(i.e., "RAM")?
I am not sure if I understand you right, maybe you can clarify
that question.
No operation requires fast media updates.
RAM disks are a bad choice for persistent data.

Yours,
Laurenz Albe

Reply With Quote
  #4  
Old   
Don Y
 
Posts: n/a

Default Re: Constraints for R/O ./data/{base, et al.} - 10-07-2011 , 12:17 AM



Hi Laurenz,

On 9/29/2011 12:37 AM, Laurenz Albe wrote:
Quote:
Don Y wrote:
Given largely static tables (etc.), how much of ./data
can I move onto R/O media? And, which types of operations
would a user then have to avoid?

As has been mentioned, that is not supported.
"Supporting" R/O media and "working (properly) in its
presence" are two different issues. E.g., one could
argue that it is folly to have tables based in non-writable
media -- but that assumes you would ALWAYS want to be able
to alter those tables!

Quote:
For example, every checkpoint changes files in the data directory.
*Every* file? I.e., the "modified" times of all files are
changed AND their contents differ from what they were just
prior to the event?

Quote:
Loosening constraints a little, how much more could a user
do if the R/O media was just "very slowly writable" media?

He or she could do everything, DML would be very slow.
The only file that is updated frequently is
$PGDATA/pg_stat_tmp/pgstat.stat, that should be the only impact
on read-only queries.
What about the *results* of those queries? Or, are they just
pushed off to VM?

Quote:
Finally, what operations *require* the fastest media updates
(i.e., "RAM")?

I am not sure if I understand you right, maybe you can clarify
that question.
No operation requires fast media updates.
Queries need to be able to store their results "somewhere".
The amount and speed of that "somewhere" determines the
effective rate at which queries can happen.

OTOH, indexes that aren't being updated could tolerate
slow writes (esp if the OS implements async writes).

Quote:
RAM disks are a bad choice for persistent data.
Think in terms of:

ROM (i.e. UNwritable after manufacture)
Flash (*slowly* writable)
Disk (more quickly writeable)
RAM (very quickly writable)

Note that you can move individual files ("how much of ./data")
to different media as their needs dictate. What I am after
is some guidance as to how often/extensively particular files
are updated in ./data and what operations *cause* those
updates (i.e., controlling those operations means I can
control those updates).

To think of it in a more conventional environment, if you had
limited local store, what would you chose to move off to an
NAS? And, what operations would that affect?

--don

Reply With Quote
  #5  
Old   
Laurenz Albe
 
Posts: n/a

Default Re: Constraints for R/O ./data/{base, et al.} - 10-07-2011 , 03:10 AM



Don Y wrote:
Quote:
Given largely static tables (etc.), how much of ./data
can I move onto R/O media? And, which types of operations
would a user then have to avoid?

As has been mentioned, that is not supported.

"Supporting" R/O media and "working (properly) in its
presence" are two different issues. E.g., one could
argue that it is folly to have tables based in non-writable
media -- but that assumes you would ALWAYS want to be able
to alter those tables!
"Not supported" in that context means that if you have a problem
starting a PostgreSQL server where (part of) the data directory
is on a CDROM, very few people would be willing to investigate
and help you solve the problem.

Is that a theoretical question or do you really plan to move
your database to read-only media?

What is the problem behind your question?

Quote:
For example, every checkpoint changes files in the data directory.

*Every* file? I.e., the "modified" times of all files are
changed AND their contents differ from what they were just
prior to the event?
Of course not. I didn't say "all files".

I can't give you any guarantees which files will change and
which will not (I know too little). It is probably safe to
assume that user tables and indexes won't ever change if you
don't update (and if no VACUUM is run).

Quote:
Loosening constraints a little, how much more could a user
do if the R/O media was just "very slowly writable" media?

He or she could do everything, DML would be very slow.
The only file that is updated frequently is
$PGDATA/pg_stat_tmp/pgstat.stat, that should be the only impact
on read-only queries.

What about the *results* of those queries? Or, are they just
pushed off to VM?
Query results are in memory and get sent to the client.

But that brings up a good point: large sorts and the like
will use temporary files in the temporary tablespace if they
don't fit in memory. There's another example of things that
*will* be written in a read-only database.

Quote:
Queries need to be able to store their results "somewhere".
The amount and speed of that "somewhere" determines the
effective rate at which queries can happen.
The effective query speed is determined by many things,
depending on the kind of query. Often disk speed is the
limiting factor, but with big sort and hash operations
it can be RAM and CPU.

Quote:
Note that you can move individual files ("how much of ./data")
to different media as their needs dictate. What I am after
is some guidance as to how often/extensively particular files
are updated in ./data and what operations *cause* those
updates (i.e., controlling those operations means I can
control those updates).
Most files in a PostgreSQL data directory have their fixed
places, they can't be moved around.

Quote:
To think of it in a more conventional environment, if you had
limited local store, what would you chose to move off to an
NAS? And, what operations would that affect?
I would have the database directory on fast storage and
create tablespaces in slow storage. Tables and indexes that
are not used a lot (or are small and don't change much)
would go there.

Yours,
Laurenz Albe

Reply With Quote
  #6  
Old   
Don Y
 
Posts: n/a

Default Re: Constraints for R/O ./data/{base, et al.} - 10-07-2011 , 01:12 PM



Hi Laurenz,

On 10/7/2011 1:10 AM, Laurenz Albe wrote:
Quote:
Don Y wrote:
Given largely static tables (etc.), how much of ./data
can I move onto R/O media? And, which types of operations
would a user then have to avoid?

As has been mentioned, that is not supported.

"Supporting" R/O media and "working (properly) in its
presence" are two different issues. E.g., one could
argue that it is folly to have tables based in non-writable
media -- but that assumes you would ALWAYS want to be able
to alter those tables!

"Not supported" in that context means that if you have a problem
starting a PostgreSQL server where (part of) the data directory
is on a CDROM, very few people would be willing to investigate
and help you solve the problem.
Understood. I expect our port will have diverged sufficiently from
the main distribution that this sort of "support" simply wouldn't
be practical, either. It would take more effort to bring that
other party up to speed on the implementation and the toolchain
than it would cost to troubleshoot it ourselves.

Quote:
Is that a theoretical question or do you really plan to move
your database to read-only media?
The latter. Though I would qualify "database" to be "as much
as practical", given the issues I put forward in the initial
post.

Quote:
What is the problem behind your question?
Disks don't like to be dropped. Semiconductor memory has higher
operating power per bit. Cost. Size. Power consumption. etc.
If you can characterize the types of accesses (performance) and
alterations that individual parts of ./data are expected to
encounter in a particular application, then you can move those
parts to media most appropriate for those criteria.

E.g., if your tables consist of US Census data, those tables
themselves won't change for 10 years. Why store them on media
that is designed to *support* change? (Or, that can't be dropped.
Or, that dissipates lots of power. Or...)

OTOH, indexes might vary with use. Query results surely will.
Etc.

Quote:
For example, every checkpoint changes files in the data directory.

*Every* file? I.e., the "modified" times of all files are
changed AND their contents differ from what they were just
prior to the event?

Of course not. I didn't say "all files".

I can't give you any guarantees which files will change and
which will not (I know too little). It is probably safe to
assume that user tables and indexes won't ever change if you
don't update (and if no VACUUM is run).
If you've never made any changes to the tables/indexes, why
would VACUUM touch those files? (i.e., assuming the DB had been
vacuumed prior to being cast in stone)

Quote:
Loosening constraints a little, how much more could a user
do if the R/O media was just "very slowly writable" media?

He or she could do everything, DML would be very slow.
The only file that is updated frequently is
$PGDATA/pg_stat_tmp/pgstat.stat, that should be the only impact
on read-only queries.

What about the *results* of those queries? Or, are they just
pushed off to VM?

Query results are in memory and get sent to the client.
So, *big* query results just rely on the VM system?

Quote:
But that brings up a good point: large sorts and the like
will use temporary files in the temporary tablespace if they
don't fit in memory. There's another example of things that
*will* be written in a read-only database.
Yes, but temporary files can be recognized as such. I.e.,
it is relatively trivial to hook the creat(3c) call so that it
always uses writable media.

Quote:
Queries need to be able to store their results "somewhere".
The amount and speed of that "somewhere" determines the
effective rate at which queries can happen.

The effective query speed is determined by many things,
Yes, of course. My point was that queries effectively
create (temporary) results that have to be stored. Hence,
the write speed of that medium becomes a factor (as is
the *read* speed of the sources that it consults)

Quote:
depending on the kind of query. Often disk speed is the
limiting factor, but with big sort and hash operations
it can be RAM and CPU.

Note that you can move individual files ("how much of ./data")
to different media as their needs dictate. What I am after
is some guidance as to how often/extensively particular files
are updated in ./data and what operations *cause* those
updates (i.e., controlling those operations means I can
control those updates).

Most files in a PostgreSQL data directory have their fixed
places, they can't be moved around.
Understood. But, we can arrange for the underlying OS to
move them without PostgreSQL even being aware that they
reside on different media. This can be as simple as linking
PostgreSQL against a libc that has suitably modified fileio
functions that embody these mechanisms.

Quote:
To think of it in a more conventional environment, if you had
limited local store, what would you chose to move off to an
NAS? And, what operations would that affect?

I would have the database directory on fast storage and
create tablespaces in slow storage. Tables and indexes that
are not used a lot (or are small and don't change much)
would go there.
Thanks! We'll look to see if we can identify the places where
each of these actions take place and see what "rules" we can
distill from them.

--don

Reply With Quote
  #7  
Old   
Mladen Gogala
 
Posts: n/a

Default Re: Constraints for R/O ./data/{base, et al.} - 10-08-2011 , 12:48 AM



On Fri, 07 Oct 2011 11:12:04 -0700, Don Y wrote:


Quote:
Is that a theoretical question or do you really plan to move your
database to read-only media?

The latter. Though I would qualify "database" to be "as much as
practical", given the issues I put forward in the initial post.
There are databases that do support read only media. MySQL has something
called myisampack that generates compressed, read only tables. MySQL can
have read only tables on read only media, like DVD. There are also
commercial databases that can switch a table or even the entire tablespace
from read write to read only and back. Postgresql doesn't have that
option. You may try using dbi-link to read a read-only table from another
database and access it from Postgresql.
--
http://mgogala.byethost5.com

Reply With Quote
  #8  
Old   
Laurenz Albe
 
Posts: n/a

Default Re: Constraints for R/O ./data/{base, et al.} - 10-10-2011 , 09:48 AM



Don Y wrote:
[wants to put part of the data directory on read-only storage]
Quote:
I can't give you any guarantees which files will change and
which will not (I know too little). It is probably safe to
assume that user tables and indexes won't ever change if you
don't update (and if no VACUUM is run).

If you've never made any changes to the tables/indexes, why
would VACUUM touch those files? (i.e., assuming the DB had been
vacuumed prior to being cast in stone)
I don't know the server code well enough to answer that question
without some research.

Quote:
Query results are in memory and get sent to the client.

So, *big* query results just rely on the VM system?
I *think* so. Again I don't know the relevant parts of the server
code well enough.

Quote:
But that brings up a good point: large sorts and the like
will use temporary files in the temporary tablespace if they
don't fit in memory. There's another example of things that
*will* be written in a read-only database.

Yes, but temporary files can be recognized as such. I.e.,
it is relatively trivial to hook the creat(3c) call so that it
always uses writable media.
It would be simpler than that, because these files will be
created in the temporary tablespace.

Quote:
I would have the database directory on fast storage and
create tablespaces in slow storage. Tables and indexes that
are not used a lot (or are small and don't change much)
would go there.

Thanks! We'll look to see if we can identify the places where
each of these actions take place and see what "rules" we can
distill from them.
To really get definite answers which files in the data directory
will never change, you should look for somebody who knows
PostgreSQL better than I do. I suggest asking the pgsql-hackers
mailing list.

You would probably have to read a fair bit of server code and
understand the design principles well.

Maybe a tablespace with tables and indexes on read-only storage
would work. You could easily try it out.

Yours,
Laurenz Albe

Reply With Quote
  #9  
Old   
Laurenz Albe
 
Posts: n/a

Default Re: Constraints for R/O ./data/{base, et al.} - 10-25-2011 , 05:37 AM



I wrote:
Quote:
If you've never made any changes to the tables/indexes, why
would VACUUM touch those files? (i.e., assuming the DB had been
vacuumed prior to being cast in stone)

I don't know the server code well enough to answer that question
without some research.
[...]

Quote:
To really get definite answers which files in the data directory
will never change, you should look for somebody who knows
PostgreSQL better than I do. I suggest asking the pgsql-hackers
mailing list.
I decided to increase my knowledge by playing around a little,
and it turns out that you need write permission even to *read*
a table. The file is opened with "O_RDWR | PG_BINARY", so even
SELECTing from a table whose file is read-only will give you
ERROR: could not open file "base/17046/17047": Permission denied

That's no bug, because a SELECT can cause writes (update the
hint bits).

So in order to have something like PostgreSQL tables on read-only
media, you'd definitely have to hack up PostgreSQL.

Yours,
Laurenz Albe

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.