dbTalk Databases Forums  

implementing a database log

comp.databases.theory comp.databases.theory


Discuss implementing a database log in the comp.databases.theory forum.



Reply
 
Thread Tools Display Modes
  #201  
Old   
Brian Selzer
 
Posts: n/a

Default Re: implementing a database log - 04-29-2008 , 11:31 AM







"David BL" <davidbl (AT) iinet (DOT) net.au> wrote

Quote:
On Apr 28, 9:25 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:ae011bd0-63c3-4373-ba35-2fe9fe6be331 (AT) 26g2000hsk (DOT) googlegroups.com...





On Apr 28, 12:25 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:289d3678-7a00-463e-aa99-33c7944ce68b (AT) 27g2000hsf (DOT) googlegroups.com...

On Apr 24, 10:11 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com
wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:fd63f466-18f7-4986-b378-f5e9f512bbd8 (AT) r66g2000hsg (DOT) googlegroups.com...

On Apr 22, 11:39 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com
wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:265621e9-25fd-46d5-9e2d-7a4f63fa84b4 (AT) m3g2000hsc (DOT) googlegroups.com...

On Apr 22, 6:58 am, "Brian Selzer" <br... (AT) selzer-software (DOT) com
wrote:
"Christoph Rupp" <cruppst... (AT) gmail (DOT) com> wrote in message

news:91140c56-1f05-4b5d-b45f-b34920db2051 (AT) x41g2000hsb (DOT) googlegroups.com...

Brian,

On Apr 21, 11:00 pm, "Brian Selzer"
br... (AT) selzer-software (DOT) com
wrote:
Why not go with #4:

4. a physical log based on modified rows. Whenever a row
is
modified,
added
or removed, it is logged. Then you could also implement
row
versioning--just add a row version field to the physical
rows.
I
believe
that this what snapshot isolation is built on.

It's not an SQL database, i don't even have the notion of
"rows",
but
basically i think your #4 is the same as my #1 or #2.

No, it isn't. #1 requires the logging of additional records
that
may
not
have been affected by an update. #2 doesn't log the entire
changed
record,
but only bits and pieces. I would think that limiting the
units
of
change
to individual records--entire records--would simplify the
process
of
marking
and isolating units of work while at the same time
guaranteeing
consistency.

I don't think an atomic unit of work is always associated with
a
change to an individual record. Are you suggesting
transactions
to
define arbitrarily large units of work aren't needed?

No, that's not what I'm suggesting. What I'm suggesting is that
the
atomic
unit of work should be a /set/ of /records/--either the old
records
in
the
case of a before image or the new records in the case of an
after
image.

Ok but that sounds like the system snapshots an entire table in
the
before/after images.

For efficiency one would expect to only store the set of records
that
have been added and the set that have been removed by a given
transaction. It is easy to see how we get the inverse which is
required for roll back of an uncommitted transaction during
recovery.
However these logical operations aren't idempotent (at least for
bags
of records). How does recovery deal with non-idempotent redo()/
undo() changes in the log?

Why would there be bags of records? At the physical level, each
record
has
a specific offset in some file, and that offset would uniquely
identify
it.
Why would you strip off that identification? Consequently, there
wouldn't
be bags of records, only sets of records.

One could say the log records are representing physical changes if
they record the physical locations.

If you start out with a set of records and you know what is to be
inserted,
updated, and deleted, you can compute the resulting set of records.
If
you
start out with the resulting set of records, and you know what was
inserted,
updated and deleted in order to arrive at that result, then you can
compute
the original set of records. For simplicity, even if it isn't
necessarily
the most efficient, what is updated could be implemented as a set
of
ordered
pairs of records, tying each original record to its replacement.
So
the
log
would consist of a sequence of triples (D, U, I) separated by
transaction
markers where D is a set of records that were deleted, U is a set
of
pairs
of records that were updated, and I is a set of records that were
inserted.

Now, provided that the log is written before the database--that is,
(1)
write the triple to the log, (2) write the database, (3) write the
transaction marker in the log--, it should be possible to determine
whether
or not what was written to the log actually made it into the
database,
and
thus it should be possible to roll back any uncommitted
transaction.

A modern DBMS using WAL will tend to use a "lazy writer" that writes
dirty pages to disk in the background. For performance this will
tend to write dirty pages in a physical order so the disk head moves
uniformly over the platter. This assumes the only constraint
between
the writing to the log and the dirty pages is the WAL constraint -
ie
dirty pages must be written to disk strictly after the associated
log
records have been written to disk. To meet this constraint the lazy
writer simply needs to ignore dirty pages that haven't (yet) been
logged.

Your suggestion brings far more onerous constraints on the disk
writing policies.

No it doesn't. It only requires that the data be written to disk
before
the
transaction marker is written to the log. Whether the disk writes are
queued up to be written lazily is a separate issue.

When is the transaction deemed to have committed? I was assuming your
transaction marker was used to commit the transaction. If not what is
it for? Check pointing? I think there are better ways to check
point than that. AFAIK there is no need to check point *individual*
transactions with special markers.

The transaction marker is used to indicate that the transaction is
complete.
What that means is that the changes that are called for in the triple
have
been written to the database. It's not enough to simply write the log
and
then write the database, because without the marker following the triple
in
the log, there is no way to be sure that the changes called for in the
triple actually made it into the database.

Ok so it relates to check pointing. I don't see how these markers are
particularly useful - ie to determine where to start replaying the log
during recovery. For example, when ARIES performs a check point it
records
1) the active transaction list
2) last LSN (Log Sequence Number) for each active transaction
3) the set of dirty pages.
4) for each dirty page, the LSN of the earliest log
record whose effect is not reflected in the page on disk.

This allows the recovery scan to determine where to start replaying
the log (in the REDO pass).

That's not what the markers are for. In fact, during recovery only the last
marker really matters.

Quote:
In fact to make a transaction durable it seems
necessary to write all the dirty pages to disk as part of the
transaction. This is certainly not normally the case in a modern
DBMS. A database that avoids a "force" policy requires REDO during
recovery.

Often a DBMS that supports long running transactions will allow
dirty
pages to be written to disk even though the associated transaction
hasn't yet committed. This is the so called "steal" policy and
leads
to the need for UNDO during recovery.

The only way this can happen is if what was replaced as a result of
the
writes is cached while the transactions are outstanding. In order to
improve performance in the presence of long-running transactions, it
may
be
necessary to maintain that cache on disk. I should emphasize here
that
such
caching is strictly a mechanism for improving performance.

With a general purpose DBMS using strict 2PL there is inevitably a
chance of dead-lock and the need to abort transactions. This in turn
requires UNDO of all changes made by a particular transaction. You
can either record before images of pages of uncommitted transactions,
or else use the log directly to roll back a transaction. The latter
is economical with caching of the tail of the log and backward
chaining of log records for a given transaction.

Ideally, the dbms engine would serialize physical transactions. The
before
images of uncommitted transactions could be read by other transactions
instead of the database during writes to the database. Since what is
written to the log is just the changes required by a particular
transaction,
it should be possible to cache the changes required by other transactions
while each is committed in turn. Note that by 'serialize physical
transactions' I'm not referring to transaction isolation levels, which
involve how information is read, but instead how information is written.

Is this for MVCC?

Not exactly. While it should be possible to have several outstanding
transactions, it makes sense to me to allow only one to commit (at least
physically) at a time. If you have two transactions, they can either
(1) be completely independent of each other so that it shouldn't matter
which transaction commits first, or
(2) interact with each other so that the order in which they commit can
result in different database states.

It should be possible to extend this to involve any number of outstanding
transactions so that a sequence of writes to the log and the database can be
determined.




Reply With Quote
  #202  
Old   
Brian Selzer
 
Posts: n/a

Default Re: implementing a database log - 04-29-2008 , 11:31 AM







"David BL" <davidbl (AT) iinet (DOT) net.au> wrote

Quote:
On Apr 28, 9:25 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:ae011bd0-63c3-4373-ba35-2fe9fe6be331 (AT) 26g2000hsk (DOT) googlegroups.com...





On Apr 28, 12:25 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:289d3678-7a00-463e-aa99-33c7944ce68b (AT) 27g2000hsf (DOT) googlegroups.com...

On Apr 24, 10:11 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com
wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:fd63f466-18f7-4986-b378-f5e9f512bbd8 (AT) r66g2000hsg (DOT) googlegroups.com...

On Apr 22, 11:39 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com
wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:265621e9-25fd-46d5-9e2d-7a4f63fa84b4 (AT) m3g2000hsc (DOT) googlegroups.com...

On Apr 22, 6:58 am, "Brian Selzer" <br... (AT) selzer-software (DOT) com
wrote:
"Christoph Rupp" <cruppst... (AT) gmail (DOT) com> wrote in message

news:91140c56-1f05-4b5d-b45f-b34920db2051 (AT) x41g2000hsb (DOT) googlegroups.com...

Brian,

On Apr 21, 11:00 pm, "Brian Selzer"
br... (AT) selzer-software (DOT) com
wrote:
Why not go with #4:

4. a physical log based on modified rows. Whenever a row
is
modified,
added
or removed, it is logged. Then you could also implement
row
versioning--just add a row version field to the physical
rows.
I
believe
that this what snapshot isolation is built on.

It's not an SQL database, i don't even have the notion of
"rows",
but
basically i think your #4 is the same as my #1 or #2.

No, it isn't. #1 requires the logging of additional records
that
may
not
have been affected by an update. #2 doesn't log the entire
changed
record,
but only bits and pieces. I would think that limiting the
units
of
change
to individual records--entire records--would simplify the
process
of
marking
and isolating units of work while at the same time
guaranteeing
consistency.

I don't think an atomic unit of work is always associated with
a
change to an individual record. Are you suggesting
transactions
to
define arbitrarily large units of work aren't needed?

No, that's not what I'm suggesting. What I'm suggesting is that
the
atomic
unit of work should be a /set/ of /records/--either the old
records
in
the
case of a before image or the new records in the case of an
after
image.

Ok but that sounds like the system snapshots an entire table in
the
before/after images.

For efficiency one would expect to only store the set of records
that
have been added and the set that have been removed by a given
transaction. It is easy to see how we get the inverse which is
required for roll back of an uncommitted transaction during
recovery.
However these logical operations aren't idempotent (at least for
bags
of records). How does recovery deal with non-idempotent redo()/
undo() changes in the log?

Why would there be bags of records? At the physical level, each
record
has
a specific offset in some file, and that offset would uniquely
identify
it.
Why would you strip off that identification? Consequently, there
wouldn't
be bags of records, only sets of records.

One could say the log records are representing physical changes if
they record the physical locations.

If you start out with a set of records and you know what is to be
inserted,
updated, and deleted, you can compute the resulting set of records.
If
you
start out with the resulting set of records, and you know what was
inserted,
updated and deleted in order to arrive at that result, then you can
compute
the original set of records. For simplicity, even if it isn't
necessarily
the most efficient, what is updated could be implemented as a set
of
ordered
pairs of records, tying each original record to its replacement.
So
the
log
would consist of a sequence of triples (D, U, I) separated by
transaction
markers where D is a set of records that were deleted, U is a set
of
pairs
of records that were updated, and I is a set of records that were
inserted.

Now, provided that the log is written before the database--that is,
(1)
write the triple to the log, (2) write the database, (3) write the
transaction marker in the log--, it should be possible to determine
whether
or not what was written to the log actually made it into the
database,
and
thus it should be possible to roll back any uncommitted
transaction.

A modern DBMS using WAL will tend to use a "lazy writer" that writes
dirty pages to disk in the background. For performance this will
tend to write dirty pages in a physical order so the disk head moves
uniformly over the platter. This assumes the only constraint
between
the writing to the log and the dirty pages is the WAL constraint -
ie
dirty pages must be written to disk strictly after the associated
log
records have been written to disk. To meet this constraint the lazy
writer simply needs to ignore dirty pages that haven't (yet) been
logged.

Your suggestion brings far more onerous constraints on the disk
writing policies.

No it doesn't. It only requires that the data be written to disk
before
the
transaction marker is written to the log. Whether the disk writes are
queued up to be written lazily is a separate issue.

When is the transaction deemed to have committed? I was assuming your
transaction marker was used to commit the transaction. If not what is
it for? Check pointing? I think there are better ways to check
point than that. AFAIK there is no need to check point *individual*
transactions with special markers.

The transaction marker is used to indicate that the transaction is
complete.
What that means is that the changes that are called for in the triple
have
been written to the database. It's not enough to simply write the log
and
then write the database, because without the marker following the triple
in
the log, there is no way to be sure that the changes called for in the
triple actually made it into the database.

Ok so it relates to check pointing. I don't see how these markers are
particularly useful - ie to determine where to start replaying the log
during recovery. For example, when ARIES performs a check point it
records
1) the active transaction list
2) last LSN (Log Sequence Number) for each active transaction
3) the set of dirty pages.
4) for each dirty page, the LSN of the earliest log
record whose effect is not reflected in the page on disk.

This allows the recovery scan to determine where to start replaying
the log (in the REDO pass).

That's not what the markers are for. In fact, during recovery only the last
marker really matters.

Quote:
In fact to make a transaction durable it seems
necessary to write all the dirty pages to disk as part of the
transaction. This is certainly not normally the case in a modern
DBMS. A database that avoids a "force" policy requires REDO during
recovery.

Often a DBMS that supports long running transactions will allow
dirty
pages to be written to disk even though the associated transaction
hasn't yet committed. This is the so called "steal" policy and
leads
to the need for UNDO during recovery.

The only way this can happen is if what was replaced as a result of
the
writes is cached while the transactions are outstanding. In order to
improve performance in the presence of long-running transactions, it
may
be
necessary to maintain that cache on disk. I should emphasize here
that
such
caching is strictly a mechanism for improving performance.

With a general purpose DBMS using strict 2PL there is inevitably a
chance of dead-lock and the need to abort transactions. This in turn
requires UNDO of all changes made by a particular transaction. You
can either record before images of pages of uncommitted transactions,
or else use the log directly to roll back a transaction. The latter
is economical with caching of the tail of the log and backward
chaining of log records for a given transaction.

Ideally, the dbms engine would serialize physical transactions. The
before
images of uncommitted transactions could be read by other transactions
instead of the database during writes to the database. Since what is
written to the log is just the changes required by a particular
transaction,
it should be possible to cache the changes required by other transactions
while each is committed in turn. Note that by 'serialize physical
transactions' I'm not referring to transaction isolation levels, which
involve how information is read, but instead how information is written.

Is this for MVCC?

Not exactly. While it should be possible to have several outstanding
transactions, it makes sense to me to allow only one to commit (at least
physically) at a time. If you have two transactions, they can either
(1) be completely independent of each other so that it shouldn't matter
which transaction commits first, or
(2) interact with each other so that the order in which they commit can
result in different database states.

It should be possible to extend this to involve any number of outstanding
transactions so that a sequence of writes to the log and the database can be
determined.




Reply With Quote
  #203  
Old   
Brian Selzer
 
Posts: n/a

Default Re: implementing a database log - 04-29-2008 , 11:31 AM




"David BL" <davidbl (AT) iinet (DOT) net.au> wrote

Quote:
On Apr 28, 9:25 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:ae011bd0-63c3-4373-ba35-2fe9fe6be331 (AT) 26g2000hsk (DOT) googlegroups.com...





On Apr 28, 12:25 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:289d3678-7a00-463e-aa99-33c7944ce68b (AT) 27g2000hsf (DOT) googlegroups.com...

On Apr 24, 10:11 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com
wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:fd63f466-18f7-4986-b378-f5e9f512bbd8 (AT) r66g2000hsg (DOT) googlegroups.com...

On Apr 22, 11:39 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com
wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:265621e9-25fd-46d5-9e2d-7a4f63fa84b4 (AT) m3g2000hsc (DOT) googlegroups.com...

On Apr 22, 6:58 am, "Brian Selzer" <br... (AT) selzer-software (DOT) com
wrote:
"Christoph Rupp" <cruppst... (AT) gmail (DOT) com> wrote in message

news:91140c56-1f05-4b5d-b45f-b34920db2051 (AT) x41g2000hsb (DOT) googlegroups.com...

Brian,

On Apr 21, 11:00 pm, "Brian Selzer"
br... (AT) selzer-software (DOT) com
wrote:
Why not go with #4:

4. a physical log based on modified rows. Whenever a row
is
modified,
added
or removed, it is logged. Then you could also implement
row
versioning--just add a row version field to the physical
rows.
I
believe
that this what snapshot isolation is built on.

It's not an SQL database, i don't even have the notion of
"rows",
but
basically i think your #4 is the same as my #1 or #2.

No, it isn't. #1 requires the logging of additional records
that
may
not
have been affected by an update. #2 doesn't log the entire
changed
record,
but only bits and pieces. I would think that limiting the
units
of
change
to individual records--entire records--would simplify the
process
of
marking
and isolating units of work while at the same time
guaranteeing
consistency.

I don't think an atomic unit of work is always associated with
a
change to an individual record. Are you suggesting
transactions
to
define arbitrarily large units of work aren't needed?

No, that's not what I'm suggesting. What I'm suggesting is that
the
atomic
unit of work should be a /set/ of /records/--either the old
records
in
the
case of a before image or the new records in the case of an
after
image.

Ok but that sounds like the system snapshots an entire table in
the
before/after images.

For efficiency one would expect to only store the set of records
that
have been added and the set that have been removed by a given
transaction. It is easy to see how we get the inverse which is
required for roll back of an uncommitted transaction during
recovery.
However these logical operations aren't idempotent (at least for
bags
of records). How does recovery deal with non-idempotent redo()/
undo() changes in the log?

Why would there be bags of records? At the physical level, each
record
has
a specific offset in some file, and that offset would uniquely
identify
it.
Why would you strip off that identification? Consequently, there
wouldn't
be bags of records, only sets of records.

One could say the log records are representing physical changes if
they record the physical locations.

If you start out with a set of records and you know what is to be
inserted,
updated, and deleted, you can compute the resulting set of records.
If
you
start out with the resulting set of records, and you know what was
inserted,
updated and deleted in order to arrive at that result, then you can
compute
the original set of records. For simplicity, even if it isn't
necessarily
the most efficient, what is updated could be implemented as a set
of
ordered
pairs of records, tying each original record to its replacement.
So
the
log
would consist of a sequence of triples (D, U, I) separated by
transaction
markers where D is a set of records that were deleted, U is a set
of
pairs
of records that were updated, and I is a set of records that were
inserted.

Now, provided that the log is written before the database--that is,
(1)
write the triple to the log, (2) write the database, (3) write the
transaction marker in the log--, it should be possible to determine
whether
or not what was written to the log actually made it into the
database,
and
thus it should be possible to roll back any uncommitted
transaction.

A modern DBMS using WAL will tend to use a "lazy writer" that writes
dirty pages to disk in the background. For performance this will
tend to write dirty pages in a physical order so the disk head moves
uniformly over the platter. This assumes the only constraint
between
the writing to the log and the dirty pages is the WAL constraint -
ie
dirty pages must be written to disk strictly after the associated
log
records have been written to disk. To meet this constraint the lazy
writer simply needs to ignore dirty pages that haven't (yet) been
logged.

Your suggestion brings far more onerous constraints on the disk
writing policies.

No it doesn't. It only requires that the data be written to disk
before
the
transaction marker is written to the log. Whether the disk writes are
queued up to be written lazily is a separate issue.

When is the transaction deemed to have committed? I was assuming your
transaction marker was used to commit the transaction. If not what is
it for? Check pointing? I think there are better ways to check
point than that. AFAIK there is no need to check point *individual*
transactions with special markers.

The transaction marker is used to indicate that the transaction is
complete.
What that means is that the changes that are called for in the triple
have
been written to the database. It's not enough to simply write the log
and
then write the database, because without the marker following the triple
in
the log, there is no way to be sure that the changes called for in the
triple actually made it into the database.

Ok so it relates to check pointing. I don't see how these markers are
particularly useful - ie to determine where to start replaying the log
during recovery. For example, when ARIES performs a check point it
records
1) the active transaction list
2) last LSN (Log Sequence Number) for each active transaction
3) the set of dirty pages.
4) for each dirty page, the LSN of the earliest log
record whose effect is not reflected in the page on disk.

This allows the recovery scan to determine where to start replaying
the log (in the REDO pass).

That's not what the markers are for. In fact, during recovery only the last
marker really matters.

Quote:
In fact to make a transaction durable it seems
necessary to write all the dirty pages to disk as part of the
transaction. This is certainly not normally the case in a modern
DBMS. A database that avoids a "force" policy requires REDO during
recovery.

Often a DBMS that supports long running transactions will allow
dirty
pages to be written to disk even though the associated transaction
hasn't yet committed. This is the so called "steal" policy and
leads
to the need for UNDO during recovery.

The only way this can happen is if what was replaced as a result of
the
writes is cached while the transactions are outstanding. In order to
improve performance in the presence of long-running transactions, it
may
be
necessary to maintain that cache on disk. I should emphasize here
that
such
caching is strictly a mechanism for improving performance.

With a general purpose DBMS using strict 2PL there is inevitably a
chance of dead-lock and the need to abort transactions. This in turn
requires UNDO of all changes made by a particular transaction. You
can either record before images of pages of uncommitted transactions,
or else use the log directly to roll back a transaction. The latter
is economical with caching of the tail of the log and backward
chaining of log records for a given transaction.

Ideally, the dbms engine would serialize physical transactions. The
before
images of uncommitted transactions could be read by other transactions
instead of the database during writes to the database. Since what is
written to the log is just the changes required by a particular
transaction,
it should be possible to cache the changes required by other transactions
while each is committed in turn. Note that by 'serialize physical
transactions' I'm not referring to transaction isolation levels, which
involve how information is read, but instead how information is written.

Is this for MVCC?

Not exactly. While it should be possible to have several outstanding
transactions, it makes sense to me to allow only one to commit (at least
physically) at a time. If you have two transactions, they can either
(1) be completely independent of each other so that it shouldn't matter
which transaction commits first, or
(2) interact with each other so that the order in which they commit can
result in different database states.

It should be possible to extend this to involve any number of outstanding
transactions so that a sequence of writes to the log and the database can be
determined.




Reply With Quote
  #204  
Old   
Brian Selzer
 
Posts: n/a

Default Re: implementing a database log - 04-29-2008 , 11:31 AM




"David BL" <davidbl (AT) iinet (DOT) net.au> wrote

Quote:
On Apr 28, 9:25 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:ae011bd0-63c3-4373-ba35-2fe9fe6be331 (AT) 26g2000hsk (DOT) googlegroups.com...





On Apr 28, 12:25 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:289d3678-7a00-463e-aa99-33c7944ce68b (AT) 27g2000hsf (DOT) googlegroups.com...

On Apr 24, 10:11 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com
wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:fd63f466-18f7-4986-b378-f5e9f512bbd8 (AT) r66g2000hsg (DOT) googlegroups.com...

On Apr 22, 11:39 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com
wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:265621e9-25fd-46d5-9e2d-7a4f63fa84b4 (AT) m3g2000hsc (DOT) googlegroups.com...

On Apr 22, 6:58 am, "Brian Selzer" <br... (AT) selzer-software (DOT) com
wrote:
"Christoph Rupp" <cruppst... (AT) gmail (DOT) com> wrote in message

news:91140c56-1f05-4b5d-b45f-b34920db2051 (AT) x41g2000hsb (DOT) googlegroups.com...

Brian,

On Apr 21, 11:00 pm, "Brian Selzer"
br... (AT) selzer-software (DOT) com
wrote:
Why not go with #4:

4. a physical log based on modified rows. Whenever a row
is
modified,
added
or removed, it is logged. Then you could also implement
row
versioning--just add a row version field to the physical
rows.
I
believe
that this what snapshot isolation is built on.

It's not an SQL database, i don't even have the notion of
"rows",
but
basically i think your #4 is the same as my #1 or #2.

No, it isn't. #1 requires the logging of additional records
that
may
not
have been affected by an update. #2 doesn't log the entire
changed
record,
but only bits and pieces. I would think that limiting the
units
of
change
to individual records--entire records--would simplify the
process
of
marking
and isolating units of work while at the same time
guaranteeing
consistency.

I don't think an atomic unit of work is always associated with
a
change to an individual record. Are you suggesting
transactions
to
define arbitrarily large units of work aren't needed?

No, that's not what I'm suggesting. What I'm suggesting is that
the
atomic
unit of work should be a /set/ of /records/--either the old
records
in
the
case of a before image or the new records in the case of an
after
image.

Ok but that sounds like the system snapshots an entire table in
the
before/after images.

For efficiency one would expect to only store the set of records
that
have been added and the set that have been removed by a given
transaction. It is easy to see how we get the inverse which is
required for roll back of an uncommitted transaction during
recovery.
However these logical operations aren't idempotent (at least for
bags
of records). How does recovery deal with non-idempotent redo()/
undo() changes in the log?

Why would there be bags of records? At the physical level, each
record
has
a specific offset in some file, and that offset would uniquely
identify
it.
Why would you strip off that identification? Consequently, there
wouldn't
be bags of records, only sets of records.

One could say the log records are representing physical changes if
they record the physical locations.

If you start out with a set of records and you know what is to be
inserted,
updated, and deleted, you can compute the resulting set of records.
If
you
start out with the resulting set of records, and you know what was
inserted,
updated and deleted in order to arrive at that result, then you can
compute
the original set of records. For simplicity, even if it isn't
necessarily
the most efficient, what is updated could be implemented as a set
of
ordered
pairs of records, tying each original record to its replacement.
So
the
log
would consist of a sequence of triples (D, U, I) separated by
transaction
markers where D is a set of records that were deleted, U is a set
of
pairs
of records that were updated, and I is a set of records that were
inserted.

Now, provided that the log is written before the database--that is,
(1)
write the triple to the log, (2) write the database, (3) write the
transaction marker in the log--, it should be possible to determine
whether
or not what was written to the log actually made it into the
database,
and
thus it should be possible to roll back any uncommitted
transaction.

A modern DBMS using WAL will tend to use a "lazy writer" that writes
dirty pages to disk in the background. For performance this will
tend to write dirty pages in a physical order so the disk head moves
uniformly over the platter. This assumes the only constraint
between
the writing to the log and the dirty pages is the WAL constraint -
ie
dirty pages must be written to disk strictly after the associated
log
records have been written to disk. To meet this constraint the lazy
writer simply needs to ignore dirty pages that haven't (yet) been
logged.

Your suggestion brings far more onerous constraints on the disk
writing policies.

No it doesn't. It only requires that the data be written to disk
before
the
transaction marker is written to the log. Whether the disk writes are
queued up to be written lazily is a separate issue.

When is the transaction deemed to have committed? I was assuming your
transaction marker was used to commit the transaction. If not what is
it for? Check pointing? I think there are better ways to check
point than that. AFAIK there is no need to check point *individual*
transactions with special markers.

The transaction marker is used to indicate that the transaction is
complete.
What that means is that the changes that are called for in the triple
have
been written to the database. It's not enough to simply write the log
and
then write the database, because without the marker following the triple
in
the log, there is no way to be sure that the changes called for in the
triple actually made it into the database.

Ok so it relates to check pointing. I don't see how these markers are
particularly useful - ie to determine where to start replaying the log
during recovery. For example, when ARIES performs a check point it
records
1) the active transaction list
2) last LSN (Log Sequence Number) for each active transaction
3) the set of dirty pages.
4) for each dirty page, the LSN of the earliest log
record whose effect is not reflected in the page on disk.

This allows the recovery scan to determine where to start replaying
the log (in the REDO pass).

That's not what the markers are for. In fact, during recovery only the last
marker really matters.

Quote:
In fact to make a transaction durable it seems
necessary to write all the dirty pages to disk as part of the
transaction. This is certainly not normally the case in a modern
DBMS. A database that avoids a "force" policy requires REDO during
recovery.

Often a DBMS that supports long running transactions will allow
dirty
pages to be written to disk even though the associated transaction
hasn't yet committed. This is the so called "steal" policy and
leads
to the need for UNDO during recovery.

The only way this can happen is if what was replaced as a result of
the
writes is cached while the transactions are outstanding. In order to
improve performance in the presence of long-running transactions, it
may
be
necessary to maintain that cache on disk. I should emphasize here
that
such
caching is strictly a mechanism for improving performance.

With a general purpose DBMS using strict 2PL there is inevitably a
chance of dead-lock and the need to abort transactions. This in turn
requires UNDO of all changes made by a particular transaction. You
can either record before images of pages of uncommitted transactions,
or else use the log directly to roll back a transaction. The latter
is economical with caching of the tail of the log and backward
chaining of log records for a given transaction.

Ideally, the dbms engine would serialize physical transactions. The
before
images of uncommitted transactions could be read by other transactions
instead of the database during writes to the database. Since what is
written to the log is just the changes required by a particular
transaction,
it should be possible to cache the changes required by other transactions
while each is committed in turn. Note that by 'serialize physical
transactions' I'm not referring to transaction isolation levels, which
involve how information is read, but instead how information is written.

Is this for MVCC?

Not exactly. While it should be possible to have several outstanding
transactions, it makes sense to me to allow only one to commit (at least
physically) at a time. If you have two transactions, they can either
(1) be completely independent of each other so that it shouldn't matter
which transaction commits first, or
(2) interact with each other so that the order in which they commit can
result in different database states.

It should be possible to extend this to involve any number of outstanding
transactions so that a sequence of writes to the log and the database can be
determined.




Reply With Quote
  #205  
Old   
Brian Selzer
 
Posts: n/a

Default Re: implementing a database log - 04-29-2008 , 11:31 AM




"David BL" <davidbl (AT) iinet (DOT) net.au> wrote

Quote:
On Apr 28, 9:25 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:ae011bd0-63c3-4373-ba35-2fe9fe6be331 (AT) 26g2000hsk (DOT) googlegroups.com...





On Apr 28, 12:25 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:289d3678-7a00-463e-aa99-33c7944ce68b (AT) 27g2000hsf (DOT) googlegroups.com...

On Apr 24, 10:11 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com
wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:fd63f466-18f7-4986-b378-f5e9f512bbd8 (AT) r66g2000hsg (DOT) googlegroups.com...

On Apr 22, 11:39 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com
wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:265621e9-25fd-46d5-9e2d-7a4f63fa84b4 (AT) m3g2000hsc (DOT) googlegroups.com...

On Apr 22, 6:58 am, "Brian Selzer" <br... (AT) selzer-software (DOT) com
wrote:
"Christoph Rupp" <cruppst... (AT) gmail (DOT) com> wrote in message

news:91140c56-1f05-4b5d-b45f-b34920db2051 (AT) x41g2000hsb (DOT) googlegroups.com...

Brian,

On Apr 21, 11:00 pm, "Brian Selzer"
br... (AT) selzer-software (DOT) com
wrote:
Why not go with #4:

4. a physical log based on modified rows. Whenever a row
is
modified,
added
or removed, it is logged. Then you could also implement
row
versioning--just add a row version field to the physical
rows.
I
believe
that this what snapshot isolation is built on.

It's not an SQL database, i don't even have the notion of
"rows",
but
basically i think your #4 is the same as my #1 or #2.

No, it isn't. #1 requires the logging of additional records
that
may
not
have been affected by an update. #2 doesn't log the entire
changed
record,
but only bits and pieces. I would think that limiting the
units
of
change
to individual records--entire records--would simplify the
process
of
marking
and isolating units of work while at the same time
guaranteeing
consistency.

I don't think an atomic unit of work is always associated with
a
change to an individual record. Are you suggesting
transactions
to
define arbitrarily large units of work aren't needed?

No, that's not what I'm suggesting. What I'm suggesting is that
the
atomic
unit of work should be a /set/ of /records/--either the old
records
in
the
case of a before image or the new records in the case of an
after
image.

Ok but that sounds like the system snapshots an entire table in
the
before/after images.

For efficiency one would expect to only store the set of records
that
have been added and the set that have been removed by a given
transaction. It is easy to see how we get the inverse which is
required for roll back of an uncommitted transaction during
recovery.
However these logical operations aren't idempotent (at least for
bags
of records). How does recovery deal with non-idempotent redo()/
undo() changes in the log?

Why would there be bags of records? At the physical level, each
record
has
a specific offset in some file, and that offset would uniquely
identify
it.
Why would you strip off that identification? Consequently, there
wouldn't
be bags of records, only sets of records.

One could say the log records are representing physical changes if
they record the physical locations.

If you start out with a set of records and you know what is to be
inserted,
updated, and deleted, you can compute the resulting set of records.
If
you
start out with the resulting set of records, and you know what was
inserted,
updated and deleted in order to arrive at that result, then you can
compute
the original set of records. For simplicity, even if it isn't
necessarily
the most efficient, what is updated could be implemented as a set
of
ordered
pairs of records, tying each original record to its replacement.
So
the
log
would consist of a sequence of triples (D, U, I) separated by
transaction
markers where D is a set of records that were deleted, U is a set
of
pairs
of records that were updated, and I is a set of records that were
inserted.

Now, provided that the log is written before the database--that is,
(1)
write the triple to the log, (2) write the database, (3) write the
transaction marker in the log--, it should be possible to determine
whether
or not what was written to the log actually made it into the
database,
and
thus it should be possible to roll back any uncommitted
transaction.

A modern DBMS using WAL will tend to use a "lazy writer" that writes
dirty pages to disk in the background. For performance this will
tend to write dirty pages in a physical order so the disk head moves
uniformly over the platter. This assumes the only constraint
between
the writing to the log and the dirty pages is the WAL constraint -
ie
dirty pages must be written to disk strictly after the associated
log
records have been written to disk. To meet this constraint the lazy
writer simply needs to ignore dirty pages that haven't (yet) been
logged.

Your suggestion brings far more onerous constraints on the disk
writing policies.

No it doesn't. It only requires that the data be written to disk
before
the
transaction marker is written to the log. Whether the disk writes are
queued up to be written lazily is a separate issue.

When is the transaction deemed to have committed? I was assuming your
transaction marker was used to commit the transaction. If not what is
it for? Check pointing? I think there are better ways to check
point than that. AFAIK there is no need to check point *individual*
transactions with special markers.

The transaction marker is used to indicate that the transaction is
complete.
What that means is that the changes that are called for in the triple
have
been written to the database. It's not enough to simply write the log
and
then write the database, because without the marker following the triple
in
the log, there is no way to be sure that the changes called for in the
triple actually made it into the database.

Ok so it relates to check pointing. I don't see how these markers are
particularly useful - ie to determine where to start replaying the log
during recovery. For example, when ARIES performs a check point it
records
1) the active transaction list
2) last LSN (Log Sequence Number) for each active transaction
3) the set of dirty pages.
4) for each dirty page, the LSN of the earliest log
record whose effect is not reflected in the page on disk.

This allows the recovery scan to determine where to start replaying
the log (in the REDO pass).

That's not what the markers are for. In fact, during recovery only the last
marker really matters.

Quote:
In fact to make a transaction durable it seems
necessary to write all the dirty pages to disk as part of the
transaction. This is certainly not normally the case in a modern
DBMS. A database that avoids a "force" policy requires REDO during
recovery.

Often a DBMS that supports long running transactions will allow
dirty
pages to be written to disk even though the associated transaction
hasn't yet committed. This is the so called "steal" policy and
leads
to the need for UNDO during recovery.

The only way this can happen is if what was replaced as a result of
the
writes is cached while the transactions are outstanding. In order to
improve performance in the presence of long-running transactions, it
may
be
necessary to maintain that cache on disk. I should emphasize here
that
such
caching is strictly a mechanism for improving performance.

With a general purpose DBMS using strict 2PL there is inevitably a
chance of dead-lock and the need to abort transactions. This in turn
requires UNDO of all changes made by a particular transaction. You
can either record before images of pages of uncommitted transactions,
or else use the log directly to roll back a transaction. The latter
is economical with caching of the tail of the log and backward
chaining of log records for a given transaction.

Ideally, the dbms engine would serialize physical transactions. The
before
images of uncommitted transactions could be read by other transactions
instead of the database during writes to the database. Since what is
written to the log is just the changes required by a particular
transaction,
it should be possible to cache the changes required by other transactions
while each is committed in turn. Note that by 'serialize physical
transactions' I'm not referring to transaction isolation levels, which
involve how information is read, but instead how information is written.

Is this for MVCC?

Not exactly. While it should be possible to have several outstanding
transactions, it makes sense to me to allow only one to commit (at least
physically) at a time. If you have two transactions, they can either
(1) be completely independent of each other so that it shouldn't matter
which transaction commits first, or
(2) interact with each other so that the order in which they commit can
result in different database states.

It should be possible to extend this to involve any number of outstanding
transactions so that a sequence of writes to the log and the database can be
determined.




Reply With Quote
  #206  
Old   
Brian Selzer
 
Posts: n/a

Default Re: implementing a database log - 04-29-2008 , 11:31 AM




"David BL" <davidbl (AT) iinet (DOT) net.au> wrote

Quote:
On Apr 28, 9:25 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:ae011bd0-63c3-4373-ba35-2fe9fe6be331 (AT) 26g2000hsk (DOT) googlegroups.com...





On Apr 28, 12:25 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:289d3678-7a00-463e-aa99-33c7944ce68b (AT) 27g2000hsf (DOT) googlegroups.com...

On Apr 24, 10:11 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com
wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:fd63f466-18f7-4986-b378-f5e9f512bbd8 (AT) r66g2000hsg (DOT) googlegroups.com...

On Apr 22, 11:39 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com
wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:265621e9-25fd-46d5-9e2d-7a4f63fa84b4 (AT) m3g2000hsc (DOT) googlegroups.com...

On Apr 22, 6:58 am, "Brian Selzer" <br... (AT) selzer-software (DOT) com
wrote:
"Christoph Rupp" <cruppst... (AT) gmail (DOT) com> wrote in message

news:91140c56-1f05-4b5d-b45f-b34920db2051 (AT) x41g2000hsb (DOT) googlegroups.com...

Brian,

On Apr 21, 11:00 pm, "Brian Selzer"
br... (AT) selzer-software (DOT) com
wrote:
Why not go with #4:

4. a physical log based on modified rows. Whenever a row
is
modified,
added
or removed, it is logged. Then you could also implement
row
versioning--just add a row version field to the physical
rows.
I
believe
that this what snapshot isolation is built on.

It's not an SQL database, i don't even have the notion of
"rows",
but
basically i think your #4 is the same as my #1 or #2.

No, it isn't. #1 requires the logging of additional records
that
may
not
have been affected by an update. #2 doesn't log the entire
changed
record,
but only bits and pieces. I would think that limiting the
units
of
change
to individual records--entire records--would simplify the
process
of
marking
and isolating units of work while at the same time
guaranteeing
consistency.

I don't think an atomic unit of work is always associated with
a
change to an individual record. Are you suggesting
transactions
to
define arbitrarily large units of work aren't needed?

No, that's not what I'm suggesting. What I'm suggesting is that
the
atomic
unit of work should be a /set/ of /records/--either the old
records
in
the
case of a before image or the new records in the case of an
after
image.

Ok but that sounds like the system snapshots an entire table in
the
before/after images.

For efficiency one would expect to only store the set of records
that
have been added and the set that have been removed by a given
transaction. It is easy to see how we get the inverse which is
required for roll back of an uncommitted transaction during
recovery.
However these logical operations aren't idempotent (at least for
bags
of records). How does recovery deal with non-idempotent redo()/
undo() changes in the log?

Why would there be bags of records? At the physical level, each
record
has
a specific offset in some file, and that offset would uniquely
identify
it.
Why would you strip off that identification? Consequently, there
wouldn't
be bags of records, only sets of records.

One could say the log records are representing physical changes if
they record the physical locations.

If you start out with a set of records and you know what is to be
inserted,
updated, and deleted, you can compute the resulting set of records.
If
you
start out with the resulting set of records, and you know what was
inserted,
updated and deleted in order to arrive at that result, then you can
compute
the original set of records. For simplicity, even if it isn't
necessarily
the most efficient, what is updated could be implemented as a set
of
ordered
pairs of records, tying each original record to its replacement.
So
the
log
would consist of a sequence of triples (D, U, I) separated by
transaction
markers where D is a set of records that were deleted, U is a set
of
pairs
of records that were updated, and I is a set of records that were
inserted.

Now, provided that the log is written before the database--that is,
(1)
write the triple to the log, (2) write the database, (3) write the
transaction marker in the log--, it should be possible to determine
whether
or not what was written to the log actually made it into the
database,
and
thus it should be possible to roll back any uncommitted
transaction.

A modern DBMS using WAL will tend to use a "lazy writer" that writes
dirty pages to disk in the background. For performance this will
tend to write dirty pages in a physical order so the disk head moves
uniformly over the platter. This assumes the only constraint
between
the writing to the log and the dirty pages is the WAL constraint -
ie
dirty pages must be written to disk strictly after the associated
log
records have been written to disk. To meet this constraint the lazy
writer simply needs to ignore dirty pages that haven't (yet) been
logged.

Your suggestion brings far more onerous constraints on the disk
writing policies.

No it doesn't. It only requires that the data be written to disk
before
the
transaction marker is written to the log. Whether the disk writes are
queued up to be written lazily is a separate issue.

When is the transaction deemed to have committed? I was assuming your
transaction marker was used to commit the transaction. If not what is
it for? Check pointing? I think there are better ways to check
point than that. AFAIK there is no need to check point *individual*
transactions with special markers.

The transaction marker is used to indicate that the transaction is
complete.
What that means is that the changes that are called for in the triple
have
been written to the database. It's not enough to simply write the log
and
then write the database, because without the marker following the triple
in
the log, there is no way to be sure that the changes called for in the
triple actually made it into the database.

Ok so it relates to check pointing. I don't see how these markers are
particularly useful - ie to determine where to start replaying the log
during recovery. For example, when ARIES performs a check point it
records
1) the active transaction list
2) last LSN (Log Sequence Number) for each active transaction
3) the set of dirty pages.
4) for each dirty page, the LSN of the earliest log
record whose effect is not reflected in the page on disk.

This allows the recovery scan to determine where to start replaying
the log (in the REDO pass).

That's not what the markers are for. In fact, during recovery only the last
marker really matters.

Quote:
In fact to make a transaction durable it seems
necessary to write all the dirty pages to disk as part of the
transaction. This is certainly not normally the case in a modern
DBMS. A database that avoids a "force" policy requires REDO during
recovery.

Often a DBMS that supports long running transactions will allow
dirty
pages to be written to disk even though the associated transaction
hasn't yet committed. This is the so called "steal" policy and
leads
to the need for UNDO during recovery.

The only way this can happen is if what was replaced as a result of
the
writes is cached while the transactions are outstanding. In order to
improve performance in the presence of long-running transactions, it
may
be
necessary to maintain that cache on disk. I should emphasize here
that
such
caching is strictly a mechanism for improving performance.

With a general purpose DBMS using strict 2PL there is inevitably a
chance of dead-lock and the need to abort transactions. This in turn
requires UNDO of all changes made by a particular transaction. You
can either record before images of pages of uncommitted transactions,
or else use the log directly to roll back a transaction. The latter
is economical with caching of the tail of the log and backward
chaining of log records for a given transaction.

Ideally, the dbms engine would serialize physical transactions. The
before
images of uncommitted transactions could be read by other transactions
instead of the database during writes to the database. Since what is
written to the log is just the changes required by a particular
transaction,
it should be possible to cache the changes required by other transactions
while each is committed in turn. Note that by 'serialize physical
transactions' I'm not referring to transaction isolation levels, which
involve how information is read, but instead how information is written.

Is this for MVCC?

Not exactly. While it should be possible to have several outstanding
transactions, it makes sense to me to allow only one to commit (at least
physically) at a time. If you have two transactions, they can either
(1) be completely independent of each other so that it shouldn't matter
which transaction commits first, or
(2) interact with each other so that the order in which they commit can
result in different database states.

It should be possible to extend this to involve any number of outstanding
transactions so that a sequence of writes to the log and the database can be
determined.




Reply With Quote
  #207  
Old   
Brian Selzer
 
Posts: n/a

Default Re: implementing a database log - 04-29-2008 , 11:31 AM




"David BL" <davidbl (AT) iinet (DOT) net.au> wrote

Quote:
On Apr 28, 9:25 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:ae011bd0-63c3-4373-ba35-2fe9fe6be331 (AT) 26g2000hsk (DOT) googlegroups.com...





On Apr 28, 12:25 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:289d3678-7a00-463e-aa99-33c7944ce68b (AT) 27g2000hsf (DOT) googlegroups.com...

On Apr 24, 10:11 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com
wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:fd63f466-18f7-4986-b378-f5e9f512bbd8 (AT) r66g2000hsg (DOT) googlegroups.com...

On Apr 22, 11:39 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com
wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:265621e9-25fd-46d5-9e2d-7a4f63fa84b4 (AT) m3g2000hsc (DOT) googlegroups.com...

On Apr 22, 6:58 am, "Brian Selzer" <br... (AT) selzer-software (DOT) com
wrote:
"Christoph Rupp" <cruppst... (AT) gmail (DOT) com> wrote in message

news:91140c56-1f05-4b5d-b45f-b34920db2051 (AT) x41g2000hsb (DOT) googlegroups.com...

Brian,

On Apr 21, 11:00 pm, "Brian Selzer"
br... (AT) selzer-software (DOT) com
wrote:
Why not go with #4:

4. a physical log based on modified rows. Whenever a row
is
modified,
added
or removed, it is logged. Then you could also implement
row
versioning--just add a row version field to the physical
rows.
I
believe
that this what snapshot isolation is built on.

It's not an SQL database, i don't even have the notion of
"rows",
but
basically i think your #4 is the same as my #1 or #2.

No, it isn't. #1 requires the logging of additional records
that
may
not
have been affected by an update. #2 doesn't log the entire
changed
record,
but only bits and pieces. I would think that limiting the
units
of
change
to individual records--entire records--would simplify the
process
of
marking
and isolating units of work while at the same time
guaranteeing
consistency.

I don't think an atomic unit of work is always associated with
a
change to an individual record. Are you suggesting
transactions
to
define arbitrarily large units of work aren't needed?

No, that's not what I'm suggesting. What I'm suggesting is that
the
atomic
unit of work should be a /set/ of /records/--either the old
records
in
the
case of a before image or the new records in the case of an
after
image.

Ok but that sounds like the system snapshots an entire table in
the
before/after images.

For efficiency one would expect to only store the set of records
that
have been added and the set that have been removed by a given
transaction. It is easy to see how we get the inverse which is
required for roll back of an uncommitted transaction during
recovery.
However these logical operations aren't idempotent (at least for
bags
of records). How does recovery deal with non-idempotent redo()/
undo() changes in the log?

Why would there be bags of records? At the physical level, each
record
has
a specific offset in some file, and that offset would uniquely
identify
it.
Why would you strip off that identification? Consequently, there
wouldn't
be bags of records, only sets of records.

One could say the log records are representing physical changes if
they record the physical locations.

If you start out with a set of records and you know what is to be
inserted,
updated, and deleted, you can compute the resulting set of records.
If
you
start out with the resulting set of records, and you know what was
inserted,
updated and deleted in order to arrive at that result, then you can
compute
the original set of records. For simplicity, even if it isn't
necessarily
the most efficient, what is updated could be implemented as a set
of
ordered
pairs of records, tying each original record to its replacement.
So
the
log
would consist of a sequence of triples (D, U, I) separated by
transaction
markers where D is a set of records that were deleted, U is a set
of
pairs
of records that were updated, and I is a set of records that were
inserted.

Now, provided that the log is written before the database--that is,
(1)
write the triple to the log, (2) write the database, (3) write the
transaction marker in the log--, it should be possible to determine
whether
or not what was written to the log actually made it into the
database,
and
thus it should be possible to roll back any uncommitted
transaction.

A modern DBMS using WAL will tend to use a "lazy writer" that writes
dirty pages to disk in the background. For performance this will
tend to write dirty pages in a physical order so the disk head moves
uniformly over the platter. This assumes the only constraint
between
the writing to the log and the dirty pages is the WAL constraint -
ie
dirty pages must be written to disk strictly after the associated
log
records have been written to disk. To meet this constraint the lazy
writer simply needs to ignore dirty pages that haven't (yet) been
logged.

Your suggestion brings far more onerous constraints on the disk
writing policies.

No it doesn't. It only requires that the data be written to disk
before
the
transaction marker is written to the log. Whether the disk writes are
queued up to be written lazily is a separate issue.

When is the transaction deemed to have committed? I was assuming your
transaction marker was used to commit the transaction. If not what is
it for? Check pointing? I think there are better ways to check
point than that. AFAIK there is no need to check point *individual*
transactions with special markers.

The transaction marker is used to indicate that the transaction is
complete.
What that means is that the changes that are called for in the triple
have
been written to the database. It's not enough to simply write the log
and
then write the database, because without the marker following the triple
in
the log, there is no way to be sure that the changes called for in the
triple actually made it into the database.

Ok so it relates to check pointing. I don't see how these markers are
particularly useful - ie to determine where to start replaying the log
during recovery. For example, when ARIES performs a check point it
records
1) the active transaction list
2) last LSN (Log Sequence Number) for each active transaction
3) the set of dirty pages.
4) for each dirty page, the LSN of the earliest log
record whose effect is not reflected in the page on disk.

This allows the recovery scan to determine where to start replaying
the log (in the REDO pass).

That's not what the markers are for. In fact, during recovery only the last
marker really matters.

Quote:
In fact to make a transaction durable it seems
necessary to write all the dirty pages to disk as part of the
transaction. This is certainly not normally the case in a modern
DBMS. A database that avoids a "force" policy requires REDO during
recovery.

Often a DBMS that supports long running transactions will allow
dirty
pages to be written to disk even though the associated transaction
hasn't yet committed. This is the so called "steal" policy and
leads
to the need for UNDO during recovery.

The only way this can happen is if what was replaced as a result of
the
writes is cached while the transactions are outstanding. In order to
improve performance in the presence of long-running transactions, it
may
be
necessary to maintain that cache on disk. I should emphasize here
that
such
caching is strictly a mechanism for improving performance.

With a general purpose DBMS using strict 2PL there is inevitably a
chance of dead-lock and the need to abort transactions. This in turn
requires UNDO of all changes made by a particular transaction. You
can either record before images of pages of uncommitted transactions,
or else use the log directly to roll back a transaction. The latter
is economical with caching of the tail of the log and backward
chaining of log records for a given transaction.

Ideally, the dbms engine would serialize physical transactions. The
before
images of uncommitted transactions could be read by other transactions
instead of the database during writes to the database. Since what is
written to the log is just the changes required by a particular
transaction,
it should be possible to cache the changes required by other transactions
while each is committed in turn. Note that by 'serialize physical
transactions' I'm not referring to transaction isolation levels, which
involve how information is read, but instead how information is written.

Is this for MVCC?

Not exactly. While it should be possible to have several outstanding
transactions, it makes sense to me to allow only one to commit (at least
physically) at a time. If you have two transactions, they can either
(1) be completely independent of each other so that it shouldn't matter
which transaction commits first, or
(2) interact with each other so that the order in which they commit can
result in different database states.

It should be possible to extend this to involve any number of outstanding
transactions so that a sequence of writes to the log and the database can be
determined.




Reply With Quote
  #208  
Old   
Brian Selzer
 
Posts: n/a

Default Re: implementing a database log - 04-29-2008 , 11:31 AM




"David BL" <davidbl (AT) iinet (DOT) net.au> wrote

Quote:
On Apr 28, 9:25 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:ae011bd0-63c3-4373-ba35-2fe9fe6be331 (AT) 26g2000hsk (DOT) googlegroups.com...





On Apr 28, 12:25 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com> wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:289d3678-7a00-463e-aa99-33c7944ce68b (AT) 27g2000hsf (DOT) googlegroups.com...

On Apr 24, 10:11 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com
wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:fd63f466-18f7-4986-b378-f5e9f512bbd8 (AT) r66g2000hsg (DOT) googlegroups.com...

On Apr 22, 11:39 pm, "Brian Selzer" <br... (AT) selzer-software (DOT) com
wrote:
"David BL" <davi... (AT) iinet (DOT) net.au> wrote in message

news:265621e9-25fd-46d5-9e2d-7a4f63fa84b4 (AT) m3g2000hsc (DOT) googlegroups.com...

On Apr 22, 6:58 am, "Brian Selzer" <br... (AT) selzer-software (DOT) com
wrote:
"Christoph Rupp" <cruppst... (AT) gmail (DOT) com> wrote in message

news:91140c56-1f05-4b5d-b45f-b34920db2051 (AT) x41g2000hsb (DOT) googlegroups.com...

Brian,

On Apr 21, 11:00 pm, "Brian Selzer"
br... (AT) selzer-software (DOT) com
wrote:
Why not go with #4:

4. a physical log based on modified rows. Whenever a row
is
modified,
added
or removed, it is logged. Then you could also implement
row
versioning--just add a row version field to the physical
rows.
I
believe
that this what snapshot isolation is built on.

It's not an SQL database, i don't even have the notion of
"rows",
but
basically i think your #4 is the same as my #1 or #2.

No, it isn't. #1 requires the logging of additional records
that
may
not
have been affected by an update. #2 doesn't log the entire
changed
record,
but only bits and pieces. I would think that limiting the
units
of
change
to individual records--entire records--would simplify the
process
of
marking
and isolating units of work while at the same time
guaranteeing
consistency.

I don't think an atomic unit of work is always associated with
a
change to an individual record. Are you suggesting
transactions
to
define arbitrarily large units of work aren't needed?

No, that's not what I'm suggesting. What I'm suggesting is that
the
atomic
unit of work should be a /set/ of /records/--either the old
records
in
the
case of a before image or the new records in the case of an
after
image.

Ok but that sounds like the system snapshots an entire table in
the
before/after images.

For efficiency one would expect to only store the set of records
that
have been added and the set that have been removed by a given
transaction. It is easy to see how we get the inverse which is
required for roll back of an uncommitted transaction during
recovery.
However these logical operations aren't idempotent (at least for
bags
of records). How does recovery deal with non-idempotent redo()/
undo() changes in the log?

Why would there be bags of records? At the physical level, each
record
has
a specific offset in some file, and that offset would uniquely
identify
it.
Why would you strip off that identification? Consequently, there
wouldn't
be bags of records, only sets of records.

One could say the log records are representing physical changes if
they record the physical locations.

If you start out with a set of records and you know what is to be
inserted,
updated, and deleted, you can compute the resulting set of records.
If
you
start out with the resulting set of records, and you know what was
inserted,
updated and deleted in order to arrive at that result, then you can
compute
the original set of records. For simplicity, even if it isn't
necessarily
the most efficient, what is updated could be implemented as a set
of
ordered
pairs of records, tying each original record to its replacement.
So
the
log
would consist of a sequence of triples (D, U, I) separated by
transaction
markers where D is a set of records that were deleted, U is a set
of
pairs
of records that were updated, and I is a set of records that were
inserted.

Now, provided that the log is written before the database--that is,
(1)
write the triple to the log, (2) write the database, (3) write the
transaction marker in the log--, it should be possible to determine
whether
or not what was written to the log actually made it into the
database,
and
thus it should be possible to roll back any uncommitted
transaction.

A modern DBMS using WAL will tend to use a "lazy writer" that writes
dirty pages to disk in the background. For performance this will
tend to write dirty pages in a physical order so the disk head moves
uniformly over the platter. This assumes the only constraint
between
the writing to the log and the dirty pages is the WAL constraint -
ie
dirty pages must be written to disk strictly after the associated
log
records have been written to disk. To meet this constraint the lazy
writer simply needs to ignore dirty pages that haven't (yet) been
logged.

Your suggestion brings far more onerous constraints on the disk
writing policies.

No it doesn't. It only requires that the data be written to disk
before
the
transaction marker is written to the log. Whether the disk writes are
queued up to be written lazily is a separate issue.

When is the transaction deemed to have committed? I was assuming your
transaction marker was used to commit the transaction. If not what is
it for? Check pointing? I think there are better ways to check
point than that. AFAIK there is no need to check point *individual*
transactions with special markers.

The transaction marker is used to indicate that the transaction is
complete.
What that means is that the changes that are called for in the triple
have
been written to the database. It's not enough to simply write the log
and
then write the database, because without the marker following the triple
in
the log, there is no way to be sure that the changes called for in the
triple actually made it into the database.

Ok so it relates to check pointing. I don't see how these markers are
particularly useful - ie to determine where to start replaying the log
during recovery. For example, when ARIES performs a check point it
records
1) the active transaction list
2) last LSN (Log Sequence Number) for each active transaction
3) the set of dirty pages.
4) for each dirty page, the LSN of the earliest log
record whose effect is not reflected in the page on disk.

This allows the recovery scan to determine where to start replaying
the log (in the REDO pass).

That's not what the markers are for. In fact, during recovery only the last
marker really matters.

Quote:
In fact to make a transaction durable it seems
necessary to write all the dirty pages to disk as part of the
transaction. This is certainly not normally the case in a modern
DBMS. A database that avoids a "force" policy requires REDO during
recovery.

Often a DBMS that supports long running transactions will allow
dirty
pages to be written to disk even though the associated transaction
hasn't yet committed. This is the so called "steal" policy and
leads
to the need for UNDO during recovery.

The only way this can happen is if what was replaced as a result of
the
writes is cached while the transactions are outstanding. In order to
improve performance in the presence of long-running transactions, it
may
be
necessary to maintain that cache on disk. I should emphasize here
that
such
caching is strictly a mechanism for improving performance.

With a general purpose DBMS using strict 2PL there is inevitably a
chance of dead-lock and the need to abort transactions. This in turn
requires UNDO of all changes made by a particular transaction. You
can either record before images of pages of uncommitted transactions,
or else use the log directly to roll back a transaction. The latter
is economical with caching of the tail of the log and backward
chaining of log records for a given transaction.

Ideally, the dbms engine would serialize physical transactions. The
before
images of uncommitted transactions could be read by other transactions
instead of the database during writes to the database. Since what is
written to the log is just the changes required by a particular
transaction,
it should be possible to cache the changes required by other transactions
while each is committed in turn. Note that by 'serialize physical
transactions' I'm not referring to transaction isolation levels, which
involve how information is read, but instead how information is written.

Is this for MVCC?

Not exactly. While it should be possible to have several outstanding
transactions, it makes sense to me to allow only one to commit (at least
physically) at a time. If you have two transactions, they can either
(1) be completely independent of each other so that it shouldn't matter
which transaction commits first, or
(2) interact with each other so that the order in which they commit can
result in different database states.

It should be possible to extend this to involve any number of outstanding
transactions so that a sequence of writes to the log and the database can be
determined.




Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.