SUPPORT FOR DECLARATIVE TRANSITION CONSTRAINTS -
09-18-2010
, 08:42 PM
SUPPORT FOR DECLARATIVE TRANSITION CONSTRAINTS
A proposed enhancement to D
By Brian S. Selzer
ABSTRACT
A non-procedural mechanism is proposed for declaring transition
constraints using set-based relational operators, without altering
relvar headings and independent of whether updates affect keys.
INTRODUCTION
Transition constraints can be used to enforce business rules.
Limited support for defining transition constraints has been
available for some time in SQL implementations through the use of
triggered procedures, but that support is limited to one table at a
time, and in implementations that don't support FOR EACH ROW
triggers, the tables must be altered to include IDENTITY or GUID
columns.
C.J. Date and Hugh Darwen suggest on pages 220-221 of TTM, Third
Edition, that transition constraints can be enforced by defining
constraints using primed relvar names that refer to the
corresponding relvar as it was prior to the update, but that
suggestion suffers from the same limitation that SQL implementations
that don't support FOR EACH ROW triggers do.
There has to be a better way.
A MOTIVATING EXAMPLE
What follows is a fragment of a database that is used to record
labor activities performed by employees in a manufacturing facility.
/* stub work order master */
VAR WORKORDER REAL RELATION { WO# CHAR, PART# CHAR } KEY { WO# };
/* work order detail */
VAR WODETAIL REAL RELATION
{ WO# CHAR, SEQ INTEGER, OP CHAR, INSTRUCT CHAR }
KEY { WO#, SEQ };
CONSTRAINT FK_WODETAIL_WORKORDER ( COUNT ( WODETAIL { WO } )
= COUNT ( ( ( WODETAIL { WO } ) JOIN WORKORDER ) { WO# } ) );
/* labor activities are recorded here. */
VAR LABOR REAL RELATION
{ EMP# CHAR, WO# CHAR, SEQ INTEGER, M# CHAR, OP CHAR,
LBRTYPE CHAR, LBRDATE DATE, TIMEON TIME, TIMEOFF TIME,
ELAPSED INTEGER, APPLIED RATIONAL, PRODUCED INTEGER,
REJECTED INTEGER, STATUS CHAR },
KEY { EMP#, WO#, SEQ, OP, M#, LBRTYPE, LBRDATE, TIMEON },
KEY { EMP#, WO#, SEQ, OP, M#, LBRTYPE, LBRDATE, TIMEOFF };
/* descriptions of attributes
EMP# identifies the employee who performed the activity.
WO# identifies the work order to which the activity applies.
SEQ is the sequence number in the work order.
OP is the operation performed.
M# identifies the machine on which the activity was performed.
LBRTYPE identifies the type of labor: S[etup], D[irect],
R[ework].
LBRDATE identifies the date on which the labor activity was
performed. Activities performed after midnight on a shift that
starts before midnight are applied to the previous day.
TIMEON and TIMEOFF are the times the labor activity began and
ended respectively. Labor activities that are currently being
performed have the same TIMEON and TIMEOFF. TIMEOFF can be
less than TIMEON when an activity starts before and ends after
midnight.
ELAPSED is the elapsed working time in minutes. Working time
does not include breaks or lunch. ELAPSED is always less than
1440 minutes. ELAPSED is zero when TIMEON and TIMEOFF are the
same.
APPLIED is the working time in hours that is to be applied to
the work order. (This is not necessarily the same as
ELAPSED/60. An employee may have been on more than one job at
the same time, for example, monitoring several different
machines at the same time.) Like ELAPSED, APPLIED is zero when
TIMEON and TIMEOFF are the same.
PRODUCED and REJECTED are the quantities of the WIP part that
were produced and rejected as a result of the labor activity.
It is possible for these to be zero--there are operations that
can take more than one shift to complete, or the activity may
have begun late in the shift. These are also zero when TIMEON
and TIMEOFF are the same.
STATUS is one of O[pen], C[losed], or A[pproved]. STATUS is
O[pen] for labor activities that are currently being performed;
STATUS is C[losed] for activities that are no longer being
performed; STATUS is A[pproved] for activities that have been
approved by management.
** foreign key and inclusion dependency declarations */
CONSTRAINT FK_LABOR_WORKORDER ( COUNT ( LABOR { WO# } )
= COUNT ( ( ( LABOR { WO# } ) JOIN WORKORDER ) { WO# } ) );
CONSTRAINT IND_LABOR_WODETAIL
( COUNT ( LABOR WHERE ( STATUS != 'A' ) { WO#, SEQ, OP } )
= COUNT ( LABOR WHERE ( STATUS != 'A' ) { WO#, SEQ, OP }
JOIN WODETAIL ) { WO#, SEQ, OP } ) );
/* state constraints to implement the rules described above */
CONSTRAINT IS_EMPTY ( LABOR WHERE NOT ( LBRTYPE = 'S'
OR LBRTYPE = 'D' OR LBRTYPE = 'R' ) );
CONSTRAINT IS_EMPTY ( LABOR WHERE ELAPSED < 0
OR ELAPSED >= 1440 );
CONSTRAINT IS_EMPTY ( LABOR WHERE APPLIED < 0.0
OR APPLIED >= 24.0 );
CONSTRAINT IS_EMPTY ( LABOR WHERE PRODUCED < 0
OR REJECTED < 0 );
CONSTRAINT IS_EMPTY ( LABOR WHERE NOT ( STATUS = 'O'
OR STATUS = 'C' OR STATUS = 'A' ) );
CONSTRAINT IS_EMPTY ( LABOR WHERE ( TIMEON = TIMEOFF
AND STATUS <> 'O' ) );
CONSTRAINT IS_EMPTY ( LABOR WHERE ( STATUS = 'O'
AND ( TIMEON <> TIMEOFF OR ELAPSED <> 0 OR APPLIED <> 0.0
OR PRODUCED <> 0 OR REJECTED <> 0 ) ) );
Below are five transition constraints that need to be declared to
implement specific business rules:
Transition constraint #1:
STATUS can only transition from O[pen] to C[losed]
or from C[losed] to A[pproved].
Business rules implemented:
- Closed activities cannot be reopened; instead, a new activity
must be opened.
- Labor activities can't be approved while they’re being performed.
Transition constraint #2:
Labor activities with STATUS A[pproved] cannot be INSERTed,
UPDATEd or DELETEd.
Business rules implemented:
- Approved labor activities have to have been reviewed before
they can be approved. They can't have been reviewed if they
hadn't already been recorded in the database with STATUS
C[losed].
- Labor activities that have been approved affect WIP (work in
progress inventory); as a consequence, once an activity has
been approved, it cannot be altered or eliminated. Corrections
to offset activities approved in error are accomplished through
auditable journal entries recorded elsewhere in the database.
Transition constraint #3:
When STATUS transitions from C[losed] to A[pproved], only STATUS
transitions from C[losed] to A[pproved].
Business rule implemented:
- A labor activity that is being approved has to have been
reviewed before it can be approved. That can't have happened
if any component other than STATUS differs.
Transition constraint #4:
LBRDATE can't be different.
Business rule implemented:
- The date that a labor activity was performed does not change.
Transition constraint #5:
Updates to SEQ in WODETAILS must cascade into LABOR--that is,
updates to SEQ must also be reflected in each referencing labor
activity.
Business rule implemented:
- When the sequence of operations required to manufacture a part
is altered, all labor activities that haven't yet been approved
must refer to the new sequence.
DISCUSSION
It is not possible to implement the above transition constraints
using primed relvar names. The keys are compound, and it's possible
for both keys to be updated. For example, Joe entered the wrong
work order when he clocked onto labor. When he tried to clock off,
the system couldn't find the work order, so the next day an entry
showed up on his supervisor's error log. The labor activity from
the day before remained open. After determining what actually
occurred, the supervisor specifies the correct work order number
along with the time off, the quantities produced and rejected, the
elapsed time and the hours to be applied, setting STATUS to C[losed].
Before: RELATION
{ TUPLE { EMP# '22', WO# '2343', SEQ 12, OP '46', M# 'M3',
LBRTYPE 'D', LBRDATE '2010-06-30', TIMEON '08:00',
TIMEOFF '08:00', ELAPSED 0, APPLIED 0.0,
PRODUCED 0, REJECTED 0, STATUS 'O' } }
After: RELATION
{ TUPLE { EMP# '22', WO# '2334', SEQ 12, OP '46', M# 'M3',
LBRTYPE 'D', LBRDATE '2010-06-30', TIMEON '08:00',
TIMEOFF '15:00', ELAPSED 330, APPLIED 5.5,
PRODUCED 33, REJECTED 2, STATUS 'C' } }
The work order number is an element of both candidate keys, so both
keys differ as a result of the supervisor's update. How would it
be possible to match the tuple in the relvar as it was prior to the
update under consideration to the tuple in the relvar as it would
be after the update to verify that LBRDATE isn't different?
Just in case you're tempted, it can't be assumed that the update
under consideration involves only one tuple. There may have been a
multiple assignment, and constraints are checked at statement
boundaries--that is, after the entire multiple assignment, not
between its component assignments. For example, when the employee
tried to clock on today, an error occurred because he was already
clocked on (the activity from yesterday remained open), so the
supervisor also had to manually enter the activity.
LABOR Before: RELATION
{ TUPLE { EMP# '22', WO# '2343', SEQ 12, OP '46', M# 'M3',
LBRTYPE 'D', LBRDATE '2010-06-30', TIMEON '08:00',
TIMEOFF '08:00', ELAPSED 0, APPLIED 0.0,
PRODUCED 0, REJECTED 0, STATUS 'O' } }
LABOR After: RELATION
{ TUPLE { EMP# '22', WO# '2334', SEQ 12, OP '46', M# 'M3',
LBRTYPE 'D', LBRDATE '2010-06-30', TIMEON '08:00',
TIMEOFF '15:00', ELAPSED 330, APPLIED 5.5,
PRODUCED 33, REJECTED 2, STATUS 'C' },
TUPLE { EMP# '22', WO# '2343', SEQ 12, OP '46', M# 'M3',
LBRTYPE 'D', LBRDATE '2010-07-01', TIMEON '08:00',
TIMEOFF '08:00', ELAPSED 0, APPLIED 0.0,
PRODUCED 0, REJECTED 0, STATUS 'O' } }
At first glance, this appears to be a clear violation of Transition
Constraint #4 (Every component of the tuples with WO# '2343' are
identical except for LBRDATE!), but it isn't a violation. The tuple
with WO#:2343 before matches the tuple with WO#2334 after, and the
tuple with WO#2343 represents a new labor activity that began on
'2010-07-01'. How can that be determined, given just the relvar as
it was prior to the update under consideration and the relvar as it
is afterward?
Let's look at another situation. In this scenario, Joe started
filing off burrs yesterday and informed his supervisor that it is
taking forever. The supervisor then tells Joe to grind each piece
smooth, and then file off the burrs. When Joe tried to clock off
operation '67', 'GRIND SMOOTH' it bombed because there wasn't
already an open activity for operation '67'. When Joe tried
clocking on today, it bombed because as far as the system was
concerned, he was still filing off burrs since yesterday. The
supervisor now has to adjust the work order to reflect the
additional step, close the open labor activity from yesterday with
the correct operation, and open up a new activity for Joe today.
WODETAIL Before: RELATION
{ TUPLE { WO# '2334', SEQ 11, OP '23', INSTRUCT 'CUT OFF EXCESS' },
TUPLE { WO# '2334', SEQ 12, OP '46', INSTRUCT 'FILE OFF BURRS' },
TUPLE { WO# '2343', SEQ 12, OP '46', INSTRUCT 'FILE OFF BURRS' } }
LABOR Before: RELATION
{ TUPLE { EMP# '22', WO# '2334', SEQ 12, OP '46', M# 'M3',
LBRTYPE 'D', LBRDATE '2010-06-30', TIMEON '08:00',
TIMEOFF '08:00', ELAPSED 0, APPLIED 0.0,
PRODUCED 0, REJECTED 0, STATUS 'O' } }
WODETAIL After: RELATION
{ TUPLE { WO# '2334', SEQ 11, OP '23', INSTRUCT 'CUT OFF EXCESS' },
TUPLE { WO# '2334', SEQ 12, OP '67', INSTRUCT 'GRIND SMOOTH' },
TUPLE { WO# '2334', SEQ 13, OP '46', INSTRUCT 'FILE OFF BURRS' },
TUPLE { WO# '2343', SEQ 12, OP '46', INSTRUCT 'FILE OFF BURRS' } }
LABOR After: RELATION
{ TUPLE { EMP# '22', WO# '2334', SEQ 12, OP '67', M# 'M3',
LBRTYPE 'D', LBRDATE '2010-06-30', TIMEON '08:00',
TIMEOFF '17:00', ELAPSED 390, APPLIED 6.5,
PRODUCED 313, REJECTED 0, STATUS 'C' },
TUPLE { EMP# '22', WO# '2334', SEQ 13, OP '46', M# 'M3',
LBRTYPE 'D', LBRDATE '2010-07-01', TIMEON '08:00',
TIMEOFF '08:00', ELAPSED 0, APPLIED 0.0,
PRODUCED 0, REJECTED 0, STATUS 'O' } }
In this case, given just the before and after values of WODETAIL and
LABOR, it's not clear whether the tuple in LABOR as it was prior to
the update matches the tuple with OP '67' or the tuple with OP '46'
in LABOR after. If the OP is what's different, then the update
doesn't violate Transition Constraint 4; if the SEQ is what's
different, then it does violate it. It's also not clear in WODETAIL
whether SEQ 12 or SEQ 13 for WO# '2334' was added. If SEQ 12 was
added, then what had SEQ 12 in LABOR Before must be what has SEQ 13
in LABOR After, otherwise Transition Constraint #5 is violated, but
that violates Transition Constraint #4. If SEQ 13 was added, then
what had SEQ 12 in LABOR before must be what has SEQ 12 in LABOR
After, otherwise Transition Constraint #5 is violated, and here
Transition Constraint #4 is not violated. These are two possible
mappings from the tuples in the initial state to the tuples in the
terminal state of the database, one that violates transition
constraints and one that doesn't.
THE PROBLEM
There can be more than one mapping from the tuples in database as it
was prior to an update to the tuples in the database as it would be
if the update were to succeed. Each mapping constitutes a possible
transition. Unless there is a mechanism to determine exactly which
mapping should be checked, it is not possible to enforce some
constraints that permit one mapping but not another.
One possible solution is to introduce permanent surrogates, but that
requires altering relvar headings, along with additional machinery
to prevent surrogates from being reused. Another possible solution
would be to introduce temporal attributes and maintain histories for
entities, thereby treating transition constraints as temporal
constraints, but again, that requires altering relvar headings and
may also require the introduction of permanent surrogates.
There is a better way: simply let the user inform the system which
mapping should be checked. In a multiple assignment consisting of
just DELETEs, INSERTs and UPDATEs, there is enough information to
determine exactly which mapping the user selected. Here's how:
To start out with, there are two collections of tuples: the tuples
in the database as it was prior to the update under consideration
and the tuples in the database as it would be if the multiple
assignment were to succeed.
The tuples in the database as it was prior to the update that are
targeted by a DELETE don't correspond to any tuples in the database
as would be if the multiple assignment were to succeed.
The tuples in the database as it would be if the multiple assignment
succeeds that are the result of an INSERT don't correspond to any
tuples in the database as it was prior to the update. (I don't
think it makes sense to allow duplicate tuples to be INSERTed.)
There is a one-to-one correspondence--a bijective mapping--from the
tuples in the relations in the database as it was prior to the
update that are targeted by an UPDATE to the tuples in the relations
in the database as it would be if the multiple assignment were to
succeed. The tuples that are targeted are EXTENDed with additional
attributes, and at that point a one-to-one correspondence is
established. Renaming occurs, and then that which established
the one-to-one correspondence is projected away.
All that are left are the tuples that were not targeted by a DELETE,
an INSERT, or an UPDATE, and those must appear unchanged in both the
database as it was prior to the update and the database as it would
be if the multiple assignment were to succeed.
It may be significant that a tuple was the target of an UPDATE or a
"DELETE, INSERT" multiple assignment, even though there is no
apparent difference--that is, even if the same tuple is an element
of both the database as it was prior to the update and the database
as it would be if the multiple assignment were to succeed. For
example, some renters may already be paying $750 per month for a
single bedroom apartment when a set-based update sets the rent at
$750 per month for all single bedroom apartments. It wouldn't make
sense to send legally required notifications of rent increases to
renters that were already paying the new rate.
THE MECHANISM
Following the convention introduced in the section, RM VERY STRONG
SUGGESTION 4: TRANSITION CONSTRAINTS, I propose that three
additional relvar names be significant during an update. Given an
arbitrary database D, for each relvar Ri in D, {i = 1, 2, …, n}, in
addition to Ri' which is understood to refer to the corresponding
relvar as it was prior to the update under consideration:
The relvar name Ri- is understood to refer to the union of all
tuples in Ri as it was prior to the update that are identified by
the WHERE of any DELETEs that target Ri.
The relvar name Ri+ is understood to refer to the union of all of
the relation expressions for INSERTs that target Ri.
The relvar name Ri~ is understood to refer to the information
supplied by all UPDATEs that target Ri. What follows is the
information supplied by means of UPDATE:
a. WHERE <bool exp> identifies which tuples in Ri as it was prior
to the update were the target of the update.
b. For each tuple identified by a WHERE <bool exp>, <attribute
assign commalist> identifies which components of the tuple are the
target of the update.
c. For each tuple identified by a WHERE <bool exp>, <attribute
assign commalist> specifies the proposed value for each component
targeted by the update.
There are several ways to make this information available for
defining transition constraints. What I propose is that each tuple
in Ri~ have a set of components with attribute names postfixed
with ' that correspond to the components of the tuple as it was
prior to the update, a set of Boolean components with attribute
names postfixed with ~ that indicate whether each component in the
tuple was targeted by the update, and a set of components with
unaltered attribute names which correspond to the components of the
tuple as it would be if the update were to succeed.
Since
<relvar target> := <relation exp>
is semantically equivalent to the multiple assignment,
DELETE <relvar target>, INSERT <relvar target> <relation exp>,
the description of Ri-, Ri+ and Ri~ above apply also to :=
assignments, the equivalent multiple assignment being substituted
prior to evaluation.
With this mechanism, not only can it be determined exactly what is
different, but also the exact scope of the transition can be
determined, regardless of whether the components of tuples within
that scope actually differ.
Now that the mechanism has been defined, the transition constraints
required in the example above can be declared:
Transition Constraint #1 would be
CONSTRAINT IS_EMPTY ( LABOR~
WHERE ( STATUS = 'A' AND STATUS' = 'O' )
OR ( STATUS = 'O' AND STATUS' = 'C' )
OR ( STATUS' = 'A' AND STATUS <> 'A' ) );
We're only concerned with UPDATEs here.
Transition Constraint #2 would be
CONSTRAINT IS_EMPTY ( ( LABOR- { STATUS } WHERE STATUS = 'A' )
UNION ( ( LABOR~ WHERE STATUS' = 'A' ) { STATUS } )
UNION ( LABOR+ { STATUS } WHERE STATUS ='A' ) );
The only permissible way that STATUS can be 'A' is if it has been
UPDATEd from 'C' to 'A' at some point.
Transition Constraint #3 would be
CONSTRAINT IS_EMPTY ( LABOR~ WHERE STATUS = 'A' AND STATUS' = 'C'
AND ( EMP# <> EMP#' OR WO# <> WO#' OR SEQ <> SEQ'
OR OP <> OP' M# <> M#' OR LBRTYPE <> LBRTYPE'
OR LBRDATE <> LBRDATE' OR TIMEON <> TIMEON'
OR TIMEOFF <> TIMEOFF' OR ELAPSED <> ELAPSED'
OR APPLIED <> APPLIED' OR PRODUCED <> PRODUCED'
OR REJECTED <> REJECTED' ) );
Transition Constraint #4 would be
CONSTRAINT IS_EMPTY ( LABOR~ WHERE LBRDATE <> LBRDATE' );
Transition Constraint #5 would be
CONSTRAINT COUNT ( ( LABOR~
WHERE STATUS' <> 'A' ) { WO#, WO#', SEQ, SEQ' OP, OP' } )
= COUNT ( ( ( LABOR~
WHERE STATUS' <> 'A' ) { WO#, WO#', SEQ, SEQ' OP, OP' } )
JOIN ( WODETAIL~ { WO#, WO#', SEQ, SEQ' OP, OP' } ) );
This constraint enforces the constraint specified by a cascading
update. What is objectionable about cascading referential actions
is that changes to one real relvar affects other real revars. A
cascading delete or update can easily be transformed into a
multiple assignment, but a cascading referential action does more
than that. It imposes a constraint that prevents updates that
don't cascade. This mechanism for declaring transition
constraints makes it possible for those constraints to be enforced
as well.
A brief sketch of how to arrive at what each additional relvar name
references:
Given an arbitrary database D,
for each relvar Ri in D, {i = 1, 2, …, n}:
Populating Ri- and Ri+ are pretty straightforward.
Given DELETE Ri WHERE delete_bool_exp_Ri_k, {k = 1, 2, ..., m}.
Populate Ri- by evaluating the union of the expression:
Ri' WHERE delete_bool_exp_Ri_k
for each k,{k = 1, 2, ..., m}
Given INSERT Ri insert_relation_exp_Ri_k, {k = 1, 2, ..., m}.
Populate Ri+ by evaluating the union of
insert_relation_exp_Ri_k
for each k, {k = 1, 2, ..., m}
Populating Ri~ is a bit more involved.
Given UPDATE Ri
WHERE update_bool_exp_Ri_k ( attribute_assign_commalist_Ri_k ),
{k = 1, 2, ..., m}.
For each UPDATE,
1. Construct a relation literal, relation_literal_Ri_k,
consisting of a single tuple, such that:
for each Aj, {j = 1, 2, ..., l} in Ri,
if Aj appears as an attribute target in
attribute_assign_commalist_Ri_k,
Aj~ TRUE
appears as a component of the tuple in
relation_literal_Ri_k,
otherwise,
Aj~ FALSE
appears as a component.
2. Transform attribute_assign_commalist_Ri_k into an
<extend add commalist>, complete_extend_add_commalist_Ri_k,
such that:
for each Aj, {j = 1, 2, ..., l},
if Aj appears as an attribute target in
attribute_assign_commalist_Ri_k,
the <attribute assign>, Aj := <exp>
is transformed into the <extend add>, <exp> AS Aj,
in complete_extend_add_commalist_Ri_k,
otherwise
the <extend add>, Aj' AS Aj, appears in
complete_extend_add_commalist_Ri_k.
3. Populate Ri~ by evaluating the union of the expression:
EXTEND ( ( Ri' WHERE update_bool_exp_Ri_k )
RENAME { SUFFIX '' AS '''' } )
JOIN relation_literal_Ri_k )
ADD { complete_extend_add_commalist_Ri_k );
for each k, {k = 1, 2, ..., m}.
Note: RENAME (SUFFIX '' AS '''') adds a single ' to each
attribute name, the convention being that '' stands for '
within a character literal.
FURTHER ENHANCEMENT
Another enhancement to D that I think would be worthwhile would be
the addition of FROM in DELETE and UPDATE.
DELETE <relvar target> [ FROM <relation exp> ] [ WHERE <bool exp> ]
UPDATE <relvar target> [ FROM <relation exp> ]
[ WHERE <bool exp> ] ( <attribute assign commalist> )
In both cases, <relvar target> is understood to have been joined to
<relation exp>, and <bool exp> and <attribute assign commalist>
permit <attribute ref>s (but not <attribute target>s) not just from
the attributes of <relvar target> but instead from the attributes of
( <relvar target> JOIN <relation exp> ). If FROM <relation exp> is
omitted, <relation exp> is understood to be TABLE_DEE.
The tuples in <relvar target> that are the target of DELETE or
UPDATE becomes
<relvar target> SEMIJOIN ( ( <relvar target> JOIN <relation exp> )
WHERE <bool exp> ).
In the preceding section,
Ri' WHERE delete_bool_exp_Ri_k
becomes
Ri' SEMIJOIN ( ( Ri' JOIN from_expression_Ri_k )
WHERE delete_bool_exp_Ri_k )
and
EXTEND ( ( Ri' WHERE update_bool_exp_Ri_k )
RENAME { SUFFIX '' AS '''' } )
JOIN relation_literal_Ri_k )
ADD { complete_extend_add_commalist_Ri_k );
becomes
EXTEND ( ( Ri' SEMIJOIN ( ( Ri' JOIN from_expression_Ri_k )
WHERE update_bool_exp_Ri_k ) ) RENAME { SUFFIX '' AS '''' } )
JOIN relation_literal_Ri_k )
ADD { complete_extend_add_commalist_Ri_k ); |