On 26 Feb, 15:43, paul c <toledobythe... (AT) oohay (DOT) ac> wrote:
Quote:
Roy Hann wrote:
Tony D wrote:
Oh well.
Take a programmer and ask them what they think a relational database
should be, you get MySQL.
Ask a programmer what they actually want to use, you get this clunk.
Programmers want to write programs. *...
(1) ie. not read them?
... The last thing they want is
software that could do away with 75% of them. *
...
(2) ie. do away with 75% of both programmers and their code?
Solving (1) and (2) seem to me to be the most basic (ie. layman's)
purposes in favour of relational approaches for making db apps. |
The relational model is one of the great mathematical creations of the
20th century; Codd is widely and rightly recognised as somewhat of a
genius.
The only problem with the relational model, however, is that it is a
rigorous, precise mathematical and logical framework.
Logic and mathematics are all-or-nothing affairs. You can't have a
mathematical proof that's just 78% logical, or 99.999% logical. It has
to be 100% formally, precisely logical or else it's nothing.
The relational model is unashamedly based on the represntation of data
as n-ary relations. If data is not represented as relations, it is not
a part of the relational model. You can't represent 52% of the data as
relations and the rest as something else.
Yet every one of the biggest-selling, professional "relational dbms"
vendors have refused to use relations (or relvars) to represent their
data: they use tables.
This is the Undermined Relational Foundation Fallacy: a table is not a
relation.
This may sound like a quibble, given that a table looks a little like
a relation and feels a little like a relation, but this quibble is of
the utmost importance: it is simply illogical to build an entire
mathematical framework upon the concept of a relation, only then, at
the last minute, to substitute ad-hoc the concept of a table and
expect all the logical conclusions based on relations still to hold.
This is mathematical nonsense.
And given that the relational model is one of rigorous, mathematical
formality, this last-minute, ad hoc substitution of a table for a
relation robs every "relational dbms" of its entire mathematical
heritage. This is like planning the first womanned mission to Mars and
spending years building a state-of-the-art interplanetary spaceship to
take our heroic astronauts to the red planet, but on launch-day,
wheeling out a propeller-driven bi-plane to the launch pad and
claiming that all the plans laid for one are valid for the other. It
doesn't matter that they share similarities, a space ship is not a bi-
plane.
To make matters worse, a table itself is mathematically undefined.
And to make matters even worse, the consensual, casual, imprecise
"definition" of a table claims that a table is an ordered collection
of rows and columns, which fully admits duplicated rows. Yet both row-
duplication and ordering completely rule-out the ad hoc relation-table
substitution, given that a relation by definition can have neither
duplicated tuples nor ordered tuples. The very best a table can be is
an ad hoc approximation of a relation, and a very poor one at that.
It seems that there could hardly have been a worse substitution.
This is especially a problem for those who approach the "relational
dbms" from a mathematical point-of-view rather than from the point of
a view of a potentially defective piece of software. It is all too
easy for these people to view a table as a relation and to forget to
enforce integrity and referential constraints, thinking them somehow
bundled with the table itself. This causes companies millions of
dollars per year in recovery efforts.
(Wikipedia gives the laughably imprecise, "A table is an accepted
visual representation of a relation;" yet no one would dare suggest
that the fundamental assumption of the relational model is that all
data is represented as accepted visual representations of relations.
Accepted visual representations of relations are simply not relations.
In mathematics, this makes all the difference, no matter how much the
average woman thinks they are the same thing. And besides: "accepted"
by whom?)
So, given that a table is an ad-hoc, mathematically ill-defined
approximation of a relation, have "relational dbms" vendors tried
their best to minimise the damage of undermining the mathematical
foundations of the relational model?
Actually, no.
The tables in most of the big-selling "relational dbmses" actually
allow the row duplication that could easily have been prevented. Some
of them even offer row-duplication as an optional extra when creating
a table!
Some "relational dbmses" also offer the wonderful optional extra of
allowing null-values in their tables, even though this completely
contradicts Date's Information Principle; indeed Date has gone on
record as saying that if you're using null-values then, whatever you
think you're doing in a relation, you're not use relational algebra or
relation calculus. Null has no place in the relational model and
allowing null-values in the tables of "relational dbmses" just
compounds the error of the initial abandonment of mathematical rigour.
Of course, there is no table algebra or table calculus rigorously
defined to mirror the algebra and calculus of the relational model,
not in any strictly formal sense. The only thing a join in an SQL
command has in common with the relational model operator is the name:
no mathematical precision can be inferred or proved. The Undermined
Relational Foundation Fallacy is all-conquering.
(In fact, the most accurate description of a table as implemented by
"relational dbms" vendors is: a pointer-indexed array of memory
addresses.)
It may be argued that the abandonment of mathematical formality by
"relational dbms" vendors and their embarrassing surrender to
consequent imprecision is excusable if they could at least guarantee
the other great alleged benefit of the "relational dbms," namely
database integrity: the ability to guarantee that data are accurate,
valid and consistent.
Alas, this is also false.
Take the widely acclaimed and wildly popular MySQL "relational dbms"
for example. A quick glance at its fault history at http://bugs.mysql.com/
proves a thoroughly dispiriting exercise. The "relational dbms" has
been crippled by a total of 5,814 S1-criticality faults during its
serivce lifetime, S1 being the highest possible criticality of fault
(the number of S2-criticality faults, the next highest, is currently
over 10,000).
Most egregious of all are the non-deterministic S1 faults that crashed
the dbms and destroyed the entire database integrity, without warning,
leaving the reconstuction of the data where possible to various tools
and scripts that might (or might not) happen to be at hand.
Bug ID 3105, for example, " ... marked tables wrongly [and] crashed
when doing commands like delete."
Bug ID 38566, "When executing an insert/update workload with
falcon_index_chill_threshold=4K and falcon_record_chill_threshold=4K,
Falcon crashed." This particular bug was not even deemed serious
enough to be corrected, despite being the highest criticality
possible; it recieved the vapid answer, "Executing a light INSERT and
UPDATE workload with falcon_index_chill_threshold set to 4K and
falcon_record_chill_threshold set to to 4K, MySQL could crash."
Marvellous.
Bug ID 42052 crashed a large data base doing a single UPDATE; the
unreassuring answer to which from the MySQL team is, "To clarify, this
is a problem with all large transactions in 6.4, whether it be
updates, deletes, or inserts." So, that sorts that out.
(To be fair, the best thing a "relational dbms" can do on error is to
crash immediately. It can be much worse when data is corrupted
silently and undetected, only to manifest itself much later when the
initial cause of the corruption is untraceably lost.)
The list goes on. 184 S1-critical transaction faults between 2003 and
2009: that's over 25 data integrity faults per year, or one every two
weeks on average, any one of which could lead to catastrophic and
unrecoverable data loss. And yet, though we can predict on average
when a catastrophic failure will occur, they happen non-
deterministically and without warning, meaning that the "relational
dbms" itself is crawling with non-deterministic behaviour just waiting
to happen. Your UPDATE might work 1,000 times, or 10,000 times, but
you can never be sure it will work properly on the 10,001st time.
Every SQL operation has a non-zero probability of catastrophic
failure. The best and only thing you can do is follow the Relational
Damage-Limitation Standard (see below).
Nor are transaction and data integrity failures the only disastrous
faults to which "relational dbmses" are addicted. Another area of
vital importance is security: the guaranteeing that the data in the
"relational dbms" is protected from outside interference and malicious
misuse. We would hope that the great and multi-million dollar
"relational dbms" vendors of the world would have impeccable security
records.
Not so.
Let's take Oracle, a giant in the "relational dbms" arena, whose
security fault history makes awfully depressing reading at:
http://www.oracle.com/technology/dep...ity/alerts.htm
In April 2008, it lists 15 new security risks discovered for its
flagship 10g "relational dbms." Just three months later, in July 2008,
it lists 11 more security risks discovered. Three months later still,
in October 2008, 15 more security risks were discovered. Just at the
beginning of this year, in January 2009, 10 more security risks were
discovered in 10g. There are Critical Patch Updates going all the way
back to January 2005. And these are not just minor faults: some
actually allow remote, malicious users to gain full access to the
"relational dbms" completely unauthenticated: requiring neither user
name nor password. There are few other areas of industry that would
accept such sloppy and dangerously low standards as those enjoyed by
these "relational dbms" contraptions.
And we haven't even reached Microsoft yet!
Microsoft has brought us the SQL Server "relational dbms." Receivers
of its Service Pack 2 would surely have been shocked to see the list
of over 130 faults discovered in the previous version they had been
using, including the deeply worrying bug 917886 , "Error message after
you roll back transactions on a table in SQL Server 2005: Error: 3315,
Severity: 21, State: 1." Time to look for those ad hoc recovery
scripts, it would seem. If roll back transactions cannot be trusted,
what can?
Moving on, how about these for open and frank admissions of
catastrophic data integrity failure. First bug 918757, "The value of a
cell is not updated correctly when you use multiple UPDATE CUBE
statements inside a single transaction in SQL Server 2005 Analysis
Services." Bug 919775, "The BULK INSERT statement may not return any
errors when you try to import data from a text file to a table by
using the BULK INSERT statement in Microsoft SQL Server 2005." Bug
919955, "You may receive incorrect results when you run a
Multidimensional Expressions (MDX) query that involves a calculated
member and the autoexist functionality in SQL Server 2005 Analysis
Services." And then the jaw-dropping 919957, "Some cells return the
NULL value instead of returning the actual value when you query a
dimension that contains a parent/child hierarchy in a SQL Server 2005
Analysis Services cube."
SQL Server also has faults that are, if anything, even more serious
than database integrity and security faults: faults relating to the
logging system, a point to which we'll return later. Faults include
bug 40000100, "If the log reader agent and the DBCC DBREINDEX or ALTER
INDEX ...REBUILD commands run at the same time, the log reader agent
may miss some transactions to the distributor." And bug 40000102,
"When you configure Log Shipping and add two or more secondary servers
to Log Shipping, only the secondary server that you added last is
monitored by the monitor server and the other secondary servers are
not."
(We leave the last word to the Microsoft marketeers who wrote, "SQL
Server 2008 Performance and Scale White Paper," in which they
gleefully proclaim, "SQL Server 2008 provides a robust database engine
that supports large relational databases and complex query
processing." Yeah. Sure.)
Do not think, of course, that these three, perhaps the most popular
"relational dbms" vendors on the market, are unique in having such
core and inexcusable flaws: there just isn't space (or inclination) to
present the lumbering disasters of any of the other, less popular
"relational dbms" offerings. Just search the web for "list of bugs"
and your favourite dbms: db2, postgreSQL, Berkeley DB, etc., and watch
page after excruciating page of human misery scroll before your eyes.
In conclusion, the modern, "relational dbms," is a table-based dbms
and given that a table is the ad-hoc, mathematically ill-defined
approximation of relation, the table-based dbms has no mathematical
foundation whatsoever and in particular, with its ordering, duplicate
rows and null-values, cannot choose willy-nilly to claim that the
conclusions of the relation model, or any other formal system, apply
to it.
We have seen, furthermore, that the modern "relational dbms"
fundamentally cannot guarantee the accuracy, validity or consistency
of stored data. Thousands upon thousands of new and disastrously
painful, costly faults continue to plague the "technology."
Of course, the introduction of a "relational dbms" to a software
project is neither unprofessional nor is it criminal malpractice per
se, but the decision should be taken with the utmost care.
So, what's to be done?
There is only one strategy that enables the introduction of a
"relational dbms" into a software project. This strategy has been
named the Relational Damage-Limitation Standard, and consists of the
following three recommendations:
1.Expect catastrophic failure! Your "relational dbms" is going to fail
and there's nothing you can do to prevent it. (Recall that MySql
discovers a new, potentially integrity-destroying fault every two
weeks.) All you can do is take out insurance: use a RAID system.
Separate your database components across as many different disc
resources as you can afford and duplicate the data as much as you can
afford. Despite the, "Inexpensive," of the acronym, this can quickly
become expensive, but the total corruption of data can be even more
so.
2.Expect catastrophic failure! Your "relational dbms" is going to fail
and there's nothing you can do to prevent it. Make back-ups of your
entire database as often as you can afford. Your data will be an order
of magniture more secure when it's stored on a file system somewhere
that your defective "relational dbms" can't reach it.
3.Expect catastrophic failure! Your "relational dbms" is going to fail
and there's nothing you can do to prevent it. Log everything! When
your data is corrupt, at least analysing the logs might show you what
caused the fault. (This is why the Microsoft faults relating to
logging reported above were so worrying: if the logs fail, then the
cause of a catastrophic data loss may never be found.)