Joe Thurbon wrote:
Quote:
Hi folks,
When JOG used to post here regularly he talked quite a bit about query
bias. It also came up recently on the TTM mailing list. It also gets a
vague nod on the wikipedia page on database normalisation.
Does the term have any formal, well defined or shared meaning?
I think I understand the intuition, that in some sense an 'unbiased'
data model minimises some sort of complexity metric aggregated across
some set of queries. (I think that 'some set' might actually be
replaced by 'all' for some people).
But complexity can mean lots of things:
How simple is the query to write?
How simple is the query to execute?
How simple is the query to understand?
What does it mean to the folk here?
It's only a curiosity to me, but I'd be interested to hear what people
think.
Cheers,
Joe |
Query bias is more a property of the data model or language than of a
design. Redundant, implicit or asymmetric language features or needless
complexity in the data model mean that when one chooses a particular
redundant, implicit, asymmetric or needless feature in a design, one
makes some queries easy to express and other queries difficult or
impossible to express.
Some languages or data models introduce all sorts of unecessary
structural elements like pointers, records, sets, arrays, parents, children.
As an example, if orders have an array of child order items, there may
be no way to access order items without going through orders. A query to
determine the most expensive order item in the database becomes
relatively difficult to express. (Note: the problem statement involves
only order items not orders.)
In some cases, the bias can relate to performance. As a general rule,
when one changes the physical layout of the data, one makes some queries
faster and some queries slower. That can be quite intentional and
desired. If the logical formalism (language) has redundancies that
affect performance, too often naive developers will choose the "fastest"
option for the first problem they have to solve without any regard for
future needs.
In the above orders/order items example, making order items a child
array of order might cluster order items with orders. While this will
make it extremely fast to access an order's order items once one has
navigated to the order, scanning all orders will be slowed--possibly by
a wide margin.