Quote:
I am looking forward to Part 2 now. |
A long time ago when I thought I already knew something about data but
hadn't yet worked with databases -- telling in itself -- I had the
opportunity to spread my views on file formats to a bunch of nerds. Of
course I already knew that no such discussion would be complete
without at least mentioning normalization. So I decided to illustrate
it in passing by an example. Since obviously the goal of normalization
is to remove redundancy, the first example naturally was pushing
strings into a lookup table and linking to them via their index in it.
Now, perhaps I could have gotten away with that had a gone on to
actually factor something into 3NF, but no. That was the only example.
I don't think Hail Maries quite cut that one, eventhough in a very
limited sense the part about redundancy is true.
Half a decade later I was designing one of my first schemas. Now I
actually knew what normal forms and integrity were about. But it seems
old habits die hard. The result was a beautifully constrained and
normalized base...with every single entity keyed by surrogate. I was
happy and went my merry way.
A couple of years ago I decided to see how the base was doing
nowadays. It was humming along nicely. The surrogates bothered me a
bit, but since they did simplify a couple of practical things and
indeed compressed the base (to its then whopping 20MB), I thought
everything was fine. Until I realized two minor details. First, I
hadn't been stern enough with the web developers, so they'd of course
put my surrogates on the user visible reports, and now they'd leaked
into the wild. Ooops. And second, it suddenly dawned on me why the
rest of the stuff seemed to be in order: during development I had
indeed been strict enough in some things. There was not a single null
in the schema...nor a single table auto generating surrogates which
wasn't also constrained by a unique key other than the surrogate.
There are no free lunches, but there is such a thing as karmic debt. I
think that goes a long way towards explaining why I now work with the
internals of a Siebel instance. In case you haven't heard, that's the
product whose trainer asks on the top of the introductory course
whether anybody present is a DBA. In case one is found, he's told he's
not going to like what he's about to see. One of those things is that
everything, and I mean *everything* is keyed by surrogates, and in one
particular table not a single field besides that surrogate is declared
NOT NULL.
Of course, my debt being what it is, we're not quite done yet. You
see, our setup is the central hub for a number of independent Siebel
installations. We distribute, deduplicate and publish data between
them. That means that each of the systems has its own, identically
formatted, and overlapping in range surrogate for each of the objects.
Since our system is supposed to have something to do with master data,
it of course additionally generates its own, authoritative, master ID
for each new object. Which isn't to say that some of the affiliate
systems wouldn't have their own, autogenerated, user visible keys in
addition to what Siebel does under the covers. The result is that the
maximum number of separate autogen keys for a single object I've seen
thus far has been in the vicinity of ten.
My title is global data administrator. It basically means I'm
responsible for the quality and uniqueness of the data.
--
Sampo