dbTalk Databases Forums  

Re: data modeling confessions, part 2

comp.databases.theory comp.databases.theory


Discuss Re: data modeling confessions, part 2 in the comp.databases.theory forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Sampo Syreeni
 
Posts: n/a

Default Re: data modeling confessions, part 2 - 10-20-2009 , 10:56 AM






Quote:
I am looking forward to Part 2 now.
A long time ago when I thought I already knew something about data but
hadn't yet worked with databases -- telling in itself -- I had the
opportunity to spread my views on file formats to a bunch of nerds. Of
course I already knew that no such discussion would be complete
without at least mentioning normalization. So I decided to illustrate
it in passing by an example. Since obviously the goal of normalization
is to remove redundancy, the first example naturally was pushing
strings into a lookup table and linking to them via their index in it.
Now, perhaps I could have gotten away with that had a gone on to
actually factor something into 3NF, but no. That was the only example.
I don't think Hail Maries quite cut that one, eventhough in a very
limited sense the part about redundancy is true.

Half a decade later I was designing one of my first schemas. Now I
actually knew what normal forms and integrity were about. But it seems
old habits die hard. The result was a beautifully constrained and
normalized base...with every single entity keyed by surrogate. I was
happy and went my merry way.

A couple of years ago I decided to see how the base was doing
nowadays. It was humming along nicely. The surrogates bothered me a
bit, but since they did simplify a couple of practical things and
indeed compressed the base (to its then whopping 20MB), I thought
everything was fine. Until I realized two minor details. First, I
hadn't been stern enough with the web developers, so they'd of course
put my surrogates on the user visible reports, and now they'd leaked
into the wild. Ooops. And second, it suddenly dawned on me why the
rest of the stuff seemed to be in order: during development I had
indeed been strict enough in some things. There was not a single null
in the schema...nor a single table auto generating surrogates which
wasn't also constrained by a unique key other than the surrogate.

There are no free lunches, but there is such a thing as karmic debt. I
think that goes a long way towards explaining why I now work with the
internals of a Siebel instance. In case you haven't heard, that's the
product whose trainer asks on the top of the introductory course
whether anybody present is a DBA. In case one is found, he's told he's
not going to like what he's about to see. One of those things is that
everything, and I mean *everything* is keyed by surrogates, and in one
particular table not a single field besides that surrogate is declared
NOT NULL.

Of course, my debt being what it is, we're not quite done yet. You
see, our setup is the central hub for a number of independent Siebel
installations. We distribute, deduplicate and publish data between
them. That means that each of the systems has its own, identically
formatted, and overlapping in range surrogate for each of the objects.
Since our system is supposed to have something to do with master data,
it of course additionally generates its own, authoritative, master ID
for each new object. Which isn't to say that some of the affiliate
systems wouldn't have their own, autogenerated, user visible keys in
addition to what Siebel does under the covers. The result is that the
maximum number of separate autogen keys for a single object I've seen
thus far has been in the vicinity of ten.

My title is global data administrator. It basically means I'm
responsible for the quality and uniqueness of the data.
--
Sampo

Reply With Quote
  #2  
Old   
Sampo Syreeni
 
Posts: n/a

Default Re: data modeling confessions, part 2 - 10-21-2009 , 01:08 PM






Quote:
Since our system is supposed to have something to do with master data,
it of course additionally generates its own, authoritative, master ID
for each new object.
As an addendum, I just learned today that one of our affiliate systems
contains a nifty feature: they can manually change this authoritative,
master ID. And no, updates to our database are not correlated by the
ID we assigned. Instead they're done via some fuzzy logic, among other
things involving the surrogate the source system is using. The result?
We have duplicate, authoritative master ID's now.

That problem is not a new one. A thorough cleansing exercise already
took place before my "reign" to eradicate it. It's just that nobody
thought of declaring the problematic attribute UNIQUE after the
cleasing was done, and the problem recurred. (It couldn't have been
declared the primary key since, after all, we're working on top of
Siebel's out-of-the-box data model which elects to place objects there
which cannot be assigned one of our ID's. Plus of course the primary
key is already set as the Siebel internal surrogate.)

On the plus side, I was assured the fix to stop this from happening
again was coming Any Day Now. It essentially consists of keeping the
matching logic the same, but ignoring updates to the master ID.
Correlating updates by our ID was apparently not an option, eventhough
one of my colleagues has been talking about that ever since the
cleanup took place.

As the perennial optimist, I'm hoping the new code rejects updates
with changed master ID's or actively reverts them. But at this point I
think it's more likely that the update will be accepted, only with the
master ID stripped off, giving rise to inconsistency between the two
communicating systems.

The lesson? If you couple loosely, you don't need yet another key but
simply a rock-solid mapping mechanism. If you try to do MDM, you
should have the authority to force your affiliates to adopt your
master key as the One True Key. Mixing the two approaches ain't really
an option.
--
Sampo

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.