dbTalk Databases Forums  

Data modeling in MV, revisited

comp.databases.pick comp.databases.pick


Discuss Data modeling in MV, revisited in the comp.databases.pick forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
dawn
 
Posts: n/a

Default Data modeling in MV, revisited - 10-18-2006 , 05:11 PM






I'm hoping to write another blog entry (have been slack for a while)
related to modeling data in MV and I could use some help. I thought I
would state that up front so you know to ignore if you have no interest
in such.

Where traditional relational theory is easy to teach because it has
that nice 1, 2, 3 NF thing going for it so you can teach how to put the
data in first normal form and then go from there, typical MV s/w
developers might be more inclined to say something like "you just do
it" or "model the data the way it makes sense" or "model the data to
align with reality."

My take is that there really are some best practices that could be
taught, even if many of them are not as mathematical as the normal
forms. Perhaps some of the following are agreeable?
--designing a solution without sub-values, just files, attributes
including single and multi-values as well as associated multi-values
--using a single file for all of the relatively small lookup tables
(valid values for marital status, gender, hair color, customer status,
etc) which is frowned upon in the SQL world (we discussed this a while
back on cdp)
--using files for metadata and generating dicts from those, using these
files as the basis for data entry forms/screens/pages & related
constraints as well as user-maintained business rules
--adding some misc user-defined single and multi-valued fields to
various files without any specified intent so users can use them as
needed to help tailor the system for their needs over time (like the
star in gmail)
--creating return links so you can navigate from file to file in both
directions (things like that really fly in the face of relational
theory, but hey ho)
--adding associated multivalues with a status code and date stamp for
any status codes where history should be kept (rather than requiring
data marts and data warehouses for such information) and using this
same pattern for other specific attributes as needed, howbeit
sparingly.
--keeping properties (not well-defined), whether single or
multi-valued, as attributes of an entity (also not well-defined other
than entity = "thing") in a single file definition rather than
splitting them out into separate files
--splitting out attributes into separate files based on functional
dependencies, such as is done in relational normalization, but getting
data into 3rd normal form without the need of 1NF.
--identifying the main entities about which users will want to report
and beefing up those dictionaries (entities) with a lot of virtual
fields (or using 3rd party tools for reporting)
--choosing naming conventions, such as plural names for multivalues,
singular for single-values, and plural for entities (relational
standards are for tables to be singular, but LIST statements read much
better with plural names)

I'm guessing I could come up with a bunch of other ideas for data
modeling best practices with multi-values since the above possibilities
are jumping off my fingertips, but this gives the idea of what I'm
looking for.

So, in answer to the question of what might be some industry best
practices for modeling/designing data for a Pick/MV implementation,
which of the above do you agree with and what would you add?

Thanks. --dawn
P.S. In the resulting blog entry, I'll acknowledge all those who help
build or refine the list unless you request no such acknowledgement.


Reply With Quote
  #2  
Old   
Pickie
 
Posts: n/a

Default Re: Data modeling in MV, revisited - 10-18-2006 , 07:03 PM






Hi Dawn,

I'm going to top, inter-, and bottom post - I hope for clarity.

I think it's easiest if one considers the Pick data structure to be a
physical implemenation of a version of the Relational Model. For me,
that seems to guide my efforts without restricting what I can do.

dawn wrote:
Quote:
I'm hoping to write another blog entry (have been slack for a while)
related to modeling data in MV and I could use some help. I thought I
would state that up front so you know to ignore if you have no interest
in such.

Where traditional relational theory is easy to teach because it has
that nice 1, 2, 3 NF thing going for it so you can teach how to put the
data in first normal form and then go from there, typical MV s/w
developers might be more inclined to say something like "you just do
it" or "model the data the way it makes sense" or "model the data to
align with reality."

My take is that there really are some best practices that could be
taught, even if many of them are not as mathematical as the normal
forms. Perhaps some of the following are agreeable?
--designing a solution without sub-values, just files, attributes
including single and multi-values as well as associated multi-values
I agree that subvalues ought not to be used. Primarily this is because
they are not supported by the enquiry language. Secondarily because
the structure of an item becomes too complex to be readily understood.

However, I have seen them used in one circumstance that made sense to
me. This was in an application that was originally written with ALL
(Applied Language Liberator - I think). In ALL, an attribute could
contain a table - with the value marks being record seperators and the
subvalue marks acting as attribute marks.

In order to report on the subvalues I used a dictionary program to
convert them to multivalues which was something like this

001: SUBROUTINE MULTISUB(THAT,ATTR,SUBV)
002: *
003: * Where data is kept in subvalues, this returns a multi-valued
list
004: * from a given attribute of each occurrance of the specified
subvalue.
005: *
006: THAT = ''
007: IF NOT(ATTR MATCHES '1N0N') THEN RETURN
008: IF NOT(SUBV MATCHES '1N0N') THEN RETURN
009: LINE = @RECORD<ATTR>
010: YYNO = DCOUNT(LINE,@VM)
011: FOR YY = 1 TO YYNO
012: THAT<1,YY> = LINE<1,YY,SUBV>
013: NEXT YY
014: RETURN


Quote:
--using a single file for all of the relatively small lookup tables
(valid values for marital status, gender, hair color, customer status,
etc) which is frowned upon in the SQL world (we discussed this a while
back on cdp)
This is a reasonable way to reduce the number of files. It makes
perfect sense when this is regarded as a physical implementation of a
relational model. Each table is its own relation from the logical
point of view, but each one is an item in a file from the physical
point of view.

I would use multiple files to store tables of similar types. One might
be for the simplest type where there is a code and a description (like
gender). Another might be for more complex ones. For example those
that have a code, a description, an implementation date, and an
obsolescent date. For me, this is a bit tidier than cramming all the
tables in one file.

Quote:
--using files for metadata and generating dicts from those, using these
files as the basis for data entry forms/screens/pages & related
constraints as well as user-maintained business rules
This makes a lot of sense to me. In all the sites I have been on, the
dictionaries get filled up with a whole bunch of alternatives for
reporting. They just can't be relied on for understanding the
application system.

Quote:
--adding some misc user-defined single and multi-valued fields to
various files without any specified intent so users can use them as
needed to help tailor the system for their needs over time (like the
star in gmail)
This doesn't really resonate with me. It's so easy to add new fields
in that I don't see the need to pre-add them, as it were.

Quote:
--creating return links so you can navigate from file to file in both
directions (things like that really fly in the face of relational
theory, but hey ho)
Again, this is a physical system. This sort of thing is just adding an
index to the severely minimalist logical model.

Quote:
--adding associated multivalues with a status code and date stamp for
any status codes where history should be kept (rather than requiring
data marts and data warehouses for such information) and using this
same pattern for other specific attributes as needed, howbeit
sparingly.
The history is its own relation. In the MV world we just have a useful
way to clump it physically with the "source" relation.

Quote:
--keeping properties (not well-defined), whether single or
multi-valued, as attributes of an entity (also not well-defined other
than entity = "thing") in a single file definition rather than
splitting them out into separate files
In general, this ability to clump information is the heart of MV.
However, it is not antithetical to the relational model; rather, it is
orthogonal to it.

Quote:
--splitting out attributes into separate files based on functional
dependencies, such as is done in relational normalization, but getting
data into 3rd normal form without the need of 1NF.
Yes, functionality is the reason for the clumping. At the logical
level, the data should be fully normalised. It should be a mere
mechanical process to convert the MV structure into a table structure
that obviously follows the relational model. (Items are subtle, tables
are clumsy?)

Quote:
--identifying the main entities about which users will want to report
and beefing up those dictionaries (entities) with a lot of virtual
fields (or using 3rd party tools for reporting)
Yes, and this is why the dictionaries fill up with cruft. I don't
really have a strong objection to this, it's just that I seem to want
to use a word that someone else has used. The problem is that one
cannot change the dictionary item in case some report somewhere throws
a fit (or a did-not-fit).

Quote:
--choosing naming conventions, such as plural names for multivalues,
singular for single-values, and plural for entities (relational
standards are for tables to be singular, but LIST statements read much
better with plural names)
That's very much a matter of taste. A bit too specific to be a rule.
Quote:
I'm guessing I could come up with a bunch of other ideas for data
modeling best practices with multi-values since the above possibilities
are jumping off my fingertips, but this gives the idea of what I'm
looking for.

So, in answer to the question of what might be some industry best
practices for modeling/designing data for a Pick/MV implementation,
which of the above do you agree with and what would you add?

Thanks. --dawn
P.S. In the resulting blog entry, I'll acknowledge all those who help
build or refine the list unless you request no such acknowledgement.

To make the resulting blog as useful as possible, may I suggest you
present the pros and cons of each point. I'm assuming there will be
arguements presented on each side. Thank God this isn't C.D.T. where
the trump arguement is "Read the same book I once did, you
self-aggrandising ignorant - I'm gonna filter you out - nya nya nya -
I'm not listening again!".


A few things I do - coding standards really. It's not exactly on
topic, but what the heck.

1. Always open a file to a variable of the same name. EG

OPEN 'CUSTOMERS' TO CUSTOMERS

because this makes for easier searching through the source code. I
have seen the following variant advised (which seemed reasonable too)

OPEN 'CUSTOMERS' TO CUSTOMER.F

READ CUSTOMERS.R FROM CUSTOMERS.F, CUSTOMERS.I ....

2. Don't use capital i for a loop variable. On a bad printout it's
very difficult to dstinguish between capital i and numeric one. I use
XX, YY, and ZZ bcause they are easy to search for - capital i is
really, really hard - SO many hits.

3. Don't comment the bleeding obvious. I don't need "* OPEN FILES" set
off and underlined, followed by 15 open file commands. Set them off
with a little white space and I'll figure it out!

4. Comment what is difficult to understand!

5. Comment _why_ something was done - I can read _what_ was done.

6. And for crying out loud, SOMEWHERE say what the program is there
for!

7. Remove obsolete code, so I can readily see the logic that's used
now. Use a versioning system to keep old stuff, not the live source
file.


You may have guessed that I'm supporting some badly commented code, and
feel a bit frazzled sometimes.

Regards, Keith.



Reply With Quote
  #3  
Old   
dawn
 
Posts: n/a

Default Re: Data modeling in MV, revisited - 10-18-2006 , 09:02 PM



Pickie wrote:

Hi Keith --
I was wondering if the earthquake was in your neck of the woods, so
good to see you online.

Quote:
Hi Dawn,

I'm going to top, inter-, and bottom post - I hope for clarity.
I don't know if I'll get through the whole thing tonight, but I'll
start on it.

Quote:
I think it's easiest if one considers the Pick data structure to be a
physical implemenation of a version of the Relational Model.
Hmmm. I like to ignore the physical implementation for anything other
than performance tuning, preferring to abstract the data model a bit,
but I'll try to think of it your way when reading your comments. Of
course the def of "physical" tends to be interpreted differently by
different folks, but I suspect I can zero in on the layer you refer to
here.

Quote:
For me,
that seems to guide my efforts without restricting what I can do.

dawn wrote:
I'm hoping to write another blog entry (have been slack for a while)
related to modeling data in MV and I could use some help. I thought I
would state that up front so you know to ignore if you have no interest
in such.

Where traditional relational theory is easy to teach because it has
that nice 1, 2, 3 NF thing going for it so you can teach how to put the
data in first normal form and then go from there, typical MV s/w
developers might be more inclined to say something like "you just do
it" or "model the data the way it makes sense" or "model the data to
align with reality."

My take is that there really are some best practices that could be
taught, even if many of them are not as mathematical as the normal
forms. Perhaps some of the following are agreeable?
--designing a solution without sub-values, just files, attributes
including single and multi-values as well as associated multi-values

I agree that subvalues ought not to be used. Primarily this is because
they are not supported by the enquiry language. Secondarily because
the structure of an item becomes too complex to be readily understood.
I would reverse the order of the rationale, but otherwise agreed.
Since we can think of files (our version of a relation) as modeling
propositions, when the "language" of the data model gets too complex,
it is hard to develop and maintain.

Quote:
However, I have seen them used in one circumstance that made sense to
me. This was in an application that was originally written with ALL
(Applied Language Liberator - I think). In ALL, an attribute could
contain a table - with the value marks being record seperators and the
subvalue marks acting as attribute marks.

In order to report on the subvalues I used a dictionary program to
convert them to multivalues which was something like this
OK, I an buy this use -- you are pushing a subtree down a level for
persistence for some reason and popping it back up when using it, if I
am understanding. It isn't pretty, but I can imagine a non-standard
scenario when such a solution might be identified. I cannot think of a
reason this would be considered a "best practice" in MV data modeling,
however, and you are not suggesting that either.

Quote:
001: SUBROUTINE MULTISUB(THAT,ATTR,SUBV)
002: *
003: * Where data is kept in subvalues, this returns a multi-valued
list
004: * from a given attribute of each occurrance of the specified
subvalue.
005: *
006: THAT = ''
007: IF NOT(ATTR MATCHES '1N0N') THEN RETURN
008: IF NOT(SUBV MATCHES '1N0N') THEN RETURN
009: LINE = @RECORD<ATTR
010: YYNO = DCOUNT(LINE,@VM)
011: FOR YY = 1 TO YYNO
012: THAT<1,YY> = LINE<1,YY,SUBV
013: NEXT YY
014: RETURN


--using a single file for all of the relatively small lookup tables
(valid values for marital status, gender, hair color, customer status,
etc) which is frowned upon in the SQL world (we discussed this a while
back on cdp)

This is a reasonable way to reduce the number of files. It makes
perfect sense when this is regarded as a physical implementation of a
relational model. Each table is its own relation from the logical
point of view, but each one is an item in a file from the physical
point of view.
Yes. In this case there are both logical advantages and can definitely
be performance advantages too.

Quote:
I would use multiple files to store tables of similar types. One might
be for the simplest type where there is a code and a description (like
gender). Another might be for more complex ones. For example those
that have a code, a description, an implementation date, and an
obsolescent date. For me, this is a bit tidier than cramming all the
tables in one file.
As my youngest used to say, "Meh" (which I think means "6 of one, half
dozen of the other" in this case, but might just mean "I don't feel
like responding to that" ;-)

Quote:
--using files for metadata and generating dicts from those, using these
files as the basis for data entry forms/screens/pages & related
constraints as well as user-maintained business rules

This makes a lot of sense to me. In all the sites I have been on, the
dictionaries get filled up with a whole bunch of alternatives for
reporting. They just can't be relied on for understanding the
application system.
This also makes more sense to me than having all of the dbms
constraints specified in another format and stored separately from
user-defined constraints.

Quote:
--adding some misc user-defined single and multi-valued fields to
various files without any specified intent so users can use them as
needed to help tailor the system for their needs over time (like the
star in gmail)

This doesn't really resonate with me. It's so easy to add new fields
in that I don't see the need to pre-add them, as it were.
I tossed that one in to see what people thought of it. I have seen
this in multiple systems. In some sense there is a defined purpose to
them -- they are "tagging" fields so that the user can tag data as they
see fit, just as Google gives me a star in gmail. When I think of how
often I add columns in spreadsheets just to browse through the data and
give it some fresh tagging in order to select based on my new tags, I
think that something along these lines is a good idea. I'm not sure
how to say it so that it would sound like a good idea to others,
however.

Quote:
--creating return links so you can navigate from file to file in both
directions (things like that really fly in the face of relational
theory, but hey ho)

Again, this is a physical system. This sort of thing is just adding an
index to the severely minimalist logical model.
And my thinking was more at the logical level again, but OK.

Quote:
--adding associated multivalues with a status code and date stamp for
any status codes where history should be kept (rather than requiring
data marts and data warehouses for such information) and using this
same pattern for other specific attributes as needed, howbeit
sparingly.

The history is its own relation. In the MV world we just have a useful
way to clump it physically with the "source" relation.
It is not quite the same as having a separate relation for the history
when we keep history on individual attributes.

We tend to keep history where it is designed, so there is an issue when
we identify something we wish we had kept a history on. However, there
is often less reason to make regular copies like in a data warehouse
because of the history we do keep.

Quote:
--keeping properties (not well-defined), whether single or
multi-valued, as attributes of an entity (also not well-defined other
than entity = "thing") in a single file definition rather than
splitting them out into separate files

In general, this ability to clump information is the heart of MV.
Yes.

Quote:
However, it is not antithetical to the relational model; rather, it is
orthogonal to it.
I was thinking that it flies in the face of RM philosophies. You often
hear RM folks say things like "I'd rather have 1000 tables than 1000
lines of code," for example. Making new tables repeatedly, without
regard for what models an actual entity compared to a property list, is
at the backbone of the RM as implemented in SQL-DBMS's today.

If you peer into a typical UniData solution (the only MV system I have
worked in extensively other than Prime Information), you can understand
the business quite handily by picking out the files that relate to "key
entities" (those are the dicts with tons of I-descs, for example). If
you peer into an Oracle schema, it will take much longer to understand
the business that is being modeled with that schema. That is not a
coincidence or even a by-product -- that is based on a difference in
how developers are supposed to design the data model in each.

Quote:
--splitting out attributes into separate files based on functional
dependencies, such as is done in relational normalization, but getting
data into 3rd normal form without the need of 1NF.

Yes, functionality is the reason for the clumping. At the logical
level, the data should be fully normalised.
Excluding 1NF, which is hard to explain to someone who thinks 3NF is
defined in terms of first having 1NF data.

Quote:
It should be a mere
mechanical process to convert the MV structure into a table structure
that obviously follows the relational model.
Yes, excellent point. One nuance is in the ordering of multivalues and
the fact that we do not specify whether the order is relevant or not (a
failing on the part of the MV model, but not one that is too
troublesome).

Quote:
(Items are subtle, tables
are clumsy?)
Not tracking with you on this parenthetical comment, sorry, please
explain.

Quote:
--identifying the main entities about which users will want to report
and beefing up those dictionaries (entities) with a lot of virtual
fields (or using 3rd party tools for reporting)

Yes, and this is why the dictionaries fill up with cruft. I don't
really have a strong objection to this, it's just that I seem to want
to use a word that someone else has used. The problem is that one
cannot change the dictionary item in case some report somewhere throws
a fit (or a did-not-fit).
Yes, and it is harder to cross-reference fields used with dict names
when some of the code for apps is outside of the MV VM arena -- when
you are using Java or C# or PHP or... with a client-server interface to
access data. There is something painful, but charming, about thinking
of each "key entity" as its own little name space/vocabulary that
people might want to use when they are entering the system through that
"portal."

Quote:
--choosing naming conventions, such as plural names for multivalues,
singular for single-values, and plural for entities (relational
standards are for tables to be singular, but LIST statements read much
better with plural names)

That's very much a matter of taste. A bit too specific to be a rule.
The "choosing naming conventions" is the rule, the rest is a "such as"
and I agree that is too specific to be a rule.

Quote:
I'm guessing I could come up with a bunch of other ideas for data
modeling best practices with multi-values since the above possibilities
are jumping off my fingertips, but this gives the idea of what I'm
looking for.

So, in answer to the question of what might be some industry best
practices for modeling/designing data for a Pick/MV implementation,
which of the above do you agree with and what would you add?

Thanks. --dawn
P.S. In the resulting blog entry, I'll acknowledge all those who help
build or refine the list unless you request no such acknowledgement.


To make the resulting blog as useful as possible, may I suggest you
present the pros and cons of each point.
I don't have have an angle on how I will do it, but I wasn't planning
to argue each point, just give them since giving an argument for each
point might be too long or cumbersome. I'll see, but readers could be
encouraged to argue points in the comments. Most of my entries have
been for intended for both SQL and MV folks (as would this one), so
this one which focusses on MV data modeling also needs to be readable
by someone who has no MV experience. It's usefulness might include
modeling data within apps or between them, such as designing XML
documents, as well as in potentially broadening the knowledge of those
who did not know you could design data for persistence any way other
than using SQL and 1NF structures.

Quote:
I'm assuming there will be
arguements presented on each side. Thank God this isn't C.D.T. where
the trump arguement is "Read the same book I once did, you
self-aggrandising ignorant - I'm gonna filter you out - nya nya nya -
I'm not listening again!".
That "self-aggrandizing ignorant" charge is occasionally disturbing,
but since I know I can always learn more and be less ignorant, I asked
a friend how I could correct a "self-aggrandizing" defect, if my
accusers were correct. Of course she pandered to me and said it was
far from accurate, but then added that if it were true, it was best
corrected by ignoring the messenger. A Zen approach that I
appreciated.

Quote:
A few things I do - coding standards really. It's not exactly on
topic, but what the heck.

1. Always open a file to a variable of the same name. EG

OPEN 'CUSTOMERS' TO CUSTOMERS

because this makes for easier searching through the source code. I
have seen the following variant advised (which seemed reasonable too)

OPEN 'CUSTOMERS' TO CUSTOMER.F

READ CUSTOMERS.R FROM CUSTOMERS.F, CUSTOMERS.I ....
These are the types of best practices that Brian Leach was working to
collect at one point. I'm focussed here on designing the data model,
but I know that most MV folks don't separate that design from any of
the other design in the system in a big way. So, for me right now,
this is out of scope, but it would be good to collect such information
somewhere as Brian was doing (and might still be doing in a U2 best
practices collection).

Quote:
2. Don't use capital i for a loop variable. On a bad printout it's
very difficult to dstinguish between capital i and numeric one. I use
XX, YY, and ZZ bcause they are easy to search for - capital i is
really, really hard - SO many hits.

3. Don't comment the bleeding obvious. I don't need "* OPEN FILES" set
off and underlined, followed by 15 open file commands. Set them off
with a little white space and I'll figure it out!
But you gotta love those rows of asterisks, right? ;-)

Quote:
4. Comment what is difficult to understand!

5. Comment _why_ something was done - I can read _what_ was done.

6. And for crying out loud, SOMEWHERE say what the program is there
for!

7. Remove obsolete code, so I can readily see the logic that's used
now.
How does this square with "keep a history of the code in the code with
id/date/reason blocks on all changes"?

Quote:
Use a versioning system to keep old stuff, not the live source
file.
Ahh. Makes sense, but I'm old and don't trust this approach yet.

Quote:
You may have guessed that I'm supporting some badly commented code, and
feel a bit frazzled sometimes.
None of the rest of us has a clue what you are talking about ;-)
smiles. --dawn

Quote:
Regards, Keith.


Reply With Quote
  #4  
Old   
Steve Alexander
 
Posts: n/a

Default Re: Data modeling in MV, revisited - 10-18-2006 , 10:42 PM



Having designed a couple of hundred MV applications over the past 30+
years, and having given this considerable thought, I am inclined to
agree with the "you just do it" sentiment. The only best practice that
comes to mind is "common sense." I don't have any words of wisdom on
how to teach or acquire that. Most people know it when they see it.
- Steve Alexander

On 18 Oct 2006 15:11:42 -0700, "dawn" <dawnwolthuis (AT) gmail (DOT) com> wrote:

Quote:
I'm hoping to write another blog entry (have been slack for a while)
related to modeling data in MV and I could use some help. I thought I
would state that up front so you know to ignore if you have no interest
in such.

Where traditional relational theory is easy to teach because it has
that nice 1, 2, 3 NF thing going for it so you can teach how to put the
data in first normal form and then go from there, typical MV s/w
developers might be more inclined to say something like "you just do
it" or "model the data the way it makes sense" or "model the data to
align with reality."

My take is that there really are some best practices that could be
taught, even if many of them are not as mathematical as the normal
forms. Perhaps some of the following are agreeable?
--designing a solution without sub-values, just files, attributes
including single and multi-values as well as associated multi-values
--using a single file for all of the relatively small lookup tables
(valid values for marital status, gender, hair color, customer status,
etc) which is frowned upon in the SQL world (we discussed this a while
back on cdp)
--using files for metadata and generating dicts from those, using these
files as the basis for data entry forms/screens/pages & related
constraints as well as user-maintained business rules
--adding some misc user-defined single and multi-valued fields to
various files without any specified intent so users can use them as
needed to help tailor the system for their needs over time (like the
star in gmail)
--creating return links so you can navigate from file to file in both
directions (things like that really fly in the face of relational
theory, but hey ho)
--adding associated multivalues with a status code and date stamp for
any status codes where history should be kept (rather than requiring
data marts and data warehouses for such information) and using this
same pattern for other specific attributes as needed, howbeit
sparingly.
--keeping properties (not well-defined), whether single or
multi-valued, as attributes of an entity (also not well-defined other
than entity = "thing") in a single file definition rather than
splitting them out into separate files
--splitting out attributes into separate files based on functional
dependencies, such as is done in relational normalization, but getting
data into 3rd normal form without the need of 1NF.
--identifying the main entities about which users will want to report
and beefing up those dictionaries (entities) with a lot of virtual
fields (or using 3rd party tools for reporting)
--choosing naming conventions, such as plural names for multivalues,
singular for single-values, and plural for entities (relational
standards are for tables to be singular, but LIST statements read much
better with plural names)

I'm guessing I could come up with a bunch of other ideas for data
modeling best practices with multi-values since the above possibilities
are jumping off my fingertips, but this gives the idea of what I'm
looking for.

So, in answer to the question of what might be some industry best
practices for modeling/designing data for a Pick/MV implementation,
which of the above do you agree with and what would you add?

Thanks. --dawn
P.S. In the resulting blog entry, I'll acknowledge all those who help
build or refine the list unless you request no such acknowledgement.


Reply With Quote
  #5  
Old   
Kevin Powick
 
Posts: n/a

Default Re: Data modeling in MV, revisited - 10-18-2006 , 11:14 PM



dawn wrote:
Quote:
I'm hoping to write another blog entry (have been slack for a while)
related to modeling data in MV and I could use some help.
Certainly not MV specific, but every developer can probably find a lot
of useful best practices in Steve McConnell's book "Code Complete".

Amazon links below my sig.

IMO, it's a must read for any developer.

--
Kevin Powick


Tiny URL: http://tinyurl.com/ycjtbo

Full URL:

http://www.amazon.com/Code-Complete-...295161?ie=UTF8



Reply With Quote
  #6  
Old   
Excalibur
 
Posts: n/a

Default Re: Data modeling in MV, revisited - 10-19-2006 , 01:17 AM



Hi
I have snipped all as I agree with all your points Dawn and I don't want to
upset Bruce's bandwidth.
One I would like to add is separate fast moving data from static
particularly where audit trails are required. For example I believe that
recalculating the balance of an account from the transactions as the sole
source of that information is bad accounting practise. However one should
keep the check totals separate from the rest of the debtor control
information or your audit update of a Debtor will become buried in a myriad
of unnecessary transaction changes.

Also separate data that should be encrypted from other data, so keep a bank
information file separate from the rest of the debtor's file.

Allow redundancy. A poster sometime back suggested keeping the full address
data in the Invoice file as well as the key to that data. Not something I
did but with many years history it can be handy for tax investigations etc.

I Look forward to your blog resumption.
Peter McMurray




Reply With Quote
  #7  
Old   
dawn
 
Posts: n/a

Default Re: Data modeling in MV, revisited - 10-19-2006 , 08:28 AM




Steve Alexander wrote:
Quote:
Having designed a couple of hundred MV applications over the past 30+
years, and having given this considerable thought, I am inclined to
agree with the "you just do it" sentiment. The only best practice that
comes to mind is "common sense." I don't have any words of wisdom on
how to teach or acquire that. Most people know it when they see it.
- Steve Alexander
Thanks, Steve. I guess we could put that in the undergrad and graduate
text books for C/S majors and see if it catches on, but...

I'm guessing that over the years you have refined your data modeling
skills based on experience. I would like to try to capture some of
that to help give a head start to the next person. Surely there are
some things you have tried that have not worked as well as other
approaches, right? --dawn

Quote:
On 18 Oct 2006 15:11:42 -0700, "dawn" <dawnwolthuis (AT) gmail (DOT) com> wrote:

I'm hoping to write another blog entry (have been slack for a while)
related to modeling data in MV and I could use some help. I thought I
would state that up front so you know to ignore if you have no interest
in such.

Where traditional relational theory is easy to teach because it has
that nice 1, 2, 3 NF thing going for it so you can teach how to put the
data in first normal form and then go from there, typical MV s/w
developers might be more inclined to say something like "you just do
it" or "model the data the way it makes sense" or "model the data to
align with reality."

My take is that there really are some best practices that could be
taught, even if many of them are not as mathematical as the normal
forms. Perhaps some of the following are agreeable?
--designing a solution without sub-values, just files, attributes
including single and multi-values as well as associated multi-values
--using a single file for all of the relatively small lookup tables
(valid values for marital status, gender, hair color, customer status,
etc) which is frowned upon in the SQL world (we discussed this a while
back on cdp)
--using files for metadata and generating dicts from those, using these
files as the basis for data entry forms/screens/pages & related
constraints as well as user-maintained business rules
--adding some misc user-defined single and multi-valued fields to
various files without any specified intent so users can use them as
needed to help tailor the system for their needs over time (like the
star in gmail)
--creating return links so you can navigate from file to file in both
directions (things like that really fly in the face of relational
theory, but hey ho)
--adding associated multivalues with a status code and date stamp for
any status codes where history should be kept (rather than requiring
data marts and data warehouses for such information) and using this
same pattern for other specific attributes as needed, howbeit
sparingly.
--keeping properties (not well-defined), whether single or
multi-valued, as attributes of an entity (also not well-defined other
than entity = "thing") in a single file definition rather than
splitting them out into separate files
--splitting out attributes into separate files based on functional
dependencies, such as is done in relational normalization, but getting
data into 3rd normal form without the need of 1NF.
--identifying the main entities about which users will want to report
and beefing up those dictionaries (entities) with a lot of virtual
fields (or using 3rd party tools for reporting)
--choosing naming conventions, such as plural names for multivalues,
singular for single-values, and plural for entities (relational
standards are for tables to be singular, but LIST statements read much
better with plural names)

I'm guessing I could come up with a bunch of other ideas for data
modeling best practices with multi-values since the above possibilities
are jumping off my fingertips, but this gives the idea of what I'm
looking for.

So, in answer to the question of what might be some industry best
practices for modeling/designing data for a Pick/MV implementation,
which of the above do you agree with and what would you add?

Thanks. --dawn
P.S. In the resulting blog entry, I'll acknowledge all those who help
build or refine the list unless you request no such acknowledgement.


Reply With Quote
  #8  
Old   
dawn
 
Posts: n/a

Default Re: Data modeling in MV, revisited - 10-19-2006 , 08:32 AM




Kevin Powick wrote:
Quote:
dawn wrote:
I'm hoping to write another blog entry (have been slack for a while)
related to modeling data in MV and I could use some help.

Certainly not MV specific, but every developer can probably find a lot
of useful best practices in Steve McConnell's book "Code Complete".
Definitely a good book. I'll have to look in mine again to see if he
has tips for data modeling that are not based on 1NF. Coupling and
cohesion might be good overall threads to bring up related to data
modeling since that is where ideas like using one validation table
arise.

Quote:
Amazon links below my sig.

IMO, it's a must read for any developer.
Agreed. It has been a few years, so I'll haul it out again. Thanks.
--dawn

Quote:
--
Kevin Powick


Tiny URL: http://tinyurl.com/ycjtbo

Full URL:

http://www.amazon.com/Code-Complete-...295161?ie=UTF8


Reply With Quote
  #9  
Old   
dawn
 
Posts: n/a

Default Re: Data modeling in MV, revisited - 10-19-2006 , 08:48 AM



Excalibur wrote:
Quote:
Hi
I have snipped all as I agree with all your points Dawn and I don't want to
upset Bruce's bandwidth.
One I would like to add is separate fast moving data from static
particularly where audit trails are required. For example I believe that
recalculating the balance of an account from the transactions as the sole
source of that information is bad accounting practise.
Good point, and different from the typical "no redundant data"
preachings.

Quote:
However one should
keep the check totals separate from the rest of the debtor control
information or your audit update of a Debtor will become buried in a myriad
of unnecessary transaction changes.
I'm not sure I'm tracking here. Are you saying that if you have a
value that you will have redundant in that you will store a total as
well as the components of the total (rather than only deriving the
total from the components), then you want to be sure that the amounts
are partitioned from other data so that you do not have to recalc the
total unnecessarily? Maybe an example baby data model to illustrate
what you are suggestion would help, if you have a chance.

Quote:
Also separate data that should be encrypted from other data, so keep a bank
information file separate from the rest of the debtor's file.
Good point. When you say "separate data" I gather you mean that you
want to have attributes in different entities files even if they are
related to the same external (real world) entity.

Quote:
Allow redundancy. A poster sometime back suggested keeping the full address
data in the Invoice file as well as the key to that data. Not something I
did but with many years history it can be handy for tax investigations etc.
Another good point on redundancy. These addresses are not really
redundant in that one is a "point in time" address while the other is
fluid.

Quote:
I Look forward to your blog resumption.
Peter McMurray
Thanks, Peter. --dawn



Reply With Quote
  #10  
Old   
Excalibur
 
Posts: n/a

Default Re: Data modeling in MV, revisited - 10-19-2006 , 05:28 PM



Hi Dawn
<snip>
When I first wrote a Debtors system it was on A3 cards with a mag stripe so
everything Debtor code, name, address, credit limits, balances etc were on
the one item and the transactions were both printed on the card and spun off
to a cassette tape.
This data storage idea tended to stick in my mind when I hit Pick in 1976.
The first to go was the address and price information. This was because
credit is a function of the total debtor but delivery and pricing is a
function of the delivery point. So I wound up with Pricing and Delivery
information in an address file with Name and credit information plus the
current balance in the Debtors Master file. The balance was there for two
or three reasons - one was fast retrieval, the second was to ensure that
credit over all addresses did not exceed the customers limit, and the third
was to ensure system integrity. It gave me a check point to cover
corruption either accidental or deliberate of the transaction file plus a
couple of other issues. I carried this approach over into the Stock file as
well. My initial training and qualifications is as an auditor and I shudder
when I see systems that allow Debtor or Stock deletion and recreation. I
have seen people delete a stock item and recreate it with the balance they
wanted (not on my system) when in fact the stuff was walking out of the door
which was proven when my system was installed. Defalcation or just plain
laziness is not new and costs money. Many a new auditor has been fooled by
accepting an adding machine strip as correct or the old cards for that
matter. The technique is simple enter the details that you want, roll the
strip or card forward, total out, enter the answer you want, roll the strip
back, total out. Bingo! everything balances.
I settled on an open item transaction system backed up by Invoices, Credits
and Receipts files and the balances in the Debtors Control File plus an
Accounting Batch control file that cross checked the lot. It is a benefit
every month when the automatic checking picks up some half completed job and
the chance of defalcation requires not only a deep knowledge of all the
files but access to the already printed audit reports. It has enabled me to
rebuild a system after some one working too late into the night for a week
or so has hit CTRL ALT DEL in a sleepy haze during a massive update and with
no backups for a couple of weeks!
Bank Master information was next to be pulled out into its own file for
encryption.
Then ENRON happened and that wonderful pair in the US passed the most ill
considered legislation in the history of accounting sending accountants into
a massive spin. Designing all sorts of ways to prove that X = Y, of course
they ignored the checks and balances already existing in the system and
demanded separate audit trails. OOPS! after 20 years having the balances in
the same item as the static data becomes an issue because every transaction
causes a new audit trail of the master as well as all the other stuff we
have making the trail quite useless - people will weed a garden but give up
on a forest. Of course we solved it but it would have been much easier with
the dynamic data - balances - separate in its own little area.
Sorry about the length.
Peter McMurray

\\

Quote:
However one should
keep the check totals separate from the rest of the debtor control
information or your audit update of a Debtor will become buried in a
myriad
of unnecessary transaction changes.

I'm not sure I'm tracking here. Are you saying that if you have a
value that you will have redundant in that you will store a total as
well as the components of the total (rather than only deriving the
total from the components), then you want to be sure that the amounts
are partitioned from other data so that you do not have to recalc the
total unnecessarily? Maybe an example baby data model to illustrate
what you are suggestion would help, if you have a chance.

Also separate data that should be encrypted from other data, so keep a
bank
information file separate from the rest of the debtor's file.

Good point. When you say "separate data" I gather you mean that you
want to have attributes in different entities files even if they are
related to the same external (real world) entity.

Allow redundancy. A poster sometime back suggested keeping the full
address
data in the Invoice file as well as the key to that data. Not something
I
did but with many years history it can be handy for tax investigations
etc.

Another good point on redundancy. These addresses are not really
redundant in that one is a "point in time" address while the other is
fluid.

I Look forward to your blog resumption.
Peter McMurray

Thanks, Peter. --dawn




Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.