![]() | |
#1
| |||
| |||
|
#2
| |||
| |||
|
|
Hello: I am trying to learn SQL and RDBMS theories. To acheive this, I decided to assign myself a project. My project is basically, account for my NFS mounts, and keep track of them. I want to see which filesystems are growing, and which filesystems are staying the same. I will get an inventory daily. I will grab the data and place it into a database, using PERL or AWK. [some illustrative data] So my intentention is, to show do stats on these filesystems, and see when it grows and when it stays the same. I will get these stats daily, and hopefully in a month, I can see what is growing, shrinking, and staying the same. Any ideas on the schema? do I really need a RDBMS for this? TIA |
#3
| |||
| |||
|
|
On Dec 25, 12:37 pm, magawake <magaw... (AT) gmail (DOT) com> wrote:> Hello: I am trying to learn SQL and RDBMS theories. To acheive this, I decided to assign myself a project. My project is basically, account for my NFS mounts, and keep track of them. I want to see which filesystems are growing, and which filesystems are staying the same. I will get an inventory daily. I will grab the data and place it into a database, using PERL or AWK. [some illustrative data] So my intentention is, to show do stats on these filesystems, and see when it grows and when it stays the same. I will get these stats daily, and hopefully in a month, I can see what is growing, shrinking, and staying the same. Any ideas on the schema? do I really need a RDBMS for this? TIA A fascinating question I've long pondered. Since I am interested in the general question, I'll take the liberty to restate it, accurately I hope: Since a relational database represents the model state of a known universe, is it possible to ask/answer the question: How has the database changed over time? Snodgrass advocates adding begin/end time attributes, for the data entry window and the data validity window (with representations for minus- and plus infinity). But am I really modeling change-over-time or just adding timeframes to snapshots? (Like using a timed exposure to take a picture with a camera when there's too little light.) Would it not be better to utilize the OP's data capture approach by periodically evaluating a query, saving the query response externally and computing "deltas" between query responses (extensions) over time?? (This is fundamentally the data warehouse approach in which a relational database is used to capture periodic snapshots.) What do other people think about this? Is it better to implement a temporal database or to capture snapshots of a non-temporal database (externally or in a data warehouse) and compute the changes? Rob |
#4
| |||
| |||
|
|
On Dec 25, 12:37 pm, magawake <magaw... (AT) gmail (DOT) com> wrote:> Hello: I am trying to learn SQL and RDBMS theories. To acheive this, I decided to assign myself a project. My project is basically, account for my NFS mounts, and keep track of them. I want to see which filesystems are growing, and which filesystems are staying the same. I will get an inventory daily. I will grab the data and place it into a database, using PERL or AWK. [some illustrative data] So my intentention is, to show do stats on these filesystems, and see when it grows and when it stays the same. I will get these stats daily, and hopefully in a month, I can see what is growing, shrinking, and staying the same. Any ideas on the schema? do I really need a RDBMS for this? TIA A fascinating question I've long pondered. Since I am interested in the general question, I'll take the liberty to restate it, accurately I hope: Since a relational database represents the model state of a known universe, is it possible to ask/answer the question: How has the database changed over time? Snodgrass advocates adding begin/end time attributes, for the data entry window and the data validity window (with representations for minus- and plus infinity). But am I really modeling change-over-time or just adding timeframes to snapshots? (Like using a timed exposure to take a picture with a camera when there's too little light.) Would it not be better to utilize the OP's data capture approach by periodically evaluating a query, saving the query response externally and computing "deltas" between query responses (extensions) over time?? (This is fundamentally the data warehouse approach in which a relational database is used to capture periodic snapshots.) What do other people think about this? Is it better to implement a temporal database or to capture snapshots of a non-temporal database (externally or in a data warehouse) and compute the changes? |
#5
| |||
| |||
|
|
On Dec 25, 12:37 pm, magawake <magaw... (AT) gmail (DOT) com> wrote:> Hello: I am trying to learn SQL and RDBMS theories. To acheive this, I decided to assign myself a project. My project is basically, account for my NFS mounts, and keep track of them. I want to see which filesystems are growing, and which filesystems are staying the same. I will get an inventory daily. I will grab the data and place it into a database, using PERL or AWK. [some illustrative data] So my intentention is, to show do stats on these filesystems, and see when it grows and when it stays the same. I will get these stats daily, and hopefully in a month, I can see what is growing, shrinking, and staying the same. Any ideas on the schema? do I really need a RDBMS for this? TIA A fascinating question I've long pondered. Since I am interested in the general question, I'll take the liberty to restate it, accurately I hope: Since a relational database represents the model state of a known universe, is it possible to ask/answer the question: How has the database changed over time? |
|
Snodgrass advocates adding begin/end time attributes, for the data entry window and the data validity window (with representations for minus- and plus infinity). But am I really modeling change-over-time or just adding timeframes to snapshots? (Like using a timed exposure to take a picture with a camera when there's too little light.) What does databases have to do with pictures? |
|
Would it not be better to utilize the OP's data capture approach by periodically evaluating a query, saving the query response externally and computing "deltas" between query responses (extensions) over time?? (This is fundamentally the data warehouse approach in which a relational database is used to capture periodic snapshots.) What do other people think about this? Is it better to implement a temporal database or to capture snapshots of a non-temporal database (externally or in a data warehouse) and compute the changes? What is a *temporal database*? Rob |
#6
| |||
| |||
|
|
Thanks for the response Rob. I want to model change-over-time. Lets say after 30 days, (I should have 30 different usages but the same x file systems. There is the possibility to have more file systems, for example if *I want to add a share for e books on the 25th day, I will only have 5 usages). Any thoughts, on the schema, now? I don't have that much experience incorporating time into data models. |
#7
| ||||||||
| ||||||||
|
|
Interesting to how this relates to a discussion in another thread... I'll try to respond to your well-thought-out response, though you have |
|
Let S' = S + d denote the application of delta d to state S to yield a new state S'. * Often S+d is much easier to calculate than d = S' - S. *In fact the latter is not always uniquely defined. Agreed. In current systems, d1,....,dn would seem to correspond to a |
|
Consider that there are two separate databases. *The State-DB records the current state S without any regard for history. * The History-DB only records the history as a sequence of deltas [d1,d2,...,dn] but is not directly concerned with calculating the current state. Here I think you are expressing the spirit of the question: Is it OK |
|
There is a concept of applying outstanding deltas to S to bring it up to date: * * S' = S + [dm,...,dn]. In that sense the State-DB is just a read only view and only the History-DB is regarded as the primary source of the data to which updates are applied with transactions. *Furthermore these updates tend to only add extra tuples to relvars rather than removing or modifying existing tuples. *Therefore in theory the History-DB can provide excellent ingestion rates and be optimised for writing new deltas as sequential data to disk. *Note that the information (as deltas) in the History-DB is total ordered. Just one observation: Sets are not ordered, so the total ordering of |
|
IMO bringing the State-DB up to date is best performed asynchronously. *The State-DB can persist a sequence number to ensure it applies each delta in the total order exactly once. *A thread can access the State-DB and by reading the sequence number, easily determine the subset of the History-DB that is associated with the current state of the State-DB. *Furthermore, in theory with pure additive changes to the History-DB it would be possible to allow read only threads to access the History-DB concurrently with its mutative transactions. *This is kind of like MVCC for free. *See * *http://en.wikipedia.org/wiki/Multive...rrency_control I see your point. But of course with "pure additive changes", your |
|
Often deltas can be regarded as events occurring at some point in space and time. * For example, a marriage event causes the state to change from unmarried to married. *A birth event causes a person to be added to a family tree database. Clearly such events are to be recorded but shouldn't require subsequent modification. I tend to think of a History-DB as an *events* database, and often represents the proper way to record the primary source of information (as distinct from the current state which is going to depend a lot more on one's conceptual model and the application). *I would go as far as saying that the events DB should be associated with the base relvars, and the DBMS should support the asynchronous calculation of the current state as a read only view. Again, a little more complex than I can follow. If the current state |
|
I imagine that only an events DB has some realistic chance of making the world's data look like a single information source - because it has less application bias and completely avoids the quandaries of distributed transactions. "making the the world's data look like a single information source" is |
|
For certain applications I anticipate that an OO physical representation could be desirable for caching specialised uses (navigations) of the data in a derived State-DB. *However there is little doubt that the RM is best for general use. Note that a History-DB can easily and efficiently make its deltas available to any number of asynchronous State-DBs without blocking its mutative threads (and upsetting the History-DB ingestion rate). When scaling to distributed data sources (ie a State-DB works against multiple History-DBs), vector times are relevant. See * *http://en.wikipedia.org/wiki/Vector_clocks- Hide quoted text - - Show quoted text - |
#8
| |||||||
| |||||||
|
|
On Dec 25, 9:10 pm, David BL <davi... (AT) iinet (DOT) net.au> wrote:> Interesting to how this relates to a discussion in another thread... I'll try to respond to your well-thought-out response, though you have gone way beyond my question. Let S' = S + d denote the application of delta d to state S to yield a new state S'. Often S+d is much easier to calculate than d = S' - S. In fact the latter is not always uniquely defined. Agreed. In current systems, d1,....,dn would seem to correspond to a change log. The log can be used to roll forward or roll back. At issue is the form of d. If each d is a before image and and after image, then d is uniquely defined, though S'-S is not. |
|
Consider that there are two separate databases. The State-DB records the current state S without any regard for history. The History-DB only records the history as a sequence of deltas [d1,d2,...,dn] but is not directly concerned with calculating the current state. Here I think you are expressing the spirit of the question: Is it OK to store the deltas in a database model? Just to be clear, let's assume that the initial database state S0 (S zero) is the empty database. As the OP suggests, I could maintain a sequence of database states S0, S1, ..., Sn in which case I could (imperfectly) compute the deltas. Or, I can maintain a sequence of deltas and perfectly compute any state. You suggest that I could have two sequences, one of deltas and one (of lower periodicity) for states. I think that's what your next paragraph says: |
|
There is a concept of applying outstanding deltas to S to bring it up to date: S' = S + [dm,...,dn]. In that sense the State-DB is just a read only view and only the History-DB is regarded as the primary source of the data to which updates are applied with transactions. Furthermore these updates tend to only add extra tuples to relvars rather than removing or modifying existing tuples. Therefore in theory the History-DB can provide excellent ingestion rates and be optimised for writing new deltas as sequential data to disk. Note that the information (as deltas) in the History-DB is total ordered. Just one observation: Sets are not ordered, so the total ordering of the History-DB is up to the app, not the RDBMS. Your point about better "ingestion rates" intuitively makes sense, but can the database designer cause the RDBMS to write tuples sequentially to disk? This kind of optimization would likely be implemented at the system level, so in a way, you are suggesting an architectural meta meta model far different from the SQL meta meta model, at least as I understand it. |
|
IMO bringing the State-DB up to date is best performed asynchronously. The State-DB can persist a sequence number to ensure it applies each delta in the total order exactly once. A thread can access the State-DB and by reading the sequence number, easily determine the subset of the History-DB that is associated with the current state of the State-DB. Furthermore, in theory with pure additive changes to the History-DB it would be possible to allow read only threads to access the History-DB concurrently with its mutative transactions. This is kind of like MVCC for free. See http://en.wikipedia.org/wiki/Multive...rrency_control I see your point. But of course with "pure additive changes", your above assertion (d = S' - S is not uniquely defined) no longer holds. |
|
The wikipedia discussion around MVCC gets into more "how" than is necessary here: The OP doesn't suggest multiple updaters, just a database-as-recording-device from which he could compute deltas. Personally, I need to understand the single user variant before wading into the more complex, multiuser case. |
|
Often deltas can be regarded as events occurring at some point in space and time. For example, a marriage event causes the state to change from unmarried to married. A birth event causes a person to be added to a family tree database. Clearly such events are to be recorded but shouldn't require subsequent modification. I tend to think of a History-DB as an *events* database, and often represents the proper way to record the primary source of information (as distinct from the current state which is going to depend a lot more on one's conceptual model and the application). I would go as far as saying that the events DB should be associated with the base relvars, and the DBMS should support the asynchronous calculation of the current state as a read only view. Again, a little more complex than I can follow. If the current state is a read only view, then either it is never brought up to date, or, the database is periodically shut down and all the events since the last current state are applied, replacing the old current state with the new current state. I assume there is some read only view of the events that answers the question "what has changed between the old and new current states?" (That's the question the OP wants to answer.) |
|
I imagine that only an events DB has some realistic chance of making the world's data look like a single information source - because it has less application bias and completely avoids the quandaries of distributed transactions. "making the the world's data look like a single information source" is beyond my ken as is the rest of your reply. (I'm just a humble toolmaker.) I do thank you for taking the time to reply. |
#9
| |||
| |||
|
![]() |
| Thread Tools | |
| Display Modes | |
| |