![]() | |
![]() |
| | Thread Tools | Display Modes |
#41
| ||||
| ||||
|
|
From the last one, I assume no updates between loads, only deletes and inserts. Yes, there are no updates. There are only insertions and deletions, running all the time. Good idea, own implementation of a DBMS with files. One table? Would it still require 900 indexes, if there would be 900 columns. I would just need to create the indexes by my self. - - Items arrive to several groups at the same time and all the time. Even if there would be own table/file per one group, there would be 900 columns per Item still. There is also table partitioning in some RDBMs. Deletion would be very fast, if table (table 2) would be partitioned (or if there would be separate table per each group). A partitioned table is quite the same as having separate tables per each group. Group ID would be needed to be added to the table 2 regarding how partitioning of the table would be best to be implemented. But I am consentrating on the queries and having an index on all of the columns in a good way, at least from performance of insertions and performance of queries points of views. - - One thought is in my mind too... If e.g. Oracle or some other RDBMS has not been able to offer a better solution(or do those offer?), would I be able to implement a better one? I have not implemented indexes etc. by myself, so I probably just do not know enough about this kind of implementation of my "own DBMS". I would need to investigate this issue more. Some practical examples presented in theory would be needed, if someone is able to implement this kind of mechanism / system. - - Reasons to "stick" with an official RDBMS (MySQL, Oracle, Postgre, ...): - Indexes - Array insert (Oracle, others?) - Support to use the DB with threads or by several process at the same time (mutexes) - Compression of indexes (and maybe compression of tables, in some version of some RDBMS) - My own implementation of a "DB system" with flat data files might probably get more easily corrupted. An already existing RDBMS might be more robust. - Need to access the DB with different programming languages from tier-2 (queries) and tier-3 (insertions). - Partitioned tables (in some RDBMSs) - Some other reasons? ( An SQL interface was already mentioned. It might be quite important too. ) All ideas are welcome. I need to go through all of those. Why don't you tell us what problem you were trying to solve I can only tell that Items arrive and values from those are monitored and analyzed. - - What leads you to require so many columns? It is the situation. An Item which arrives has all of those columns. It cannot be affected. Why is an index required on all columns? A value is wanted to be searched from one column (inside one group) at a time. Queries would be too slow, if there would not be an index on those columns. It seems to be the situation. And these queries are the issue which I try to solve. Index would be needed or some other solution(?) to have good performance with such queries. Does each column specify exactly one attribute? Yes. Each column is independent in an Item. All columns still belong to the Item. Do all the attributes describe the same "thing" I am not sure what is exactly being asked. All columns (or attributes) describe the Item. Columns are attributes of an Item. How do you plan on dealing with situations where there is a row, and a column, but no value that ought to be stored in the intersetion of the given row and the given column? I am sorry, you would need to give an example. I was not able to understand exactly what the situation would be. |
|
If there is 900 columns per one Item. Then all of those columns would have a value. |
|
Next Item could have only 100 columns (or attributes) and then only those 100 columns would have a value. |
|
So, there are: - Groups - Items, which belong to groups - attributes which belong to the Items Then each Item has also a type. One Item can have e.g. 900 attributes or 100 attributes depending on the type. |
#42
| ||||
| ||||
|
|
From the last one, I assume no updates between loads, only deletes and inserts. Yes, there are no updates. There are only insertions and deletions, running all the time. Good idea, own implementation of a DBMS with files. One table? Would it still require 900 indexes, if there would be 900 columns. I would just need to create the indexes by my self. - - Items arrive to several groups at the same time and all the time. Even if there would be own table/file per one group, there would be 900 columns per Item still. There is also table partitioning in some RDBMs. Deletion would be very fast, if table (table 2) would be partitioned (or if there would be separate table per each group). A partitioned table is quite the same as having separate tables per each group. Group ID would be needed to be added to the table 2 regarding how partitioning of the table would be best to be implemented. But I am consentrating on the queries and having an index on all of the columns in a good way, at least from performance of insertions and performance of queries points of views. - - One thought is in my mind too... If e.g. Oracle or some other RDBMS has not been able to offer a better solution(or do those offer?), would I be able to implement a better one? I have not implemented indexes etc. by myself, so I probably just do not know enough about this kind of implementation of my "own DBMS". I would need to investigate this issue more. Some practical examples presented in theory would be needed, if someone is able to implement this kind of mechanism / system. - - Reasons to "stick" with an official RDBMS (MySQL, Oracle, Postgre, ...): - Indexes - Array insert (Oracle, others?) - Support to use the DB with threads or by several process at the same time (mutexes) - Compression of indexes (and maybe compression of tables, in some version of some RDBMS) - My own implementation of a "DB system" with flat data files might probably get more easily corrupted. An already existing RDBMS might be more robust. - Need to access the DB with different programming languages from tier-2 (queries) and tier-3 (insertions). - Partitioned tables (in some RDBMSs) - Some other reasons? ( An SQL interface was already mentioned. It might be quite important too. ) All ideas are welcome. I need to go through all of those. Why don't you tell us what problem you were trying to solve I can only tell that Items arrive and values from those are monitored and analyzed. - - What leads you to require so many columns? It is the situation. An Item which arrives has all of those columns. It cannot be affected. Why is an index required on all columns? A value is wanted to be searched from one column (inside one group) at a time. Queries would be too slow, if there would not be an index on those columns. It seems to be the situation. And these queries are the issue which I try to solve. Index would be needed or some other solution(?) to have good performance with such queries. Does each column specify exactly one attribute? Yes. Each column is independent in an Item. All columns still belong to the Item. Do all the attributes describe the same "thing" I am not sure what is exactly being asked. All columns (or attributes) describe the Item. Columns are attributes of an Item. How do you plan on dealing with situations where there is a row, and a column, but no value that ought to be stored in the intersetion of the given row and the given column? I am sorry, you would need to give an example. I was not able to understand exactly what the situation would be. |
|
If there is 900 columns per one Item. Then all of those columns would have a value. |
|
Next Item could have only 100 columns (or attributes) and then only those 100 columns would have a value. |
|
So, there are: - Groups - Items, which belong to groups - attributes which belong to the Items Then each Item has also a type. One Item can have e.g. 900 attributes or 100 attributes depending on the type. |
#43
| ||||
| ||||
|
|
From the last one, I assume no updates between loads, only deletes and inserts. Yes, there are no updates. There are only insertions and deletions, running all the time. Good idea, own implementation of a DBMS with files. One table? Would it still require 900 indexes, if there would be 900 columns. I would just need to create the indexes by my self. - - Items arrive to several groups at the same time and all the time. Even if there would be own table/file per one group, there would be 900 columns per Item still. There is also table partitioning in some RDBMs. Deletion would be very fast, if table (table 2) would be partitioned (or if there would be separate table per each group). A partitioned table is quite the same as having separate tables per each group. Group ID would be needed to be added to the table 2 regarding how partitioning of the table would be best to be implemented. But I am consentrating on the queries and having an index on all of the columns in a good way, at least from performance of insertions and performance of queries points of views. - - One thought is in my mind too... If e.g. Oracle or some other RDBMS has not been able to offer a better solution(or do those offer?), would I be able to implement a better one? I have not implemented indexes etc. by myself, so I probably just do not know enough about this kind of implementation of my "own DBMS". I would need to investigate this issue more. Some practical examples presented in theory would be needed, if someone is able to implement this kind of mechanism / system. - - Reasons to "stick" with an official RDBMS (MySQL, Oracle, Postgre, ...): - Indexes - Array insert (Oracle, others?) - Support to use the DB with threads or by several process at the same time (mutexes) - Compression of indexes (and maybe compression of tables, in some version of some RDBMS) - My own implementation of a "DB system" with flat data files might probably get more easily corrupted. An already existing RDBMS might be more robust. - Need to access the DB with different programming languages from tier-2 (queries) and tier-3 (insertions). - Partitioned tables (in some RDBMSs) - Some other reasons? ( An SQL interface was already mentioned. It might be quite important too. ) All ideas are welcome. I need to go through all of those. Why don't you tell us what problem you were trying to solve I can only tell that Items arrive and values from those are monitored and analyzed. - - What leads you to require so many columns? It is the situation. An Item which arrives has all of those columns. It cannot be affected. Why is an index required on all columns? A value is wanted to be searched from one column (inside one group) at a time. Queries would be too slow, if there would not be an index on those columns. It seems to be the situation. And these queries are the issue which I try to solve. Index would be needed or some other solution(?) to have good performance with such queries. Does each column specify exactly one attribute? Yes. Each column is independent in an Item. All columns still belong to the Item. Do all the attributes describe the same "thing" I am not sure what is exactly being asked. All columns (or attributes) describe the Item. Columns are attributes of an Item. How do you plan on dealing with situations where there is a row, and a column, but no value that ought to be stored in the intersetion of the given row and the given column? I am sorry, you would need to give an example. I was not able to understand exactly what the situation would be. |
|
If there is 900 columns per one Item. Then all of those columns would have a value. |
|
Next Item could have only 100 columns (or attributes) and then only those 100 columns would have a value. |
|
So, there are: - Groups - Items, which belong to groups - attributes which belong to the Items Then each Item has also a type. One Item can have e.g. 900 attributes or 100 attributes depending on the type. |
#44
| ||||||
| ||||||
|
|
How do you plan on dealing with situations where there is a row, and a column, but no value that ought to be stored in the intersetion of the given row and the given column? I am sorry, you would need to give an example. I was not able to understand exactly what the situation would be. I was trying to get an example from you. Let me be very distinct in my question: Are all 900 columns identical? meaning: same data type (the columns are all CHAR, NUMBER, BLOB, other???) same domain of data (the column are all character, representing DNA sequences) same semantics (the columns are all CHAR, representing Colors, which are a coloring of a map) |
|
If there is 900 columns per one Item. Then all of those columns would have a value. The same value? |
|
Next Item could have only 100 columns (or attributes) and then only those 100 columns would have a value. Aha! Now a new clue. If there can be variations, are the attributes really groups of attributes? These groups could be normalized in different tables. |
|
I still have to question the need for being able to query on any column. |
|
Knowing you data and database features helps, but actually running tests is the way to confirm performance meets the goal. |
|
but I really see little progress without knowing what the data looks like. |
#45
| ||||||
| ||||||
|
|
How do you plan on dealing with situations where there is a row, and a column, but no value that ought to be stored in the intersetion of the given row and the given column? I am sorry, you would need to give an example. I was not able to understand exactly what the situation would be. I was trying to get an example from you. Let me be very distinct in my question: Are all 900 columns identical? meaning: same data type (the columns are all CHAR, NUMBER, BLOB, other???) same domain of data (the column are all character, representing DNA sequences) same semantics (the columns are all CHAR, representing Colors, which are a coloring of a map) |
|
If there is 900 columns per one Item. Then all of those columns would have a value. The same value? |
|
Next Item could have only 100 columns (or attributes) and then only those 100 columns would have a value. Aha! Now a new clue. If there can be variations, are the attributes really groups of attributes? These groups could be normalized in different tables. |
|
I still have to question the need for being able to query on any column. |
|
Knowing you data and database features helps, but actually running tests is the way to confirm performance meets the goal. |
|
but I really see little progress without knowing what the data looks like. |
#46
| ||||||
| ||||||
|
|
How do you plan on dealing with situations where there is a row, and a column, but no value that ought to be stored in the intersetion of the given row and the given column? I am sorry, you would need to give an example. I was not able to understand exactly what the situation would be. I was trying to get an example from you. Let me be very distinct in my question: Are all 900 columns identical? meaning: same data type (the columns are all CHAR, NUMBER, BLOB, other???) same domain of data (the column are all character, representing DNA sequences) same semantics (the columns are all CHAR, representing Colors, which are a coloring of a map) |
|
If there is 900 columns per one Item. Then all of those columns would have a value. The same value? |
|
Next Item could have only 100 columns (or attributes) and then only those 100 columns would have a value. Aha! Now a new clue. If there can be variations, are the attributes really groups of attributes? These groups could be normalized in different tables. |
|
I still have to question the need for being able to query on any column. |
|
Knowing you data and database features helps, but actually running tests is the way to confirm performance meets the goal. |
|
but I really see little progress without knowing what the data looks like. |
![]() |
| Thread Tools | |
| Display Modes | |
| |