dbTalk Databases Forums  

Hundreds of columns, index on all of those.

comp.databases comp.databases


Discuss Hundreds of columns, index on all of those. in the comp.databases forum.



Reply
 
Thread Tools Display Modes
  #41  
Old   
Ed Prochak
 
Posts: n/a

Default Re: Hundreds of columns, index on all of those. - 05-08-2008 , 07:29 AM






On May 8, 12:23 am, rock_san... (AT) yahoo (DOT) com wrote:
Quote:
From the last one, I assume no updates between loads, only deletes and
inserts.

Yes, there are no updates. There are only insertions and deletions,
running all the time.

Good idea, own implementation of a DBMS with files.

One table?
Would it still require 900 indexes, if there would be 900 columns. I
would just need to create the indexes by my self.

- -
Items arrive to several groups at the same time and all the time.

Even if there would be own table/file per one group, there would be
900 columns per Item still.

There is also table partitioning in some RDBMs. Deletion would be very
fast, if table (table 2) would be partitioned (or if there would be
separate table per each group).
A partitioned table is quite the same as having separate tables per
each group.
Group ID would be needed to be added to the table 2 regarding how
partitioning of the table would be best to be implemented.

But I am consentrating on the queries and having an index on all of
the columns in a good way, at least from performance of insertions and
performance of queries points of views.
- -

One thought is in my mind too...

If e.g. Oracle or some other RDBMS has not been able to offer a better
solution(or do those offer?), would I be able to implement a better
one?

I have not implemented indexes etc. by myself, so I probably just do
not know enough about this kind of implementation of my "own DBMS". I
would need to investigate this issue more.

Some practical examples presented in theory would be needed, if
someone is able to implement this kind of mechanism / system.

- -

Reasons to "stick" with an official RDBMS (MySQL, Oracle,
Postgre, ...):
- Indexes
- Array insert (Oracle, others?)
- Support to use the DB with threads or by several process at the same
time (mutexes)
- Compression of indexes (and maybe compression of tables, in some
version of some RDBMS)
- My own implementation of a "DB system" with flat data files might
probably get more easily corrupted.
An already existing RDBMS might be more robust.
- Need to access the DB with different programming languages from
tier-2 (queries) and tier-3 (insertions).
- Partitioned tables (in some RDBMSs)
- Some other reasons? ( An SQL interface was already mentioned. It
might be quite important too. )

All ideas are welcome.
I need to go through all of those.

Why don't you tell us what problem you were trying to solve

I can only tell that Items arrive and values from those are
monitored and analyzed.

- -

What leads you to require so many columns?

It is the situation. An Item which arrives has all of those columns.
It cannot be affected.

Why is an index required on all

columns?

A value is wanted to be searched from one column (inside one group) at
a time. Queries would be too slow, if there would not be an index on
those columns.
It seems to be the situation. And these queries are the issue which I
try to solve. Index would be needed or some other solution(?) to have
good performance with such queries.

Does each column specify exactly one attribute?

Yes.
Each column is independent in an Item.
All columns still belong to the Item.

Do all the attributes

describe the same "thing"

I am not sure what is exactly being asked.
All columns (or attributes) describe the Item.
Columns are attributes of an Item.

How do you plan on dealing with situations where there is a row,
and a
column, but no value that ought to be stored in
the intersetion of the
given row and the given column?

I am sorry, you would need to give an example.
I was not able to understand exactly what the situation would be.
I was trying to get an example from you. Let me be very distinct in my
question:
Are all 900 columns identical? meaning:
same data type
(the columns are all CHAR, NUMBER, BLOB, other???)
same domain of data
(the column are all character, representing DNA sequences)
same semantics
(the columns are all CHAR, representing Colors, which are a coloring
of a map)

We are trying to understand YOUR situation.
Quote:
If there is 900 columns per one Item. Then all of those columns would
have a value.
The same value?

Quote:
Next Item could have only 100 columns (or attributes) and then only
those 100 columns would have a value.
Aha! Now a new clue. If there can be variations, are the attributes
really groups of attributes? These groups could be normalized in
different tables.
Quote:
So, there are:
- Groups
- Items, which belong to groups
- attributes which belong to the Items
Then each Item has also a type. One Item can have e.g. 900 attributes
or 100 attributes depending on the type.
Aha
If they are different types, your table is really denormalized. You
could split them on load into tables according to their type.

I still have to question the need for being able to query on any
column. Basically what you are saying is any column could be to only
one in the condition (WHERE) clause of a query AND that there is
enough data in the table to make a full table scan too slow. I still
am having trouble believing that.

I think you should consider normalizing the data.then even if the
query requirement is true, the number of rows in the 900 column table
will be reduced possibly to the point where the 900 indices are not
absolutely necessary.

This discussion in the abstract really is less productive than
necessary. Please consider giving some examples of your real problem.
Obscure the real problem domain if you can (if this is proprietary or
otherwise secret), but I really see little progress without knowing
what the data looks like.

But if I must, I'll give one final general comment about this:

Your problem is finding the level of optimization needed (do you need
indices or not) to achieve a given performance level. Generally, the
way to find out performance improvements is to test. Knowing you data
and database features helps, but actually running tests is the way to
confirm performance meets the goal.

Ed


Reply With Quote
  #42  
Old   
Ed Prochak
 
Posts: n/a

Default Re: Hundreds of columns, index on all of those. - 05-08-2008 , 07:29 AM






On May 8, 12:23 am, rock_san... (AT) yahoo (DOT) com wrote:
Quote:
From the last one, I assume no updates between loads, only deletes and
inserts.

Yes, there are no updates. There are only insertions and deletions,
running all the time.

Good idea, own implementation of a DBMS with files.

One table?
Would it still require 900 indexes, if there would be 900 columns. I
would just need to create the indexes by my self.

- -
Items arrive to several groups at the same time and all the time.

Even if there would be own table/file per one group, there would be
900 columns per Item still.

There is also table partitioning in some RDBMs. Deletion would be very
fast, if table (table 2) would be partitioned (or if there would be
separate table per each group).
A partitioned table is quite the same as having separate tables per
each group.
Group ID would be needed to be added to the table 2 regarding how
partitioning of the table would be best to be implemented.

But I am consentrating on the queries and having an index on all of
the columns in a good way, at least from performance of insertions and
performance of queries points of views.
- -

One thought is in my mind too...

If e.g. Oracle or some other RDBMS has not been able to offer a better
solution(or do those offer?), would I be able to implement a better
one?

I have not implemented indexes etc. by myself, so I probably just do
not know enough about this kind of implementation of my "own DBMS". I
would need to investigate this issue more.

Some practical examples presented in theory would be needed, if
someone is able to implement this kind of mechanism / system.

- -

Reasons to "stick" with an official RDBMS (MySQL, Oracle,
Postgre, ...):
- Indexes
- Array insert (Oracle, others?)
- Support to use the DB with threads or by several process at the same
time (mutexes)
- Compression of indexes (and maybe compression of tables, in some
version of some RDBMS)
- My own implementation of a "DB system" with flat data files might
probably get more easily corrupted.
An already existing RDBMS might be more robust.
- Need to access the DB with different programming languages from
tier-2 (queries) and tier-3 (insertions).
- Partitioned tables (in some RDBMSs)
- Some other reasons? ( An SQL interface was already mentioned. It
might be quite important too. )

All ideas are welcome.
I need to go through all of those.

Why don't you tell us what problem you were trying to solve

I can only tell that Items arrive and values from those are
monitored and analyzed.

- -

What leads you to require so many columns?

It is the situation. An Item which arrives has all of those columns.
It cannot be affected.

Why is an index required on all

columns?

A value is wanted to be searched from one column (inside one group) at
a time. Queries would be too slow, if there would not be an index on
those columns.
It seems to be the situation. And these queries are the issue which I
try to solve. Index would be needed or some other solution(?) to have
good performance with such queries.

Does each column specify exactly one attribute?

Yes.
Each column is independent in an Item.
All columns still belong to the Item.

Do all the attributes

describe the same "thing"

I am not sure what is exactly being asked.
All columns (or attributes) describe the Item.
Columns are attributes of an Item.

How do you plan on dealing with situations where there is a row,
and a
column, but no value that ought to be stored in
the intersetion of the
given row and the given column?

I am sorry, you would need to give an example.
I was not able to understand exactly what the situation would be.
I was trying to get an example from you. Let me be very distinct in my
question:
Are all 900 columns identical? meaning:
same data type
(the columns are all CHAR, NUMBER, BLOB, other???)
same domain of data
(the column are all character, representing DNA sequences)
same semantics
(the columns are all CHAR, representing Colors, which are a coloring
of a map)

We are trying to understand YOUR situation.
Quote:
If there is 900 columns per one Item. Then all of those columns would
have a value.
The same value?

Quote:
Next Item could have only 100 columns (or attributes) and then only
those 100 columns would have a value.
Aha! Now a new clue. If there can be variations, are the attributes
really groups of attributes? These groups could be normalized in
different tables.
Quote:
So, there are:
- Groups
- Items, which belong to groups
- attributes which belong to the Items
Then each Item has also a type. One Item can have e.g. 900 attributes
or 100 attributes depending on the type.
Aha
If they are different types, your table is really denormalized. You
could split them on load into tables according to their type.

I still have to question the need for being able to query on any
column. Basically what you are saying is any column could be to only
one in the condition (WHERE) clause of a query AND that there is
enough data in the table to make a full table scan too slow. I still
am having trouble believing that.

I think you should consider normalizing the data.then even if the
query requirement is true, the number of rows in the 900 column table
will be reduced possibly to the point where the 900 indices are not
absolutely necessary.

This discussion in the abstract really is less productive than
necessary. Please consider giving some examples of your real problem.
Obscure the real problem domain if you can (if this is proprietary or
otherwise secret), but I really see little progress without knowing
what the data looks like.

But if I must, I'll give one final general comment about this:

Your problem is finding the level of optimization needed (do you need
indices or not) to achieve a given performance level. Generally, the
way to find out performance improvements is to test. Knowing you data
and database features helps, but actually running tests is the way to
confirm performance meets the goal.

Ed


Reply With Quote
  #43  
Old   
Ed Prochak
 
Posts: n/a

Default Re: Hundreds of columns, index on all of those. - 05-08-2008 , 07:29 AM



On May 8, 12:23 am, rock_san... (AT) yahoo (DOT) com wrote:
Quote:
From the last one, I assume no updates between loads, only deletes and
inserts.

Yes, there are no updates. There are only insertions and deletions,
running all the time.

Good idea, own implementation of a DBMS with files.

One table?
Would it still require 900 indexes, if there would be 900 columns. I
would just need to create the indexes by my self.

- -
Items arrive to several groups at the same time and all the time.

Even if there would be own table/file per one group, there would be
900 columns per Item still.

There is also table partitioning in some RDBMs. Deletion would be very
fast, if table (table 2) would be partitioned (or if there would be
separate table per each group).
A partitioned table is quite the same as having separate tables per
each group.
Group ID would be needed to be added to the table 2 regarding how
partitioning of the table would be best to be implemented.

But I am consentrating on the queries and having an index on all of
the columns in a good way, at least from performance of insertions and
performance of queries points of views.
- -

One thought is in my mind too...

If e.g. Oracle or some other RDBMS has not been able to offer a better
solution(or do those offer?), would I be able to implement a better
one?

I have not implemented indexes etc. by myself, so I probably just do
not know enough about this kind of implementation of my "own DBMS". I
would need to investigate this issue more.

Some practical examples presented in theory would be needed, if
someone is able to implement this kind of mechanism / system.

- -

Reasons to "stick" with an official RDBMS (MySQL, Oracle,
Postgre, ...):
- Indexes
- Array insert (Oracle, others?)
- Support to use the DB with threads or by several process at the same
time (mutexes)
- Compression of indexes (and maybe compression of tables, in some
version of some RDBMS)
- My own implementation of a "DB system" with flat data files might
probably get more easily corrupted.
An already existing RDBMS might be more robust.
- Need to access the DB with different programming languages from
tier-2 (queries) and tier-3 (insertions).
- Partitioned tables (in some RDBMSs)
- Some other reasons? ( An SQL interface was already mentioned. It
might be quite important too. )

All ideas are welcome.
I need to go through all of those.

Why don't you tell us what problem you were trying to solve

I can only tell that Items arrive and values from those are
monitored and analyzed.

- -

What leads you to require so many columns?

It is the situation. An Item which arrives has all of those columns.
It cannot be affected.

Why is an index required on all

columns?

A value is wanted to be searched from one column (inside one group) at
a time. Queries would be too slow, if there would not be an index on
those columns.
It seems to be the situation. And these queries are the issue which I
try to solve. Index would be needed or some other solution(?) to have
good performance with such queries.

Does each column specify exactly one attribute?

Yes.
Each column is independent in an Item.
All columns still belong to the Item.

Do all the attributes

describe the same "thing"

I am not sure what is exactly being asked.
All columns (or attributes) describe the Item.
Columns are attributes of an Item.

How do you plan on dealing with situations where there is a row,
and a
column, but no value that ought to be stored in
the intersetion of the
given row and the given column?

I am sorry, you would need to give an example.
I was not able to understand exactly what the situation would be.
I was trying to get an example from you. Let me be very distinct in my
question:
Are all 900 columns identical? meaning:
same data type
(the columns are all CHAR, NUMBER, BLOB, other???)
same domain of data
(the column are all character, representing DNA sequences)
same semantics
(the columns are all CHAR, representing Colors, which are a coloring
of a map)

We are trying to understand YOUR situation.
Quote:
If there is 900 columns per one Item. Then all of those columns would
have a value.
The same value?

Quote:
Next Item could have only 100 columns (or attributes) and then only
those 100 columns would have a value.
Aha! Now a new clue. If there can be variations, are the attributes
really groups of attributes? These groups could be normalized in
different tables.
Quote:
So, there are:
- Groups
- Items, which belong to groups
- attributes which belong to the Items
Then each Item has also a type. One Item can have e.g. 900 attributes
or 100 attributes depending on the type.
Aha
If they are different types, your table is really denormalized. You
could split them on load into tables according to their type.

I still have to question the need for being able to query on any
column. Basically what you are saying is any column could be to only
one in the condition (WHERE) clause of a query AND that there is
enough data in the table to make a full table scan too slow. I still
am having trouble believing that.

I think you should consider normalizing the data.then even if the
query requirement is true, the number of rows in the 900 column table
will be reduced possibly to the point where the 900 indices are not
absolutely necessary.

This discussion in the abstract really is less productive than
necessary. Please consider giving some examples of your real problem.
Obscure the real problem domain if you can (if this is proprietary or
otherwise secret), but I really see little progress without knowing
what the data looks like.

But if I must, I'll give one final general comment about this:

Your problem is finding the level of optimization needed (do you need
indices or not) to achieve a given performance level. Generally, the
way to find out performance improvements is to test. Knowing you data
and database features helps, but actually running tests is the way to
confirm performance meets the goal.

Ed


Reply With Quote
  #44  
Old   
rock_sand81@yahoo.com
 
Posts: n/a

Default Re: Hundreds of columns, index on all of those. - 05-09-2008 , 06:04 AM



Quote:
How do you plan on dealing with situations where there is a row,
and a
column, but no value that ought to be stored in
the intersetion of the
given row and the given column?

I am sorry, you would need to give an example.
I was not able to understand exactly what the situation would be.

I was trying to get an example from you. Let me be very distinct in my
question:
Are all 900 columns identical? meaning:
same data type
(the columns are all CHAR, NUMBER, BLOB, other???)
same domain of data
(the column are all character, representing DNA sequences)
same semantics
(the columns are all CHAR, representing Colors, which are a coloring
of a map)


All 900 columns will be VARCHAR types.
Even though also numbers are stored. -> Numbers are also stored as
VARCHARs.


Quote:
If there is 900 columns per one Item. Then all of those columns would
have a value.

The same value?

There are different kinds of values. All are stored as VARCHARs.


Quote:
Next Item could have only 100 columns (or attributes) and then only
those 100 columns would have a value.

Aha! Now a new clue. If there can be variations, are the attributes
really groups of attributes? These groups could be normalized in
different tables.

Each item can have e.g. 900 columns (or attributes) or 100 columns.
Group of 900 attributes form one type of an Item, or group of 100
attributes form an other type of Item. I call these groups of
attributes as types of Items.
*But*, there are many of those types of Items. Maybe even tens. So I
am not sure would it be nice to make a separate table for each of the
types of Items. (If that would be suggested.)
New types of Items can be formed and then a new table would be needed
to be created for those too.
Scalability would not be so good, if there would be separate tables
for all types of Items.
And the performance problem (related to the query) might still exist
with those tables too.

But yes, it would be an alternative still to have separate tables for
all types of Items.
--

Quote:
I still have to question the need for being able to query on any
column.
A query to fetched particular value (or values) from a column would
take tens of seconds or minute (or something like that) with one
RDBMS, if index is not be used on the column (on all of the 900
columns).
That would be too slow for my application.
That might be the requirement for the performance of the query.
So it looks like the index would be needed, at least with table
structure 3, which I tried. The query took too long. The query would
be too slow with table structure alternative 1 too (one row and e.g.
900 columns).


I mean this kind of a query:
First fetch a set of Item IDs.
Then fetch some value from one column from those Item IDs, which were
found.
( Tested with alternative 3 for table structures. All columns on
separate rows. )

Even if all types of Items would have own tables, the query might be
too slow and an index might be needed on all of the columns still. And
I would like to avoid tables with e.g. 900 columns, even if indexes
would not be needed on the columns.



Quote:
Knowing you data
and database features helps, but actually running tests is the way to
confirm performance meets the goal.

Yes, it seems I just need to test.
-

Quote:
but I really see little progress without knowing
what the data looks like.
Yes, this seems to be all I can write.


-------------------------------------------------

I have tried to describe as much about the data as I can.

SUMMARY

I will write summary once still:

Functionality: monitor and analyze Items.
New type of query needed: fetch set of Item IDs, which has value x on
column y and which belongs to group z and the type of the Item ID
should be w.

There are:
- Groups. Different (types of) Items can belong to one group.
- Items. These belong to Groups.
- Attributes which belong to the Items.
Each Item has also a type. One Item can have e.g. 900 attributes
or 100 attributes depending on the type. There are tens of types of
Items, so there can be Items, which have an amount of 200, 300,
400, ..., etc. attributes.

Attributes of the Items are stored as VARCHARs.

Requirement: I need to fetch some particular value or values from one
of 900 columns at a time. The query must be done rather in seconds
(not in tens of seconds or in minutes, or hours). Values are fetched
only inside a group and inside the type of Items.

So query is made on these Item IDs
(table structure alternative 3, all columns on separate rows):

select Item ID from table1 where group ID = ? and Item type = ?;

And only those Item IDs are chosen, which have value x on column y:

select Item ID from table2 where column_ID = ? and value = ?;

Requirement: Performance of insertions need to be "good enough".

Size of the DB is maybe not so crucial, but it would be quite big with
some table structures.


-----------

Table structure alternative #3 once more:

Table 1:
Columns: Item ID, Item name, group ID
Index 1.1 on (Item ID)
Index 1.2 on (group ID) (or something like this)

Table 2:
Columns: Item ID, column ID, value
Index 2.1 on (Item ID)
Index 2.2 on (value, column ID).


Thank you!
All comments are welcome still of course!





Reply With Quote
  #45  
Old   
rock_sand81@yahoo.com
 
Posts: n/a

Default Re: Hundreds of columns, index on all of those. - 05-09-2008 , 06:04 AM



Quote:
How do you plan on dealing with situations where there is a row,
and a
column, but no value that ought to be stored in
the intersetion of the
given row and the given column?

I am sorry, you would need to give an example.
I was not able to understand exactly what the situation would be.

I was trying to get an example from you. Let me be very distinct in my
question:
Are all 900 columns identical? meaning:
same data type
(the columns are all CHAR, NUMBER, BLOB, other???)
same domain of data
(the column are all character, representing DNA sequences)
same semantics
(the columns are all CHAR, representing Colors, which are a coloring
of a map)


All 900 columns will be VARCHAR types.
Even though also numbers are stored. -> Numbers are also stored as
VARCHARs.


Quote:
If there is 900 columns per one Item. Then all of those columns would
have a value.

The same value?

There are different kinds of values. All are stored as VARCHARs.


Quote:
Next Item could have only 100 columns (or attributes) and then only
those 100 columns would have a value.

Aha! Now a new clue. If there can be variations, are the attributes
really groups of attributes? These groups could be normalized in
different tables.

Each item can have e.g. 900 columns (or attributes) or 100 columns.
Group of 900 attributes form one type of an Item, or group of 100
attributes form an other type of Item. I call these groups of
attributes as types of Items.
*But*, there are many of those types of Items. Maybe even tens. So I
am not sure would it be nice to make a separate table for each of the
types of Items. (If that would be suggested.)
New types of Items can be formed and then a new table would be needed
to be created for those too.
Scalability would not be so good, if there would be separate tables
for all types of Items.
And the performance problem (related to the query) might still exist
with those tables too.

But yes, it would be an alternative still to have separate tables for
all types of Items.
--

Quote:
I still have to question the need for being able to query on any
column.
A query to fetched particular value (or values) from a column would
take tens of seconds or minute (or something like that) with one
RDBMS, if index is not be used on the column (on all of the 900
columns).
That would be too slow for my application.
That might be the requirement for the performance of the query.
So it looks like the index would be needed, at least with table
structure 3, which I tried. The query took too long. The query would
be too slow with table structure alternative 1 too (one row and e.g.
900 columns).


I mean this kind of a query:
First fetch a set of Item IDs.
Then fetch some value from one column from those Item IDs, which were
found.
( Tested with alternative 3 for table structures. All columns on
separate rows. )

Even if all types of Items would have own tables, the query might be
too slow and an index might be needed on all of the columns still. And
I would like to avoid tables with e.g. 900 columns, even if indexes
would not be needed on the columns.



Quote:
Knowing you data
and database features helps, but actually running tests is the way to
confirm performance meets the goal.

Yes, it seems I just need to test.
-

Quote:
but I really see little progress without knowing
what the data looks like.
Yes, this seems to be all I can write.


-------------------------------------------------

I have tried to describe as much about the data as I can.

SUMMARY

I will write summary once still:

Functionality: monitor and analyze Items.
New type of query needed: fetch set of Item IDs, which has value x on
column y and which belongs to group z and the type of the Item ID
should be w.

There are:
- Groups. Different (types of) Items can belong to one group.
- Items. These belong to Groups.
- Attributes which belong to the Items.
Each Item has also a type. One Item can have e.g. 900 attributes
or 100 attributes depending on the type. There are tens of types of
Items, so there can be Items, which have an amount of 200, 300,
400, ..., etc. attributes.

Attributes of the Items are stored as VARCHARs.

Requirement: I need to fetch some particular value or values from one
of 900 columns at a time. The query must be done rather in seconds
(not in tens of seconds or in minutes, or hours). Values are fetched
only inside a group and inside the type of Items.

So query is made on these Item IDs
(table structure alternative 3, all columns on separate rows):

select Item ID from table1 where group ID = ? and Item type = ?;

And only those Item IDs are chosen, which have value x on column y:

select Item ID from table2 where column_ID = ? and value = ?;

Requirement: Performance of insertions need to be "good enough".

Size of the DB is maybe not so crucial, but it would be quite big with
some table structures.


-----------

Table structure alternative #3 once more:

Table 1:
Columns: Item ID, Item name, group ID
Index 1.1 on (Item ID)
Index 1.2 on (group ID) (or something like this)

Table 2:
Columns: Item ID, column ID, value
Index 2.1 on (Item ID)
Index 2.2 on (value, column ID).


Thank you!
All comments are welcome still of course!





Reply With Quote
  #46  
Old   
rock_sand81@yahoo.com
 
Posts: n/a

Default Re: Hundreds of columns, index on all of those. - 05-09-2008 , 06:04 AM



Quote:
How do you plan on dealing with situations where there is a row,
and a
column, but no value that ought to be stored in
the intersetion of the
given row and the given column?

I am sorry, you would need to give an example.
I was not able to understand exactly what the situation would be.

I was trying to get an example from you. Let me be very distinct in my
question:
Are all 900 columns identical? meaning:
same data type
(the columns are all CHAR, NUMBER, BLOB, other???)
same domain of data
(the column are all character, representing DNA sequences)
same semantics
(the columns are all CHAR, representing Colors, which are a coloring
of a map)


All 900 columns will be VARCHAR types.
Even though also numbers are stored. -> Numbers are also stored as
VARCHARs.


Quote:
If there is 900 columns per one Item. Then all of those columns would
have a value.

The same value?

There are different kinds of values. All are stored as VARCHARs.


Quote:
Next Item could have only 100 columns (or attributes) and then only
those 100 columns would have a value.

Aha! Now a new clue. If there can be variations, are the attributes
really groups of attributes? These groups could be normalized in
different tables.

Each item can have e.g. 900 columns (or attributes) or 100 columns.
Group of 900 attributes form one type of an Item, or group of 100
attributes form an other type of Item. I call these groups of
attributes as types of Items.
*But*, there are many of those types of Items. Maybe even tens. So I
am not sure would it be nice to make a separate table for each of the
types of Items. (If that would be suggested.)
New types of Items can be formed and then a new table would be needed
to be created for those too.
Scalability would not be so good, if there would be separate tables
for all types of Items.
And the performance problem (related to the query) might still exist
with those tables too.

But yes, it would be an alternative still to have separate tables for
all types of Items.
--

Quote:
I still have to question the need for being able to query on any
column.
A query to fetched particular value (or values) from a column would
take tens of seconds or minute (or something like that) with one
RDBMS, if index is not be used on the column (on all of the 900
columns).
That would be too slow for my application.
That might be the requirement for the performance of the query.
So it looks like the index would be needed, at least with table
structure 3, which I tried. The query took too long. The query would
be too slow with table structure alternative 1 too (one row and e.g.
900 columns).


I mean this kind of a query:
First fetch a set of Item IDs.
Then fetch some value from one column from those Item IDs, which were
found.
( Tested with alternative 3 for table structures. All columns on
separate rows. )

Even if all types of Items would have own tables, the query might be
too slow and an index might be needed on all of the columns still. And
I would like to avoid tables with e.g. 900 columns, even if indexes
would not be needed on the columns.



Quote:
Knowing you data
and database features helps, but actually running tests is the way to
confirm performance meets the goal.

Yes, it seems I just need to test.
-

Quote:
but I really see little progress without knowing
what the data looks like.
Yes, this seems to be all I can write.


-------------------------------------------------

I have tried to describe as much about the data as I can.

SUMMARY

I will write summary once still:

Functionality: monitor and analyze Items.
New type of query needed: fetch set of Item IDs, which has value x on
column y and which belongs to group z and the type of the Item ID
should be w.

There are:
- Groups. Different (types of) Items can belong to one group.
- Items. These belong to Groups.
- Attributes which belong to the Items.
Each Item has also a type. One Item can have e.g. 900 attributes
or 100 attributes depending on the type. There are tens of types of
Items, so there can be Items, which have an amount of 200, 300,
400, ..., etc. attributes.

Attributes of the Items are stored as VARCHARs.

Requirement: I need to fetch some particular value or values from one
of 900 columns at a time. The query must be done rather in seconds
(not in tens of seconds or in minutes, or hours). Values are fetched
only inside a group and inside the type of Items.

So query is made on these Item IDs
(table structure alternative 3, all columns on separate rows):

select Item ID from table1 where group ID = ? and Item type = ?;

And only those Item IDs are chosen, which have value x on column y:

select Item ID from table2 where column_ID = ? and value = ?;

Requirement: Performance of insertions need to be "good enough".

Size of the DB is maybe not so crucial, but it would be quite big with
some table structures.


-----------

Table structure alternative #3 once more:

Table 1:
Columns: Item ID, Item name, group ID
Index 1.1 on (Item ID)
Index 1.2 on (group ID) (or something like this)

Table 2:
Columns: Item ID, column ID, value
Index 2.1 on (Item ID)
Index 2.2 on (value, column ID).


Thank you!
All comments are welcome still of course!





Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.