![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
#2
| |||
| |||
|
|
I have an entity that along with all of the usual attributes, has a set of 140 bit flag attributes that I need to track all yes/no values of. I started going down the path of creating a table 150 columns wide, but after typing a few of them thought there may be a better way. :-) I am thinking of creating a table like this: Create table details ( DetailsID int identity(1000,1), Name nvarchar(40), Descr nvarchar(60), Category nvarchar(40) ) Thus I can take what would have been my column headers and use them in Name field, throw what would have been the bit flag value and throw that into the description field. I would tie them together with a category value to combine them into logical groups. I have around 1.2 million entities to store this info for, so this new table will get long fast. Another passing thought was to store these sets of values as XML for each entity. I have to admit I dont know what the advantage is for that other than the row count savings. how easy would it be to search through a 140 field xml blob for 1 million entites? Any advice is appreciated on how to tackle this. |
#3
| |||
| |||
|
|
carl.henth... (AT) gmail (DOT) com wrote: I have an entity that along with all of the usual attributes, has a set of 140 bit flag attributes that I need to track all yes/no values of. I started going down the path of creating a table 150 columns wide, but after typing a few of them thought there may be a better way. :-) I am thinking of creating a table like this: Create table details ( * DetailsID int identity(1000,1), * Name nvarchar(40), * Descr nvarchar(60), * Category nvarchar(40) ) Thus I can take what would have been my column headers and use them in Name field, throw what would have been the bit flag value and throw that into the description field. I would tie them together with a category value to combine them into logical groups. I have around 1.2 million entities to store this info for, so this new table will get long fast. Another passing thought was to store these sets of values as XML for each entity. I have to admit I dont know what the advantage is for that other than the row count savings. how easy would it be to search through a 140 field xml blob for 1 million entites? Any advice is appreciated on how to tackle this. Before folks start pointing and laughing and snickering about you re-inventing EAV yet again, I thought I would point out that the simplest logical representation for a boolean in a relational system is simply the existence of a tuple in a relation. Thus, a design that has 150 tables each with a single column theoretically suffices. On the other hand, such a design would clearly violate the POOD. Some might question the principle of orthogonal design, but I think one ignores it at one's own risk when creating 150 tables with identical predicates. At a more basic level, though, are you sure you have correctly modelled your problem? 150 independent booleans creates a state machine with somewhere around 10^45 states. That's a big state machine. Without any sort of transition constraints, that creates a fully connected state machine with 10^45 states and somewhere around 10^90 allowable transitions. Those are big numbers, and it seems unlikely you really need such an unwieldy state machine for each row of your table.- Hide quoted text - - Show quoted text - |
#4
| |||
| |||
|
|
carl.henth... (AT) gmail (DOT) com wrote: I have an entity that along with all of the usual attributes, has a set of 140 bit flag attributes that I need to track all yes/no values of. I started going down the path of creating a table 150 columns wide, but after typing a few of them thought there may be a better way. :-) I am thinking of creating a table like this: Create table details ( DetailsID int identity(1000,1), Name nvarchar(40), Descr nvarchar(60), Category nvarchar(40) ) Thus I can take what would have been my column headers and use them in Name field, throw what would have been the bit flag value and throw that into the description field. I would tie them together with a category value to combine them into logical groups. I have around 1.2 million entities to store this info for, so this new table will get long fast. Another passing thought was to store these sets of values as XML for each entity. I have to admit I dont know what the advantage is for that other than the row count savings. how easy would it be to search through a 140 field xml blob for 1 million entites? Any advice is appreciated on how to tackle this. Before folks start pointing and laughing and snickering about you re-inventing EAV yet again, I thought I would point out that the simplest logical representation for a boolean in a relational system is simply the existence of a tuple in a relation. Thus, a design that has 150 tables each with a single column theoretically suffices. On the other hand, such a design would clearly violate the POOD. Some might question the principle of orthogonal design, but I think one ignores it at one's own risk when creating 150 tables with identical predicates. At a more basic level, though, are you sure you have correctly modelled your problem? 150 independent booleans creates a state machine with somewhere around 10^45 states. That's a big state machine. Without any sort of transition constraints, that creates a fully connected state machine with 10^45 states and somewhere around 10^90 allowable transitions. Those are big numbers, and it seems unlikely you really need such an unwieldy state machine for each row of your table.- Hide quoted text - - Show quoted text - Thank you for bypassing the hazing of the rookie, I appreciate it greatly! :-) For the purposes of this table, my entity is a person, and the 150 bit fields are a combination of columns describing all of the travel interests that they may have in a "yes I am interested", or "No, I am not interested" format. While it is possible to have a person interested in all 150 travel destinations, it is very unlikely, thus giving me a subset to work with. Since the set of "yes" values determines which marketing pieces get sent out, I do need to keep track of those, obviously, but I do not see the reason for keeping track of the "no's" since marketing would not be sent to them anyways. In my proposed table the absense of a row is the same as a "No" value. Being moslty self taught, I was not familier with the term EAV. But after looking it up, I can see that my issue is very simualr to a patient-symptom set of data for a doctor. Sounds like this is the way to go after all. I still have to wonder if there is a better way, however, to store/ read that info. Perhaps a quasi-binary where each bit represents a field in a footprint? or even the XML blob. thank you again for the response! |
#5
| |||
| |||
|
|
At a more basic level, though, are you sure you have correctly modelled your problem? 150 independent booleans creates a state machine with somewhere around 10^45 states. That's a big state machine. Without any sort of transition constraints, that creates a fully connected state machine with 10^45 states and somewhere around 10^90 allowable transitions. Those are big numbers, and it seems unlikely you really need such an unwieldy state machine for each row of your table. |
#6
| |||
| |||
|
|
On Jan 22, 12:53 pm, Bob Badour <bbad... (AT) pei (DOT) sympatico.ca> wrote: carl.henth... (AT) gmail (DOT) com wrote: I have an entity that along with all of the usual attributes, has a set of 140 bit flag attributes that I need to track all yes/no values of. I started going down the path of creating a table 150 columns wide, but after typing a few of them thought there may be a better way. :-) I am thinking of creating a table like this: Create table details ( DetailsID int identity(1000,1), Name nvarchar(40), Descr nvarchar(60), Category nvarchar(40) ) Thus I can take what would have been my column headers and use them in Name field, throw what would have been the bit flag value and throw that into the description field. I would tie them together with a category value to combine them into logical groups. I have around 1.2 million entities to store this info for, so this new table will get long fast. Another passing thought was to store these sets of values as XML for each entity. I have to admit I dont know what the advantage is for that other than the row count savings. how easy would it be to search through a 140 field xml blob for 1 million entites? Any advice is appreciated on how to tackle this. Before folks start pointing and laughing and snickering about you re-inventing EAV yet again, I thought I would point out that the simplest logical representation for a boolean in a relational system is simply the existence of a tuple in a relation. Thus, a design that has 150 tables each with a single column theoretically suffices. On the other hand, such a design would clearly violate the POOD. Some might question the principle of orthogonal design, but I think one ignores it at one's own risk when creating 150 tables with identical predicates. At a more basic level, though, are you sure you have correctly modelled your problem? 150 independent booleans creates a state machine with somewhere around 10^45 states. That's a big state machine. Without any sort of transition constraints, that creates a fully connected state machine with 10^45 states and somewhere around 10^90 allowable transitions. Those are big numbers, and it seems unlikely you really need such an unwieldy state machine for each row of your table.- Hide quoted text - - Show quoted text - Thank you for bypassing the hazing of the rookie, I appreciate it greatly! :-) For the purposes of this table, my entity is a person, and the 150 bit fields are a combination of columns describing all of the travel interests that they may have in a "yes I am interested", or "No, I am not interested" format. While it is possible to have a person interested in all 150 travel destinations, it is very unlikely, thus giving me a subset to work with. Since the set of "yes" values determines which marketing pieces get sent out, I do need to keep track of those, obviously, but I do not see the reason for keeping track of the "no's" since marketing would not be sent to them anyways. In my proposed table the absense of a row is the same as a "No" value. Being moslty self taught, I was not familier with the term EAV. But after looking it up, I can see that my issue is very simualr to a patient-symptom set of data for a doctor. Sounds like this is the way to go after all. I still have to wonder if there is a better way, however, to store/ read that info. Perhaps a quasi-binary where each bit represents a field in a footprint? or even the XML blob. thank you again for the response! |
#7
| |||
| |||
|
|
On Jan 22, 3:53 pm, Bob Badour <bbad... (AT) pei (DOT) sympatico.ca> wrote: At a more basic level, though, are you sure you have correctly modelled your problem? 150 independent booleans creates a state machine with somewhere around 10^45 states. That's a big state machine. Without any sort of transition constraints, that creates a fully connected state machine with 10^45 states and somewhere around 10^90 allowable transitions. Those are big numbers, and it seems unlikely you really need such an unwieldy state machine for each row of your table. I've often wondered about this line of thinking, that if your system isn't implemented in a purely relational methodology, you have no choice but to implement it as a state machine. However, I think that I'm settling on the idea that its more the malfunction of the relational advocate that they aren't able to handle situations outside of their admittedly good relational foundations. Just because the original design used what appears to be a perfectly decent instance of repeating groups does not mean the designer is then condemned to the complexity of Mr. Badour's imaginary zillion states. While the DATA in these tables could have a zillion states, thats like saying that an ssn field has 10^9 states because it has nine characters that can range from zero to nine. Yes, there may be many many possible values, but simple payroll programs for example are not burdened with a zillion states as either entire ranges of values are handled identically, or the unneeded values are simply never entered into storage. C'mon people, programming 101 here. Does anybody have knowledge of the origin of this silly "state machine" argument? I'm really curious why it carries any weight in these sort of discussions but I also imagine that it originally was a valid point offered by someone a bit more academically inclined than Mr. Badour, and I'm genuinely interested in reading about it.i |
#8
| |||
| |||
|
#9
| |||||||||
| |||||||||
|
|
patrick61z (AT) yahoo (DOT) com wrote: On Jan 22, 3:53 pm, Bob Badour <bbad... (AT) pei (DOT) sympatico.ca> wrote: At a more basic level, though, are you sure you have correctly modelled your problem? 150 independent booleans creates a state machine with somewhere around 10^45 states. That's a big state machine. Without any sort of transition constraints, that creates a fully connected state machine with 10^45 states and somewhere around 10^90 allowable transitions. Those are big numbers, and it seems unlikely you really need such an unwieldy state machine for each row of your table. I've often wondered about this line of thinking, that if your system isn't implemented in a purely relational methodology, you have no choice but to implement it as a state machine. |
|
Just because the original design used what appears to be a perfectly decent instance of repeating groups does not mean the designer is then condemned to the complexity of Mr. Badour's imaginary zillion states. |
|
While the DATA in these tables could have a zillion states, thats like saying that an ssn field has 10^9 states because it has nine characters that can range from zero to nine. |
|
Yes, there may be many many possible values, but simple payroll programs for example are not burdened with a zillion states as either entire ranges of values are handled identically, or the unneeded values are simply never entered into storage. |
|
C'mon people, programming 101 here. Does anybody have knowledge of the origin of this silly "state machine" argument? |
|
I'm really curious why it carries any weight in these sort of discussions but I also imagine that it originally was a valid point offered by someone a bit more academically inclined than Mr. Badour, and I'm genuinely interested in reading about it.i |
|
I'd say the point of the 'state machine' comp- arison was likely to illustrate that the original post completely lacked 'actionable' requirements, ie., the 'requirement', as stated, was silly (no offence intended to the OP who qualified his questions). |
|
I've seen people try to pass off such as real requirements more times than I want to remember. Note that the original poster hadn't clarified his requirement until later. A later message showed that he's trying to deal with a survey result. But since a marketing operation seems involved, there are further applications, eg., 'who bought a trip and might become a repeat customer?'. |
|
Besides, if 1.2 million respondents, times 150 questions, is considered a big db, I have to laugh. |
#10
| ||||
| ||||
|
|
Those are big numbers, and it seems unlikely you really need such an unwieldy state machine for each row of your table. |
|
Does anybody have knowledge of the origin of this silly "state machine" argument? |
|
Yes, of course, we all do. It originated in Patrick's skull as a figment of his imagination. I doubt any argument exists beyond that scope. |
|
At a more basic level, though, are you sure you have correctly modelled your problem? 150 independent booleans creates a state machine with somewhere around 10^45 states. That's a big state machine. Without any sort of transition constraints, that creates a fully connected state machine with 10^45 states and somewhere around 10^90 allowable transitions. |
![]() |
| Thread Tools | |
| Display Modes | |
| |