![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
#2
| |||
| |||
|
|
Hello all, We have a table with about 2 million rows that is used to store log events. The table has an identity column and also a datetime column to record the event time. It is expected to at least double in size over its lifetime. Because the datetime records the current time, the value of that column goes up as the value in the identity column goes up. When doing searches on the table it is normal to have a date range as part of the search. This would lead me to think that the clustered index of the main table should be on the datetime column, letting SQL Server know that a date range is sequential in the table as opposed to the database server hopping all over the place to retrieve rows. However is there any argument for putting the clustered index on the identity column of the main table? The datetimes may not be unique (highly unlikely in the given application, but logically possible I guess) - would that lessen the advantage of the clustered index on the datetime? How about putting the clustered index on the identity column, and if given two dates to form a range then look up the two corresponding identities and turn the query into an indentity range query, thus allowing the identity column clustered index to be used for the date range query as well? Or is there some way to let SQL Server know that two columns are 'linked' so that it knows to alter the query itself? (i.e. a range on column A is equivalent to a range on column B, and column B is the clustered index column) Any ideas would be appreciated. Thanks, Tommy. |
#3
| |||
| |||
|
|
I would be inclined to cluster (unique) on the datetime/identity pair, in that order, and set the PK on the identity non-clustered. That gives you the date range efficiency you need. I can see no benefit to clustering on the identity. There is no way to tell the optimizer that the two columns are moving along "in parallel" the way they are, so don't worry about it. Roy Harvey Beacon Falls, CT On Tue, 25 Sep 2007 13:57:37 -0000, Tommy Hayes tommy.ha... (AT) gmail (DOT) com> wrote: Hello all, We have a table with about 2 million rows that is used to store log events. The table has an identity column and also a datetime column to record the event time. It is expected to at least double in size over its lifetime. Because the datetime records the current time, the value of that column goes up as the value in the identity column goes up. When doing searches on the table it is normal to have a date range as part of the search. This would lead me to think that the clustered index of the main table should be on the datetime column, letting SQL Server know that a date range is sequential in the table as opposed to the database server hopping all over the place to retrieve rows. However is there any argument for putting the clustered index on the identity column of the main table? The datetimes may not be unique (highly unlikely in the given application, but logically possible I guess) - would that lessen the advantage of the clustered index on the datetime? How about putting the clustered index on the identity column, and if given two dates to form a range then look up the two corresponding identities and turn the query into an indentity range query, thus allowing the identity column clustered index to be used for the date range query as well? Or is there some way to let SQL Server know that two columns are 'linked' so that it knows to alter the query itself? (i.e. a range on column A is equivalent to a range on column B, and column B is the clustered index column) Any ideas would be appreciated. Thanks, Tommy.- Hide quoted text - - Show quoted text - |
#4
| |||
| |||
|
|
I would be inclined to cluster (unique) on the datetime/identity pair, in that order, and set the PK on the identity non-clustered. That gives you the date range efficiency you need. I can see no benefit to clustering on the identity. There is no way to tell the optimizer that the two columns are moving along "in parallel" the way they are, so don't worry about it. Roy Harvey Beacon Falls, CT On Tue, 25 Sep 2007 13:57:37 -0000, Tommy Hayes tommy.hayes (AT) gmail (DOT) com> wrote: Hello all, We have a table with about 2 million rows that is used to store log events. The table has an identity column and also a datetime column to record the event time. It is expected to at least double in size over its lifetime. Because the datetime records the current time, the value of that column goes up as the value in the identity column goes up. When doing searches on the table it is normal to have a date range as part of the search. This would lead me to think that the clustered index of the main table should be on the datetime column, letting SQL Server know that a date range is sequential in the table as opposed to the database server hopping all over the place to retrieve rows. However is there any argument for putting the clustered index on the identity column of the main table? The datetimes may not be unique (highly unlikely in the given application, but logically possible I guess) - would that lessen the advantage of the clustered index on the datetime? How about putting the clustered index on the identity column, and if given two dates to form a range then look up the two corresponding identities and turn the query into an indentity range query, thus allowing the identity column clustered index to be used for the date range query as well? Or is there some way to let SQL Server know that two columns are 'linked' so that it knows to alter the query itself? (i.e. a range on column A is equivalent to a range on column B, and column B is the clustered index column) Any ideas would be appreciated. Thanks, Tommy. |
![]() |
| Thread Tools | |
| Display Modes | |
| |