dbTalk Databases Forums  

Clustered index on the identity column or a datetime column

comp.databases.ms-sqlserver comp.databases.ms-sqlserver


Discuss Clustered index on the identity column or a datetime column in the comp.databases.ms-sqlserver forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Tommy Hayes
 
Posts: n/a

Default Clustered index on the identity column or a datetime column - 09-25-2007 , 08:57 AM






Hello all,

We have a table with about 2 million rows that is used to store log
events. The table has an identity column and also a datetime column to
record the event time. It is expected to at least double in size over
its lifetime. Because the datetime records the current time, the value
of that column goes up as the value in the identity column goes up.

When doing searches on the table it is normal to have a date range as
part of the search. This would lead me to think that the clustered
index of the main table should be on the datetime column, letting SQL
Server know that a date range is sequential in the table as opposed to
the database server hopping all over the place to retrieve rows.

However is there any argument for putting the clustered index on the
identity column of the main table? The datetimes may not be unique
(highly unlikely in the given application, but logically possible I
guess) - would that lessen the advantage of the clustered index on the
datetime?

How about putting the clustered index on the identity column, and if
given two dates to form a range then look up the two corresponding
identities and turn the query into an indentity range query, thus
allowing the identity column clustered index to be used for the date
range query as well?

Or is there some way to let SQL Server know that two columns are
'linked' so that it knows to alter the query itself? (i.e. a range on
column A is equivalent to a range on column B, and column B is the
clustered index column)

Any ideas would be appreciated.
Thanks,
Tommy.


Reply With Quote
  #2  
Old   
Roy Harvey (SQL Server MVP)
 
Posts: n/a

Default Re: Clustered index on the identity column or a datetime column - 09-25-2007 , 09:08 AM






I would be inclined to cluster (unique) on the datetime/identity pair,
in that order, and set the PK on the identity non-clustered. That
gives you the date range efficiency you need.

I can see no benefit to clustering on the identity. There is no way
to tell the optimizer that the two columns are moving along "in
parallel" the way they are, so don't worry about it.

Roy Harvey
Beacon Falls, CT

On Tue, 25 Sep 2007 13:57:37 -0000, Tommy Hayes
<tommy.hayes (AT) gmail (DOT) com> wrote:

Quote:
Hello all,

We have a table with about 2 million rows that is used to store log
events. The table has an identity column and also a datetime column to
record the event time. It is expected to at least double in size over
its lifetime. Because the datetime records the current time, the value
of that column goes up as the value in the identity column goes up.

When doing searches on the table it is normal to have a date range as
part of the search. This would lead me to think that the clustered
index of the main table should be on the datetime column, letting SQL
Server know that a date range is sequential in the table as opposed to
the database server hopping all over the place to retrieve rows.

However is there any argument for putting the clustered index on the
identity column of the main table? The datetimes may not be unique
(highly unlikely in the given application, but logically possible I
guess) - would that lessen the advantage of the clustered index on the
datetime?

How about putting the clustered index on the identity column, and if
given two dates to form a range then look up the two corresponding
identities and turn the query into an indentity range query, thus
allowing the identity column clustered index to be used for the date
range query as well?

Or is there some way to let SQL Server know that two columns are
'linked' so that it knows to alter the query itself? (i.e. a range on
column A is equivalent to a range on column B, and column B is the
clustered index column)

Any ideas would be appreciated.
Thanks,
Tommy.

Reply With Quote
  #3  
Old   
Tommy Hayes
 
Posts: n/a

Default Re: Clustered index on the identity column or a datetime column - 09-25-2007 , 11:36 AM



Thanks again Roy,

Tommy.


On Sep 25, 3:08 pm, "Roy Harvey (SQL Server MVP)"
<roy_har... (AT) snet (DOT) net> wrote:
Quote:
I would be inclined to cluster (unique) on the datetime/identity pair,
in that order, and set the PK on the identity non-clustered. That
gives you the date range efficiency you need.

I can see no benefit to clustering on the identity. There is no way
to tell the optimizer that the two columns are moving along "in
parallel" the way they are, so don't worry about it.

Roy Harvey
Beacon Falls, CT

On Tue, 25 Sep 2007 13:57:37 -0000, Tommy Hayes



tommy.ha... (AT) gmail (DOT) com> wrote:
Hello all,

We have a table with about 2 million rows that is used to store log
events. The table has an identity column and also a datetime column to
record the event time. It is expected to at least double in size over
its lifetime. Because the datetime records the current time, the value
of that column goes up as the value in the identity column goes up.

When doing searches on the table it is normal to have a date range as
part of the search. This would lead me to think that the clustered
index of the main table should be on the datetime column, letting SQL
Server know that a date range is sequential in the table as opposed to
the database server hopping all over the place to retrieve rows.

However is there any argument for putting the clustered index on the
identity column of the main table? The datetimes may not be unique
(highly unlikely in the given application, but logically possible I
guess) - would that lessen the advantage of the clustered index on the
datetime?

How about putting the clustered index on the identity column, and if
given two dates to form a range then look up the two corresponding
identities and turn the query into an indentity range query, thus
allowing the identity column clustered index to be used for the date
range query as well?

Or is there some way to let SQL Server know that two columns are
'linked' so that it knows to alter the query itself? (i.e. a range on
column A is equivalent to a range on column B, and column B is the
clustered index column)

Any ideas would be appreciated.
Thanks,
Tommy.- Hide quoted text -

- Show quoted text -



Reply With Quote
  #4  
Old   
Gert-Jan Strik
 
Posts: n/a

Default Re: Clustered index on the identity column or a datetime column - 09-25-2007 , 04:17 PM



If there are relatively few duplicates in the datetime (which is typical
for a log file), then I would go for a clustered nonunique index on the
datetime column, because that would be a bit more efficient then a
unique compound index, since under the covers the uniqueifier is only
added for duplicates. Obviously this applies to the index tree (because
the data pages would require more space in case of duplicates), but it
might just save one index depth.

--
Gert-Jan


"Roy Harvey (SQL Server MVP)" wrote:
Quote:
I would be inclined to cluster (unique) on the datetime/identity pair,
in that order, and set the PK on the identity non-clustered. That
gives you the date range efficiency you need.

I can see no benefit to clustering on the identity. There is no way
to tell the optimizer that the two columns are moving along "in
parallel" the way they are, so don't worry about it.

Roy Harvey
Beacon Falls, CT

On Tue, 25 Sep 2007 13:57:37 -0000, Tommy Hayes
tommy.hayes (AT) gmail (DOT) com> wrote:

Hello all,

We have a table with about 2 million rows that is used to store log
events. The table has an identity column and also a datetime column to
record the event time. It is expected to at least double in size over
its lifetime. Because the datetime records the current time, the value
of that column goes up as the value in the identity column goes up.

When doing searches on the table it is normal to have a date range as
part of the search. This would lead me to think that the clustered
index of the main table should be on the datetime column, letting SQL
Server know that a date range is sequential in the table as opposed to
the database server hopping all over the place to retrieve rows.

However is there any argument for putting the clustered index on the
identity column of the main table? The datetimes may not be unique
(highly unlikely in the given application, but logically possible I
guess) - would that lessen the advantage of the clustered index on the
datetime?

How about putting the clustered index on the identity column, and if
given two dates to form a range then look up the two corresponding
identities and turn the query into an indentity range query, thus
allowing the identity column clustered index to be used for the date
range query as well?

Or is there some way to let SQL Server know that two columns are
'linked' so that it knows to alter the query itself? (i.e. a range on
column A is equivalent to a range on column B, and column B is the
clustered index column)

Any ideas would be appreciated.
Thanks,
Tommy.

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.