dbTalk Databases Forums  

Indexing of very large table

microsoft.public.sqlserver.programming microsoft.public.sqlserver.programming


Discuss Indexing of very large table in the microsoft.public.sqlserver.programming forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Jay
 
Posts: n/a

Default Indexing of very large table - 08-08-2009 , 11:24 PM






I have a table with around 10million rows.
The Primary key is (Symbol varchar(20), [date] smalldatetime).
There are 50 other columns - mostly smallint, varchar(20), decimal
(12,6)
The Server is Win2000, 3GB RAM.

When I ran: SELECT COUNT(*) FROM MYTABLE, it took 2 mins. Is it
normal for it to take this long?

The main query that I will be running involves ranking.

INSERT INTO MYRESULTS
SELECT A.Date, A.Symbol, COUNT(B.MyVal) As Rank
FROM #MYTABLE A, #MYTABLE B
WHERE A.Date = B.Date, A.Symbol = B.Symbol AND B.MyVal <= A.MyVal

NOTE: I am first putting a Million or so records into a temp table as
adding a new RANK column to the main table turned out to be very time
consuming. MyVal is varchar(20) but has a numeric value)

Any suggestions on Indexing ??? and improving performance of the above
query. Note that this is a research Database so adding columns,
changing data types, indexes etc. can be done. I am the only user,
this is the only table in the DB and I mainly need to analyze this
data. Once the table is filled up it will be static.


Thanks.

Reply With Quote
  #2  
Old   
Erland Sommarskog
 
Posts: n/a

Default Re: Indexing of very large table - 08-09-2009 , 04:26 AM






Jay (jay6447 (AT) hotmail (DOT) com) writes:
Quote:
I have a table with around 10million rows.
The Primary key is (Symbol varchar(20), [date] smalldatetime).
There are 50 other columns - mostly smallint, varchar(20), decimal
(12,6)
The Server is Win2000, 3GB RAM.

When I ran: SELECT COUNT(*) FROM MYTABLE, it took 2 mins. Is it
normal for it to take this long?
If there is no non-clustered index on the table, and no data is in the
cache, that seems reasonable.

Quote:
The main query that I will be running involves ranking.

INSERT INTO MYRESULTS
SELECT A.Date, A.Symbol, COUNT(B.MyVal) As Rank
FROM #MYTABLE A, #MYTABLE B
WHERE A.Date = B.Date, A.Symbol = B.Symbol AND B.MyVal <= A.MyVal
I assume there is a GROUP BY that you accidently left out?

You don't say which version of SQL Server you are using. If you
are on SQL 2005, you can change the above to:

SELECT Date, Symbol,
rank = rank() OVER(PARTITION BY Date, Symbol ORDER BY MyVal)
FROM #MYTABLE

If you are stuck on SQL 2000, the above will never perform well, but
a non-clustered in dex on (Date, Symbol, MyVal) (Or Symbol, Date, MyVal)
is the best you can get. The query will cover the index, and SQL Server
does not have to wade through all the other columns.

Quote:
NOTE: I am first putting a Million or so records into a temp table as
adding a new RANK column to the main table turned out to be very time
consuming. MyVal is varchar(20) but has a numeric value)
Uh-oh. Beware that SQL Server will sort '10' before '9', so unless the
numeric values are zero-padded or right-adjusted, you will probably not
get the ranking you are looking for. Changing the data type may be a very
good idea.



--
Erland Sommarskog, SQL Server MVP, esquel (AT) sommarskog (DOT) se

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
SQL 2000: http://www.microsoft.com/sql/prodinf...ons/books.mspx

Reply With Quote
  #3  
Old   
RickNZ
 
Posts: n/a

Default RE: Indexing of very large table - 08-09-2009 , 10:09 PM



If the table isn't in memory, or if it won't fit in memory, then 2 minutes to
count 10M rows seems within reason. Having enough addressable RAM can have a
huge performance impact on aggregation queries.

Having a varchar in a clustered index can cause lots of table fragmentation
if the rows weren't inserted in index order. You might try defragmenting the
table.

First, run DBCC SHOWCONTIG to see how fragmented it is. If it looks
fragmented, run DBCC DBREINDEX ('YourTableName')

You might be able to use a CTE to avoid the need for a temp table with your
rank query.

Have you considered building a cube from your data, and using SSAS to
analyze it? Aggregations and ranking in SSAS are much faster than in the
relational engine.

Reply With Quote
  #4  
Old   
Jay
 
Posts: n/a

Default Re: Indexing of very large table - 08-09-2009 , 10:27 PM



Sorry, I did not mention a couple of things...

Response to a couple of queries mentioned above....
1.) I am using SQL 2000.
2.) Yes, I did forget to mention the GROUP BY clause in my statement.
3.) I do use a CAST() when I rank by MyVal.

This is what I will do based on the above responses....
1.) Change data type of column MyVal to Numeric
2.) Add a non-clustered index on Date, Symbol, MyVal (I guess, I still
keep the PK?)
3.) Try de-fragmenting the table as suggested.

Should I install the FREE SQL 2005 Express and transfer this database
to the 2005 installation - will this give me significant improvement
in performance?

Thanks.

Reply With Quote
  #5  
Old   
RickNZ
 
Posts: n/a

Default Re: Indexing of very large table - 08-10-2009 , 12:03 AM



SQL Express generally performs better than SQL 2000, although there are cases
where it doesn't. It would certainly be worth a try if your data fits in the
4GB size limit. I would use SQL Express 2008 instead of 2005 though. You
can download the advanced features pack to get Management Studio too.

If your project is for your own personal use, as part of a development
effort, you might also look into getting SQL Developer 2008. It costs $45 at
Amazon, and includes all of the functionality of SQL Enterprise. That would
also give you access to a recent version of Analysis Services, which is
usually the fastest way to address aggregations.

--
Check out my upcoming book: ASP.NET: Building Ultra-Fast and Ultra-Scalable
Websites Using ASP.NET and SQL Server

Reply With Quote
  #6  
Old   
Jay
 
Posts: n/a

Default Re: Indexing of very large table - 08-10-2009 , 12:57 AM



Thanks - With 4GB limmit, SQL 2005 Express may not work for this
project. I will check on 2008.


On Aug 10, 10:03*am, RickNZ <kies... (AT) no-spam-gmail (DOT) com> wrote:
Quote:
SQL Express generally performs better than SQL 2000, although there are cases
where it doesn't. *It would certainly be worth a try if your data fits in the
4GB size limit. *I would use SQL Express 2008 instead of 2005 though. *You
can download the advanced features pack to get Management Studio too.

If your project is for your own personal use, as part of a development
effort, you might also look into getting SQL Developer 2008. *It costs $45 at
Amazon, and includes all of the functionality of SQL Enterprise. *That would
also give you access to a recent version of Analysis Services, which is
usually the fastest way to address aggregations.

--
Check out my upcoming book: *ASP.NET: Building Ultra-Fast and Ultra-Scalable
Websites Using ASP.NET and SQL Server

Reply With Quote
  #7  
Old   
Henrik Staun Poulsen
 
Posts: n/a

Default Re: Indexing of very large table - 08-10-2009 , 03:28 AM



Jay,

Itzik Ben-Gan has an article series in the lastest numbers of SQL
Server Magazine about "Running Aggregates"

He lists a number of options for getting a fast solution.

HIH
Best regards,
Henrik Staun Poulsen
www.Stovi.com

On Aug 9, 6:24*am, Jay <jay6... (AT) hotmail (DOT) com> wrote:
Quote:
I have a table with around 10million rows.
The Primary key is (Symbol varchar(20), [date] smalldatetime).
There are 50 other columns - mostly smallint, varchar(20), decimal
(12,6)
The Server is Win2000, 3GB RAM.

When I ran: SELECT COUNT(*) FROM MYTABLE, *it took 2 mins. Is it
normal *for it to take this long?

The main query that I will be running involves ranking.

INSERT INTO MYRESULTS
SELECT A.Date, A.Symbol, COUNT(B.MyVal) As Rank
FROM #MYTABLE A, #MYTABLE B
WHERE A.Date = B.Date, A.Symbol = B.Symbol AND B.MyVal <= A.MyVal

NOTE: I am first putting a Million or so records into a temp table as
adding a new RANK column to the main table turned out to be very time
consuming. MyVal is varchar(20) but has a numeric value)

Any suggestions on Indexing ??? and improving performance of the above
query. Note that this is a research Database so adding columns,
changing data types, indexes etc. can be done. I am the only user,
this is the only table in the DB and I mainly need to analyze this
data. Once the table is filled up it will be static.

Thanks.

Reply With Quote
  #8  
Old   
TheSQLGuru
 
Posts: n/a

Default Re: Indexing of very large table - 08-10-2009 , 09:05 AM



Is your PK clustered? If so, make a nonclustered index on myval and you
should then have a covering index since the clustering key is carried on all
nonclustered rows. This should greatly help your query performance (and the
count(*)).

Note that your query - well, once you include the group by anyway - will
perform horribly regardless since it is a 'triangular join'. The more rows
you have and the more that fall under the <= MyVal join criteria the worse
it gets.

--
Kevin G. Boles
Indicium Resources, Inc.
SQL Server MVP
kgboles a earthlink dt net


"Jay" <jay6447 (AT) hotmail (DOT) com> wrote

Quote:
I have a table with around 10million rows.
The Primary key is (Symbol varchar(20), [date] smalldatetime).
There are 50 other columns - mostly smallint, varchar(20), decimal
(12,6)
The Server is Win2000, 3GB RAM.

When I ran: SELECT COUNT(*) FROM MYTABLE, it took 2 mins. Is it
normal for it to take this long?

The main query that I will be running involves ranking.

INSERT INTO MYRESULTS
SELECT A.Date, A.Symbol, COUNT(B.MyVal) As Rank
FROM #MYTABLE A, #MYTABLE B
WHERE A.Date = B.Date, A.Symbol = B.Symbol AND B.MyVal <= A.MyVal

NOTE: I am first putting a Million or so records into a temp table as
adding a new RANK column to the main table turned out to be very time
consuming. MyVal is varchar(20) but has a numeric value)

Any suggestions on Indexing ??? and improving performance of the above
query. Note that this is a research Database so adding columns,
changing data types, indexes etc. can be done. I am the only user,
this is the only table in the DB and I mainly need to analyze this
data. Once the table is filled up it will be static.


Thanks.

Reply With Quote
  #9  
Old   
Erland Sommarskog
 
Posts: n/a

Default Re: Indexing of very large table - 08-10-2009 , 05:35 PM



Jay (jay6447 (AT) hotmail (DOT) com) writes:
Quote:
Should I install the FREE SQL 2005 Express and transfer this database
to the 2005 installation - will this give me significant improvement
in performance?
If you do a lot of this ranking, getting access to the row_number()
function may be essential for good performance.

I should also clarify that you should have an index on Date, Symbol,
MyVal on SQL 2005/2008 as well for best performance.


--
Erland Sommarskog, SQL Server MVP, esquel (AT) sommarskog (DOT) se

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
SQL 2000: http://www.microsoft.com/sql/prodinf...ons/books.mspx

Reply With Quote
  #10  
Old   
TheSQLGuru
 
Posts: n/a

Default Re: Indexing of very large table - 08-10-2009 , 11:29 PM



since date and symbol are PK columns, wouldn't an index on MyVal be
sufficient since the clustering key is carried on all nonclustered indexes?

--
Kevin G. Boles
Indicium Resources, Inc.
SQL Server MVP
kgboles a earthlink dt net


"Erland Sommarskog" <esquel (AT) sommarskog (DOT) se> wrote

Quote:
Jay (jay6447 (AT) hotmail (DOT) com) writes:
Should I install the FREE SQL 2005 Express and transfer this database
to the 2005 installation - will this give me significant improvement
in performance?

If you do a lot of this ranking, getting access to the row_number()
function may be essential for good performance.

I should also clarify that you should have an index on Date, Symbol,
MyVal on SQL 2005/2008 as well for best performance.


--
Erland Sommarskog, SQL Server MVP, esquel (AT) sommarskog (DOT) se

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
SQL 2000:
http://www.microsoft.com/sql/prodinf...ons/books.mspx

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.