dbTalk Databases Forums  

the 20% rule

comp.databases.theory comp.databases.theory


Discuss the 20% rule in the comp.databases.theory forum.



Reply
 
Thread Tools Display Modes
  #11  
Old   
Bob Badour
 
Posts: n/a

Default Re: the 20% rule - 07-24-2008 , 12:17 PM






aarklon (AT) gmail (DOT) com wrote:

Quote:
Hi all,

I recently read the following in a book(oracle 9i for dummies by
Carol McCullough Dieter)

The fastest way to retrieve rows from a table is to access the row
with exact row id. An
index is the second fastest way, but it decreases in performance as
the proprotion of the rows retrieved increases. if you are retrieving
approximately 20 % of the rows in a table, using a index is just as
fast. But beyond that magic 20 %, not using 20% is faster.

keep this rule in mind when you create indexes intended to help speed
up a query.
Queries vary in the rows that they select from a table. if you have a
query that you use often,
determine the number of rows that it selects from the table. if this
number is more than 20%
of the total no: of rows in the table , an index on the table may not
improve the performance
of the query. you may just want to try both methods. if the number of
rows is less than 20%, an index will almost certainly help
performance.

my question is to what extent this rule is true ?
If you want to understand performance optimization, I suggest you learn
about the actual physical structures themselves and read one of Shasha's
books on the subject.

Made up percentages are just bullshit.


Reply With Quote
  #12  
Old   
Bob Badour
 
Posts: n/a

Default Re: the 20% rule - 07-24-2008 , 12:17 PM






aarklon (AT) gmail (DOT) com wrote:

Quote:
Hi all,

I recently read the following in a book(oracle 9i for dummies by
Carol McCullough Dieter)

The fastest way to retrieve rows from a table is to access the row
with exact row id. An
index is the second fastest way, but it decreases in performance as
the proprotion of the rows retrieved increases. if you are retrieving
approximately 20 % of the rows in a table, using a index is just as
fast. But beyond that magic 20 %, not using 20% is faster.

keep this rule in mind when you create indexes intended to help speed
up a query.
Queries vary in the rows that they select from a table. if you have a
query that you use often,
determine the number of rows that it selects from the table. if this
number is more than 20%
of the total no: of rows in the table , an index on the table may not
improve the performance
of the query. you may just want to try both methods. if the number of
rows is less than 20%, an index will almost certainly help
performance.

my question is to what extent this rule is true ?
If you want to understand performance optimization, I suggest you learn
about the actual physical structures themselves and read one of Shasha's
books on the subject.

Made up percentages are just bullshit.


Reply With Quote
  #13  
Old   
Bob Badour
 
Posts: n/a

Default Re: the 20% rule - 07-24-2008 , 12:17 PM



aarklon (AT) gmail (DOT) com wrote:

Quote:
Hi all,

I recently read the following in a book(oracle 9i for dummies by
Carol McCullough Dieter)

The fastest way to retrieve rows from a table is to access the row
with exact row id. An
index is the second fastest way, but it decreases in performance as
the proprotion of the rows retrieved increases. if you are retrieving
approximately 20 % of the rows in a table, using a index is just as
fast. But beyond that magic 20 %, not using 20% is faster.

keep this rule in mind when you create indexes intended to help speed
up a query.
Queries vary in the rows that they select from a table. if you have a
query that you use often,
determine the number of rows that it selects from the table. if this
number is more than 20%
of the total no: of rows in the table , an index on the table may not
improve the performance
of the query. you may just want to try both methods. if the number of
rows is less than 20%, an index will almost certainly help
performance.

my question is to what extent this rule is true ?
If you want to understand performance optimization, I suggest you learn
about the actual physical structures themselves and read one of Shasha's
books on the subject.

Made up percentages are just bullshit.


Reply With Quote
  #14  
Old   
Bob Badour
 
Posts: n/a

Default Re: the 20% rule - 07-24-2008 , 12:17 PM



aarklon (AT) gmail (DOT) com wrote:

Quote:
Hi all,

I recently read the following in a book(oracle 9i for dummies by
Carol McCullough Dieter)

The fastest way to retrieve rows from a table is to access the row
with exact row id. An
index is the second fastest way, but it decreases in performance as
the proprotion of the rows retrieved increases. if you are retrieving
approximately 20 % of the rows in a table, using a index is just as
fast. But beyond that magic 20 %, not using 20% is faster.

keep this rule in mind when you create indexes intended to help speed
up a query.
Queries vary in the rows that they select from a table. if you have a
query that you use often,
determine the number of rows that it selects from the table. if this
number is more than 20%
of the total no: of rows in the table , an index on the table may not
improve the performance
of the query. you may just want to try both methods. if the number of
rows is less than 20%, an index will almost certainly help
performance.

my question is to what extent this rule is true ?
If you want to understand performance optimization, I suggest you learn
about the actual physical structures themselves and read one of Shasha's
books on the subject.

Made up percentages are just bullshit.


Reply With Quote
  #15  
Old   
Bob Badour
 
Posts: n/a

Default Re: the 20% rule - 07-24-2008 , 12:17 PM



aarklon (AT) gmail (DOT) com wrote:

Quote:
Hi all,

I recently read the following in a book(oracle 9i for dummies by
Carol McCullough Dieter)

The fastest way to retrieve rows from a table is to access the row
with exact row id. An
index is the second fastest way, but it decreases in performance as
the proprotion of the rows retrieved increases. if you are retrieving
approximately 20 % of the rows in a table, using a index is just as
fast. But beyond that magic 20 %, not using 20% is faster.

keep this rule in mind when you create indexes intended to help speed
up a query.
Queries vary in the rows that they select from a table. if you have a
query that you use often,
determine the number of rows that it selects from the table. if this
number is more than 20%
of the total no: of rows in the table , an index on the table may not
improve the performance
of the query. you may just want to try both methods. if the number of
rows is less than 20%, an index will almost certainly help
performance.

my question is to what extent this rule is true ?
If you want to understand performance optimization, I suggest you learn
about the actual physical structures themselves and read one of Shasha's
books on the subject.

Made up percentages are just bullshit.


Reply With Quote
  #16  
Old   
Bob Badour
 
Posts: n/a

Default Re: the 20% rule - 07-24-2008 , 12:17 PM



aarklon (AT) gmail (DOT) com wrote:

Quote:
Hi all,

I recently read the following in a book(oracle 9i for dummies by
Carol McCullough Dieter)

The fastest way to retrieve rows from a table is to access the row
with exact row id. An
index is the second fastest way, but it decreases in performance as
the proprotion of the rows retrieved increases. if you are retrieving
approximately 20 % of the rows in a table, using a index is just as
fast. But beyond that magic 20 %, not using 20% is faster.

keep this rule in mind when you create indexes intended to help speed
up a query.
Queries vary in the rows that they select from a table. if you have a
query that you use often,
determine the number of rows that it selects from the table. if this
number is more than 20%
of the total no: of rows in the table , an index on the table may not
improve the performance
of the query. you may just want to try both methods. if the number of
rows is less than 20%, an index will almost certainly help
performance.

my question is to what extent this rule is true ?
If you want to understand performance optimization, I suggest you learn
about the actual physical structures themselves and read one of Shasha's
books on the subject.

Made up percentages are just bullshit.


Reply With Quote
  #17  
Old   
Bob Badour
 
Posts: n/a

Default Re: the 20% rule - 07-24-2008 , 12:17 PM



aarklon (AT) gmail (DOT) com wrote:

Quote:
Hi all,

I recently read the following in a book(oracle 9i for dummies by
Carol McCullough Dieter)

The fastest way to retrieve rows from a table is to access the row
with exact row id. An
index is the second fastest way, but it decreases in performance as
the proprotion of the rows retrieved increases. if you are retrieving
approximately 20 % of the rows in a table, using a index is just as
fast. But beyond that magic 20 %, not using 20% is faster.

keep this rule in mind when you create indexes intended to help speed
up a query.
Queries vary in the rows that they select from a table. if you have a
query that you use often,
determine the number of rows that it selects from the table. if this
number is more than 20%
of the total no: of rows in the table , an index on the table may not
improve the performance
of the query. you may just want to try both methods. if the number of
rows is less than 20%, an index will almost certainly help
performance.

my question is to what extent this rule is true ?
If you want to understand performance optimization, I suggest you learn
about the actual physical structures themselves and read one of Shasha's
books on the subject.

Made up percentages are just bullshit.


Reply With Quote
  #18  
Old   
Bob Badour
 
Posts: n/a

Default Re: the 20% rule - 07-24-2008 , 12:17 PM



aarklon (AT) gmail (DOT) com wrote:

Quote:
Hi all,

I recently read the following in a book(oracle 9i for dummies by
Carol McCullough Dieter)

The fastest way to retrieve rows from a table is to access the row
with exact row id. An
index is the second fastest way, but it decreases in performance as
the proprotion of the rows retrieved increases. if you are retrieving
approximately 20 % of the rows in a table, using a index is just as
fast. But beyond that magic 20 %, not using 20% is faster.

keep this rule in mind when you create indexes intended to help speed
up a query.
Queries vary in the rows that they select from a table. if you have a
query that you use often,
determine the number of rows that it selects from the table. if this
number is more than 20%
of the total no: of rows in the table , an index on the table may not
improve the performance
of the query. you may just want to try both methods. if the number of
rows is less than 20%, an index will almost certainly help
performance.

my question is to what extent this rule is true ?
If you want to understand performance optimization, I suggest you learn
about the actual physical structures themselves and read one of Shasha's
books on the subject.

Made up percentages are just bullshit.


Reply With Quote
  #19  
Old   
Bob Badour
 
Posts: n/a

Default Re: the 20% rule - 07-24-2008 , 12:17 PM



aarklon (AT) gmail (DOT) com wrote:

Quote:
Hi all,

I recently read the following in a book(oracle 9i for dummies by
Carol McCullough Dieter)

The fastest way to retrieve rows from a table is to access the row
with exact row id. An
index is the second fastest way, but it decreases in performance as
the proprotion of the rows retrieved increases. if you are retrieving
approximately 20 % of the rows in a table, using a index is just as
fast. But beyond that magic 20 %, not using 20% is faster.

keep this rule in mind when you create indexes intended to help speed
up a query.
Queries vary in the rows that they select from a table. if you have a
query that you use often,
determine the number of rows that it selects from the table. if this
number is more than 20%
of the total no: of rows in the table , an index on the table may not
improve the performance
of the query. you may just want to try both methods. if the number of
rows is less than 20%, an index will almost certainly help
performance.

my question is to what extent this rule is true ?
If you want to understand performance optimization, I suggest you learn
about the actual physical structures themselves and read one of Shasha's
books on the subject.

Made up percentages are just bullshit.


Reply With Quote
  #20  
Old   
Tim X
 
Posts: n/a

Default Re: the 20% rule - 07-26-2008 , 12:54 AM



aarklon (AT) gmail (DOT) com writes:

Quote:
Hi all,

I recently read the following in a book(oracle 9i for dummies by
Carol McCullough Dieter)

The fastest way to retrieve rows from a table is to access the row
with exact row id. An
index is the second fastest way, but it decreases in performance as
the proprotion of the rows retrieved increases. if you are retrieving
approximately 20 % of the rows in a table, using a index is just as
fast. But beyond that magic 20 %, not using 20% is faster.

keep this rule in mind when you create indexes intended to help speed
up a query.
Queries vary in the rows that they select from a table. if you have a
query that you use often,
determine the number of rows that it selects from the table. if this
number is more than 20%
of the total no: of rows in the table , an index on the table may not
improve the performance
of the query. you may just want to try both methods. if the number of
rows is less than 20%, an index will almost certainly help
performance.

my question is to what extent this rule is true ?
This is what I would call a heuristic rather than a formal/solid rule -
a rule of thumb if you like.

The extent to which it is true depends on a number of factors. To
understand to what extent it is true, you need to udnerstand the
rationale underlying it.

consider an overly simplistic example.

You have a table with 100 records. Each record is 100k in size.

If you had a query you knew would return 10 records, consider the case
with and without an index.

In the case without the index, you would need to retrieve all rows and
determine if they match your selection criteria. Thats 100 disk reads of
100k each. Lets say its a very slow computer and each disk read takes
1 second. You would have 100 x 1 second to retrieve all the rows (100
seconds) and 100 x 100k - 10000k.

Now if you had an index, lets say the field being indexed is 10k. Assume
for simplicity, it a really good index and you can identify all 10 rows
with just 10 reads of the index. Again assuming 1 second per read. You
will have

10 x 1 sec for the index = 10 secs and 100k data
10 x 1 sec for the 10 rows = 10 sec and 1000k
= total of 20sec and 1100k.

Now consider a query where you know 60 of the records will be
returned. Same index.

60 x 1 secs = 60 seconds + 600k index retrievals
60 x 1 secs = 60 seconds + 6000 table retrievals
giving 120 sec and 6600k data. This means it took 20 seconds longer with
the index and you only retrieved 60% of the records compared to 100%
retrieval without in 100 seconds. So, in this case, it would be faster
without the index (ignoring data transfer overheads and possible impact
on memory caches due to retrieving more data etc)

I said this was simplistic. A lot of other factors come into play -
particularly with databases like Oracle where you have lots of
configuration variables, such as shared global memory areas, cached
queries, various different index schemes, different tablespace
configurations that can affect how efficient table scans/reads are etc.

In general, the larger the percentage of the rows that will be returned
by a query, the less benefit an index can be. At some point, there is a
cross-over where the index actually decreases efficiency rather than
improves it. The extreme case is when you are going to need to read all
rows - in this situation, using an index to identify the row is gaining
you nothing - you would have read that row anyway (though even this is a
rule of thumb - indexes can still be useful when you are talking about
table joins or correlated subqueries etc.

With Oracle, it is generally best to rely on tools like explain
plan. Rules like the one you mentioned can be a good guide and a helpful
tool in diagnosing performance issues, but you should only use them as a
guide. Nothing beats hard imperical facts. Consider rules like that one
to be something that can give you some direction on what to look at when
trying to improve performance. Consider other operations as well. Does
it really matter if queries become a bit slow if the index speeds up
inserts/updates (which would be unusual, but not unknown) and you have a
lot of inserts/updates compared to selects? Would indexing a different
field be better?

Of course, other factors should always be taken into consideration. For
example, what are the users expectations? How often is that query made
compared to other queries on the table? Will your expectations of
percentage of row returned change over time or with changing business
requirements or profiles/operations? Is the hardware profile one where
there is lots of memory and large cases with slow disk IO or is it a
system with less memory but really fast disk IO or some combination?

The figure of 20% seems somewhat arbitrary to me. I would expect this
figure would vary depending on the size of the data being indexed, the
type of index, the size of the rows in the table, the type of the data
in the index e.g. VARCHAR2 compared to integer etc.

Finally, note that I'm just a run of the mill developer that works a lot
with Oracle. I'm not an expert in Oracle tuning and know enough to know
I don't know enough. However, I have learnt to take many of the "rules"
I've seen with a grain of salt and not to consider them as hard and fast
rules that should never be broken.

regards,

Tim

--
tcross (at) rapttech dot com dot au


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.