![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
#2
| |||
| |||
|
|
So, why is this a bane? The explanation on the page is less than clear. |
#3
| |||
| |||
|
|
So, why is this a bane? The explanation on the page is less than clear. |
#4
| ||||||
| ||||||
|
|
*“The columns that really do need histograms, where it really does matter, are the indexed columns. Those are the only columns where the optimizer can choose the access path.” *That is a common misconception: that histograms only matter for indexed columns. Choosing an access method ( index or tablescan ) for a row source is only part of the task of the CBO when putting together an access plan. Other, at least equally important, tasks are the types and especially order of joins – unless none of your sql contain joins. |
|
And it is there where column selectivities, derived from column statistics, play a crucial role. If you deprive the optimizer of this vital information it is bound to produce suboptimal, even very bad plans. If you have not yet encountered that count yourself lucky." |
|
snippage... Now consider that table T1 has an unindexed column named STATUS - the status is one of RELEASED, CLOSED, and CANCELLED. *Your particular query is looking for the few rows in table T1 with a STATUS of CANCELLED, with just 200 of the 1,000,000 rows matching that criteria. *If a histogram were created on that column, the optimizer might change the cardinality estimates like this (note that it likely will not be exactly correct due to the selectivity estimated being multiplied together when multiple predicates are present with AND specified between the predicates: T1: 5 T2: 500,000 T3: 100,000 In the above case, the selected join order might be T1 -> T3 -> T2 I think that the above is a case where just collecting histograms on indexed columns might be limiting. |
|
Richard Foote stated: "I wouldn’t recommend ALL INDEXED COLUMNS. Non-indexed columns can benefit from stats and histograms as well as indexed columns as the stats can provide important cardinality estimates of steps within an execution plan that can influence join orders and when steps are performed" |
|
Why not index the STATUS column in my example? *You could, but then what if you are looking for something different, like sales in the state of Alaska (probably much fewer than for California) - would you necessarily index a column that indicates the state of a customer? |
|
(On second thought, I probably would not have thought to manually build a histogram on this column either, but the statistics collection process might). |
#5
| |||
| |||
|
|
I have frequently used the method_opt from the title to analyze my tables and have been castigated for it with amazing regularity. I still don't quite understand why is that method bad. My logic is quite simple: The main decision that an optimizer has to make about the particular column condition is whether to use index to resolve it or not. That decision is based on the column statistics, as well as the the index statistics. *The things that influence the decision are the condition itself, the existence of the histograms which helps to estimate the number of rows that need to be read, the clustering factor which will help estimate the number of blocks to read, the values from SYS.AUX_STAT$ to estimate the I/O cost and the statistics from the table itself. There are two possible decisions that optimizer can make with respect to the column: whether to use an index or not. There is no decision to be made if the index is not there, except with respect to join method, whether to use sort or hash. That means that using "for all indexed columns size 254" actually makes a lot of sense and it doesn't waste as much space as "FOR ALL COLUMNS SIZE 254". There is an option "FOR ALL COLUMNS SIZE SKEWONLY" which would be nice, except that I have no idea of the histogram size. This method_opt was called "a bane":http://structureddata.org/2008/10/14...t-and-for-all- indexed-columns/ So, why is this a bane? The explanation on the page is less than clear. --http://mgogala.byethost5.com |
#6
| |||
| |||
|
|
Why not index the STATUS column in my example? You could, but then what if you are looking for something different, like sales in the state of Alaska (probably much fewer than for California) - would you necessarily index a column that indicates the state of a customer? (On second thought, I probably would not have thought to manually build a histogram on this column either, but the statistics collection process might). |
#7
| |||
| |||
|
|
- Use FOR ALL COLUMNS SIZE 1 as default. This way you ensure that you end up with basic column statistics on all columns. - Generate histograms only on columns where you can prove that it will be beneficial - do this explicitly or you could use an approach of SIZE REPEAT instead of SIZE 1 and generate the histogram once explicitly - There are cases where you will only benefit from a histogram when crafting it manually Hope this helps, Randolf |
![]() |
| Thread Tools | |
| Display Modes | |
| |