![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
|
Through the last few months I have been working on a custom storage engine, which is optimized for a specific application. I have come very far indeed, but it was a very thorny road :-( .... For purposes of comparison I filled two tables with the same data (some 130 million rows). One table uses my storage engine, the other table uses innodb. The same query command produces very different results. Here is the command: .... I checked what it is that the query optimizer wants to know about my storage engine table, and it's not a lot: number of rows (info() method), cost of a table scan (scan_time() method), cost of an index access (read_time() method). The records_in_range() method is not called, and the cost returned by scan_time() is about 10^11 times larger than the one returned by read_time(). I'm stumped. Why would the query optimizer prefer such an abismal choice? And how would I rectify this? Note that changing the query command to a "select straight_join" is not really an option for me. While it cuts the number of rows involved from 95706810 to 638075, it's far from the 693 rows the innodb query uses. Whatever problems there is, it ought to be resolved within the storage engine domain. |
#2
| |||
| |||
|
|
Through the last few months I have been working on a custom storage engine, which is optimized for a specific application. I have come very far indeed, but it was a very thorny road :-( That storage engine supports indexing, but so far I have only activated it for the primary key. As far as my tests tell me, the storage engine does what it is supposed to. There is, however, one curious fact which threatens to make all the work I put into this project obsolete. That fact is in that certain table queries can literally take days to complete. For purposes of comparison I filled two tables with the same data (some 130 million rows). One table uses my storage engine, the other table uses innodb. The same query command produces very different results. Here is the command: select d.symbol,d.syscode,if(min(h.datum)>date_sub(now(), interval 12 month),-1,0) as fehlende_tage , date_add(min(h.datum),interval 28 day) as min_datum, count(h.datum)-20 as anzahl_kurse from system.dtboerse_ranglisten as d left join master.historien as h on h.symbol=d.symbol and (h.boerse in ('xet')) and h.datum>=date_sub (now(),interval 12 month) and weekday(h.datum)<5 where d.rank_turnover is not null and h.datum>=date_sub(now(),interval 12 month) group by d.symbol,d.syscode,d.seg_id\G For the innodb table the results of "explain select..." are as follows: *************************** 1. row *************************** id: 1 select_type: SIMPLE table: d type: ALL possible_keys: symsys key: NULL key_len: NULL ref: NULL rows: 693 Extra: Using where; Using temporary; Using filesort *************************** 2. row *************************** id: 1 select_type: SIMPLE table: h type: ref possible_keys: PRIMARY key: PRIMARY key_len: 26 ref: system.d.symbol,const rows: 7 Extra: Using where; Using index 2 rows in set (0.00 sec) For my own storage engine the following results are produced: *************************** 1. row *************************** id: 1 select_type: SIMPLE table: h type: ALL possible_keys: PRIMARY key: NULL key_len: NULL ref: NULL rows: 95706810 Extra: Using where; Using temporary; Using filesort *************************** 2. row *************************** id: 1 select_type: SIMPLE table: d type: ref possible_keys: symsys key: symsys key_len: 22 ref: master.h.symbol rows: 2 Extra: Using where 2 rows in set (0.01 sec) Note the number of rows involved: for my own storage engine the query optimizer always prefers to run a complete table scan, which with this number of rows is deadly. I checked what it is that the query optimizer wants to know about my storage engine table, and it's not a lot: number of rows (info() method), cost of a table scan (scan_time() method), cost of an index access (read_time() method). The records_in_range() method is not called, and the cost returned by scan_time() is about 10^11 times larger than the one returned by read_time(). I'm stumped. Why would the query optimizer prefer such an abismal choice? And how would I rectify this? Note that changing the query command to a "select straight_join" is not really an option for me. While it cuts the number of rows involved from 95706810 to 638075, it's far from the 693 rows the innodb query uses. Whatever problems there is, it ought to be resolved within the storage engine domain. Any help would be appreciated! I'm using MySQL 5.0, because that's the version I started implementing my custom storage engine with. -- Home: Olaf Barthel, Gneisenaustrasse 43, D-31275 Lehrte Net: olsen (AT) sourcery (DOT) han.de (Home), o.barthel (AT) logical-line (DOT) de (Work) -- MySQL Internals Mailing List For list archives: http://lists.mysql.com/internals To unsubscribe: http://lists.mysql.com/internals? unsub=paul.mccullagh (AT) primebase (DOT) com |
#3
| |||
| |||
|
|
Try to add a FORCE INDEX to tell the optimizer you really want to use the PRIMARY KEY - though it may not help if optimizer thinks that it's not possible to use the PRIMARY KEY here. Adding 'force index (primary)' in the right place had no effect whatsoever :-( The "explain select..." query produces exactly the same results as if I had omitted the 'force index (primary)' text. I have a hunch that the optimizer considers the index on the primary key to be unusable. Here is what a "show index from master.historien" query produces: *************************** 1. row *************************** Table: historien Non_unique: 0 Key_name: PRIMARY Seq_in_index: 1 Column_name: symbol Collation: A Cardinality: NULL Sub_part: NULL Packed: NULL Null: Index_type: BTREE Comment: *************************** 2. row *************************** .... What I do not understand yet is why the cardinality is NULL for the first two columns. |
|
And check best_access_path() function (in sql_select.cc) in a debugger, to see why it prefers the table scan. I didn't dare to go the hard way yet ;-) |

|
Some fiddling with the built-in DBUG package produced some interesting information, though: the primary key index is in fact completely ignored in test_quick_select(), as called by get_quick_record_count(). |
#4
| |||
| |||
|
|
Sergei Golubchik schrieb: And check best_access_path() function (in sql_select.cc) in a debugger, to see why it prefers the table scan. .... Check the big if() in that function. The if() that has a big comment block before it, and that has different conditions in if() numbered as (1), (2), and so on. This is the section that begins with the comment text "Don't test table scan if it can't be better", isn't it? I can't say I understand what choices are being made, and why, looking at the code in the debugger. |
#5
| |||
| |||
|
|
The condition (4): !(s->table->force_index && best_key && !s->quick) looks like the easiest to analyze. Let's assume you used FORCE INDEX. I've looked into it, but with no success. This particular condition evaluates to false for the FORCE INDEX case. In the problematic case best_key != NULL, and since s->quick == NULL the search still ends up picking the complete table scan. |
#6
| |||
| |||
|
|
I also had another look at what the "explain select straight_join" produces on the table managed by my storage engine: .... *************************** 2. row *************************** id: 1 select_type: SIMPLE table: h type: ref possible_keys: PRIMARY key: PRIMARY key_len: 26 ref: system.d.symbol,const rows: 509683 Extra: Using where For the same select command on the innodb table, it looks like this: .... *************************** 2. row *************************** id: 1 select_type: SIMPLE table: h type: ref possible_keys: PRIMARY key: PRIMARY key_len: 26 ref: system.d.symbol,const rows: 240 Extra: Using where; Using index -- 8< -- Note the "Extra:" lines, and the missing "Using index" for my custom storage engine. |
#7
| |||
| |||
|
|
Sergei Golubchik schrieb: [..] There're two differences, as you can see. One is very different estimation for the number of rows. It comes from the cardinality values, so you need to get cardinality values which are close to reality. Hm... the cardinality values go into the table->key_info[].rec_per_key[] fields when the info() method is called with the HA_STATUS_CONST flag, which is what happens for the heap storage engine. How close do I have to approximate what's in the index? A value of 0 probably won't do, but will any non-zero value do? |
|
Second is "Using index". According to the manual it mean " * `Using index' The column information is retrieved from the table using only information in the index tree without having to do an additional seek to read the actual row. This strategy can be used when the query uses only columns that are part of a single index. " MySQL can only use this mode if index_flags() for an index contain HA_KEYREAD_ONLY flag. Of course, ::extra() method of your handler must support HA_EXTRA_KEYREAD to make use of this optimization. Good, I can arrange that (tomorrow). What effect will this have, in general? If I understand correctly what the other storage engines do, they will limit the data retrieved from the rows to the portions covering the primary key. Which is an optimization, since it won't call in more data than necessary. Is that correct? |
#8
| |||
| |||
|
|
Sergei Golubchik schrieb: MySQL can only use this mode if index_flags() for an index contain HA_KEYREAD_ONLY flag. Of course, ::extra() method of your handler must support HA_EXTRA_KEYREAD to make use of this optimization. It's an optimization for MyISAM, because it stores indexes and data in different files. So if HA_EXTRA_KEYREAD is in use, MyISAM can only read index file, without reading data file at all. For InnoDB it makes little sense, because it keeps everything in one file anyway. InnoDB uses HA_EXTRA_KEYREAD to avoid reading blobs, as it stores blobs on separate data pages. Depending on the design of your storage engine, HA_EXTRA_KEYREAD may be a win, or a completely useless hint. It is not automatically good, and you don't have to struggle for "Using index" in the EXPLAIN output, unless you're sure that you can benefit from this "keyread" mode. Adding HA_KEYREAD_ONLY to the index_flags() return value made all the difference. I can't tell you how surprised I am... It's the more surprising to me because, if what all the other storage engines use this flag for, it ought to have no effect on the query optimizer's treatment of the tables. Yet it does. |
|
But as I had to find out again and again, there is more to implementing storage engines than there is in the documentation. For example, just today I discovered that the documentation disagrees on the "mode" parameter passed to the open() method. Is there anything I can do to help improving the documentation? |
![]() |
| Thread Tools | |
| Display Modes | |
| |