dbTalk Databases Forums  

getting aggregate UDFs to execute with temp tables

mailing.database.mysql-internals mailing.database.mysql-internals


Discuss getting aggregate UDFs to execute with temp tables in the mailing.database.mysql-internals forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Stephen Tu
 
Posts: n/a

Default getting aggregate UDFs to execute with temp tables - 02-21-2012 , 01:29 AM






Hello,

Please bear with me as this is going to be kind of a long post.

I have a query that is like (call it query A):

SELECT fieldA, agg(fieldB) FROM some_tbl WHERE fieldC <= [some
constant] GROUP BY fieldA;

Where both fieldA, fieldB, and fieldC are *not* part of any index on the
table, and where agg() is some aggregate UDF.

EXPLAIN tells me that mysql is doing the following (in the extra
column): "Using where; Using filesort"

Ok, that works. But then I notice in a very similar query (call it query B):

SELECT fieldA, sum(fieldB) FROM some_tbl WHERE fieldC <= [some
constant] GROUP BY fieldA;

EXPLAIN says: "Using where; Using temporary table; Using filesort"

When I measure the performance, I notice that query B substantially
outperforms query A (about an order of magnitude). Note that agg() isn't
doing anything terribly expensive, so the perf difference is not all
attributed to the UDF. So I want to get query A to use the same
execution strategy as query B (namely the use of a temporary table).

Therefore, I took the source code for mysql-5.5.21, and in
sql/sql_select.cc, I commented out the following lines:

15278 //if (!sum_item->quick_group)
15279 // param->quick_group=0; // UDF SUM function

Upon re-compilation, EXPLAIN now tells me for query A that it is doing
the following: "Using where; Using temporary table; Using filesort".

Great, I thought, that was easy! However, now when I execute query A,
agg(fieldB) simply returns NULL. I verified that my UDF isn't even being
called. Presumably, there is a reason why UDFs don't go down this code
path

As I am not very familiar with the mysql internals, I would like to know
how I can actually get this to work. Is this even possible?

I realize that in general, UDFs can be non-deterministic etc, so you
have to be very careful when executing them. However, in my particular
case, my agg() UDF has the exact same semantics as sum(), so I would
hopefully like it to also have the same performance characteristics. I'm
ok with running some modified mysql to get this to work.

I hope what I am asking is clear. I'm willing to provide more
information if this is not enough.

Thanks!
--
Stephen Tu

--
MySQL Internals Mailing List
For list archives: http://lists.mysql.com/internals
To unsubscribe: http://lists.mysql.com/internals

Reply With Quote
  #2  
Old   
Michael Widenius
 
Posts: n/a

Default re: getting aggregate UDFs to execute with temp tables - 02-21-2012 , 12:50 PM






Hi!

Quote:
"Stephen" == Stephen Tu <stephentu (AT) csail (DOT) mit.edu> writes:
Stephen> Hello,
Stephen> Please bear with me as this is going to be kind of a long post.

Stephen> I have a query that is like (call it query A):

Stephen> SELECT fieldA, agg(fieldB) FROM some_tbl WHERE fieldC <= [some
Stephen> constant] GROUP BY fieldA;

Stephen> Where both fieldA, fieldB, and fieldC are *not* part of any index on the
Stephen> table, and where agg() is some aggregate UDF.

<cut>

Stephen> Therefore, I took the source code for mysql-5.5.21, and in
Stephen> sql/sql_select.cc, I commented out the following lines:

Stephen> 15278 //if (!sum_item->quick_group)
Stephen> 15279 // param->quick_group=0; // UDF SUM function

Stephen> Upon re-compilation, EXPLAIN now tells me for query A that it is doing
Stephen> the following: "Using where; Using temporary table; Using filesort".

Stephen> Great, I thought, that was easy! However, now when I execute query A,
Stephen> agg(fieldB) simply returns NULL. I verified that my UDF isn't even being
Stephen> called. Presumably, there is a reason why UDFs don't go down this code
Stephen> path

Stephen> As I am not very familiar with the mysql internals, I would like to know
Stephen> how I can actually get this to work. Is this even possible?

The reason for the above is the following:

MySQL has two ways to calculate aggreagates:

a)
- Sort things according to the items in the GROUP BY part
- For each row
- call add() method for all aggregated items
- When the GROUP BY changes value, send the current calculated values
to the next stage (normally the client) and call the clear() method
on all aggregate items.

This method is is the one that is supported with the aggregate UDF
interface.

b)
- Create a temporary table to hold one result row. (For each aggregate
item we use the method create_temp_field() to create a field)

- For each row
- Generate the GROUP BY KEY
- Search for a row in the temporary table matching that key
- If row didn't exists
- Call reset_field() for all aggregate items
- Write a row with the group-by key and aggregated item values to
the temporary table
- If row existed
- call update_field() for all aggregate fields
- Update row in temporary table
- Sort temporary table in group-by order and send result to client.


The reason UDF:s can't use b) is that the UDF interface doesn't
support the reset_field(), update_field(), create_temp_field() and
result_item() methods.

(This was done to keep the udf interface simple).

If you have a high desire for speed, then you should consider creating
a native aggregate function in MariaDB/MySQL.

Check the files item_sum.h and item_sum.cc. You should be able to use
solve your problem relatively easily by inherit from some of the
already existing items.

Regards,
Monty
Creator of MySQL and MariaDB

--
MySQL Internals Mailing List
For list archives: http://lists.mysql.com/internals
To unsubscribe: http://lists.mysql.com/internals

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.