Sunday, February 26, 2012

Comparing the optimizer features of MariaDB 5.3 and MySQL 5.6

In his comment for Vadim Tkachenko's post Baron Schwartz writes:

"...I speculate that when MySQL 5.6 is GA, the official MySQL from
Oracle will have a clear advantage over GA versions of MariaDB in
several common types of workloads. When will MariaDB based on MySQL
5.6 be released? I suppose that when MariaDB based on 5.5 is
finished, we will have a rule of thumb that might be useful to
estimate the lag. I’d expect (but I could be wrong) that it will
take somewhat longer to port to 5.6, because unlike the 5.5
codebase where Oracle, Percona, and Monty Program made a lot of
changes in somewhat disparate parts of the server, in MySQL 5.6
there will be a lot of changes that will potentially conflict —
in MySQL 5.6 there are extensive changes to the query optimizer
and the replication codebase, which have been changed a lot in
MariaDB as well. These changes might be difficult to merge."

Baron is known as a prominent advocate of the Oracle MySQL products. In this role he probably believes that the less credit he gives to the competing product MariaDB, the better. As a member of the MariaDB team, I, on my part, can hardly be expected to consider it as a fair position.

My intention is not to convince Baron that MariaDB deserves a more gracious attitude from his side, but rather to provide other members of the MySQL community with a guidance how the information on the state of the MySQL optimizer development at MariaDB and at Oracle could be extracted from the product source code.

I conventionally understand “optimizer” here as the piece of the MySQL server code that is responsible for :
- transformations of the submitted query to one with better execution time
- search for the best execution plan of the transformed query.
Usually new, more efficient algorithms to perform relational operations employed by query execution plans are also considered as the area of responsibility of the optimizer. The user oriented functionality directly related to the optimization process (like support of EXPLAIN) also belong to the domain of the optimizer.

Ok, I'm a member of the MySQL optimizer team since 2003, I'm tightly involved in the MariaDB optimizer development, so it's quite probable that I know a lot of details concerning the optimizer code in MariDB. But what about the optimizer development at Oracle? Can I be considered knowledgeable here? I think so, if you take into account the following circumstances:
  • the main bulk of the optimizer features of MySQL 5.6 were back-ported from ill-fated MySQL 6.0
  • these features were either written by my colleagues/me from Monty Program AB or were partly designed by me or were reviewed by me
  • the MySQL 5.6 code is still open (lp:mysql-server/trunk) and I regularly review the optimizer related fixes from there and back-port them to the MariaDB code if I find them appropriate
  • the MySQL optimizer features that were developed by Oracle proper are not too complex yet and can be accessed by any expert
So what are the main optimizer features of MariaDB5.3/MySQL 5.6? They are:The first five of these features were back-ported into MariaDB 5.3 code line by me and Sergey Petrunia in December 2009 – February 2010. At Oracle they were back-ported into MySQL 5.6 code line by Evgeny Potemkin by June 2010. Initial back-ports code-wise were almost identical. Yet when back-porting into MariaDB some re-engineering was performed.

After the back-port of optimizer features from MySQL 6.0, the MariaDB team decided to add:
  • null-aware check of IN/NOT non-correlated subquery predicates (MWL #68)
  • smart choice between materialization of not-semijoin IN subqueries and transformations of the IN subquery predicates into EXISTS subqueries (MWL #89)
  • inside-out execution of non-semijoin subqueries (MWL #90)
  • subquery cache for correlated subqueries (MWL #66)
We also planned to implement MRR for primary keys in InnoDB. Without it InnoDB primary keys could not be used for BKA and this compromised the performance of BKA join greatly (see MWL #121).

And of course, from the very beginning we planned to back-port the code of WL #4777 and to complete the development of the optimizations for derived tables (MWL #106).

We did not experience any problems with the implementations of MWL #68 and MWL #66. MWL #106 promised to be a not so easy project, but by May 2010 it was successfully completed. MWL #90 required some re-engineering work. Yet it was finished in more or less expected time.

Quite unexpectedly MWL #89 and MWL #121 turned out to be the tasks at which we stumbled badly (especially the first one). So badly that I had enough time to finish the task I had never planned for MariaDB 5.3: an implementation of the block-based hash join algorithm (MWL #128). This task required some re-engineering work for the other block-based join algorithms though.

Anyway, by the end of July 2011 (too much for a development cycle, anyone of us agrees upon this!) all optimizers features were in and we released MariaDB 5.3.0 beta. 7 months more and now we are ready to release 5.3.5 GA.

What about the back-ported optimizer features in MySQL 5.6?Will subquery optimization appear in MySQL 5.6 eventually? Looking at the development tree I can't surely say so. Yes, they are enabled in the optimizer team tree and everybody can install the tree and build MySQL 5.6 with subquery optimizations. But I can't tell you how close the team is to the completion. I could if I had looked at their outstanding bugs. But this information is not open for me.

Anyway, if you really want to compare the gains you get from optimizer features of MySQL 5.6 with those of MariaDB 5.3 I would recommend you to build the MySQL 5.6 tree with subquery optimizations.

Did we make such comparison? Yes, we did with the DBT-3 benchmark on the databases of scale factor 10 (~30GB), 30 (~80GB), 100 (270GB) for InnoDB and MyISAM.

Where does MariaDB win? On Q3, Q5, Q7, Q8, Q10. Why? Mainly because when employing BKA we can use MRR for primary keys in InnoDB and MRR with sorted keys for MyISAM/InnoDB. How much do we win? It depends the engine and the database scale factor (and the type the disk system, of course).
For example, when executing Q7 for the InnoDB database scale factor 10 we won 257s : 1886s , while when executing Q3 we won 288s : 691, when executing Q8 – 330s : 952s (a conventional HDD was used).

For the InnoDB scale factor 100 we had the following ratio
Q3. 980s : 2008s
Q7. 1393s : 8498s
Q8. 1118s : 3209s.

Do we lose sometimes? Never for MyISAM. Yet on Q9 and Q12 for InnoDB we did. We did not lose too much, but quite noticeably : 589s : 403s on Q9, 465s : 336s on Q12 if to execute on the database of scale factor 10.
The actual cause of this loss is to be investigated yet, but the interesting thing is that we do not lose here if we do not ask for MRR key sorting.

What about subquery optimizations for DBT-3? Here MySQL 5.6 and MariaDB 5.3 show approximately the same results. A more disappointing fact is that, with the exception of Q18 execution of which on MariaDB 5.2 takes years while on MariaDB 5.3 it takes a couple of minutes, other queries with subqueries from DBT-3 do not benefit too much from the new subquery optimizations.

Comparing optimizer features of MariaDB 5.3 and MySQL 5.6 I should mention also other optimizer improvements that do not overlap.
For MariaDB 5.3 these are:For MySQL 5.6 these are:(BTW, have you ever tried to execute Q7 or Q8 with optimizer trace turned on and to look through the results?).

The MariaDB 5.5 tree is green in Buildbot and I expect the first alpha release these days. MariaDB 5.5 will not include much of new development from MP AB. It will be mainly a merge of MySQL 5.5 with MariaDB 5.3. Yet some cute optimizer features will appear there.
I already blogged about one of them. Another feature will allow us to use subquery optimizations for some [NOT] EXISTS subqueries. There will be a couple of other very specific optimizer features developed for our valuable customers.

The optimizer code of MariaDB and that of MySQL diverge more and more. Does it mean that we won't be able to back-port interesting features from MySQL 5.6 into MariaDB? Not at all. We still understand the optimizer internals pretty well.

Do Oracle's MySQL optimizer guys understand it as well? Judging the evolution of their bug fixes I could say that definitely their understanding is getting better and better. I can't say that they've already reached our level, but it may happen in the future.
I almost forgot to give you their names in order you could easily detect their patches in the huge mysql trunk tree to follow their progress yourself and to be able to make judgments yourself rather the trust some shaky speculations. Here they are (in no particular order):
  • Roy Lyseng (from Sun, Norway)
  • Jorgen Loland (from Sun, Norway),
  • Ole John Aske (from Sun, Norway),
  • Jon Olav Hauglid (from Sun, Norway),
  • Olav Sandstaa (from Sun, Norway),
  • Oystein Grovlen (from Sun, Norway),
  • Tor Didriksen (from Sun, Norway),
  • Guilhem Bichot (from MySQL AB, original Maria team, France)
  • Evgeny Potemkin (from MySQL AB, original optimizer team, Russia)
(there is another Russian guy who is from the original optimizer team, who implemented EXPLAIN for DML statements. His name is Gleb Shchepa. Yet I'm not sure that he has not moved to another team).

And here we are, those who actively work on optimizer features in MariaDB:
  • Sergey Petrunia (with MySQL/MariaDB since November 2003, Russia)
  • Timour Katchaounov (with MySQL/MariaDB since April 2004, Bulgaria)
  • Sanja Byelkin (the author of the query cache, the author the original subquery code, one who implemented views in MySQL, Ukraine)
  • Igor Babaev (with MySQL/MariaDB since December 2002, USA/Russia)
    and, of course, the last, but not the least,
  • Michael (Monty) Widenius (the main reviewer of the optimizer code now, Finland)