I’ve been following the excellent work that Jan, Kay, and others have been doing with MySQL Proxy, it has really matured into a great piece of software. I talked to Jan at the MySQL UC and toyed with the idea of integrating libdrizzle into MySQL Proxy. I’ve also been asked by a number of folks when a Drizzle Proxy project will be started and if it will be as feature rich as MySQL Proxy. For a while I just said “Someday, I just don’t have the time.” Lately though I am hoping we never have a Drizzle Proxy project.
Let me explain.
One of the fundamental ideas in software engineering is code reuse through libraries or modules. Rather than create a Drizzle Proxy project, why not add a proxy module into the Drizzle server? This way, at any point during the query execution path, you could toss the query to the proxy module to deal with, and the main execution engine would be done. You could of course run the Drizzle server in a “proxy only” mode where new queries may only be parsed and then a post-parsing module determines where and how that query is proxied. Post proxy hooks will be needed as well for result processing. Functionally, it’s the same thing as the proxy, but without having to reinvent the components needed in the proxy. (Just as a side note, I understand this may not have been an option for the MySQL proxy folks).
So, to be clear, I still want to have proxy functionality, just not as an independent project.
Even with a proxy module inside of the server, I’d like to address some of the reasons proxies are created and used. These are not necessarily specific to a database proxies, many of these reasons apply to other server types as well. In the case of a database proxy, especially with Drizzle, I would like to address the list of reasons below in a different way. Why? In most architectures, I see a proxy server as a fix for a shortcoming with another component, possibly in the client, server, or maybe even in the application data model. It also introduces latency and another failure point that may not be necessary. The less code and machines your application has to run through, the better. Don’t get me wrong, there are reasons to use proxies, but sometimes they are used as a hack.
- Query processing and rewriting - In Drizzle we plan to add query rewrite plugin hooks, both pre-parser and post-parser. At some point we want to add pluggable parser support and clean up the abstract syntax tree. These plugins would enable rewriting of queries at a few different levels, both with the raw strings or with rearranging the syntax tree before the optimizer takes over.
- Query multi-cast, data partitioning, result merging - In my opinion, this may could probably be done at the client library layer or through another system such as Gearman. If pushing that logic into the client is not an option, you could still accomplish this through the proxy module I mentioned above, possibly running the server in a mixed-mode (some queries answered locally, some proxied).
- Connection Pooling/Concentration - People often confuse these two terms. Pooling is the re-use of connection on a client side. This should be pushed to client APIs whenever possible. When this is not possible, you need to use a generic TCP proxy or database proxy, but these should only be run locally (not on a separate machine). Concentration is a piece of software that acts as a connection multiplexer. It takes multiple client side connections and allows them to map onto a single connection to the server. This is usually because the server does not have an efficient threading or file descriptor handling model to withstand thousands of connections. It’s not always an option to re-architect a server to handle this, but it should be preferred over creating another layer to do the concentration for you. In Drizzle, this is one thing I have a particular interest in. It involves improving or re-writing the pool-of-threads scheduler and making the execution engine more stateful so it can yield a thread when it knows it will block.
- Sharding, HA/failover - Again, something I think belongs in the client library, and is part of the new Drizzle protocol. I’ll be adding support into libdrizzle to manage sharding and connection failover shortly.
- Debugging layer - At some point we should be adding probes into the server where output can be piped to a module of your choice. For example, you can register for a set of events and have a module send those out into a Gearman network for processing and debugging. This will give you the flexibility to process probe output however you want and does not introduce another layer just for debugging.
These are things I plan to work on at some point or would like to help someone else work on inside Drizzle. Also, these are my own thoughts and may not be shared by fellow Drizzle developers. Treat this as an invitation for discussion. :)