Calling Java code in MySQL

This post is a follow-up to my talk at JavaDeus 2008 where I showed how you can make use of Lucene inside of MySQL using an experimental branch and some triggers and stored procedures.

Since the process is not entirely obvious and the branch is very experimental, I thought it would be a nice thing to put it online. It’s pretty amazing with how little code (in Java and otherwise) you can implement a working full-text search engine that simply doesn’t get into your way :)

Note: Be advised that the MySQL server we will build is likely to crash at times. Do not use it in production! And of course, always have a safe backup of important data.

This first part will cover the basics on how the get the necessary foundation in place, configuring the server and calling Java UDFs. In the second part, which I hopefully will post tomorrow, I’ll show the juicy rest of getting Lucene going.

To achieve our goal we need several things:

The software versions I used are:

  • Mac OS X 10.5.3 (nothing specific about OS X here, though)
  • Java 1.5.0_13 (you need the SDK, the runtime will not suffice)
  • gcc 4.0.1
  • mysql-5.1-wl820 revision 2584 (as of 2008-06-19)

Ok, off we go.

After you installed Bazaar, you are ready to get the MySQL 5.1-wl820 sources. On the command line do:

$ bzr branch lp:~starbuggers/sakila-server/mysql-5.1-wl820

This might warn you that you cannot push things back to launchpad, but we’ll just ignore it, because we don’t actually want to push changes back.

At this point, I recommend getting up and doing something else, like boarding a flight to Hawaii, because it will take a bit until the sources are downloaded.

When it’s done change into the directory bzr created and configure the server. This is a branch of the MySQL 5.1 main-line, so everything that applies to building the community sources (or Enterprise for that matter) applies to this tree, as well.

You can go the autoreconf route, or be lazy like I am and use the scripts in the BUILD directory (choose the one that applies to your system).

One thing to watch out for, is that the check for a working javac assumes the Java CLASSPATH contains the current directory. Check that your settings (most likely the CLASSPATH environment variable) contain . in there! Skip the first step if you already have . in your classpath (or alter the command if you are not using bash…aah the joys of Unix).

You most certainly will want to specify an installation prefix for this server (otherwise it would be installed in a system-wide location), so don’t forget to fill in the --prefix option.

$ export CLASSPATH=$CLASSPATH:.
$ cd mysql-5.1-wl820
$ ./BUILD/compile-pentium-debug-max-no-ndb --prefix=<where you want the server to live>
[...lots of output...]

Building MySQL might take a bit, depending on your machine, so do something useful while waiting :)

Double check your classpath if the check for javac fails. (Can you tell I always forget that?!)

Aside: If you own a multicore machine, you can speed up the compile considerably by telling the build script to just configure the server and then build it manually (note the -c). Replace the 9 in the make invocation with “number of cores you have”+1:

$ ./BUILD/compile-pentium-debug-max-no-ndb -c --prefix=<where you want the server to live>
[...lots of output...]
$ make -j9
[...yet more output...]

Now is the time to install the server in the location you gave to the build script above. For simplicity I’ll assume that you have no other MySQL server running, please refer to the MySQL docs on how to setup multiple mysqlds on one box.

$ make install
[...still more output...]
$ cd <installation_directory>
$ ./bin/mysql_install_db

Follow the steps mysql_install_db outputs to secure your installation, though for the purpose of this setup guide I have skipped that and am running totally insecure :P (The real reason is that I’m to lazy to remove the anonymous account when testing stuff like this.)

Let’s start the server and make sure everything works so far:

$ ./bin/mysqld_safe &
$ ./bin/mysql
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.1.26-rc-debug Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema | 
| test               | 
+--------------------+
2 rows in set (0.00 sec)

mysql> quit
Bye

If you can successfully connect we are already halfway there :)

Check that you are connected to the right server (note the server version is 5.1.26-rc-debug) and that there are no databases other that information_schema and test.

Let’s now hit the first roadblock in installing this experimental tree ;)

To load the Java plugin, we will need to connect as root (because we need access to the mysql database):

$ ./bin/mysql -uroot
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 3
Server version: 5.1.26-rc-debug Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> INSTALL PLUGIN `Java` SONAME "psm_java.so";
ERROR 1126 (HY000): Can't open shared library '/private/tmp/mysql-5.1-wl820-build/lib/mysql/plugin/psm_java.so' (errno: 2 dlopen(/private/tmp/mysql-5.1-wl820-build/lib/mysql/plugin/psm_java.so, 2): image not found)

The problem is that apparently the installation process copies the language plugins into the wrong directory. They should be in lib/mysql/plugin/ but are being installed in lib/mysql/. For security reasons we are not allowed to use paths when loading a plugin, so we need to copy (or symlink) them.

$ cd lib/mysql/plugin
$ ln -s ../psm_* .

Another try:

$ ./bin/mysql -uroot
mysql> INSTALL PLUGIN `Java` SONAME "psm_java.so";
ERROR 2013 (HY000): Lost connection to MySQL server during query

Whoops. One problem down, and a crash pops its ugly head up. Let’s investigate by looking at the error log (it’s in var/machinename.err). It will contain a line like:

Can't find class: com/mysql/udf/UDFModule080622 20:38:17 [ERROR] Plugin 'Java' init funtion returned error.

Which basically means: Our old friend, the classpath, is wrong. This being an experimental branch, there are error checks missing to make problems painfully obvious. No worries, we’ll just fix the classpath.

But how?

The psm_java plugin contains code that embeds a JVM and as such there must be a place to tell it its startup parameters, like the heapsize and the classpath. There is a configuration file name jvm_options.cnf in the MySQL datadir, which contains those settings. If it doesn’t exist yet, simply create a file of that name.

Mine contains

-Djava.awt.headless=true
-Djava.class.path=/tmp/mysql-5.1-wl820-build/lib/mysql/java_udf.jar:/tmp/mysql-5.1-wl820-build/lib/mysql/java_udf_util.jar:/tmp/mysql-5.1-wl820-build/lib/mysql/java_udf_example.jar

This file also is the place to tell the JVM which garbage collector to use, which can be very important, as the VM is running inside a database server.

/tmp/mysql-5.1-wl820-build is the basedir where I installed MySQL into, the actual .jars you need are in lib/mysql/. java_udf_example.jar contains a couple of samples to get you started. They are built as part of the build process we did earlier and their sources are in the plugin/java_udf/org/example/mysql/udf/ directory of the MySQL source directory.

After creating or modifying jvm_options.cnf, let’s check if our settings worked. Restart mysqld_safe and connect with the client:

$ ./bin/mysqld_safe &
$ ./bin/mysql
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.1.26-rc-debug Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> SHOW PLUGINS;
+------------+--------+----------------+-------------+---------+
| Name       | Status | Type           | Library     | License |
+------------+--------+----------------+-------------+---------+
| binlog     | ACTIVE | STORAGE ENGINE | NULL        | GPL     |
| partition  | ACTIVE | STORAGE ENGINE | NULL        | GPL     |
| ARCHIVE    | ACTIVE | STORAGE ENGINE | NULL        | GPL     |
| BLACKHOLE  | ACTIVE | STORAGE ENGINE | NULL        | GPL     |
| CSV        | ACTIVE | STORAGE ENGINE | NULL        | GPL     |
| MEMORY     | ACTIVE | STORAGE ENGINE | NULL        | GPL     |
| InnoDB     | ACTIVE | STORAGE ENGINE | NULL        | GPL     |
| MyISAM     | ACTIVE | STORAGE ENGINE | NULL        | GPL     |
| MRG_MYISAM | ACTIVE | STORAGE ENGINE | NULL        | GPL     |
| Java       | ACTIVE | PSM LANGUAGE   | psm_java.so | GPL     |
+------------+--------+----------------+-------------+---------+
10 rows in set (0.00 sec)

You should see the Java plugin as in the output above, if not, recheck your classpath or try the INSTALL PLUGIN line from above again. The server will automatically load the plugin when it starts, there’s no need to issue the INSTALL PLUGIN command again.

Now let’s call our first UDF in Java. We’ll simply take the examples that were packed up for us in java_udf_example.jar. To do that, connect to the database as user root (unless you want to give the anonymous account EXECUTE privileges) and do the following:

$ ./bin/mysql -uroot
mysql> use test
Database changed
mysql> CREATE FUNCTION my_reverse(string VARCHAR(255))
    -> RETURNS VARCHAR(255)
    -> NO SQL LANGUAGE Java
    -> EXTERNAL NAME 'java_udf_example.jar:org.example.mysql.udf.Reverse.go';
Query OK, 0 rows affected (0.00 sec)
mysql> select my_reverse("looc era sFDUavaJ");
+---------------------------------+
| my_reverse("looc era sFDUavaJ") |
+---------------------------------+
| JavaUDFs are cool               | 
+---------------------------------+
1 row in set (0.01 sec)

Congratulations! You have called your first Java method from inside of MySQL!

This MySQL branch has a slightly different syntax of the CREATE FUNCTION and CREATE PROCEDURE commands, in that it has been extended to allow different language names and can associate an external name with the in-database routine. The actual format of the external name is dependent on the language plugin, of course. For Java we can give it the .jar file and the package (all the way down to the method) of the code we want to call. Other than that it’s the same old Stored Routine syntax.

For reference, here’s the code we have just called (from plugin/java_udf/org/example/mysql/udf/Reverse.java):

package org.example.mysql.udf;

public class Reverse {
    public String go(String str) {
        if (str == null) {
            return null;
        }
        StringBuffer sb = new StringBuffer(str.length());
        for (int i = str.length() - 1; i >= 0; i--) {
            sb.append(str.charAt(i));
        }
        return sb.toString();
    }
}

I love the fact that absolutely nothing in the above Java code refers to JDBC or MySQL at all. It could be anything that takes a string and returns a string, which is a whole lot :)

Now obviously we have to be a bit careful about the code we execute as part of our Stored Routine, because it will run inside the mysqld process. Avoid costly (both in terms of CPU and memory usage) code, don’t block etc. Common sense applies.

In the next post, I will show you how to put it all together and use Lucene to transparently index data flowing into your tables. You will even be able to search that fulltext index from within MySQL :)

CU next time.