Perl script to convert MySQL character set to utf8

“Groovy, baby!”, as international man of mystery likes to say.

It seems converting character and text data to utf8 character set is a common task for MySQL, especially during an upgrade. In fact, I had trouble with it during server and WordPress database upgrade for this blog site. I wrote about it in this post, where I explained how to do it step-by-step using a command line tool such as mysql, taking advantage of some nifty code generation trick with information_schema.

One drawback of that method is that it’s a manual process, therefore time-consuming and error-prone.

I spent some time today to cook up a simple Perl script to automate this task. See below. Remember to change MyDatabase to your database name (there are 3 places that need to be changed), and adjust the user name and password accordingly. This script also prints out sql statements used during the process so you will know what has been done to the database.

I have tested this successfully on my own WordPress blog. Hopefully it will help somebody out there. Enjoy!

#!/usr/bin/perl
# MyUtf8Converter.pl - convert all character data to utf8 character set
use strict;
use warnings;
use DBI;

# data source name, username, password, connection attributes
my $dsn = "DBI:mysql:MyDatabase:localhost";
my $user_name = "MySQLUserName";
my $password = "MySQLPassword";
my %conn_attrs = (RaiseError => 1, PrintError => 0, AutoCommit => 1);

# connect to database
my $dbh = DBI->connect ($dsn, $user_name, $password, \%conn_attrs);

# Convert char types first. Use information_schema to generate alter table code to convert data to binary first.
my $sth = $dbh->prepare ("SELECT CONCAT('ALTER TABLE ', table_name, ' MODIFY ', column_name, ' ', REPLACE(column_type, 'char', 'binary')) FROM information_schema.columns WHERE table_schema = 'MyDatabase' and data_type LIKE '%char%'");

$sth->execute ();

# Run alter table stamement. First convert char types to binary types, then convert those to char types utf8 character set
while (my ($val) = $sth->fetchrow_array ())
    {
        print $val, "\n";
        $dbh->do ($val);
        $val =~ s/binary/char/;
        $val = $val . " CHARACTER SET utf8";
        print $val, "\n";
        $dbh->do ($val);
    }
$sth->finish ();
# Now we convert text data. Use information_schema to generate alter table code to convert data to blob first.
$sth = $dbh->prepare ("SELECT CONCAT('ALTER TABLE ', table_name, ' MODIFY ', column_name, ' ', REPLACE(column_type, 'text', 'blob')) FROM information_schema.columns WHERE table_schema = 'MyDatabase' and data_type LIKE '%text%'");

$sth->execute ();

# Run alter table stamement. First convert text types to blob types, then convert those to text types utf8 character set
while (my ($val) = $sth->fetchrow_array ())
    {
        print $val, "\n";
        $dbh->do ($val);
        $val =~ s/blob/text/;
        $val = $val . " CHARACTER SET utf8";
        print $val, "\n";
        $dbh->do ($val);
    }
$sth->finish ();

$dbh->disconnect ();