SQL Zone is brought to you in partnership with:

Matt Raible has been building web applications for most of his adult life. He started tinkering with the web before Netscape 1.0 was even released. For the last 16 years, Matt has helped companies adopt open source technologies (Spring, Hibernate, Apache, Struts, Tapestry, Grails) and use them effectively. Matt has been a speaker at many conferences worldwide, including Devoxx, Jfokus, ÜberConf, No Fluff Just Stuff, and a host of others. Matt is a DZone MVB and is not an employee of DZone and has posted 144 posts at DZone. You can read more from them at their website. View Full User Profile

Be careful when switching MySQL to UTF-8

05.06.2011
| 6221 views |
  • submit to reddit

Earlier this week, I noticed a couple strange issues on this blog and sent an email to the roller-user mailing list. I figured both issues were caused by my upgrade to Roller 5. Basically, my tag cloud wasn't working and I noticed a bunch of blog entries that had truncated data. I'd provide links to the truncated posts, but I believe I've fixed them, so links would be useless. This post is to make others aware of something I wasn't: be careful when switching MySQL to UTF-8.

The first issue, 404s from my tag cloud links, was something my theme was missing. It seems the tagIndex action is new in Roller 5 and required for tag clouds to work. To fix this issue, I had to add the following XML to my theme's theme.xml file.

<template action="tagsIndex">
<name>TagsIndex</name>
<description>Tag index page</description>
<link></link>
<navbar>false</navbar>
<hidden>true</hidden>
<templateLanguage>velocity</templateLanguage>
<contentType>text/html</contentType>
<contentsFile>Weblog.vm</contentsFile>
</template>

Since I wanted to replicate Roller 4's behavior, pointing the contentsFile to Weblog.vm worked just fine. The nice thing is I can always change it to another page and customize it to show more information about the selected tag.

The 2nd issue, data truncation, was a bit trickier. I thought it might've been something Roller did when upgrading my database from Roller 4 to 5. I didn't suspect upgrading from MySQL 3 to 5 would cause it. From my previous upgrade post:

At this point, I figured my database might be slightly hosed, but since it was simply creating tables, I was probably OK. I restarted Tomcat and left the old version in place while I waited for a MySQL 5 database instance from my hosting provider, KGB Internet. Once I got the new instance, I imported my backed-up database, ran the upgrade script and everything worked just peachy.

Keith at KGB looked into my issue and thought the problem was the charset. My old MySQL 3.x database used latin1 while my MySQL 5.x database uses UTF-8. The symptom looked familiar:

Be careful when switching to UTF-8. Once you have converted your data, any program/webapp that uses the database will have to check that the data they are sending to the database is valid UTF-8. If it isn't then MySQL will silently truncate the data after the invalid part, which can cause all sorts of problems.

Luckily, I had a backup of my pre-upgrade database and was able to convert and recover everything successfully with a little iconv, perl and numerous mysqldump and mysql import commands. Of course, it's possible there's still some jacked entries and comments. If you noticed any truncation, please let me know and I'll get them fixed.

From http://raibledesigns.com/rd/entry/be_careful_when_switching_mysql

Published at DZone with permission of Matt Raible, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)