Alan Hogan

Things Alan Hogan feels like sharing.

These are my comments on music, movies, books, web development and programming, Mac tips, and life in general. Enjoy!

Mon Oct 11

PHP + MySQL + Unicode = ugh

Last night I spent two hours trying to get a PHP + MySQL stack to correctly match exactly strings with accented characters (e.g. handling a URL like /résumé which is of course percent-encoded by the browser to /r%C3%A9sum%C3%A9).

It never quite worked.

Oh, I could urldecode() the request string (not sure why that doesn’t happen automatically for me, but hey), and my database is supposedly using UTF-8, and know how to controle how explicit string matching happens (without losing INDEX benefits). But something’s still broken. And why wouldn’t it be? PHP 5 is still not fully Unicode-aware. So whether it’s the database connection or the database abstraction layer I’m using is a bit unclear, but if it’s the latter, how could I really expect much better from a PHP library?

Using MySQL BINARY comparisons and CONVERT(x AS utf8) worked for me in PHPMyAdmin but not in my actual CMS. Seriously, copy and pasting the query from my logs to PHPMyAdmin works, but the CMS fails.

The Ruby on Rails jab at Java frameworks was something like, “Up and running in the time it takes Java to do XML push-ups.” With PHP, my push-ups involve coming up with strategies to avoid issues slashes, magic globals, unicode, and error reporting — things that change wildly from server to server, PHP version to PHP version, and framework to framework.

I’m quite ready to completely migrate to technology stack that actually has Unicode support and sane defaults with less legacy cruft.