Using OS X’s Built-in ICU Library in Your Own Project
Unicode string is usually not difficult to handle1. The tricky part comes when you need to have knowledge of Unicode on things like:
- Turning strings like “Café” to upper case or lower case (the
toupper()ortolower()in Standard C Library won’t work) - Stripping all punctuation symbols. This is not as easy as it seems. How do you also strip various European quotation marks as well as full-width ones used in East Asian text?
Some languages, like Java, already have such Unicode knowledge baked in. For C and C++, the standard library to use is IBM’s ICU.
You can either build ICU from its source code, or use tools like MacPorts to install it for you. But you may also wonder: Since OS X has great internationalization support, perhaps OS X also uses ICU?
It turns out that OS X does include a version of pre-built ICU library2 placed in /usr/library/libicucore.dylib, but for various reasons — and I also assume it has much to do with the fact that ICU is written in C++, and library versioning with C++ is a pain — it does not include the headers for you to use.
Here’s how you can link against the built-in library:
- Install a copy of the latest ICU
- In your project file, set header path to include the installed ICU header (e.g.
/opt/local/include) - Define
U_DISABLE_RENAMINGin your project (i.e. setting the compiler flag-DU_DISABLE_RENAMING=1)— this turns off ICU’s version renaming scheme so that you’ll be able to link against an older library (the one that comes with OS X in this case) with the latest header - Link against
libicucore(with the linker flag-licucore)
As the name implies, it only contains the core ICU library, so it lacks quite a few things. For example, C++ methods such as UnicodeString::toUTF8, classes as StringPiece, StringByteSink are missing. A good way to check availability is to use the UNIX tool nm to dump libicucore.dylib. Your best bet is to stick to the C API. On the other hand, this saves the trouble of including your own copy of ICU.
-
Read Joel Spolsky’s excellent primer on Unicode if it still isn’t to you. ↩
-
Many of NS/CFString and NSCharacterSet’s internationalization features use ICU under the hood. I haven’t tested it, but I believe the tips provided here can also be used on iOS. ↩