Unicode string is usually not difficult to handle1. The tricky part comes when you need to have knowledge of Unicode on things like:
toupper()
or tolower()
in Standard C Library won't work)Some languages, like Java, already have such Unicode knowledge baked in. For C and C++, the standard library to use is IBM's ICU.
You can either build ICU from its source code, or use tools like MacPorts to install it for you. But you may also wonder: Since OS X has great internationalization support, perhaps OS X also uses ICU?
It turns out that OS X does include a version of pre-built ICU library2 placed in /usr/library/libicucore.dylib
, but for various reasons — and I also assume it has much to do with the fact that ICU is written in C++, and library versioning with C++ is a pain — it does not include the headers for you to use.
Here's how you can link against the built-in library:
/opt/local/include
)U_DISABLE_RENAMING
in your project (i.e. setting the compiler flag -DU_DISABLE_RENAMING=1
)— this turns off ICU's version renaming scheme so that you'll be able to link against an older library (the one that comes with OS X in this case) with the latest headerlibicucore
(with the linker flag -licucore
)As the name implies, it only contains the core ICU library, so it lacks quite a few things. For example, C++ methods such as UnicodeString::toUTF8
, classes as StringPiece
, StringByteSink
are missing. A good way to check availability is to use the UNIX tool nm
to dump libicucore.dylib
. Your best bet is to stick to the C API. On the other hand, this saves the trouble of including your own copy of ICU.
Read Joel Spolsky's excellent primer on Unicode if it still isn't to you. ↩
Many of NS/CFString and NSCharacterSet's internationalization features use ICU under the hood. I haven't tested it, but I believe the tips provided here can also be used on iOS. ↩