We’ve just released LadyBugz 1.6.9. It turns out that 1.6.7 didn’t completely fix the crashing bugs, and from the crash reports we’ve received (thanks everyone for letting us know!), it concentrated in one specific area. We’ve revamped that part of the code, and version 1.6.9 should clear that issue now.
We have just released LadyBugz 1.6.6, available from our website and through auto update.
After we released LadyBugz 1.6.1, we have received a number of bug reports that it crashed at various places. We realized that our garbage collection transition was not completely done, and there were loose ends that we overlooked. We take those issues seriously, and after a week of testing, we believe 1.6.6 should behave way better.
I have written previously why we chose to go back from garbage collection to manual memory management, and while it was absolutely necessary, I really wish we could have found the crashing bugs earlier with a broader range of testing data set. Quitting the perks of garbage collection is a hard process (not having to worry about complex object relationship and ownership is especially addictive), and hard-to-reproduce crashing bugs suck. We have learned a thing or two the hard way.
NSBlockOperation *op = [[[NSBlockOperation alloc] init] autorelease];
[op addExecutionBlock:^(void) {
if (![op isCancelled]) {
// run the block if the operation is not cancelled
}
}];
[someOperationQueue addOperation:op];
If you don’t use garbage collection (gc), the snippet above will eat up memory, and the Leaks Instrument will not be able to discover it as a leak. Why?
In Objective-C, blocks behave like objects. In non-gc runtime, they have retain count, can be copied1 and must be released when you want to discard them.
When a block is created, it implicitly retains every Objective-C object referenced in its scope. When the block is deallocated, it sends -release to all those retained objects. In Objective-C, those extra calls are done for you by the compiler. If you set breakpoints to -retain and -release, you’ll be able to observe this behavior as blocks are created or deallocated.
Usually you don’t need to worry about those things, and that’s how you can use blocks as good closures that many other modern languages have.
But here’s the problem: Because blocks only send -release when they are deallocated, not when their scope exits, it is possible for them to retain the objects which retain them. The code snippet above is a case in point. The NSBlockOperation object op retains an execution block, but then the block implicitly retains op. The result of such cyclic retention is that the retain count of op will never reach zero, and hence op will eat up a chuck of unreclaimable memory. And because op is retained by another object, it is not a lone wanderer that you forget to call -release, therefore the Leaks Instrument will never discover it as a leak.
A solution to this problem would be:
NSBlockOperation *op = [[[NSBlockOperation alloc] init] autorelease];
NSValue *weakOpValue = [NSValue valueWithNonretainedObject:op];
[op addExecutionBlock:^(void) {
if (![[weakOpValue nonretainedObjectValue] isCancelled]) {
// run the block if the operation is not cancelled
}
}];
[someOperationQueue addOperation:op];
The NSValue object used above is effectively a weak reference to op. This breaks the retention cycle, so the -dealloc of op will now be called, and the block will also be released.
Cyclic retention is not a problem for Objective-C runtime’s garbage collector, so the first code snippet will not be a problem if you use gc.
Another incomplete solution is for Apple to change the timing of calling -release in a block. Releasing retained objects when a block exits seems to be a reasonable expectation. But, given the nature of the use case above (an NSBlockOperation might never get executed), plus the fact that referenced objects must be retained before a block is entered, it’s not an encompassing solution, although it would make blocks behave more to such expectation.
One final note: Because of the potential risk of cyclic retention, you must be very careful if you use blocks within a block. This is one place where gc can liberate us from such a blocking issue (pun intended), but gc is not available on the iOS and has its issues on Mac, too.
But not retained. The reason is that when a block is first created, its scope refers to the values residing in the nearest outer scope on the stack. If you toss it around (i.e. passing it as an object), it must first make a (const) copy of those values onto the heap. If you just retain a block then pass it around, chances are it’s still referring to the stack. And you know what will happen when dangling stack references are passed around. ↩
Update: Replaced the video with a new version with English subtitles.
Earlier this year, a Taiwanese newspaper, Apple Daily (蘋果日報, not affiliated with Apple Inc.), made an Animation News (動新聞) clip on the Jay Leno-Conan O’Brien spat, which I also re-posted. But Apple Daily’s real fame started with its animated Tiger Woods tabloid series.
Anyway, this is the new one on Apple’s Antennagate. And once again you don’t need to know any Mandarin Chinese (the official language in Taiwan) in order to understand the gist of the story. Now I must warn you not to eat or drink in front of your laptop while watching this clip…
Why My Next Mac App Will Not Use Garbage Collection
Among the many things I have learned from the past few years’ experience of developing desktop applications, here’s one: Implement your app conservatively. This often translates to using only proven technologies. Cocoa’s garbage collection (gc), alas, does not seem to be one of them. My next Mac app will not use garbage collection. In fact, we are actually even taking the pain of modifying an existing garbage collected app to non-gc.
Since Mac OS X 10.5, you can choose to use automatic garbage collection in your Cocoa app. Apple’s take on gc is a brilliant engineering feat, which makes Objective-C a more modern language (and it’s even open source). Before gc, memory management is much like manual transmission and requires a lot of mental arithmetic — you need to retain an object (to increase its refcount) when you start using it, and release it (to decrease its refcount) when you relinquish its ownership. With gc, no more such mental arithmetic is required. It’s great that you are saved the burden of memory management, particularly when you have other more nasty things to worry about, such as multithreading (which makes manual management harder) and binding (which complicates object graph).
Unfortunately, I have found a few instances that Cocoa’s gc doesn’t work that well. This has particularly to do with libraries that are beyond gc’s reach. I’ll name three: Secure Transport (which handles things like HTTPS), icucore (which handles localization and date/time formatting), and XML parsers (I have tried both NSXMLParser and expat). These three can already leak from time to time when being called from the main thread alone, and almost gurantee to leak if used in different threads, even if proper locks and one-instance-per-thread-at-a-time policies are enforced.
You might be tempted to think, big deal, if those stacks have their baggage and work best in a non-gc app, I’ll just write a separate app, and use distributed objects (DO), another fancy technology, to bridge between the gc and non-gc processes. Here’s the bad news: There is apparently a bug in Apple’s DO implementation under gc, and all proxies objects that ever created to communicate with the remote process will not be destroyed until the app’s termination.
So for the time being my conclusion is, if your app is network-intesive, needs to do a lot of things in parallel, and happens to parse a lot of XML and has a fairly complex user interface that relies on binding, then garbage collection probably isn’t for you.
Now the only big question that I have is, how did Xcode manage it? Xcode, as we know it, is a garbage collected app1. It’s also said to be Apple’s most complicated application. Now, unlike Mail.app, Xcode doesn’t seem to do lots of date/time formatting. It reads plist (for which Apple has a faster parser implemetation) but not really XML. The only network-bound parts seem to be the documentation fetcher and the recently-added automatic iPhone provisioning. Much of the IDE seems to be there already in OS X 10.3 days. All told, Xcode seems to stand well.
I’d like to know to make those non-Cocoa parts work with gc. But before that, I’ll take the more conservative path.
I know this because the previous generation of our Adobe Kuler color picker, Mondrianum, used (and still uses) a cover flow image view, which doesn’t work under gc. In the previous generation, we used some NSGarbageCollector hacks (don’t ask) to manage to make the plug-in work within Xcode. In the current version we have decided not to support gc apps like Xcode, because those hacks never worked perfectly. ↩
My company’s FogBugz client application for Mac, LadyBugz, has a new version. We have given the case history view a facelift, along with many improvements and bug fixes. We weren’t sure if we were on the right direction when we started to make the major interface change, so we asked a few of our customers if they’d like to try out a beta version. The initial response was positive, and so we sent the merge command, and the tentative branch became the main. This enables us to take on implementing other frequently requested features, and we believe this version is a substantial improvement.
D. Richard Hipp, creator of SQLite, in sqlite-users mailing list:
Some of the code in SQLite (such as the Lemon parser generator and the printf implementation) dates back to the late 1980s. But the core of SQLite was not started until 10 years ago. Ten years is not that long ago, though it has been long enough to amass 7114 check-ins - an average of 2.1 check-ins per day. If you are overseeing such a project, 10 years seems like forever. It has hard for me to remember a time when I wasn’t working on SQLite.
SQLite is my favorite software project and a role model. It is lightweight, efficient, self-contained, and vastly powerful. Not many software projects can be said of the all four, especially in terms of self-containedness. SQLite now states it status as public domain in a more official manner (out of institutional use considerations), but I believe all of us can learn a lot from the blessing in its source code:
May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
I also had the opportunity to work on a client project that used SQLite’s both paid versions—SEE and CEROD—and SQLite never disappointed.
I wish the best for the project, and look forward to its 20th and 30th birthdays.
Formosana, a Collection of C++ Libraries for Processing Taiwanese Languages
Formosana is a C++ library collection that provides basic building blocks for processing Taiwanese languages. Currently three languages are supported: Mandarin, Holo and Hakka. It also provides a language-agnostic, bigram-based word segmentation library. It has no external dependencies and can be built on most platforms I know of. It is available on github under the MIT License.
My day job is commercial iPhone and Mac software development. In addition to that, I also develop open source software, mostly in the form of libraries. Designing libraries and frameworks is both a good exercise in itself and an important part of software development. It pushes you to think and plan head for future consumption, and it also gives you a good opportunity to think about the fundamentals of a given problem set.
Formosana currently has three major components:
Formosa::Mandarin: A library for processing Mandarin syllables and handling text input keyboard layouts. An abstract data type represents Mandarin syllable. The syllable data type accepts both Pinyin and Bopomofo as input, and can be used to convert to either form as output. Its internal representation guarantees that the syllable in always in the CVCT form, although it does not guarantee that the produced syllable is always phonetically grammatical (i.e. it can be used to produce syllables not found in the actual Mandarin). It also support four major keyboard layouts (expandable) that map a standard US keyboard to Bopomofo symbols.
Formosa::TaiwaneseRomanization: A library for processing Romanized Holo and Hakka. An abstract data type represents Holo or Hakka syllable. Internally it uses POJ (pe̍h-oē-jī, also called Church Romanization by some). It accepts POJ for both input and output. Tâi-lô (TL, or Taiwanese Romanization System), technically a POJ variation that is the standard for Romanized Holo used by Taiwan’s Ministry of Education, can also be used as both input and output. This syllable library has a normalization member function that guarantees only the composed tonal mark is placed on the correct vowel character according to the resonance in the Holo language. It is weaker than its Mandarin counterpart in that the syllable class does not guarantee the represented syllable is always in the Initial+Vowel+Final+Tone form. It accepts both “composed” form (syllable with diacritics) and “uncomposed” form (tone in numerals, also called database query form in the library) for input, and can also produce output in either forms. This library also supports keyboard layout mappings. Both numerical tone input and dead key combinations are supported.
Formosa::Gramambular (literally, “gram walking”): A language-agnostic, bigram-based word segmentation library. It accepts an input set of weight unigram and bigram key-value pairs, and output a best-scored path. If the key is input syllables and value is a Chinese phrase that the syllables represent, the walk is an input method. If we reverse the key and value, it becomes a word segmentation tool. As the library works without any grammatical knowledge, the quality of the dictionary (that provides the data source for weighted nodes) is the deciding factor of the output quality. I have mentioned the principal of the library’s design in a talk at Open Source Developer Conference, Taipei, Taiwan, in 2008. As a bonus, Gramambular has a debug helper that can produce outputs in the Graphviz DOT format, which you can then feed into the tool and get visualizations like this and this.
Each of the components comes with its own demo code. I have also supplied Makefiles (for Mac OS X and other UNIX platforms) and Microsoft Visual C++ solution files for those sample projects.
The library collection makes use of a few helper classes from The OpenVanilla Project. I have included those class files (also written by me) in the source to make the collection buildable with no external dependencies.
Formosana was first designed for developing input methods, and both the Mandarin and the Taiwanese Romanization modules have been used in actual products. Although Gramambular has not yet been used in production, I have previously worked on an implementation based on similar principles for an internal project at my company. The commit history of the project will tell you that Gramambular was written pretty fast (2 days) from ground-up. For me it was also an exercise to start over from scratch to see if the design is solid.
The library collection has many other uses in processing Taiwanese languages. There is also space for improvement. For example, a syllable class that can validate against the phonetic grammar is highly desirable. Currently the Taiwanese Romanization class instances are mutable. Normalization changes internal representation instead of returning a new immutable object. In addition, for the libraries to be useful for building language-related web applications, bindings to major scripting languages are also desirable. These are the things that developers interested in the field can work on.
I’ll be highly interested in hearing from you if you use or plan to use Formosana in your own projects. My contact info regarding to this project can be found on my github profile.