Rant on Optimizations -> Faster Beats 1.5b?

Warning! Long, random, technical rant ahead! Read at your own risk!
I may also misuse a few words incorrectly here and there – I apologize in advance for any incoherence within my rant

[rant]

Modern Android

According to the latest data from Google, the majority of phones these days run Android 2.1 (Eclaire) or 2.2 (Froyo) and are most likely sporting mean, snappy 1GHz+ processors capable of even running advanced, graphically/arithmetically-intensive programs such as PlayStation emulators decently. With Android 3.0 (Honeycomb) not too far off in the distance and expected to require top-of-the-line 1GHz processors (or even dual-core processors), expect to be using your next generation smartphones for far more than just simple communication. If you look around today, you may occasionally spot old, cranky Android phones (like the Motorola Cliq, HTC G1, or even the original Motorola DROID) stumbling along with its 500MHz processors and incomplete multitouch touch-screen (or sometimes lack thereof). But those ancient Android 1.5/1.6 phones are easily eclipsed by the newer high end phones such as the Samsung Galaxy S, Droid X, and even Nexus One, all running 1GHz processors with complete multitouch support and capable of running processor-intensive apps such as Beats without a hitch (no, I haven’t actually done any real benchmark test comparisons as I do not own or have access to any phones other than my own, but these tests can give you a nice comparison point).

iPodLinux

With these high performance wonder machines right around the corner, one might think it silly to even wonder, “why bother even trying to optimize things when its going to be running full speed anyway?” The answer from me is, “because I can.” Back when I worked on the iPodLinux project, it was all about pushing things to the limit. Sure, I didn’t have the coding knowledge to program things like dynarec in Ducky and Fellni’s iBoy project, I was always fascinated in making things run fast on my little first-generation iPod nano, strongly encouraged by the fact that my hDoom port easily ran at full speed and much faster than the initial iDoom port it was inspired by/based off. The ultimate flag was my igpSP port of Exophase’s popular Gameboy Advanced emulator. Despite initial skepticism of seeing a playable GBA port on the iPod, igpSP 0.9-2xb K7 (available only in SVN) ended up running most games at roughly 70% speed. So what does this all have to do with Beats? Nothing directly. I’m just ranting about why when it comes to making things run faster, aka making sure Beats runs smoothly and lag-free even on old phones that nobody really cares about, I try anyway and don’t give up easily.

Beats 1.5b?
With so many radical internal code changes between 1.3b and 1.4b, we decided to start taking a deeper look into optimizing Beats up. If you recall, Beats 1.0b was literally coded from scratch and tossed together during the 48hr PennApps 2010 hackathon where performance and code efficiency was dropped in favour for functionality. So after two SVN repos switches and approaching 300 revisions in the current repos, its time we sidetrack a bit away from the bells and whistles planned for Beats 1.5b and take a look at performance.

Summarizing the last week or so, we’ve made quite a few improvements in the current Beats code pertaining to performance. Neither Matt nor I can actually confirm that these improvements will have a real impact on the gameplay experience across the wide spectrum of Android-supporting devices – between the two of us, I have a fast Samsung Captivate that has always run Beats without issue ever since 1.0a while Matt doesn’t even own an Android device – he’s stuck with a slow, clunky emulator running on his Macbook. Nevertheless, the raw numbers are looking better and the FPS counter is going up. Here’s a few brief notes on what we’ve come across:

Traceview

Traceview is Google’s nifty Android profiler. After playing around with it, I ran it on the two methods I thought were the most performance critical – the game’s onDraw method (which draws the arrows on the screen) and the notes’ update method (which updates the arrays of falling notes, missed notes, etc.). The results weren’t surprising, but very much reassuring on what needs to be optimized. For update(), the iterator loops seem to be used far more than they should – each call currently creates 8 queues that are reconstructed on every update (effectively every frame). Not only does the iterators loop through each note object multiple times, each queue creates new objects for each relevant note. The result is a huge mess of loops and (unnecessary) object creations that can definitely be optimized through many approaches.

Current plan involves experimenting with static arrays, the optimized System.arrayCopy where applicable, and running it all on a second thread with locking. Whatever approach we ultimately decide on, it will be more efficient than the current massive “bruteforce” mess. For onDraw, drawBitmap took the stage (as expected of course). What we did find surprising was the total time spent – 109ms in onDraw() vs 2ms in update(). Sure we can optimize away the backend however much we want, but whats the point when graphics accounts for 99% of the slowdown anyway? Current things in mind that we are considering looking into include current bitmap scaling, “dirty” redrawing, clipping used in the holds, static bitmap objects, etc. The possibility of just rewriting everything in OpenGL (and letting/hoping the library takes care of making sure everything is optimized) is still there, but with both Matt and I busy every day with our univ classes and studies, it probably won’t be happening any time soon. Besides, its always a lot more fun when you do things your own way from scratch ; )

Micro-Optimizations
http://developer.android.com/guide/practices/design/performance.html“Designing for Performance”
Some of the advice there made sense but I found much of it silly. “Avoid Internal Getters/Setters” – while I completely understand the logic and reasoning behind it all, I just find it silly why there isn’t any preprocessor that is smart enough to optimize out the overhead of method calling for single-line get/set methods (aka force inline everything or something along those lines). To be honest/fair, no, I haven’t actually looked too deeply into the Android build/run process, but the fact that the process of un-encapsulating all the data fields (lol-Java-OOP-concepts) and increasing the number of static methods has lead to slightly faster code makes me wonder why Google chose Java for embedded device programming. That said, however, I love Java and I love Google, so I’m not really complaining, just pondering on answers outside my scope of knowledge.

Background Redrawing
http://www.curious-creature.org/2009/03/04/speed-up-your-android-ui/“Speed up your Android UI”
This hack was god-sent. We already knew that background drawing was a major contributor to low framerate, but removal of it would mean no pretty picture to look at while you tap away furiously at your fragile phone. Neither of us suspected, however, that this entire time, the phone’s actual background was also being redrawn on each screen update, even if we were using an opaque background. I’m sure there’s plenty more graphical optimization tricks out there – we just have to keep looking!

Haptic Feedback Lag
This might be entirely a Samsung Galaxy S thing, but through a bit of profiling and experimentation, I discovered that a major source of lag actually came from the vibrate calls from taps (or note misses). One way or another, too many calls to Vibrator.vibrate() was somehow causing the phone to hiccup (blocking the UI thread somehow?) and especially problematic when the screen fills with notes and the user panics and reacts by just spam tapping all the arrows. Each tap calls the vibrate function, which in turn slows the phone down and makes the game even choppier, resulting in even more panic. A viscous cycle. Having added haptic feedback specifically such that the user is able to feel the game actually being responsive to his/her taps, I am very reluctant to remove the feature or turn it off by default.

Some quick Google searches of the web and the Android developer’s site/blog returned nothing and I doubt there even are many active/performance critical apps out there that even use the vibrate service, so I doubt I’ll be able to find out anything more than what I discover through experimentation. I tried moving the vibrate call to an AsyncTask, but it resulted in noticeably delayed vibrate responses that would easily lead to very distinct and prominently perceived lag (even though the actual game itself ran slightly smoother). Whether this was due to the AsyncTask overhead or the Vibrator service being slow, I don’t know but probably won’t look much into because the trade-off of performance over user experience just didn’t seem worth it. I was, however, able to spot my silly mistake of allowing for multiple vibrates per game cycle (i.e. request for a vibrate multiple times even when the arrows are being tapped at the exact same time/frame). Removing those redundant requests definitely did lessen the lag significantly, but apart from that, I don’t see much else there can be done without sacrificing responsive haptic feedback. Just got to optimize elsewhere I guess.

Garbage Collector
So there were a few reports of random lag spikes here and there that were clearly not associated with any particular event. So I played around with Beats the past few days and did indeed encounter a few instances of these random lag spikes. Only half a second in duration, they were not caused by any background data, any phone background process, or even a mass influx of notes in the game itself. In fact, they were seemingly 100% random and impossible to reliably replicate. After much thought into the possible cause of the problem, we thought of one (now blatantly obvious) culprit: the Garbage Collector! (no, I haven’t actually gotten around to finish watching the “Google I/O 2009 – Writing real-time games for Android” or “Google I/O 2010 – Writing real-time games for Android redux” videos but I plan to one day when my classes/schedule allows for it).

At the moment, I am highly suspecting the frequent calls to the GC are due to the largely inefficient iterator(s) described earlier and the numerous, unnecessary objects that are created. There’s two approaches to this: 1) explicitly call the GC when things aren’t laggy, or 2) reduce memory usage so that the GC doesn’t even need to get called at all during gameplay (where even a split second of lag could lead to a C-C-C-C-COMBO BREAKER). 1) Would work nice if we could predict at what times in the song is it alright to lag the phone and/or when there isn’t any upcoming note in the next many milliseconds. Unfortunately, such is very hard to predict (boo, theoretical textbook scheduling schemes don’t always work in the dynamic real world) and won’t work if the song you play happens to be spamtastic and full of non-stop notes. 2) of course, is the preferred method.

How would we reduce memory usage? For one, I am very much hoping that whatever approach we take for optimizing the iterators will also result in drastically lowered object creation and subsequently less GC calls. But even that isn’t a complete guarantee as what if its an old phone running Beats with limited RAM? Another alternate approach we were thinking of is the “buckets” approach. We pre-allocate a certain number of falling note objects and only allow for those to display on the screen. Once a note object is removed (either by being hit or missed), it returns to the pool of available objects for reuse. Pro of this approach would be that the GC will never need to be called as there is nothing to collect. Con of this approach is that we don’t know what is a good size pool (this would be hardware dependent wouldn’t it?). Even after we choose a good size, can there arise a situation (1.0x, jumps on, Challenge) where there is far more notes that need to be displayed on screen than the pool can hold? Hopefully the iterator optimizations will make interruptive, random GC calls an extremely rare event, but if not, the buckets approach I think is worth a try.

Results?

(for the stuff that I did try prior to typing out this long rant)
When I first heard about users’ wishes for smoother gameplay on their older phones, I decided to add a simple live FPS counter (and later also a cumulative one) just to get rough comparative numbers. On my Samsung Captivate, with the refresh rate set at 40Hz (i.e. up to 40 redraws every second, one invalidate call every 25ms, resulting in a cap of 40 FPS), I would consistently run at a comfortable 26-28 FPS during simple gameplay, maybe 24-26 FPS when the screen gets busy. Before the vibrate lag fix/tweak, framerate would drop down to 19-21 FPS if I spam tapped; after the fix, it only drops down 2 or 3 FPS. Meaning no more crazy lag when you panic and spam tap your phone to death (or at least less lag). The accumulated micro-optimizations that Google’s documentation recommended improved things a bit; while I couldn’t notice any graphical difference (at best I could say I saw my phone peak occasionally at 34 FPS, but that may have just been a placebo effect), Traceview said something improved, and something is better than nothing (I really need to implement some real benchtesting functionality for Beats one day…).

The most significant boost, however, came from the background hack. Average FPS went skyrocketing from 28-32 FPS to 38-42 FPS (I actually had to raise the refresh rate to 50Hz since it was being capped by the 40Hz’s limit). If I turned off the background image entirely and set the refresh rate to 60Hz, the game would run at an incredibly smooth 52-56 FPS, dipping down no further than 40-45 FPS at any given moment in time (as mentioned earlier, however, these actual numbers aren’t 100% accurate as I re-calculate the FPS every 10 frames based on time elapsed between the two calculations – I only use these numbers for comparative purposes). Of course, I can’t wait to try it out on another phone, but if it leads to a smooth 20+ FPS on even a Motorola Milestone, I’ll be happy enough in terms of optimization (of course the internal loops still need to be redone, but that would be for programming and design sake and not whatever small performance improvement results from it).

[/rant]

That’s it for now in terms of my rant.
tl:dr –> Beats 1.5b is still very much in the works, but when its ready, in addition to having a bunch of cool new features, it will run so fast and smooth on your old phone that you might have to worry about breaking your touch screen from playing too much Beats ; )

~Keripo

Post your comments and feedback in the forum thread here!