MP4 improvements in Firefox for Android

One of the things that has always been a bit of a struggle in Firefox for Android is getting reliable video decoding for H264. For a couple of years, we’ve been shipping an implementation that went through great heroics in order to use libstagefright directly. While it does work fine in many cases, we consistently get reports of videos not playing, not displayed correctly, or just crashing.

In Android 4.1, Google added the MediaCodec class to the SDK. This provides a blessed interface to the underlying libstagefright API, so presumably it will be far more reliable. This summer, my intern Martin McDonough worked on adding a decoding backend in Firefox for Android that uses this class. I expected him to be able to get something that sort of worked by the end of the internship, but he totally shocked me by having video on the screen inside of two weeks. This included some time spent modifying our JNI bindings generator to work against the Android SDK. You can view Martin’s intern presentation on Air Mozilla.

While the API for MediaCodec seems relatively straightforward, there are several details you need to get right or the whole thing falls apart. Martin constantly ran into problems where it would throw IllegalStateException for seemingly no valid reason. There was no error message or other explanation in the exception. This made development pretty frustrating, but he fought through it. It looks like Google has improved both the documentation and the error handling in the API as of Lollipop, so that’s good to see.

As Martin wrapped up his internship he was working on handling the video frames as output by the decoder. Ideally you would get some kind of sane YUV variation, but this often is not the case. Qualcomm devices frequently output in their own proprietary format, OMX_QCOM_COLOR_FormatYUV420PackedSemiPlanar64x32Tile2m8ka. You’ll notice this doesn’t even appear in the list of possibilities according to MediaCodecInfo.CodecCapabilities. It does, however, appear in the OMX headers, along with a handful of other proprietary formats. Great, so Android has this mostly-nice class to decode video, but you can’t do anything with the output? Yeah. Kinda. It turns out we actually have code to handle this format for B2G, because we run on QC hardware there, so this specific case had a possible solution. But maybe there is a better way?

I know from my work on supporting Flash on Android that we use a SurfaceTexture there to render video layers from the plugin. It worked really well most of the time. We can use that with MediaCodec too. With this output path we don’t ever see the raw data; it goes straight into the Surface attached to the SurfaceTexture. You can then composite it with OpenGL and the crazy format conversions are done by the GPU. Pretty nice! I think handling all the different YUV conversions would’ve been a huge source of pain, so I was happy to eliminate that entire class of bugs. I imagine the GPU conversions are probably faster, too.

There is one problem with this. Sometimes we need to do something with the video other than composite it onto the screen with OpenGL. One common usage is to draw the video into a canvas (either 2D or WebGL). Now we have a problem, because the only way to get stuff out of the SurfaceTexture (and the attached Surface) is to draw it with OpenGL. Initially, my plan to handle this was to ask the compositor to draw this single SurfaceTexture separately into a temporary FBO, read it back, and give me those bits. It worked, but boy was it ugly. There has to be a better way, right? There is, but it’s still not great. SurfaceTexture, as of Jelly Bean, allows you to attach and detach a GL context. Once attached, the updateTexImage() call updates whatever texture you attached. Detaching frees that texture, and makes the SurfaceTexture able to be attached to another texture (or GL context). My idea was to only attach the compositor to the SurfaceTexture while it was drawing it, and detach after. This would leave the SurfaceTexture able to be consumed by another GL context/texture. For doing the readback, we just attach to a context created specifically for this purpose on the main thread, blit the texture to a FBO, read the pixels, detach. Performance is not great, as glReadPixels() always seems to be slow on mobile GPUs, but it works. And it doesn’t involve IPC to the compositor. I had to resort to a little hack to make some of this work well, though. Right now there is no way to create a SurfaceTexture in an initially detached state. You must always pass a texture in the constructor, so I pass 0 and then immediately call detachFromGLContext(). Pretty crappy, but it should be relatively safe. I filed an Android bug to request a no-arg constructor for SurfaceTexture more than two years ago, but nothing has happened. I’m not sure why Google even allows people to file stuff, honestly.

tl;dr: Video decoding should be much better in Firefox for Android as of today’s Nightly if you are on Jelly Bean or higher. Please give it a try, especially if you’ve had problems in the past. Also, file bugs if you have issues!