Trace: » Gwenole's Blog

Video Decode Acceleration: Benchmarks

If you wondered how well does video decode acceleration work on Linux, here are a few tests I run on current hardware (HW).

Platforms. This only covers AMD, Intel and NVIDIA systems since I don't have any VIA platform capable of HD video decode acceleration through dedicated HW. It would be interesting to have such configurations for completeness, though.

G84 M98 PSB
CPU 2x Opteron 2218
→ 2.6 GHz, dual core
1x Phenom 8450
→ 2.1 GHz, triple core
1x Atom Z530
→ 1.6 GHz, hyper-threaded
GPU GeForce 8600 GT
→ 580 MHz, 256 MB
Mobility Radeon HD 4870
→ 550 MHz, 1 GB
GMA500 (US15W, 'Poulsbo')
→ 200 MHz
Graphics driver nvidia 185.18.14 fglrx 8.62.4 psb 0.23 / psb-video 0.24
Screen resolution 1920×1080 1920×1080 1280×1024

Video players. All players are based on MPlayer SVN rev 29365 and FFmpeg SVN rev 19203. Other dependencies include:

  • XVideo for */xv configurations. This only accelerates color space conversion (YV12 → RGBA)
  • VDPAU patches (integrated) for nvidia/vdpau
  • VA API patches for */vaapi, which depend on libVA 0.29+sds8 API. Additional backends may be necessary:
    • xvba-video 0.2.2 for XvBA support (AMD chipsets)
    • vdpau-video 0.3.1 for VDPAU support (NVIDIA chipsets)

As a side note, it's worth mentioning that VA API implementations are capable to address a wide range of hardware video decoders under Linux. That is, either directly (Intel, VIA) or indirectly through adaptors (VDPAU/NVIDIA, XvBA/AMD).

Results. The figures below show CPU utilisation for MPlayer, in percent. This accounts for video decode and video display only, i.e. audio decoding is disabled. The whole video is played for each test, which takes 1 to 2 minutes to complete. Note: anything above 95% of CPU is considered to be maxed at 100% and frames drop is expected. In other words, the video is not suitable for the designated configuration.

“300” trailer (1) Flight Simulator X (2) “Riddick” trailer (3)
Video codec information H.264, 1080p, 8 Mbps VC-1, 720p, 15 Mbps VC-1, 1080p
G84, nvidia/xv 68.3% 66.3% 86.0%
G84, nvidia/vaapi/vdpau-video 0.6% 24.5% 26.5%
G84, nvidia/vdpau 0.9% 20.8% 26.7%
M98, fglrx/xv 75.3% 78.9% 86.3%
M98, fglrx/vaapi/xvba-video 4.3% 10.9% 12.8%
PSB, poulsbo/xv 98.4% 96.5% 97.8%
PSB, poulsbo/vaapi 2.2% 6.1% 12.4%
  1. “300” Trailer
    I don't quite remember where it comes from

Some rapid conclusions:

  • Poulsbo is the most power-efficient (<2.5W) solution capable of full HD video decode acceleration
  • VC-1 decode acceleration on NVIDIA chips look rather poor to worst. Anyway, who cares about VC-1?

NSPluginWrapper 1.3.0

nspluginwrapper 1.3.0, a snapshot from the development branch, is now available. This is work-in-progress, so you will only find source code here (at the bottom) and binary packages are not supported.

Changes from 1.2.2 to 1.3.0:

  • Don't poll for Xt events in Gtk (XEMBED) plug-ins
  • Use 40 Hz timer for Xt events only when necessary (Xt input sources)
  • Add NPIdentifier and NPClass::HasMethod caches, i.e. lower RPC traffic
  • Add support for multiple viewer paths, see –viewer-paths=PATH-EXPR configure option
  • Add basic checks for malloc()'ed buffer underflow/overflow
  • Add checks for single-threaded calls into the browser (NPN_*() functions)

Changes from 1.2.0 to 1.2.2 :

  • Fix support for the VLC plug-in (0.8.6)
  • Fix memory deallocation in NPN_GetStringIdentifiers()
  • Fix return value if stream creation failed in standalone player

Release focus. nspluginwrapper 1.3.x development series (leading to 1.4.0) focuses on performance optimization, with an emphasis on energy efficiency. This means various optimizations are under development to reduce the activity of nspluginwrapper processes, thus making them less CPU demanding. However, it's important to note we are bound to the plug-in's limitations (typically, Adobe Flash Player), so this is not a magical solution. Only performance issues on the nspluginwrapper side are examined. There is no way you could expect hungry plug-ins to reduce CPU usage. Also note that nspluginwrapper overhead, incurred from the out-of-process plug-in execution model, is typically under 5%, and even under 1% actually…

Reduced timers. Prior to this version, nspluginwrapper always used two additional timers for Xt-specific tasks, even if the plug-in was not using this toolkit but Gtk for example (e.g. Adobe Flash Player). Now, Xt events poll functions are only used for… Xt plug-ins, as this ought to be. This condition is met if we are not running the plug-in in XEMBED mode.

There is also another optimization specific to Xt plug-ins. The other timer was a 40 Hz (25 ms) timer checking for anything that was not reaching the X communication socket, i.e. XtIMXEvent in Xt literature. This covers signals (XtAppAddSignal()XtIMSignal), inputs (XtAppAddInputXtIMAlternateInput) and of course timers (XtAppAddTimeOutXtIMTimer). Various tricks are now used to fold them into the timer that was initially polling for XEvents. The technique is actually based on internal Xt data structures. Of course, there are checks to see if the data structures meet the expectations. Otherwise, everything operates as if we still had a 40 Hz timer.

Reduced RPC traffic. For starters, I wanted to look at a specific example: Google GrandCentral. When I debugged an XEMBED problem (in nspluginwrapper 1.1.x series), I also noticed excessive RPC traffic. In particular, many round-trips were occurring for the following functions: NPClass::HasProperty(), NPClass::HasMethod(), NPN_UTF8FromIdentifier() and NPN_Invoke(). While we can't really do anything for the latter invoke function, some simple things can be done for the former functions: caching.

NPClass::HasMethod() results can be cached trivially because the set of member functions in an NPObject doesn't change. NPClass::HasProperty() can't be cached because the set of properties can be changed without notification: properties are not necessarily constructed or modified by NPClass::setProperty(). However, given an NPIdentifier, we can be assured that it doesn't reference a property if it does actually reference a method. So, the NPClass::HasMethod() cache can also be used for NPClass::HasProperty().

NPN_UTF8FromIdentifier() results can also be cached. Of course, an NPIdentifier is defined to be unique, so it could actually be implemented as a pointer to its inner information: kind (integer or string), value, etc. However, there is no garbage collector in nspluginwrapper and no means to know when an NPIdentifier is no longer necessary, so another solution is needed to guarantee good use of memory. A simple hash table of (up to) 256 elements, with NPIdentifier as the key, will fit this purpose.

I instrumented nspluginwrapper to gather a few stats about RPC: time spent in rpc_method_invoke() et al. functions, how many bytes are transferred, etc. Lifetime of the plug-in was fixed programmatically to 3 minutes.

Without NPRuntime data caching:

  • RPC overhead: 1.37% (2.458 sec / 179.850 sec)
  • RPC transfer: 2048820 bytes (recv: 1290671, send: 758149)

With NPRuntime data caching:

  • RPC overhead: 0.52% (0.930 sec / 180.567 sec)
  • RPC transfer: 677978 bytes (recv: 475629, send: 202349)

As you can see, those simple changes are a benefit to GrandCentral: the amount of bytes transferred through RPC reduced by approx. 65%!

Downloads. nspluginwrapper 1.3.0 is available as source code only. You can get packages from your distributor or build your own. RPMs are really simple to get: rpmbuild -tb nspluginwrapper-1.3.0.tar.bz2 will generate binary RPMs in your usual tree.

NSPluginWrapper 1.2.0

nspluginwrapper reached stable release! This is a great improvement over version 1.0.0, released almost 6 months ago. Thanks to everyone who tested the development series early.

Changes from 1.1.10 to 1.2.0:

  • Drop the obsolete mkruntime script
  • Use valgrind if NPW_USE_VALGRIND=yes
  • Add support for SunStudio compilers
  • Add support for Flash Player 10 on OpenSolaris 2008.11
  • Fix build on non-Linux platforms
  • Fix NPP_Destroy() to keep NPP instances longer
  • Fix NPP_Destroy() to destroy the plugin windows

Now, for those who missed development snapshots, here is a list of the most notable changes.

Windowless plugins are now supported in Firefox 3. Only Flash Player 10 and a modified DiamondX plugin were tested. There currently is no means to disable windowless plugins at run-time, unless they offer this possibility themselves (e.g. WindowlessDisable config option for Flash Player). However, if you really insist, this can be turned off at compile-time through the ALLOW_WINDOWLESS_PLUGINS macro in npw-viewer.c. Also note that, sometimes, popup menus triggered by a right-click won't show up in windowless mode. Likewise, you may also experience a few browser window redraw issues when Gtk2 modal dialogs are raised by the plugin in windowless mode.

Native plugins. nspluginwrapper is basically an out-of-process plugins viewer communicating with the browser through some RPC. Its primary goal was to allow Linux/i386 plugins (Adobe Flash, in particular) to run in Linux/x86_64 browsers. However, it became recently apparent that it can actually be useful to all kind of plugins, for reliability and security purposes. Red Hat pioneered this on Linux for a couple of distributions now. Then, this approach got popularised in Google Chrome. What's confinement (aka, sandbox in Google terms) about? Well, the goals are two-fold:

  • First, ensure that a plugin crash doesn't bring down the whole browser. A plugin crashed? Simply restart it on next page reload.
  • Next, ensure that a plugin only uses resources it is actually allowed to. e.g. allow it to only connect to HTTP ports, forbid access to sensible files (e.g. /etc/shadow), etc.

However, note that if a plugin crashed, there is no effort made to “replay” it at this time. This means you explicitly have to request a new page reload, or wait for the next refresh requested by the page javascripts. Replaying plugins implicitly could cause some issues, in particular for interactive sessions, or simply annoyance with things that could pop up again. IMHO, it's better to let the user decide whether he wants to replay the plugins or not. Besides, having the plugin content disappear actually gives a chance to the user to report bugs (in plugins, browsers or nspluginwrapper). Hiding bugs is never good. ;-)

Standalone plugins player. nspluginplayer is a new application that enables you to execute a plugin without a browser. This can be useful for Flash presentation or… simply debugging. The player does not emulate the whole NPAPI but enough is implemented to support Flash Player, Acrobat Reader. Well, this is because I only tested those actually.

Usage is very simple: nspluginwrapper embed-args whereby embed-args are the arguments from an <embed> tag. Here are some examples to demonstrate the program.

  • Play a Google video:
    $ nspluginplayer style="width:400px; height:326px;" id="VideoPlayback" type="application/x-shockwave-flash" src="http://video.google.com/googleplayer.swf?docId=-7309713943323243972&#038;hl=en-GB" flashvars=""
  • Play a Flash game:
    $ nspluginplayer src=http://magic.pen.fizzlebot.com/magic-pen.swf width=800 height=520
  • View a PDF document:
    $ nspluginplayer src=/path/to/some/file.pdf

New hosts and targets. nspluginwrapper can now be run on Solaris/x86 (2008.xx). I have only tested native Flash Player plugins there.

It's also worth mentioning that Transitive is using nspluginwrapper to allow cross-execution of plugins with their flagship dynamic binary translation technology: QuickTransit. The first (free) product publicly available is QuickTransit® for Solaris™/x86 with Adobe® Reader® that, as its name stands for, allows you to run Adobe Reader for Solaris/SPARC on a Solaris/x86 platform. Another use of nspluginwrapper with their technology is to run Flash Player 9 on some ARM targets.

Flash 9 support in FreeBSD. Well, I have not changed anything for that, but it's worth mentioning that Flash 9 can now be run in FreeBSD 7.1. Thanks to the FreeBSD maintainers for fixing their Linux emulation layer. However, note that you can't run Flash 10 at this time. If you want to debug the problem, don't forget to install nss, nspr and curl libraries, from the Fedora 8 repository for example. Then, instead of starting the browser over and over again, use nspluginplayer. I have not investigated the problem further but it seems the communication socket is blocked, though it was set in non-blocking mode (and the fcntl() didn't fail).

MPlayer with VA API acceleration

VA API is the Video Acceleration API for Linux systems. Provided you have a working VA API implementation (check with vainfo), a VA API enabled player will make it possible to use the underlying hardware for video decode tasks. In particular, I am pleased to announce a preliminary implementation for MPlayer and FFmpeg. That is, a fully Open Source one.

Why MPlayer? Well, simply because it supports many more video formats than other solutions, and also because the NVIDIA VDPAU work helped that effort. i.e. understanding how to integrate an HW assisted video decode solution with MPlayer and FFmpeg. Sure, it's probably duplicate work, but it looked better to make it work first, then make it nice later. To be fair, I will mention that both Helix and Fluendo have proprietary and commercial solutions for HW assisted video decoding, based on VA API too. Intel is also working on their own Open Source libva-based codecs for gstreamer.

How to build it? First, get the patches from mplayer-vaapi-latest.tar.bz2. Then, follow carefully the steps described in the README.txt file. Instructions are reproduced here for your convenience.

Make sure you have svn (subversion) installed on your system. You will also need libva and all other MPlayer build dependencies, i.e. development files for X11 and other libraries. If you use a Debian system, the following commands should help you get started:

# apt-get build-dep mplayer
# apt-get install libva-dev

Next, you can run the supplied script to build mplayer-vaapi. It will checkout and patch sources from the mplayer SVN tree, then build the program with as many jobs that sounds reasonnable for this task.

$ ./checkout-patch-build.sh

How to run it? If the build completes properly, the mplayer program lives in the mplayer-vaapi/ directory. Execute it as follows:

$ ./mplayer -vo vaapi -vc <VAAPI-codec-name> <URI>

<VAAPI-codec-name> can be one of:

  • ffmpeg2vaapi for MPEG-2
  • ffmpeg4vaapi for MPEG-4 ASP (DivX)
  • ffh264vaapi for MPEG-4 AVC (H.264)

<URI> can be a local filename or an URL.

Are there any problems? This is preliminary work, so there are issues. Here are a few ones that come to my mind:

  • MPlayer OSD is currently not supported.
  • VC-1 streams are currently not supported.
  • Non accelerated decoding is currently not supported with the vaapi renderer.
  • Reference picture lists construction for H.264 has bugs that can cause visual issues with some videos.

Splitted-Desktop Systems

Hi, I have noticed there were two many entries about nspluginwrapper on this blog. So, let me take the opportunity to talk about the company I joined 18 months ago: Splitted-Desktop Systems (SDS, herein after). I am not a good sales person, so you won't find any marketing material here. Rather, descriptive offerings may be found on the company website. In short, we aim at developing next-generation set-top boxes and companion devices for various ISPs.

The first product we designed is an “all-in-one” computer for Orange, a major ISP in France. SDS involvment in this project concerned the hardware and the device management. Hardware is pretty standard PC components: a mini-ITX board jointly developed with AMD and a well known 4-letter manufacturer. Technically, the board uses an RS690 chipset and supports AMD socket S1 processors like a Sempron (single core) or a Turion64 X2 (dual core). The SDS innovation also resides in the cooling solution: totally fanless and supporting up to a Turion64 X2 TL-62 in the case! On the software side, the OS is based on a Debian “Lenny” and the UI was developed by e-sidor.

What am I doing at SDS? Nothing really visible as I am usually a “low-level” guy: making sure things works well fast and in an energy efficient way, but also exploring/debugging(!) new technologies ahead of time. i.e. I am working on N+1 or N+2 projects. I gained some new experience in the following areas:

  • Clutter, a cross-platform toolkit based on OpenGL ;
  • WebKit, an Open Source web browser engine (the core of Safari) ;
  • Media acceleration APIs: VA API and VDPAU, to name only those that were officially announced.

Note: it's not because I am doing Clutter stuff that I aim at developing user interfaces. Likewise, it's not because I maintain a personal webserver that I aim at being a system administrator. ;-) The Clutter work is simply to show off people what it is technically possible to do with the hardware we develop. It's a really interesting toolkit, GObject things put aside. Yes, you will have noticed that I pretty dislike GObject and I am still wondering why people would want to rewrite C++ core language features in C… The latter will be poorly optimisable on those, e.g. devirtualisation through GObject anyone? The rest of glib is nice and very helpful though.

That's all for now. I would simply add that it's reasonnable to expect some patches landing to the community in the near future. In particular, this concerns things (i) I don't want to maintain eternally at SDS and we don't see fit in our future products, or (ii) we want to get a larger testing audience.

NSPluginWrapper 1.1.10

nspluginwrapper 1.1.10 is now available. This is a snapshot from the development branch but it matured enough to reach “release candidate” for 1.2.0.

Changes from 1.1.8 to 1.1.10:

  • Fix NPPVpluginScriptableNPObject::Invalidate()
  • Fix condition for delayed NPN_ReleaseObject() call
  • Fix XEMBED (rework for lost events/focus regressions)
  • Fix RPC for calls initiated by the plugin (SYNC mode)
  • Fix invalid RPC after the plugin was NPP_Destroy()ed

Changes from 1.1.6 to 1.1.8:

  • Delay NPN_ReleaseObject() if there is incoming RPC
  • Improve plugins restart machinery (Martin Stransky)
  • Close npviewer.bin sockets on exec()
  • Close all open files on fork() (initial patch by Dan Walsh)
  • Make `which` failures silent for soundwrappers (Stanislav Brabec)
  • Allow direct execution of native plugins if NPW_DIRECT_EXEC is set

Security improvements. We now get notified of a plugin crash earlier so that we can provide proper measures. e.g. stop accessing to data that could be garbage, or lead to breakage. Besides, open files are closed on fork() and before we execute the actual plugins execution environment, this makes sure plugins don't access to browser's open files it was not supposed to see. RPC sockets are also created with the SOCK_CLOEXEC flag on systems that support it (that is, Linux kernel >= 2.6.27). You can read more about this extension on Ulrich Drepper's blog.

Direct execution mode. If the NPW_DIRECT_EXEC environment variable is set, the native (i.e. same operating system, same architecture) plugin will run directly in-process, without nspluginwrapper. This helps debugging… and users to report bugs. Note however, this is not a 100% identical environment. In particular, the NPAPI version exposed is the one from nspluginwrapper: 0.17. This is a desired feature, and is equivalent to an older browser supporting NPAPI up to that version. Besides, the following functions are not implemented: NPN_ForceRedraw(), NPN_ReloadPlugins(), NPN_InvalidateRegion(), NPP_SetValue(). If you see the following “WARNING: Unimplemented function function() at file:line-number message, please tell me how and where to reproduce. It's easy to fix but I would like to know how to check it's OK.

Fixed XEMBED regressions. The previous workaround for Firefox / Gtk2 bugs caused a major focus regression on some websites. That hack is now removed and the standard GtkPlug / GtkSocket machinery used again… but with a difference. I spent some time in tracing X and Gdk calls. The major difference between in-process and out-of-process operation is that the plugin window is destroyed aggressively by the browser. Why? This is a wanted behaviour from Gtk2: it sends a WM_DELETE_EVENT message to the plugin as if it was a window manager. Another striking difference is the plugin window is actually never destroyed in the in-process case. Yes, XDestroyWindow() is never called for the lifetime of the program, but I assume an XCloseDisplay() will tell the X server to free up all those resources used. Is this a bug? Probably, but something is sure: if Gtk were to destroy the X window (when requested in _gdk_window_destroy_hierarchy()), then Firefox would start crashing the same way nspluginwrapper did. Note: the GdkWindow is destroyed (abstraction resources), not the actual X window. What a trickery!

So, I tried an in-between solution: don't destroy the plugin window immediately, but wait until nobody references the associated plugin instance again. That's NPP_Destroy() in the best case, or NPP_Shutdown() if an NPObject still references it (e.g. the NPPVpluginScriptableNPObject from Flash, it seems).

Downloads. nspluginwrapper 1.1.10 is available as source code only. You can get packages from your distributor or build your own. RPMs are really simple to get: rpmbuild -tb nspluginwrapper-1.1.10.tar.bz2 will generate binary RPMs in your usual tree. [Well, actually I lied, there are binary packages available on this server, but I have not tested them. If you really insist, you can guess the names easily]

Older entries >>

 
Recent changes RSS feed Creative Commons License Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki