Why xHE-AAC is being embraced at Meta
- We share how Meta delivers high-quality audio at scale using the xHE-AAC audio codec.
- xHE-AAC has already been deployed on Facebook and Instagram to provide enhanced audio for features like Reels and Stories.
At Meta, we serve every imaginable media use case for billions of people around the world – from short, user-generated content such as rollto the bonus Video on Demand (VOD) And live broadcasts. With this in mind, we need a next-generation audio codec that supports a range of operating points with excellent compression efficiency and advanced system-level audio capabilities.
To meet these needs now and in the future, Meta has embraced xHE-AAC as a vehicle for delivering high-quality audio at scale.
The benefits of xHE-AAC
xHE-AAC is the newest member of the MPEG-AAC audio codec family. The Fraunhofer Institute for Integrated Circuits IIS played an integral role in the development of xHE-AAC and the MPEG-D DRC standard.
Already today, xHE-AAC offers a superior audio experience on Facebook and Instagram – also on roll And stories — and has a number of valuable features.
With Hundreds of millions of uploads per day via Facebook and Instagramwe get audio tracks with volume levels ranging from silence to full volume and everything in between.
When people play these videos back-to-back, they may find some audio signals too loud or too quiet. This tires the listener because they have to constantly adjust the volume.
xHE-AAC’s built-in loudness management system resolves loudness inconsistencies while meticulously preserving creator intent, bringing the average loudness of all sessions to the same target level and adjusting each session’s dynamic range to match the playback environment.
Instead of burning in a specific target level and DRC (Dynamic Range Compression) profile during encoding, xHE-AAC allows us to leave the original audio characteristics untouched and delegate loudness management processing to the client via loudness metadata to achieve the optimal audio experience based on Context.
As a result of xHE-AAC’s loudness management, users can spend more time immersing themselves in their favorite content and less time fiddling with volume controls.
Adaptive bit rate audio
Most of the people who use our apps consume media on mobile devices and expect the highest audio quality without interruption. This poses a challenge for streaming media, as connection quality varies across mobile devices and can result in a very inconsistent user experience.
To optimize quality under dynamic bandwidth constraints, we produce multiple video and audio qualities to adapt to different network conditions at playback time. Even if we produce several audio tracks, historically we have only employed adaptive bit rate (ABR) Algorithms for switching video quality during playback, since it is difficult to enable adaptive bitrate audio without sacrificing quality at track transitions.
To enable seamless audio ABR, xHE-AAC introduces the concept of instant playout frames (IPFs), which contain all the data needed to start playing a new audio track without relying on data from other frames . By placing an IPF at the beginning of each DASH (Dynamic Adaptive Streaming over HTTP) segment and aligning the segment durations of each track, we can seamlessly switch between audio tracks during playback to provide the highest audio quality at any available bandwidth while maintaining playback smoothness avoid stalls.
After launching Audio ABR on Facebook for Android, we were able to improve the user experience by reducing the number of sessions where playback stopped.
As we deployed xHE-AAC
We generate xHE-AAC bitstreams using an encoder SDK provided by Fraunhofer Institute for Integrated Circuits IIS, and then prepare the resulting audio files for DASH streaming using shaka-packager. The two-pass encoding mode of the xHE-AAC encoder is used to measure the input volume envelope and the average program volume in the first pass and to perform the actual audio data compression in the second pass. As an added benefit, two-pass encoding allows us to use Loudness Range Control (LRAC) DRC, which mitigates pumping artifacts otherwise introduced by single-pass DRC algorithms.
To prepare an xHE-AAC audio adaptation set for ABR delivery, IPFs are inserted at constant time intervals, audio configuration parameters such as sample rate and channel configuration are kept constant, and unique stream identifiers are chosen for each lane in the audio adaptation set.
At the time of playback, we customize the audio to the listening environment by configuring a target loudness level and a DRC effect type based on the context, and thanks to the embedded loudness metadata, we can adapt a single xHE-AAC bitstream to a variety of audio consumption use cases , from headphones to device speakers and various background noise levels. Finally, when the client is hungry for data or bandwidth is plentiful, Audio ABR automatically switches audio quality to ensure the highest audio quality is played without interrupting the playback session.
Where can you experience xHE-AAC today?
You can experience xHE-AAC audio on Facebook for iOS and Android, as well as targeted surfaces on Instagram like Reels and Stories. We recommend you install the latest version of the Facebook and Instagram apps on iOS 13+ and Android 9+ to ensure you can experience them.
This work is the joint result of the entire video infrastructure and Instagram media platform teams at Meta in collaboration with Fraunhofer Institute for Integrated Circuits IIS. The author would like to give special thanks to Abhishek Gera, Tim Harris, Arun Kotidath, Edward Li, Meng Li, Srinivas Lingutla, Denise Noyes, Mohanish Penta, David Ronca, Haixia Shi, Mike Starr, Cosmin Stejerean, Simha Venkataramaiah, Juehui Zhang, Runshen Zhu and the Fraunhofer engineering team Institute for Integrated Circuits IIS.