<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>RTC | DAILY ZSOCIAL MEDIA NEWS</title>
	<atom:link href="https://dailyzsocialmedianews.com/tag/rtc/feed/" rel="self" type="application/rss+xml" />
	<link>https://dailyzsocialmedianews.com</link>
	<description>ALL ABOUT DAILY ZSOCIAL MEDIA NEWS</description>
	<lastBuildDate>Thu, 21 Mar 2024 01:34:52 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.7.1</generator>

<image>
	<url>https://dailyzsocialmedianews.com/wp-content/uploads/2020/12/cropped-DAILY-ZSOCIAL-MEDIA-NEWS-e1607166156946-32x32.png</url>
	<title>RTC | DAILY ZSOCIAL MEDIA NEWS</title>
	<link>https://dailyzsocialmedianews.com</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Optimizing RTC bandwidth estimation with machine studying</title>
		<link>https://dailyzsocialmedianews.com/optimizing-rtc-bandwidth-estimation-with-machine-studying/</link>
		
		<dc:creator><![CDATA[]]></dc:creator>
		<pubDate>Thu, 21 Mar 2024 01:34:51 +0000</pubDate>
				<category><![CDATA[Facebook]]></category>
		<category><![CDATA[bandwidth]]></category>
		<category><![CDATA[estimation]]></category>
		<category><![CDATA[learning]]></category>
		<category><![CDATA[Machine]]></category>
		<category><![CDATA[Optimizing]]></category>
		<category><![CDATA[RTC]]></category>
		<guid isPermaLink="false">https://dailyzsocialmedianews.com/?p=24934</guid>

					<description><![CDATA[<div style="margin-bottom:20px;"><img width="1023" height="576" src="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/21013450/Optimizing-RTC-bandwidth-estimation-with-machine-learning.png" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="Optimizing RTC bandwidth estimation with machine learning" decoding="async" fetchpriority="high" srcset="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/21013450/Optimizing-RTC-bandwidth-estimation-with-machine-learning.png 1023w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/21013450/Optimizing-RTC-bandwidth-estimation-with-machine-learning-300x169.png 300w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/21013450/Optimizing-RTC-bandwidth-estimation-with-machine-learning-768x432.png 768w" sizes="(max-width: 1023px) 100vw, 1023px" /></div><p>Bandwidth estimation (BWE) and congestion control play an important role in delivering high-quality real-time communication (RTC) across Meta’s family of apps. We’ve adopted a machine learning (ML)-based approach that allows us to solve networking problems holistically across cross-layers such as BWE, network resiliency, and transport. We’re sharing our experiment results from this approach, some of [&#8230;]</p>
The post <a href="https://dailyzsocialmedianews.com/optimizing-rtc-bandwidth-estimation-with-machine-studying/">Optimizing RTC bandwidth estimation with machine studying</a> first appeared on <a href="https://dailyzsocialmedianews.com">DAILY ZSOCIAL MEDIA NEWS</a>.]]></description>
										<content:encoded><![CDATA[<div style="margin-bottom:20px;"><img width="1023" height="576" src="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/21013450/Optimizing-RTC-bandwidth-estimation-with-machine-learning.png" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="Optimizing RTC bandwidth estimation with machine learning" decoding="async" srcset="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/21013450/Optimizing-RTC-bandwidth-estimation-with-machine-learning.png 1023w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/21013450/Optimizing-RTC-bandwidth-estimation-with-machine-learning-300x169.png 300w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/21013450/Optimizing-RTC-bandwidth-estimation-with-machine-learning-768x432.png 768w" sizes="(max-width: 1023px) 100vw, 1023px" /></div><p></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Bandwidth estimation (BWE) and congestion control play an important role in delivering high-quality real-time communication (RTC) across Meta’s family of apps.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">We’ve adopted a machine learning (ML)-based approach that allows us</span><span style="font-weight: 400;"> to solve networking problems holistically across cross-layers such as BWE, network resiliency, and transport.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">We’re sharing our experiment results from this approach, some of the challenges we encountered during execution, and learnings for new adopters.</span></li>
</ul>
<p><span style="font-weight: 400;">Our existing bandwidth estimation (BWE) module at Meta is</span> <span style="font-weight: 400;">based on WebRTC’s Google Congestion Controller (GCC)</span><span style="font-weight: 400;">. We have made several improvements through parameter tuning, but this has resulted in a more complex system, as shown in Figure 1.</span></p>
<p>Figure 1: BWE module’s system diagram for congestion control in RTC.</p>
<p><span style="font-weight: 400;">One challenge with the tuned congestion control (CC)/BWE algorithm was that it had multiple parameters and actions that were dependent on network conditions. For example, there was a trade-off between quality and reliability; improving quality for high-bandwidth users often led to reliability regressions for low-bandwidth users, and vice versa, making it challenging to optimize the user experience for different network conditions.</span></p>
<p><span style="font-weight: 400;">Additionally, we noticed some inefficiencies in regards to improving and maintaining the module with the complex BWE module:</span></p>
<ol>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Due to the absence of realistic network conditions during our experimentation process, fine-tuning the parameters for user clients necessitated several attempts.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Even after the rollout, it wasn’t clear if the optimized parameters were still applicable for the targeted network types.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">This resulted in complex code logics and branches for engineers to maintain.</span></li>
</ol>
<p><span style="font-weight: 400;">To solve these inefficiencies, we developed a machine learning (ML)-based, network-targeting approach that offers a cleaner alternative to hand-tuned rules. This approach also allows us to solve networking problems holistically across cross-layers such as BWE, network resiliency, and transport.</span></p>
<h2><span style="font-weight: 400;">Network characterization</span></h2>
<p><span style="font-weight: 400;">An ML model-based approach leverages time series data to improve the bandwidth estimation by using offline parameter tuning for characterized network types. </span></p>
<p><span style="font-weight: 400;">For an RTC call to be completed, the endpoints must be connected to each other through network devices. The optimal configs that have been tuned offline are stored on the server and can be updated in real-time. During the call connection setup, these optimal configs are delivered to the client. During the call, media is transferred directly between the endpoints or through a relay server. Depending on the network signals collected during the call, an ML-based approach characterizes the network into different types and applies the optimal configs for the detected type.</span></p>
<p><span style="font-weight: 400;">Figure 2 illustrates an example of an RTC call that’s optimized using the ML-based approach. </span><span style="font-weight: 400;"> </span></p>
<p><img decoding="async" class="size-large wp-image-21120" src="https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-2.png?w=1024" alt="" width="1024" height="576" srcset="https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-2.png 1999w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-2.png?resize=580,326 580w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-2.png?resize=916,516 916w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-2.png?resize=768,432 768w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-2.png?resize=1024,576 1024w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-2.png?resize=1536,864 1536w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-2.png?resize=96,54 96w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-2.png?resize=192,108 192w" sizes="(max-width: 992px) 100vw, 62vw"/>Figure 2: An example RTC call configuration with optimized parameters delivered from the server and based on the current network type.</p>
<h2><span style="font-weight: 400;">Model learning and offline parameter tuning</span></h2>
<p><span style="font-weight: 400;">On a high level, network characterization consists of two main components, as shown in Figure 3. The first component is offline ML model learning using ML to categorize the network type (random packet loss versus bursty loss). The second component uses offline simulations to tune parameters optimally for the categorized network type. </span></p>
<p><img loading="lazy" decoding="async" class="size-large wp-image-21121" src="https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-3.png?w=1024" alt="" width="1024" height="576" srcset="https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-3.png 1999w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-3.png?resize=580,326 580w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-3.png?resize=916,516 916w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-3.png?resize=768,432 768w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-3.png?resize=1024,576 1024w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-3.png?resize=1536,864 1536w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-3.png?resize=96,54 96w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-3.png?resize=192,108 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>Figure 3: Offline ML-model learning and parameter tuning.</p>
<p><span style="font-weight: 400;">For model learning, we leverage the time series data (network signals and non-personally identifiable information, see Figure 6, below) from production calls and simulations. Compared to the aggregate metrics logged after the call, time series captures the time-varying nature of the network and dynamics. We use</span><span style="font-weight: 400;"> FBLearner</span><span style="font-weight: 400;">, our internal AI stack, for the training pipeline and deliver the PyTorch model files on demand to the clients at the start of the call.</span></p>
<p><span style="font-weight: 400;">For offline tuning, we use simulations to run network profiles for the detected types and choose the optimal parameters for the modules based on improvements in technical metrics (such as quality, freeze, and so on.).</span></p>
<h2><span style="font-weight: 400;">Model architecture</span></h2>
<p><span style="font-weight: 400;">From our experience, we’ve found that it’s necessary to combine time series features with non-time series (i.e., derived metrics from the time window) for a highly accurate modeling.</span></p>
<p><span style="font-weight: 400;">To handle both time series and non-time series data, we’ve designed a model architecture that can process input from both sources.</span></p>
<p><span style="font-weight: 400;">The time series data will pass through a</span> <span style="font-weight: 400;">long short-term memory (LSTM) layer</span><span style="font-weight: 400;"> that will convert time series input into a one-dimensional vector representation, such as 16×1. The non-time series data or dense data will pass through a dense layer (i.e., a fully connected layer). Then the two vectors will be concatenated, to fully represent the network condition in the past, and passed through a fully connected layer again. The final output from the neural network model will be the predicted output of the target/task, as shown in Figure 4. </span></p>
<p><img loading="lazy" decoding="async" class="size-large wp-image-21122" src="https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-4.png?w=1024" alt="" width="1024" height="576" srcset="https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-4.png 1999w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-4.png?resize=580,326 580w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-4.png?resize=916,516 916w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-4.png?resize=768,432 768w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-4.png?resize=1024,576 1024w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-4.png?resize=1536,864 1536w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-4.png?resize=96,54 96w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-4.png?resize=192,108 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>Figure 4: Combined-model architecture with LSTM and Dense Layers</p>
<h2><span style="font-weight: 400;">Use case: Random packet loss classification</span></h2>
<p><span style="font-weight: 400;">Let’s consider the use case of categorizing packet loss as either random or congestion. The former loss is due to the network components, and the latter is due to the limits in queue length (which are delay dependent). Here is the ML task definition:</span><span style="font-weight: 400;"><br /></span><span style="font-weight: 400;"><br /></span><span style="font-weight: 400;">Given the network conditions in the past N seconds (10), and that the network is currently incurring packet loss, the goal is to characterize the packet loss at the current timestamp as RANDOM or not.</span></p>
<p><span style="font-weight: 400;">Figure 5 illustrates how we leverage the architecture to achieve that goal:</span></p>
<p><img loading="lazy" decoding="async" class="size-large wp-image-21123" src="https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-5.png?w=1024" alt="" width="1024" height="576" srcset="https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-5.png 1999w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-5.png?resize=580,326 580w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-5.png?resize=916,516 916w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-5.png?resize=768,432 768w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-5.png?resize=1024,576 1024w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-5.png?resize=1536,864 1536w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-5.png?resize=96,54 96w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-5.png?resize=192,108 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>Figure 5: Model architecture for a random packet loss classification task.</p>
<h3><span style="font-weight: 400;">Time series features</span></h3>
<p><span style="font-weight: 400;">We leverage the following time series features gathered from logs:</span></p>
<p><img loading="lazy" decoding="async" class="wp-image-21136 size-large" src="https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-6b.png?w=1024" alt="" width="1024" height="576" srcset="https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-6b.png 2500w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-6b.png?resize=580,326 580w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-6b.png?resize=916,515 916w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-6b.png?resize=768,432 768w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-6b.png?resize=1024,576 1024w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-6b.png?resize=1536,864 1536w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-6b.png?resize=2048,1152 2048w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-6b.png?resize=96,54 96w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-6b.png?resize=192,108 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>Figure 6: Time series features used for model training.</p>
<h3><span style="font-weight: 400;">BWE optimization</span></h3>
<p><span style="font-weight: 400;">When the ML model detects random packet loss, we perform local optimization on the BWE module by:</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Increasing the tolerance to random packet loss in the loss-based BWE (holding the bitrate).</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Increasing the ramp-up speed, depending on the link capacity on high bandwidths.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Increasing the network resiliency by sending additional forward-error correction packets to recover from packet loss.</span></li>
</ul>
<h2><span style="font-weight: 400;">Network prediction</span></h2>
<p><span style="font-weight: 400;">The network characterization problem discussed in the previous sections focuses on classifying network types based on past information using time series data. For those simple classification tasks, we achieve this using the hand-tuned rules with some limitations. The real power of leveraging ML for networking, however, comes from using it for predicting future network conditions.</span></p>
<p><span style="font-weight: 400;">We have applied ML for solving congestion-prediction problems for optimizing low-bandwidth users’ experience.</span></p>
<h2><span style="font-weight: 400;">Congestion prediction</span></h2>
<p><span style="font-weight: 400;">From our analysis of production data, we found that low-bandwidth users often incur congestion due to the behavior of the GCC module. By predicting this congestion, we can improve the reliability of such users’ behavior. Towards this, we addressed the following problem statement using round-trip time (RTT) and packet loss:</span><span style="font-weight: 400;"><br /></span><span style="font-weight: 400;"><br /></span><span style="font-weight: 400;">Given the historical time-series data from production/simulation (“N” seconds), the goal is to predict packet loss due to congestion or the congestion itself in the next “N” seconds; that is, a spike in RTT followed by a packet loss or a further growth in RTT.</span></p>
<p><span style="font-weight: 400;">Figure 7 shows an example from a simulation where the bandwidth alternates between 500 Kbps and 100 Kbps every 30 seconds. As we lower the bandwidth, the network incurs congestion and the ML model predictions fire the green spikes even before the delay spikes and packet loss occur. This early prediction of congestion is helpful in faster reactions and thus improves the user experience by preventing video freezes and connection drops.</span></p>
<p><img loading="lazy" decoding="async" class="size-large wp-image-21137" src="https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-7b.png?w=1024" alt="" width="1024" height="576" srcset="https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-7b.png 2500w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-7b.png?resize=580,326 580w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-7b.png?resize=916,515 916w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-7b.png?resize=768,432 768w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-7b.png?resize=1024,576 1024w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-7b.png?resize=1536,864 1536w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-7b.png?resize=2048,1152 2048w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-7b.png?resize=96,54 96w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-7b.png?resize=192,108 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>Figure 7: Simulated network scenario with alternating bandwidth for congestion prediction</p>
<h2><span style="font-weight: 400;">Generating training samples</span></h2>
<p><span style="font-weight: 400;">The main challenge in modeling is generating training samples for a variety of congestion situations. With simulations, it’s harder to capture different types of congestion that real user clients would encounter in production networks. As a result, we used actual production logs for labeling congestion samples, following the RTT-spikes criteria in the past and future windows according to the following assumptions:</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Absent past RTT spikes, packet losses in the past and future are independent.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Absent past RTT spikes, we cannot predict future RTT spikes or fractional losses (i.e., flosses).</span></li>
</ul>
<p><span style="font-weight: 400;">We split the time window into past (4 seconds) and future (4 seconds) for labeling.</span><span style="font-weight: 400;"><br /></span></p>
<p><img loading="lazy" decoding="async" class="size-large wp-image-21126" src="https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-8.png?w=1024" alt="" width="1024" height="576" srcset="https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-8.png 1999w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-8.png?resize=580,326 580w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-8.png?resize=916,516 916w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-8.png?resize=768,432 768w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-8.png?resize=1024,576 1024w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-8.png?resize=1536,864 1536w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-8.png?resize=96,54 96w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-8.png?resize=192,108 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>Figure 8: Labeling criteria for congestion prediction</p>
<h2><span style="font-weight: 400;">Model performance</span></h2>
<p><span style="font-weight: 400;">Unlike network characterization, where ground truth is unavailable, we can obtain ground truth by examining the future time window after it has passed and then comparing it with the prediction made four seconds earlier. With this logging information gathered from real production clients, we compared the performance in offline training to online data from user clients:</span></p>
<p><img loading="lazy" decoding="async" class="size-large wp-image-21127" src="https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-9.png?w=1024" alt="" width="1024" height="576" srcset="https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-9.png 1999w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-9.png?resize=580,326 580w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-9.png?resize=916,516 916w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-9.png?resize=768,432 768w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-9.png?resize=1024,576 1024w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-9.png?resize=1536,864 1536w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-9.png?resize=96,54 96w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-9.png?resize=192,108 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>Figure 9: Offline versus online model performance comparison.</p>
<h2><span style="font-weight: 400;">Experiment results</span></h2>
<p><span style="font-weight: 400;">Here are some highlights from our deployment of various ML models to improve bandwidth estimation:</span></p>
<h3><span style="font-weight: 400;">Reliability wins for congestion prediction</span></h3>
<p><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /></span> <span style="font-weight: 400;">connection_drop_rate -0.326371 +/- 0.216084<br /></span><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> last_minute_quality_regression_v1 -0.421602 +/- 0.206063<br /></span><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> last_minute_quality_regression_v2 -0.371398 +/- 0.196064<br /></span><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> bad_experience_percentage -0.230152 +/- 0.148308<br /></span><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> transport_not_ready_pct -0.437294 +/- 0.400812</span></p>
<p><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /></span><span style="font-weight: 400;"> peer_video_freeze_percentage -0.749419 +/- 0.180661<br /></span><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> peer_video_freeze_percentage_above_500ms -0.438967 +/- 0.212394</span></p>
<h3><span style="font-weight: 400;">Quality and user engagement wins for random packet loss characterization in high bandwidth</span></h3>
<p><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /></span><span style="font-weight: 400;"> peer_video_freeze_percentage -0.379246 +/- 0.124718<br /></span><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> peer_video_freeze_percentage_above_500ms -0.541780 +/- 0.141212<br /></span><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> peer_neteq_plc_cng_perc -0.242295 +/- 0.137200</span></p>
<p><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> total_talk_time 0.154204 +/- 0.148788</span></p>
<h3><span style="font-weight: 400;">Reliability and quality wins for cellular low bandwidth classification</span></h3>
<p><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> connection_drop_rate -0.195908 +/- 0.127956<br /></span><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> last_minute_quality_regression_v1 -0.198618 +/- 0.124958<br /></span><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> last_minute_quality_regression_v2 -0.188115 +/- 0.138033</span></p>
<p><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> peer_neteq_plc_cng_perc -0.359957 +/- 0.191557<br /></span><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> peer_video_freeze_percentage -0.653212 +/- 0.142822</span></p>
<h3><span style="font-weight: 400;">Reliability and quality wins for cellular high bandwidth classification</span></h3>
<p><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> avg_sender_video_encode_fps 0.152003 +/- 0.046807<br /></span><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> avg_sender_video_qp -0.228167 +/- 0.041793<br /></span><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> avg_video_quality_score 0.296694 +/- 0.043079<br /></span><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> avg_video_sent_bitrate 0.430266 +/- 0.092045</span></p>
<h2><span style="font-weight: 400;">Future plans for applying ML to RTC</span></h2>
<p><span style="font-weight: 400;">From our project execution and experimentation on production clients, we noticed that a ML-based approach is more efficient in targeting, end-to-end monitoring, and updating than traditional hand-tuned rules for networking. However, the efficiency of ML solutions largely depends on data quality and labeling (using simulations or production logs). By applying ML-based solutions to solving network prediction problems – congestion in particular – we fully leveraged the power of ML. </span></p>
<p><span style="font-weight: 400;">In the future, we will be consolidating all the network characterization models into a single model using the multi-task approach to fix the inefficiency due to redundancy in model download, inference, and so on. We will be building a shared representation model for the time series to solve different tasks (e.g., bandwidth classification, packet loss classification, etc.) in network characterization. We will focus on building realistic production network scenarios for model training and validation. This will enable us to use ML to identify optimal network actions given the network conditions. We will persist in refining our learning-based methods to enhance network performance by considering existing network signals.</span></p>The post <a href="https://dailyzsocialmedianews.com/optimizing-rtc-bandwidth-estimation-with-machine-studying/">Optimizing RTC bandwidth estimation with machine studying</a> first appeared on <a href="https://dailyzsocialmedianews.com">DAILY ZSOCIAL MEDIA NEWS</a>.]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Higher video for cellular RTC with AV1 and HD</title>
		<link>https://dailyzsocialmedianews.com/higher-video-for-cellular-rtc-with-av1-and-hd/</link>
		
		<dc:creator><![CDATA[]]></dc:creator>
		<pubDate>Wed, 20 Mar 2024 21:32:05 +0000</pubDate>
				<category><![CDATA[Facebook]]></category>
		<category><![CDATA[AV1]]></category>
		<category><![CDATA[Mobile]]></category>
		<category><![CDATA[RTC]]></category>
		<category><![CDATA[video]]></category>
		<guid isPermaLink="false">https://dailyzsocialmedianews.com/?p=24930</guid>

					<description><![CDATA[<div style="margin-bottom:20px;"><img width="1023" height="610" src="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/20213203/Better-video-for-mobile-RTC-with-AV1-and-HD.png" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="Better video for mobile RTC with AV1 and HD" decoding="async" loading="lazy" srcset="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/20213203/Better-video-for-mobile-RTC-with-AV1-and-HD.png 1023w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/20213203/Better-video-for-mobile-RTC-with-AV1-and-HD-300x179.png 300w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/20213203/Better-video-for-mobile-RTC-with-AV1-and-HD-768x458.png 768w" sizes="auto, (max-width: 1023px) 100vw, 1023px" /></div><p>At Meta, we support real-time communication (RTC) for billions of people through our apps, including Messenger, Instagram, and WhatsApp. We’ve seen significant benefits by adopting the AV1 codec for RTC. Here’s how we are improving the RTC video quality for our apps with tools like the AV1 codec, the challenges we face, and how we [&#8230;]</p>
The post <a href="https://dailyzsocialmedianews.com/higher-video-for-cellular-rtc-with-av1-and-hd/">Higher video for cellular RTC with AV1 and HD</a> first appeared on <a href="https://dailyzsocialmedianews.com">DAILY ZSOCIAL MEDIA NEWS</a>.]]></description>
										<content:encoded><![CDATA[<div style="margin-bottom:20px;"><img width="1023" height="610" src="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/20213203/Better-video-for-mobile-RTC-with-AV1-and-HD.png" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="Better video for mobile RTC with AV1 and HD" decoding="async" loading="lazy" srcset="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/20213203/Better-video-for-mobile-RTC-with-AV1-and-HD.png 1023w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/20213203/Better-video-for-mobile-RTC-with-AV1-and-HD-300x179.png 300w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/20213203/Better-video-for-mobile-RTC-with-AV1-and-HD-768x458.png 768w" sizes="auto, (max-width: 1023px) 100vw, 1023px" /></div><p></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">At Meta, we support real-time communication (RTC) for billions of people through our apps, including Messenger, Instagram, and WhatsApp.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">We’ve seen significant benefits by adopting the </span><span style="font-weight: 400;">AV1 codec for RTC</span><span style="font-weight: 400;">.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Here’s how we are improving the RTC video quality for our apps with tools like the AV1 codec, the challenges we face, and how we mitigate those challenges.</span></li>
</ul>
<p><span style="font-weight: 400;">The last few decades have seen tremendous improvements in mobile phone camera quality as well as video quality for streaming video services. But if we look at real-time communication (RTC) applications, while the video quality also has improved over time, it has always lagged behind that of camera quality. </span></p>
<p><span style="font-weight: 400;">When we looked at ways to improve video quality for RTC across our family of apps, AV1 stood out as the best option. Meta has increasingly</span> <span style="font-weight: 400;">adopted the AV1 codec</span><span style="font-weight: 400;"> over the years because it offers high video quality at bitrates much lower than older codecs. But, </span><span style="font-weight: 400;">as we’ve implemented AV1 for mobile RTC</span><span style="font-weight: 400;">, we’ve also had to address a number of challenges including scaling,  improving video quality for low-bandwidth users as well as high-end networks, CPU and battery usage, and maintaining quality stability.</span></p>
<h2><span style="font-weight: 400;">Improving video quality for low-bandwidth networks</span></h2>
<p><span style="font-weight: 400;">This post is going to focus on peer-to-peer (P2P, or 1:1) calls, which involve two participants. </span></p>
<p><span style="font-weight: 400;">People who use our products and services experience a range of network conditions – some have really great networks, while others are using throttled or low-bandwidth networks.</span></p>
<p><span style="font-weight: 400;">This chart illustrates what the distribution of bandwidth looks like for some of these calls on Messenger:</span></p>
<p>Figure 1: Bandwidth distribution of P2P calls on Messenger.</p>
<p><span style="font-weight: 400;">As seen in Figure 1, some calls operate in very low-bandwidth conditions. </span></p>
<p><span style="font-weight: 400;">We consider anything less than 300 Kbps to be a low-end network, but we also see a lot of video calls operating at just 50 Kbps, or even under 25 Kbps.</span></p>
<p><span style="font-weight: 400;">Note that this bandwidth is the share for the video encoder. Total bandwidth is shared with audio, RTP overhead, signaling overhead, RTX (re-transmissions of packets to handle lost packets)/FEC (forward error correction)/duplication (packet duplication), and so on. The big assumption here is that the bandwidth estimator is working correctly and estimating true bitrates. </span></p>
<p><span style="font-weight: 400;">There are no universal definitions for low, mid, and high networks, but for the purpose of this blog post, less than 300 Kbps will be considered as low, 300-800 Kbps as mid, and above 800 Kbps as a high, HD-capable, or high-end network.</span></p>
<p><span style="font-weight: 400;">When we looked into improving the video quality for low-bandwidth users, there were few key options. Migrating to a newer codec such as AV1 presented the greatest opportunity. Other options such as better video scalers and region-of-interest encoding offered incremental improvements. </span></p>
<h3><span style="font-weight: 400;">Video scalers</span></h3>
<p><span style="font-weight: 400;">We use WebRTC in most of our apps, but the video scalers shipped with WebRTC don’t have the best quality video scaling. We have been able to improve the video scaling quality significantly by leveraging in-house scalers. </span></p>
<p><span style="font-weight: 400;">At low bitrates, we often end up downscaling the video to encode at ¼ resolution (assuming the camera capture is 640×480 or 1280×720). With our custom scaler implementations, we have seen significant improvements in video quality. From public tests we saw gains in peak signal-to-noise ratio (PSNR) by 0.75 db on average.</span></p>
<p><span style="font-weight: 400;">Here is a snapshot showing results with the default</span> <span style="font-weight: 400;">libyuv</span><span style="font-weight: 400;"> scaler (a box filter):</span></p>
<p><img loading="lazy" decoding="async" class="size-large wp-image-21107" src="https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-2a.png?w=866" alt="" width="866" height="290" srcset="https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-2a.png 866w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-2a.png?resize=768,257 768w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-2a.png?resize=96,32 96w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-2a.png?resize=192,64 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>Figure 2.a: Video image results using WebRTC/libyuv video scaler.</p>
<p><span style="font-weight: 400;">And the results after downscaling with our video scaler:</span></p>
<p><img loading="lazy" decoding="async" class="size-large wp-image-21108" src="https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-2b.png?w=866" alt="" width="866" height="290" srcset="https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-2b.png 866w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-2b.png?resize=768,257 768w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-2b.png?resize=96,32 96w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-2b.png?resize=192,64 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>Figure 2.b: Video image results using Meta’s video scaler.</p>
<h3><span style="font-weight: 400;">Region-of-interest encoding</span></h3>
<p><span style="font-weight: 400;">Identifying the region of interest (ROI) allowed us to optimize by spending more encoder bitrate in the area that’s most important to a viewer (the speaker’s face in a talking head video, for example). Most mobile devices have APIs to locate the face region without utilizing any CPU overhead. Once we have found the face region we can configure the encoder to spend more bits on this important region and less on the rest. The easiest way to do this was to have some APIs on encoders to configure the quantization parameters (QP) for ROI versus the rest of the image. These changes provided incremental improvements in the video quality metrics like PSNR. </span></p>
<h2><span style="font-weight: 400;">Adopting the AV1 video codec</span></h2>
<p><span style="font-weight: 400;">The video encoder is a key element when it comes to video quality for RTC. H.264 has been the most popular codec over the last decade, with hardware support and most applications supporting it. But it is a 20-year-old codec. Back in 2018, the Alliance for Open Media (AOMedia) standardized the AV1 video codec. Since then, several companies including Meta, YouTube, and Netflix have </span><span style="font-weight: 400;">deployed it at a large scale for video streaming</span><span style="font-weight: 400;">. </span></p>
<p><span style="font-weight: 400;">At Meta, moving from H.264 to AV1 led us to our greatest improvements in video quality at low bitrates.</span></p>
<p><img loading="lazy" decoding="async" class="size-large wp-image-21109" src="https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-3.png?w=1024" alt="" width="1024" height="427" srcset="https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-3.png 1600w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-3.png?resize=916,382 916w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-3.png?resize=768,320 768w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-3.png?resize=1024,427 1024w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-3.png?resize=1536,640 1536w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-3.png?resize=96,40 96w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-3.png?resize=192,80 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>Figure 3: Improvements over time, moving from H.262 to AV1 and H.266</p>
<h3><span style="font-weight: 400;">Why AV1?</span></h3>
<p><span style="font-weight: 400;">We chose to use AV1 in part because it’s royalty-free. Codec licensing (and concurrent fees) was an important aspect in our decision-making process. Typically, if an application uses a device’s hardware codec, no additional codec licensing costs will be incurred. But if an application is shipping a software version of the codec,  there will most likely be licensing costs to cover.</span></p>
<p><span style="font-weight: 400;">But why do we need to use software codecs even though most phones have hardware-supported codecs?</span></p>
<p><span style="font-weight: 400;">Most mobile devices have dedicated hardware for video encoding and decoding. And these days most mobile devices support H.264 and even H.265s. But those encoders are designed for common use cases such as camera capture, which uses much higher resolutions, frame rates, and bitrates. Most mobile device hardware is currently capable of encoding 4K 60 FPS in real time with very low battery usage, but the results of encoding a 7 FPS, 320×180, 200 Kbps video are often worse than those of software encoders running on the same mobile device. </span></p>
<p><span style="font-weight: 400;">The reason for that is prioritization of the RTC use case. Most independent hardware vendors (IHVs) are not aware of the network conditions where RTC calls operate; hence, these hardware codecs are not optimized for RTC scenarios, especially for low bitrates, resolutions, and frame rates. So, we leverage software encoders when operating in these low bitrates to provide high-quality video.</span></p>
<p><span style="font-weight: 400;">And since we can’t ship software codecs without a license, AV1 is a very good option for RTC.</span></p>
<h2><span style="font-weight: 400;">AV1 for RTC</span></h2>
<p><span style="font-weight: 400;">The biggest reason to move to a more advanced video codec is simple: The same quality experience can be delivered with a much lower bitrate, and we can deliver a much higher-quality real-time calling experience for our users who are on bandwidth-constrained networks.</span></p>
<p><span style="font-weight: 400;">Measuring video quality is a complex topic, but a relatively simple way to look at it is to use the </span><span style="font-weight: 400;">Bjontegaard Delta-Bit Rate</span><span style="font-weight: 400;"> (BD-BR) metric. BD-BR compares how much bitrate various codecs need to produce a certain quality level. By generating multiple samples at different bitrates, measuring the quality of the produced video provides a rate-distortion (RD) curve, and from the RD curve you can derive the BD-BR (as shown below).</span></p>
<p><span style="font-weight: 400;">As can be seen in Figure 4, AV1 provided higher quality for all bitrate ranges in our local tests.</span></p>
<p><img loading="lazy" decoding="async" class="size-large wp-image-21110" src="https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-4.png?w=1024" alt="" width="1024" height="650" srcset="https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-4.png 1282w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-4.png?resize=916,582 916w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-4.png?resize=768,488 768w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-4.png?resize=1024,650 1024w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-4.png?resize=96,61 96w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-4.png?resize=192,122 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>Figure 4: Bitrate distortion comparison chart.</p>
<h3><span style="font-weight: 400;">Screen-encoding tools</span></h3>
<p><span style="font-weight: 400;">AV1 also has a few key tools that are useful for RTC. Screen content quality is becoming an increasingly important factor for Meta, with relevant use cases, including screen sharing, game streaming, and VR remote desktop, requiring high-quality encoding. In these areas, AV1 truly shines. </span></p>
<p><span style="font-weight: 400;">Traditionally, video encoders aren’t well suited to complex content such as text with a lot of high-frequency content, and humans are sensitive to reading blurry text. AV1 has a set of coding tools—palette mode and intra-block copy—that drastically improve performance for screen content. Palette mode is designed according to the observation that the pixel values in a screen-content frame usually concentrate on the limited number of color values. Palette mode can represent the screen content efficiently by signaling the color clusters instead of the quantized transform-domain coefficients. In addition, for typical screen content, repetitive patterns can usually be found within the same picture. Intra-block copy facilitates block prediction within the same frame, so that the compression efficiency can be improved significantly. That AV1 provides these two tools at the baseline profile is a huge plus.</span></p>
<h3><span style="font-weight: 400;">Reference picture resampling</span><span style="font-weight: 400;">: Fewer key frames</span></h3>
<p><span style="font-weight: 400;">Another useful feature is reference picture resampling (RPR), which allows resolution changes without generating a key frame. In video compression, a key frame is one that’s encoded independently, like a still image. It’s the only type of frame that can be decoded without having another frame as reference. </span></p>
<p><span style="font-weight: 400;">For RTC applications, since the bandwidth keeps on changing often, there are frequent resolution changes needed to adapt to these network changes. With older codecs like H.264, each of these resolution changes requires a key frame that is much larger in size and thus inefficient for RTC apps. Such large key frames increase the amount of data needing to be sent over the network and result in higher end-to-end latencies and congestion. </span></p>
<p><span style="font-weight: 400;">By using RPR, we can avoid generating any key frames.</span></p>
<p><img loading="lazy" decoding="async" class="alignnone size-large wp-image-21111" src="https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-5.png?w=788" alt="" width="788" height="242" srcset="https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-5.png 788w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-5.png?resize=768,236 768w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-5.png?resize=96,29 96w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-5.png?resize=192,59 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/></p>
<h2><span style="font-weight: 400;">Challenges around improving video quality for low-bandwidth users</span></h2>
<h3><span style="font-weight: 400;">CPU/battery usage</span></h3>
<p><span style="font-weight: 400;">AV1 is great for coding efficiency, but codecs achieve this at the cost of higher CPU and battery usage. A lot of modern codecs pose these challenges when running real-time applications on mobile platforms.</span></p>
<p><span style="font-weight: 400;">Based on local lab testing, we anticipated a roughly 4 percent increase in battery usage, and we saw similar results in public tests. We used a power meter to do this local battery measurement.</span></p>
<p><span style="font-weight: 400;">Even though the AV1 encoder itself increased CPU usage three-fold when compared to H.264 implementation, the overall contribution of CPU usage from the encoder was a small part of the battery usage. The phone display screen, networking/radio, and other processes using the CPU contribute significantly to battery usage, hence the increase in battery usage was 5-6 percent (a significant increase in battery usage). </span></p>
<p><span style="font-weight: 400;">A lot of calls run out of device battery, or people hang up once their operating system indicates a low battery, so increasing battery usage isn’t worthwhile for users unless it provides increased value such as video quality improvement. Even then it’s a trade-off between video quality versus battery use.</span></p>
<p><span style="font-weight: 400;">We use WebRTC and Session Description Protocol (SDP) for codec negotiation, which allows us to negotiate multiple codecs (e.g., AV1 and H.264) up front and then switch the codecs without any need for signaling or a handshake during the call. This means the codec switch is seamless, without users noticing any glitches or pauses in video.</span></p>
<p><span style="font-weight: 400;">We created a custom encoder that encapsulates both H.264 and the AV1 encoders. We call it a hybrid encoder. This allowed us to switch the codec during the call based on triggers such as CPU usage, battery level, or encoding time — and to switch to the more battery-efficient H.264 encoder when needed. </span></p>
<h3><span style="font-weight: 400;">Increased crashes and out of memory errors</span></h3>
<p><span style="font-weight: 400;">Even without new leaks added, AV1 used more memory than H.264. Any time additional memory is used, apps are more likely to hit out of memory (OOM) crashes or hit OOM sooner because of other leaks or memory demands on the system from other apps. To mitigate this, we had to disable AV1 on devices with low memory. This is one area for improvement and for further optimizing the encoder’s memory usage.</span></p>
<h3><span style="font-weight: 400;">In-product quality measurement</span></h3>
<p><span style="font-weight: 400;">To compare the quality between H.264 and AV1 using public tests, we needed a low-complexity metric. Metrics such as encoded bitrates and frame rates won’t show any gains as the total bandwidth available to send video is still the same, as these are limited by the network capacity, which means the bitrates and frame rates for video will not change much with the change in the codec. We had been using composite metrics that combine the quantization parameter (QP is often used as a proxy for video quality, as this introduces pixel data loss during the encoding process), resolutions, and frame rate, and freezes it to generate video composite metrics, but QP is not comparable between AV1 and H.264 codecs, and hence can’t be used.</span></p>
<p><span style="font-weight: 400;">PSNR is a standard metric, but it’s reference-based and hence does not work for RTC. Non-reference, video-quality metrics are quite CPU-intensive (e.g., BRISQUE: Blind/Referenceless Image Spatial Quality Evaluator), though we are exploring those as well.</span></p>
<p><img loading="lazy" decoding="async" class="size-large wp-image-21112" src="https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-6.png?w=1024" alt="" width="1024" height="635" srcset="https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-6.png 1220w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-6.png?resize=916,568 916w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-6.png?resize=768,476 768w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-6.png?resize=1024,635 1024w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-6.png?resize=96,59 96w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-6.png?resize=192,119 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>Figure 6: High-level architecture for PSNR computation in RTC.</p>
<p><span style="font-weight: 400;">We have come up with a framework for PSNR computation. We first modified the encoder to report distortions caused by compression (most software encoders already have support for this metric). Then we designed a lightweight, scaling-distortion algorithm that estimates the distortion introduced by video scaling. This algorithm can combine these scaling distortions with the encoder distortions to produce output PSNR. We developed and verified this algorithm locally and will be sharing the findings in publications and at academic conferences over the next year. With this lightweight PSNR metric, we saw 2 db improvements with AV1 compared to H.264.</span></p>
<h2><span style="font-weight: 400;">Challenges around improving video quality for high-end networks</span></h2>
<p><span style="font-weight: 400;">As a quick review: For our purposes, high bandwidth covers users for whom bandwidth is greater than 800 kbps. </span></p>
<p><span style="font-weight: 400;">Over the years, there have been huge improvements in camera capture quality. As a result, people’s expectations have gone up, and they want to see RTC video quality on par with local camera capture quality. </span></p>
<p><span style="font-weight: 400;">Based on local testing, we settled on settings resulting in video quality that looks similar to that of camera recordings. We call this HD mode. We found that with a video codec like H.264 encoding at 3.5 Mbps and 30 frames per second, 720p resolution looked very similar to local camera recordings. We also compared 720p to 1080p in subjective quality tests and found that the difference is not noticeable on most devices except for those with a larger screen when we conducted subjective quality tests.</span></p>
<h3><span style="font-weight: 400;">Bandwidth estimator improvements</span></h3>
<p><span style="font-weight: 400;">Improving the video quality for users who have high-end phones with good CPUs, good batteries, hardware codecs, and good network speeds seems trivial. It may seem like all you have to do is increase the maximum bitrate, capture resolution, and capture frame rates, and users will send high-quality video. But, in reality, it’s not that simple. </span></p>
<p><span style="font-weight: 400;">If you increase the bitrate, you expose your bandwidth estimation and congestion detection algorithm to hit congestion more often, and your algorithm will be tested many more times than if you were not using these higher bitrates. </span></p>
<p><img loading="lazy" decoding="async" class="size-large wp-image-21113" src="https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-7.png?w=1024" alt="" width="1024" height="395" srcset="https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-7.png 1650w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-7.png?resize=916,354 916w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-7.png?resize=768,296 768w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-7.png?resize=1024,395 1024w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-7.png?resize=1536,593 1536w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-7.png?resize=96,37 96w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-7.png?resize=192,74 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>Figure 7: Example showing how using higher bandwidth increases the instances for congestion.</p>
<p><span style="font-weight: 400;">If you look at the network pipeline in Figure 7, the higher the bitrates you are using, the more your algorithm/code will be tested for robustness over the time of the RTC call. Figure 7 shows how using 1 Mbps hits more congestion than using 500 Kbps and using 3 Mbps hits more congestion than 1 Mbps, and so on. If you are using bandwidths lower than the minimum throughput of the network, however, you won’t hit congestion at all. For example, see the 500-Kbps call in Figure 7. </span></p>
<p><span style="font-weight: 400;">To mitigate these issues, we improved congestion detection. For example, we added custom ISP throttling detection, something that was not being caught by the traditional delay-based estimator of WebRTC. </span></p>
<p><span style="font-weight: 400;">Bandwidth estimator and network resilience comprise a complex area on their own, and this is where RTC products stand out. They have their own custom algorithms that work best for their products and customers.</span></p>
<h3><span style="font-weight: 400;">Stable quality</span></h3>
<p><span style="font-weight: 400;">People don’t like oscillations in video quality. These can happen when we send high-quality video for a few seconds and then drop back to low-quality because of congestion. Learning from past history, we added support in  bandwidth estimation to prevent these oscillations.</span></p>
<h3><span style="font-weight: 400;">Audio is more important than video for RTC</span></h3>
<p><span style="font-weight: 400;">When network congestion occurs, all media packets could be lost. This causes video freezes and broken audio, (aka, robotic audio). For RTC, both are bad, but audio quality is more important than video. </span></p>
<p><span style="font-weight: 400;">Broken audio often completely prevents conversations from happening, often causing people to hang up or redial the call. Broken video, on the other hand, often results in less delightful conversations, but, depending on the scenario, it could also be a block for some users.</span></p>
<p><span style="font-weight: 400;">At high bitrates like 2.5 Mbps and higher, you can afford to have three to five times more audio packets or duplication without any noticeable degradation to video. When operating in these higher bitrates with cell phone connections, we saw more of these congestion, packet loss, and ISP throttling issues, so we had to make changes to our network resiliency algorithms. And since people are highly sensitive to data usage on their cell phones, we disabled high bitrates on cellular connections.</span></p>
<h3><span style="font-weight: 400;">When to enable HD?</span></h3>
<p><span style="font-weight: 400;">We used ML-based targeting to guess which call should be HD-capable. We relied on the network stats from the users’ previous calls to predict if HD should be enabled or not.</span></p>
<h3><span style="font-weight: 400;">Battery regressions</span></h3>
<p><span style="font-weight: 400;">We have lots of metrics, including performance, networking, and media quality, to track the quality of RTC calls. When we ran tests for HD, we noticed regressions in battery metrics. What we found was that most battery regressions do not come from higher bitrates or resolution but from the capture frame rates.</span></p>
<p><span style="font-weight: 400;">To mitigate the regressions, we built a mechanism for detecting both caller and callee device capabilities, including device model, battery levels, Wi-Fi or mobile usage, and so on. To enable high-quality modes, we check both sides of the call to ensure that they satisfy the requirements and only then do we enable these high-quality, resource-intensive configurations.</span></p>
<p><img loading="lazy" decoding="async" class="size-large wp-image-21114" src="https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-8.png?w=972" alt="" width="972" height="608" srcset="https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-8.png 972w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-8.png?resize=916,573 916w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-8.png?resize=768,480 768w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-8.png?resize=96,60 96w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-8.png?resize=192,120 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>Figure 8: Signaling server setup for turning HD on or off.</p>
<h2><span style="font-weight: 400;">What the future holds for RTC</span></h2>
<p><span style="font-weight: 400;">Hardware manufacturers are acknowledging the significant benefits of using AV1 for RTC. The new Apple iPhone 15 Pro supports AV1’s hardware decoder, and the Google Pixel 8 supports AV1 encoding and decoding. Hardware codecs are an absolute necessity for high-end network and HD resolutions. Video calling is becoming as ubiquitous as traditional audio calling and we hope that as hardware manufacturers recognize this shift, there will be more opportunities for collaboration between RTC app creators and hardware manufacturers to optimize encoders for these scenarios. </span></p>
<p><span style="font-weight: 400;">On the software side, we will continue to work on optimizing AV1 software encoders and developing new encoder implementations. We try to provide the best experience for our users, but at the same time we want to let people have full control over their RTC experience. We will provide controls to the users so that they can choose whether they want higher quality at the cost of battery and data usage, or vice versa.</span></p>
<p><span style="font-weight: 400;">We also plan to work with IHVs to collaborate on hardware codec development to make these codecs usable for RTC scenarios including low-bandwidth use cases. </span></p>
<p><span style="font-weight: 400;">We also will investigate forward-looking features such as video processing to increase the resolution and frame rates on the receiver’s rendering stack and leveraging AI/ML to improve bandwidth estimation (BWE) and network resiliency.</span></p>
<p><span style="font-weight: 400;">Further, we’re investigating</span> <span style="font-weight: 400;">Pixel Codec Avatar</span><span style="font-weight: 400;"> technologies that will allow us to transmit the model/share once and then send the geometry/vectors for receiver side rendering. This enables video rendering with much smaller bandwidth usage than traditional video codecs for RTC scenarios. </span></p>The post <a href="https://dailyzsocialmedianews.com/higher-video-for-cellular-rtc-with-av1-and-hd/">Higher video for cellular RTC with AV1 and HD</a> first appeared on <a href="https://dailyzsocialmedianews.com">DAILY ZSOCIAL MEDIA NEWS</a>.]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
