<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Facebook | DAILY ZSOCIAL MEDIA NEWS</title>
	<atom:link href="https://dailyzsocialmedianews.com/category/facebook/feed/" rel="self" type="application/rss+xml" />
	<link>https://dailyzsocialmedianews.com</link>
	<description>ALL ABOUT DAILY ZSOCIAL MEDIA NEWS</description>
	<lastBuildDate>Tue, 26 Mar 2024 19:29:40 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.7.1</generator>

<image>
	<url>https://dailyzsocialmedianews.com/wp-content/uploads/2020/12/cropped-DAILY-ZSOCIAL-MEDIA-NEWS-e1607166156946-32x32.png</url>
	<title>Facebook | DAILY ZSOCIAL MEDIA NEWS</title>
	<link>https://dailyzsocialmedianews.com</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Bringing HDR picture assist to Instagram and Threads</title>
		<link>https://dailyzsocialmedianews.com/bringing-hdr-picture-assist-to-instagram-and-threads/</link>
		
		<dc:creator><![CDATA[]]></dc:creator>
		<pubDate>Tue, 26 Mar 2024 19:29:40 +0000</pubDate>
				<category><![CDATA[Facebook]]></category>
		<category><![CDATA[bringing]]></category>
		<category><![CDATA[HDR]]></category>
		<category><![CDATA[Instagram]]></category>
		<category><![CDATA[photo]]></category>
		<category><![CDATA[Support]]></category>
		<category><![CDATA[Threads]]></category>
		<guid isPermaLink="false">https://dailyzsocialmedianews.com/?p=24983</guid>

					<description><![CDATA[<p>Meta’s family of apps serves trillions of image download requests every day. And if you’re into high-quality images, you’ve probably noticed that Instagram and Threads have added support for high dynamic range (HDR) photos. Now people on Threads and Instagram can upload and share images that are more true-to-life, with the full color and range [&#8230;]</p>
The post <a href="https://dailyzsocialmedianews.com/bringing-hdr-picture-assist-to-instagram-and-threads/">Bringing HDR picture assist to Instagram and Threads</a> first appeared on <a href="https://dailyzsocialmedianews.com">DAILY ZSOCIAL MEDIA NEWS</a>.]]></description>
										<content:encoded><![CDATA[<p></p>
<p>Meta’s family of apps serves trillions of image download requests every day. And if you’re into high-quality images, you’ve probably noticed that Instagram and Threads have added support for high dynamic range (HDR) photos. Now people on Threads and Instagram can upload and share images that are more true-to-life, with the full color and range their device is capable of capturing.</p>
<p>Zuzanna Mroczek, a software engineer on Meta’s Media Platform Team, joins Pascal Hartig (@passy) on the Meta Tech Podcast to talk about how her team, which owns the entire flow from serving images from the CDN to displaying them on your device, is driving up image quality across apps, platforms, and devices.</p>
<p>Hear how the Media Platform Team brought HDR to Instagram and Threads, and how they partnered with major phone manufacturers (including Google and Samsung) on the rollout!</p>
<p>Download or listen to the episode below:</p>
<p><iframe style="border: none;" title="Libsyn Player" src="https://html5-player.libsyn.com/embed/episode/id/30326568/height/90/theme/custom/thumbnail/yes/direction/forward/render-playlist/no/custom-color/000000/" width="100%" height="90" scrolling="no" allowfullscreen="allowfullscreen"></iframe></p>
<p>You can also find the episode wherever you get your podcasts, including:</p>
<p>The Meta Tech Podcast is a podcast, brought to you by Meta, where we highlight the work Meta’s engineers are doing at every level – from low-level frameworks to end-user features.</p>
<p>Send us feedback on Instagram, Threads, or X.</p>
<p>And if you’re interested in learning more about career opportunities at Meta visit the Meta Careers page.</p>The post <a href="https://dailyzsocialmedianews.com/bringing-hdr-picture-assist-to-instagram-and-threads/">Bringing HDR picture assist to Instagram and Threads</a> first appeared on <a href="https://dailyzsocialmedianews.com">DAILY ZSOCIAL MEDIA NEWS</a>.]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Threads has entered the fediverse</title>
		<link>https://dailyzsocialmedianews.com/threads-has-entered-the-fediverse/</link>
		
		<dc:creator><![CDATA[]]></dc:creator>
		<pubDate>Thu, 21 Mar 2024 21:46:15 +0000</pubDate>
				<category><![CDATA[Facebook]]></category>
		<category><![CDATA[Entered]]></category>
		<category><![CDATA[fediverse]]></category>
		<category><![CDATA[Threads]]></category>
		<guid isPermaLink="false">https://dailyzsocialmedianews.com/?p=24949</guid>

					<description><![CDATA[<div style="margin-bottom:20px;"><img width="1024" height="546" src="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/21214613/Threads-has-entered-the-fediverse.png" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="Threads has entered the fediverse" decoding="async" fetchpriority="high" srcset="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/21214613/Threads-has-entered-the-fediverse.png 1024w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/21214613/Threads-has-entered-the-fediverse-300x160.png 300w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/21214613/Threads-has-entered-the-fediverse-768x410.png 768w" sizes="(max-width: 1024px) 100vw, 1024px" /></div><p>Threads has entered the fediverse! As part of our beta experience, now available in a few countries, Threads users aged 18+ with public profiles can now choose to share their Threads posts to other ActivityPub-compliant servers. People on those servers can now follow federated Threads profiles and see, like, reply to, and repost posts from [&#8230;]</p>
The post <a href="https://dailyzsocialmedianews.com/threads-has-entered-the-fediverse/">Threads has entered the fediverse</a> first appeared on <a href="https://dailyzsocialmedianews.com">DAILY ZSOCIAL MEDIA NEWS</a>.]]></description>
										<content:encoded><![CDATA[<div style="margin-bottom:20px;"><img width="1024" height="546" src="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/21214613/Threads-has-entered-the-fediverse.png" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="Threads has entered the fediverse" decoding="async" srcset="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/21214613/Threads-has-entered-the-fediverse.png 1024w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/21214613/Threads-has-entered-the-fediverse-300x160.png 300w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/21214613/Threads-has-entered-the-fediverse-768x410.png 768w" sizes="(max-width: 1024px) 100vw, 1024px" /></div><p></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Threads has entered the fediverse! As part of our beta experience, now available in a few countries, Threads users aged 18+ with public profiles can now </span><span style="font-weight: 400;">choose to share their Threads posts</span><span style="font-weight: 400;"> to other ActivityPub-compliant servers.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">People on those servers can now follow federated Threads profiles and see, like, reply to, and repost posts from the fediverse.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">We’re sharing how we’re continuing to integrate Threads with the fediverse, the technical challenges, the solutions we’ve come up with along the way, and what’s next as we move toward making Threads fully interoperable.</span></li>
</ul>
<p><span style="font-weight: 400;">Threads’ initial launch came together in</span> <span style="font-weight: 400;">only a few short months</span><span style="font-weight: 400;">. A nimble team of engineers, leveraging</span> <span style="font-weight: 400;">Meta’s existing scalable infrastructure</span><span style="font-weight: 400;">, was able to make Threads Meta’s most successful app launch of all time.</span></p>
<p><span style="font-weight: 400;">Now, we’re integrating Threads with the fediverse. With our beta experience, </span><span style="font-weight: 400;">now available in a few countries, including the US,</span><span style="font-weight: 400;"> Threads users aged 18+ with public profiles can now </span><span style="font-weight: 400;">choose to federate their profiles</span><span style="font-weight: 400;"> – allowing them to share their Threads posts to other </span><span style="font-weight: 400;">ActivityPub-compliant</span><span style="font-weight: 400;"> servers, and enabling people on those servers to follow them, and like, reply to, and repost their posts.</span></p>
<p><span style="font-weight: 400;">Building a federated platform – Meta’s first app for open social networking – has meant new engineering challenges and opportunities. Designing for the fediverse comes with unique interoperability considerations and hurdles to overcome on the server side. </span></p>
<h2><span style="font-weight: 400;">What is the fediverse?</span></h2>
<p><span style="font-weight: 400;">When we set out to build Threads our goal was always to build a decentralized social networking app within the fediverse, where federated networking gives people greater control over their online identity and the content they see, regardless of their chosen platform.</span></p>
<p><span style="font-weight: 400;">One way to think about the fediverse is to compare it to email. You can send an email from a Gmail account to a Yahoo account, for example, because those services support the same protocols. Similarly, in the fediverse you can connect with people who use different social networking services that are built on the same protocol, removing the silos that confine people and their followers to any single platform. But unlike email, your fediverse conversations and profile are public and can be shared across servers.</span></p>
<p><span style="font-weight: 400;">Building Threads on an open social networking protocol gives people more freedom and choice in the online communities they inhabit. </span><span style="font-weight: 400;">Every fediverse server can set its own community standards and content moderation policies, meaning people have the freedom to choose spaces that align with their values.</span></p>
<p><span style="font-weight: 400;">We believe this decentralized approach, similar to the protocols governing email and the web itself, will play an important role in the future of online platforms. The fediverse promotes innovation and competition by fostering a more diverse and vibrant ecosystem of social media platforms that can easily connect with a wider audience.</span></p>
<h2><span style="font-weight: 400;">What is ActivityPub?</span></h2>
<p><span style="font-weight: 400;">Threads leverages </span><span style="font-weight: 400;">ActivityPub</span><span style="font-weight: 400;"> – a decentralized, open social networking protocol built by the </span><span style="font-weight: 400;">World Wide Web Consortium(W3C)</span><span style="font-weight: 400;"> – that is premised on a straightforward, fundamental idea: creating a social networking structure based on open protocols that allow people to communicate and network with each other regardless of the server they choose. </span></p>
<p><span style="font-weight: 400;">ActivityPub acts as a server-to-server protocol where the API allows decentralized servers to communicate with one another to deliver content and activities. </span></p>
<p><span style="font-weight: 400;">The protocol plays a key role in allowing Threads to be interoperable with other servers that also use it. Eventually, people on Threads will be able to interact with people on platforms like Mastodon and WordPress without having to sign up for accounts on those apps.</span></p>
<h2><span style="font-weight: 400;">The current state of fediverse integration in Threads</span></h2>
<p><span style="font-weight: 400;">With our beta experience, Threads users aged 18+ with public profiles can now </span><span style="font-weight: 400;">choose to enable sharing to the fediverse</span><span style="font-weight: 400;">. If they do, they’ll be able to publish posts on Threads that will be viewable on other ActivityPub-compliant servers. Threads users will also be able to see aggregated like counts on their posts from other fediverse servers directly from the Threads app. If people on other fediverse servers follow federated Threads profiles they’ll be able to see, reply to, and repost Threads posts (if their server allows it).</span></p>
</p>
<h3><span style="font-weight: 400;">What types of content are federated?</span></h3>
<p><span style="font-weight: 400;">In this initial phase federated Threads users will not be able to see who liked their posts or any replies from people in the fediverse on Threads. For now, people who want to see replies on their posts on other fediverse servers will have to visit those servers directly.</span></p>
<p><span style="font-weight: 400;">Certain types of posts and content are also not federated, including:</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Posts with restricted replies.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Replies to non-federated posts.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Post with polls (until future updates).</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Reposts of non-federated posts.</span></li>
</ul>
<p><span style="font-weight: 400;">For posts that contain links, a link attachment will be appended as a link at the end of the post if it is not already included in the post. </span></p>
<h2><span style="font-weight: 400;">Building more federated features for Threads</span></h2>
<p><span style="font-weight: 400;">More federated features for Threads will come once we have addressed other technical hurdles in a way that we feel is safest and offers the best possible user experience. Within all of this, it’s also important to us that, as we build these solutions, we do so alongside the open and decentralized fediverse developer community. </span></p>
<p><span style="font-weight: 400;">As we federate new features in Threads, we have to look at how to address the disparity in the availability and implementation of these features across servers.</span></p>
<h3><span style="font-weight: 400;">Federating quote posts</span></h3>
<p><span style="font-weight: 400;">Take quote posts as an example. They’re a popular feature across all social media, but ActivityPub does not have a formal specification for how to handle them yet. Thus, fediverse servers have come up with their own methods of integrating and handling quote posts. Some servers allow for creating and viewing quote posts; others don’t support the function at all.</span></p>
<p><span style="font-weight: 400;">There are a handful of unofficial methods for handling quote posts in ActivityPub. One fediverse enhancement proposal (FEP),</span> <span style="font-weight: 400;">FEP-e232</span><span style="font-weight: 400;">, proposes a way to represent inline quotes and other text-based links to ActivityPub in a manner similar to mentions on other social media platforms. Another method would be to use the </span><span style="font-weight: 400;"><span style="font-family: 'courier new', courier;">quoteURL</span> </span><span style="font-weight: 400;">property within ActivityPub, which would assign posts an ID that could then be pulled into other posts that want to quote them.</span> <span style="font-weight: 400;">Misskey created its own solution</span><span style="font-weight: 400;"> with its </span><span style="font-weight: 400; font-family: 'courier new', courier;">_misskey_quote</span><span style="font-weight: 400;"> property, which builds on FEP-e232.</span></p>
<p><span style="font-weight: 400;">Many fediverse servers also append extra syntax (</span><span style="font-weight: 400; font-family: 'courier new', courier;">RE:<quoted post URL></span><span style="font-weight: 400;">) to post content to make it compatible with servers that haven’t implemented any of the structured methods for handling quote posts.</span></p>
<p><span style="font-weight: 400;">After exploring different options pursued by the fediverse community, we chose to implement both FEP-e232 and</span><span style="font-weight: 400;"> <span style="font-weight: 400; font-family: 'courier new', courier;">_misskey_quote</span></span><span style="font-weight: 400;"> to federate quote posts on Threads. As of now, none of these methods are official keys in the ActivityPub namespace. We chose <span style="font-weight: 400; font-family: 'courier new', courier;">_misskey_quote</span></span><span style="font-weight: 400;"> because its naming makes it clear that it’s not an official ActivityPub method, and because we know that it’s supported by Misskey, Firefish, and potentially other servers that use quote posts.</span></p>
<p><span style="font-weight: 400;">In our current implementation, if a Threads user creates a quote post from a federated post, the quote post will contain a permalink URL (e.g. “</span><span style="font-weight: 400; font-family: 'courier new', courier;">RE: <URL to permalink></span><span style="font-weight: 400;">“) to the post along with a structured representation of the post. Platforms outside of Threads can display the quote post similar to how it’s displayed on Threads by using the structured representation to fetch the post and display it within the quote post.</span></p>
<p><span style="font-weight: 400;">If the post being quoted is not federated, the quote post’s content will only contain the permalink URL and not the structured representation. </span></p>
<h3><span style="font-weight: 400;">Federated and non-federated interactions</span></h3>
<p><span style="font-weight: 400;">If a federated Threads user is replying to, quoting, or reposting a post from another federated Threads user it makes perfect sense to federate that reply, quote, or repost (which we do).</span></p>
<p><span style="font-weight: 400;">However, we had to take a careful look at the complexities that arise since not every Threads user will opt in to turn on sharing to the fediverse. Prioritizing the user experience for both those who federate and those who choose not to is important to us. Which also means federated and non-federated users on Threads should still be able to interact with one another seamlessly. </span></p>
<p><span style="font-weight: 400;">Unlike other federated platforms, Threads doesn’t simply federate every post. Given that features like replies may or may not be federated, we had to build UI/UX treatments and notices to help people understand what is happening and what to expect when posting. </span></p>
<h2><span style="font-weight: 400;">Our phased approach to the fediverse</span></h2>
<p><span style="font-weight: 400;">We’re taking a phased approach to Threads’ fediverse integration to ensure we can continue to build responsibly and get valuable feedback from our users and the fediverse community. </span></p>
<p><span style="font-weight: 400;">In the future, we expect content to flow from the fediverse into Threads. Federated Threads users will be able to see and engage with replies to their posts coming from other servers, or follow people on other fediverse servers and engage with their content directly in Threads. Our plan is for fediverse-enabled Threads profiles to ultimately have one consolidated number of followers that combines users that followed them from Threads and users from other servers. </span></p>
<p><span style="font-weight: 400;">Building a federated social networking app is a complex and delicate process if it is to be done safely. While we don’t have exact dates or details on our milestones just yet, we’re committed to a fully interoperable experience, and we’ll take the time to get this right and grow the fediverse responsibly.</span></p>
<p><span style="font-weight: 400;">This is another step in our journey to make Threads fully interoperable. We will continue to collaborate with developers and policy makers so that people across services have the opportunity to experience the benefits the fediverse offers via a fully interoperable experience, including reaching new audiences and fostering their community. </span></p>The post <a href="https://dailyzsocialmedianews.com/threads-has-entered-the-fediverse/">Threads has entered the fediverse</a> first appeared on <a href="https://dailyzsocialmedianews.com">DAILY ZSOCIAL MEDIA NEWS</a>.]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Optimizing RTC bandwidth estimation with machine studying</title>
		<link>https://dailyzsocialmedianews.com/optimizing-rtc-bandwidth-estimation-with-machine-studying/</link>
		
		<dc:creator><![CDATA[]]></dc:creator>
		<pubDate>Thu, 21 Mar 2024 01:34:51 +0000</pubDate>
				<category><![CDATA[Facebook]]></category>
		<category><![CDATA[bandwidth]]></category>
		<category><![CDATA[estimation]]></category>
		<category><![CDATA[learning]]></category>
		<category><![CDATA[Machine]]></category>
		<category><![CDATA[Optimizing]]></category>
		<category><![CDATA[RTC]]></category>
		<guid isPermaLink="false">https://dailyzsocialmedianews.com/?p=24934</guid>

					<description><![CDATA[<div style="margin-bottom:20px;"><img width="1023" height="576" src="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/21013450/Optimizing-RTC-bandwidth-estimation-with-machine-learning.png" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="Optimizing RTC bandwidth estimation with machine learning" decoding="async" loading="lazy" srcset="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/21013450/Optimizing-RTC-bandwidth-estimation-with-machine-learning.png 1023w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/21013450/Optimizing-RTC-bandwidth-estimation-with-machine-learning-300x169.png 300w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/21013450/Optimizing-RTC-bandwidth-estimation-with-machine-learning-768x432.png 768w" sizes="auto, (max-width: 1023px) 100vw, 1023px" /></div><p>Bandwidth estimation (BWE) and congestion control play an important role in delivering high-quality real-time communication (RTC) across Meta’s family of apps. We’ve adopted a machine learning (ML)-based approach that allows us to solve networking problems holistically across cross-layers such as BWE, network resiliency, and transport. We’re sharing our experiment results from this approach, some of [&#8230;]</p>
The post <a href="https://dailyzsocialmedianews.com/optimizing-rtc-bandwidth-estimation-with-machine-studying/">Optimizing RTC bandwidth estimation with machine studying</a> first appeared on <a href="https://dailyzsocialmedianews.com">DAILY ZSOCIAL MEDIA NEWS</a>.]]></description>
										<content:encoded><![CDATA[<div style="margin-bottom:20px;"><img width="1023" height="576" src="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/21013450/Optimizing-RTC-bandwidth-estimation-with-machine-learning.png" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="Optimizing RTC bandwidth estimation with machine learning" decoding="async" loading="lazy" srcset="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/21013450/Optimizing-RTC-bandwidth-estimation-with-machine-learning.png 1023w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/21013450/Optimizing-RTC-bandwidth-estimation-with-machine-learning-300x169.png 300w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/21013450/Optimizing-RTC-bandwidth-estimation-with-machine-learning-768x432.png 768w" sizes="auto, (max-width: 1023px) 100vw, 1023px" /></div><p></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Bandwidth estimation (BWE) and congestion control play an important role in delivering high-quality real-time communication (RTC) across Meta’s family of apps.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">We’ve adopted a machine learning (ML)-based approach that allows us</span><span style="font-weight: 400;"> to solve networking problems holistically across cross-layers such as BWE, network resiliency, and transport.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">We’re sharing our experiment results from this approach, some of the challenges we encountered during execution, and learnings for new adopters.</span></li>
</ul>
<p><span style="font-weight: 400;">Our existing bandwidth estimation (BWE) module at Meta is</span> <span style="font-weight: 400;">based on WebRTC’s Google Congestion Controller (GCC)</span><span style="font-weight: 400;">. We have made several improvements through parameter tuning, but this has resulted in a more complex system, as shown in Figure 1.</span></p>
<p>Figure 1: BWE module’s system diagram for congestion control in RTC.</p>
<p><span style="font-weight: 400;">One challenge with the tuned congestion control (CC)/BWE algorithm was that it had multiple parameters and actions that were dependent on network conditions. For example, there was a trade-off between quality and reliability; improving quality for high-bandwidth users often led to reliability regressions for low-bandwidth users, and vice versa, making it challenging to optimize the user experience for different network conditions.</span></p>
<p><span style="font-weight: 400;">Additionally, we noticed some inefficiencies in regards to improving and maintaining the module with the complex BWE module:</span></p>
<ol>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Due to the absence of realistic network conditions during our experimentation process, fine-tuning the parameters for user clients necessitated several attempts.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Even after the rollout, it wasn’t clear if the optimized parameters were still applicable for the targeted network types.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">This resulted in complex code logics and branches for engineers to maintain.</span></li>
</ol>
<p><span style="font-weight: 400;">To solve these inefficiencies, we developed a machine learning (ML)-based, network-targeting approach that offers a cleaner alternative to hand-tuned rules. This approach also allows us to solve networking problems holistically across cross-layers such as BWE, network resiliency, and transport.</span></p>
<h2><span style="font-weight: 400;">Network characterization</span></h2>
<p><span style="font-weight: 400;">An ML model-based approach leverages time series data to improve the bandwidth estimation by using offline parameter tuning for characterized network types. </span></p>
<p><span style="font-weight: 400;">For an RTC call to be completed, the endpoints must be connected to each other through network devices. The optimal configs that have been tuned offline are stored on the server and can be updated in real-time. During the call connection setup, these optimal configs are delivered to the client. During the call, media is transferred directly between the endpoints or through a relay server. Depending on the network signals collected during the call, an ML-based approach characterizes the network into different types and applies the optimal configs for the detected type.</span></p>
<p><span style="font-weight: 400;">Figure 2 illustrates an example of an RTC call that’s optimized using the ML-based approach. </span><span style="font-weight: 400;"> </span></p>
<p><img loading="lazy" decoding="async" class="size-large wp-image-21120" src="https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-2.png?w=1024" alt="" width="1024" height="576" srcset="https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-2.png 1999w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-2.png?resize=580,326 580w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-2.png?resize=916,516 916w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-2.png?resize=768,432 768w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-2.png?resize=1024,576 1024w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-2.png?resize=1536,864 1536w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-2.png?resize=96,54 96w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-2.png?resize=192,108 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>Figure 2: An example RTC call configuration with optimized parameters delivered from the server and based on the current network type.</p>
<h2><span style="font-weight: 400;">Model learning and offline parameter tuning</span></h2>
<p><span style="font-weight: 400;">On a high level, network characterization consists of two main components, as shown in Figure 3. The first component is offline ML model learning using ML to categorize the network type (random packet loss versus bursty loss). The second component uses offline simulations to tune parameters optimally for the categorized network type. </span></p>
<p><img loading="lazy" decoding="async" class="size-large wp-image-21121" src="https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-3.png?w=1024" alt="" width="1024" height="576" srcset="https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-3.png 1999w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-3.png?resize=580,326 580w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-3.png?resize=916,516 916w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-3.png?resize=768,432 768w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-3.png?resize=1024,576 1024w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-3.png?resize=1536,864 1536w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-3.png?resize=96,54 96w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-3.png?resize=192,108 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>Figure 3: Offline ML-model learning and parameter tuning.</p>
<p><span style="font-weight: 400;">For model learning, we leverage the time series data (network signals and non-personally identifiable information, see Figure 6, below) from production calls and simulations. Compared to the aggregate metrics logged after the call, time series captures the time-varying nature of the network and dynamics. We use</span><span style="font-weight: 400;"> FBLearner</span><span style="font-weight: 400;">, our internal AI stack, for the training pipeline and deliver the PyTorch model files on demand to the clients at the start of the call.</span></p>
<p><span style="font-weight: 400;">For offline tuning, we use simulations to run network profiles for the detected types and choose the optimal parameters for the modules based on improvements in technical metrics (such as quality, freeze, and so on.).</span></p>
<h2><span style="font-weight: 400;">Model architecture</span></h2>
<p><span style="font-weight: 400;">From our experience, we’ve found that it’s necessary to combine time series features with non-time series (i.e., derived metrics from the time window) for a highly accurate modeling.</span></p>
<p><span style="font-weight: 400;">To handle both time series and non-time series data, we’ve designed a model architecture that can process input from both sources.</span></p>
<p><span style="font-weight: 400;">The time series data will pass through a</span> <span style="font-weight: 400;">long short-term memory (LSTM) layer</span><span style="font-weight: 400;"> that will convert time series input into a one-dimensional vector representation, such as 16×1. The non-time series data or dense data will pass through a dense layer (i.e., a fully connected layer). Then the two vectors will be concatenated, to fully represent the network condition in the past, and passed through a fully connected layer again. The final output from the neural network model will be the predicted output of the target/task, as shown in Figure 4. </span></p>
<p><img loading="lazy" decoding="async" class="size-large wp-image-21122" src="https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-4.png?w=1024" alt="" width="1024" height="576" srcset="https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-4.png 1999w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-4.png?resize=580,326 580w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-4.png?resize=916,516 916w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-4.png?resize=768,432 768w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-4.png?resize=1024,576 1024w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-4.png?resize=1536,864 1536w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-4.png?resize=96,54 96w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-4.png?resize=192,108 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>Figure 4: Combined-model architecture with LSTM and Dense Layers</p>
<h2><span style="font-weight: 400;">Use case: Random packet loss classification</span></h2>
<p><span style="font-weight: 400;">Let’s consider the use case of categorizing packet loss as either random or congestion. The former loss is due to the network components, and the latter is due to the limits in queue length (which are delay dependent). Here is the ML task definition:</span><span style="font-weight: 400;"><br /></span><span style="font-weight: 400;"><br /></span><span style="font-weight: 400;">Given the network conditions in the past N seconds (10), and that the network is currently incurring packet loss, the goal is to characterize the packet loss at the current timestamp as RANDOM or not.</span></p>
<p><span style="font-weight: 400;">Figure 5 illustrates how we leverage the architecture to achieve that goal:</span></p>
<p><img loading="lazy" decoding="async" class="size-large wp-image-21123" src="https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-5.png?w=1024" alt="" width="1024" height="576" srcset="https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-5.png 1999w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-5.png?resize=580,326 580w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-5.png?resize=916,516 916w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-5.png?resize=768,432 768w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-5.png?resize=1024,576 1024w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-5.png?resize=1536,864 1536w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-5.png?resize=96,54 96w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-5.png?resize=192,108 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>Figure 5: Model architecture for a random packet loss classification task.</p>
<h3><span style="font-weight: 400;">Time series features</span></h3>
<p><span style="font-weight: 400;">We leverage the following time series features gathered from logs:</span></p>
<p><img loading="lazy" decoding="async" class="wp-image-21136 size-large" src="https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-6b.png?w=1024" alt="" width="1024" height="576" srcset="https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-6b.png 2500w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-6b.png?resize=580,326 580w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-6b.png?resize=916,515 916w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-6b.png?resize=768,432 768w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-6b.png?resize=1024,576 1024w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-6b.png?resize=1536,864 1536w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-6b.png?resize=2048,1152 2048w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-6b.png?resize=96,54 96w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-6b.png?resize=192,108 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>Figure 6: Time series features used for model training.</p>
<h3><span style="font-weight: 400;">BWE optimization</span></h3>
<p><span style="font-weight: 400;">When the ML model detects random packet loss, we perform local optimization on the BWE module by:</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Increasing the tolerance to random packet loss in the loss-based BWE (holding the bitrate).</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Increasing the ramp-up speed, depending on the link capacity on high bandwidths.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Increasing the network resiliency by sending additional forward-error correction packets to recover from packet loss.</span></li>
</ul>
<h2><span style="font-weight: 400;">Network prediction</span></h2>
<p><span style="font-weight: 400;">The network characterization problem discussed in the previous sections focuses on classifying network types based on past information using time series data. For those simple classification tasks, we achieve this using the hand-tuned rules with some limitations. The real power of leveraging ML for networking, however, comes from using it for predicting future network conditions.</span></p>
<p><span style="font-weight: 400;">We have applied ML for solving congestion-prediction problems for optimizing low-bandwidth users’ experience.</span></p>
<h2><span style="font-weight: 400;">Congestion prediction</span></h2>
<p><span style="font-weight: 400;">From our analysis of production data, we found that low-bandwidth users often incur congestion due to the behavior of the GCC module. By predicting this congestion, we can improve the reliability of such users’ behavior. Towards this, we addressed the following problem statement using round-trip time (RTT) and packet loss:</span><span style="font-weight: 400;"><br /></span><span style="font-weight: 400;"><br /></span><span style="font-weight: 400;">Given the historical time-series data from production/simulation (“N” seconds), the goal is to predict packet loss due to congestion or the congestion itself in the next “N” seconds; that is, a spike in RTT followed by a packet loss or a further growth in RTT.</span></p>
<p><span style="font-weight: 400;">Figure 7 shows an example from a simulation where the bandwidth alternates between 500 Kbps and 100 Kbps every 30 seconds. As we lower the bandwidth, the network incurs congestion and the ML model predictions fire the green spikes even before the delay spikes and packet loss occur. This early prediction of congestion is helpful in faster reactions and thus improves the user experience by preventing video freezes and connection drops.</span></p>
<p><img loading="lazy" decoding="async" class="size-large wp-image-21137" src="https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-7b.png?w=1024" alt="" width="1024" height="576" srcset="https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-7b.png 2500w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-7b.png?resize=580,326 580w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-7b.png?resize=916,515 916w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-7b.png?resize=768,432 768w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-7b.png?resize=1024,576 1024w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-7b.png?resize=1536,864 1536w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-7b.png?resize=2048,1152 2048w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-7b.png?resize=96,54 96w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-7b.png?resize=192,108 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>Figure 7: Simulated network scenario with alternating bandwidth for congestion prediction</p>
<h2><span style="font-weight: 400;">Generating training samples</span></h2>
<p><span style="font-weight: 400;">The main challenge in modeling is generating training samples for a variety of congestion situations. With simulations, it’s harder to capture different types of congestion that real user clients would encounter in production networks. As a result, we used actual production logs for labeling congestion samples, following the RTT-spikes criteria in the past and future windows according to the following assumptions:</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Absent past RTT spikes, packet losses in the past and future are independent.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Absent past RTT spikes, we cannot predict future RTT spikes or fractional losses (i.e., flosses).</span></li>
</ul>
<p><span style="font-weight: 400;">We split the time window into past (4 seconds) and future (4 seconds) for labeling.</span><span style="font-weight: 400;"><br /></span></p>
<p><img loading="lazy" decoding="async" class="size-large wp-image-21126" src="https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-8.png?w=1024" alt="" width="1024" height="576" srcset="https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-8.png 1999w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-8.png?resize=580,326 580w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-8.png?resize=916,516 916w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-8.png?resize=768,432 768w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-8.png?resize=1024,576 1024w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-8.png?resize=1536,864 1536w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-8.png?resize=96,54 96w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-8.png?resize=192,108 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>Figure 8: Labeling criteria for congestion prediction</p>
<h2><span style="font-weight: 400;">Model performance</span></h2>
<p><span style="font-weight: 400;">Unlike network characterization, where ground truth is unavailable, we can obtain ground truth by examining the future time window after it has passed and then comparing it with the prediction made four seconds earlier. With this logging information gathered from real production clients, we compared the performance in offline training to online data from user clients:</span></p>
<p><img loading="lazy" decoding="async" class="size-large wp-image-21127" src="https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-9.png?w=1024" alt="" width="1024" height="576" srcset="https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-9.png 1999w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-9.png?resize=580,326 580w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-9.png?resize=916,516 916w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-9.png?resize=768,432 768w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-9.png?resize=1024,576 1024w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-9.png?resize=1536,864 1536w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-9.png?resize=96,54 96w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-9.png?resize=192,108 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>Figure 9: Offline versus online model performance comparison.</p>
<h2><span style="font-weight: 400;">Experiment results</span></h2>
<p><span style="font-weight: 400;">Here are some highlights from our deployment of various ML models to improve bandwidth estimation:</span></p>
<h3><span style="font-weight: 400;">Reliability wins for congestion prediction</span></h3>
<p><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /></span> <span style="font-weight: 400;">connection_drop_rate -0.326371 +/- 0.216084<br /></span><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> last_minute_quality_regression_v1 -0.421602 +/- 0.206063<br /></span><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> last_minute_quality_regression_v2 -0.371398 +/- 0.196064<br /></span><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> bad_experience_percentage -0.230152 +/- 0.148308<br /></span><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> transport_not_ready_pct -0.437294 +/- 0.400812</span></p>
<p><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /></span><span style="font-weight: 400;"> peer_video_freeze_percentage -0.749419 +/- 0.180661<br /></span><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> peer_video_freeze_percentage_above_500ms -0.438967 +/- 0.212394</span></p>
<h3><span style="font-weight: 400;">Quality and user engagement wins for random packet loss characterization in high bandwidth</span></h3>
<p><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /></span><span style="font-weight: 400;"> peer_video_freeze_percentage -0.379246 +/- 0.124718<br /></span><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> peer_video_freeze_percentage_above_500ms -0.541780 +/- 0.141212<br /></span><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> peer_neteq_plc_cng_perc -0.242295 +/- 0.137200</span></p>
<p><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> total_talk_time 0.154204 +/- 0.148788</span></p>
<h3><span style="font-weight: 400;">Reliability and quality wins for cellular low bandwidth classification</span></h3>
<p><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> connection_drop_rate -0.195908 +/- 0.127956<br /></span><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> last_minute_quality_regression_v1 -0.198618 +/- 0.124958<br /></span><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> last_minute_quality_regression_v2 -0.188115 +/- 0.138033</span></p>
<p><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> peer_neteq_plc_cng_perc -0.359957 +/- 0.191557<br /></span><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> peer_video_freeze_percentage -0.653212 +/- 0.142822</span></p>
<h3><span style="font-weight: 400;">Reliability and quality wins for cellular high bandwidth classification</span></h3>
<p><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> avg_sender_video_encode_fps 0.152003 +/- 0.046807<br /></span><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> avg_sender_video_qp -0.228167 +/- 0.041793<br /></span><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> avg_video_quality_score 0.296694 +/- 0.043079<br /></span><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> avg_video_sent_bitrate 0.430266 +/- 0.092045</span></p>
<h2><span style="font-weight: 400;">Future plans for applying ML to RTC</span></h2>
<p><span style="font-weight: 400;">From our project execution and experimentation on production clients, we noticed that a ML-based approach is more efficient in targeting, end-to-end monitoring, and updating than traditional hand-tuned rules for networking. However, the efficiency of ML solutions largely depends on data quality and labeling (using simulations or production logs). By applying ML-based solutions to solving network prediction problems – congestion in particular – we fully leveraged the power of ML. </span></p>
<p><span style="font-weight: 400;">In the future, we will be consolidating all the network characterization models into a single model using the multi-task approach to fix the inefficiency due to redundancy in model download, inference, and so on. We will be building a shared representation model for the time series to solve different tasks (e.g., bandwidth classification, packet loss classification, etc.) in network characterization. We will focus on building realistic production network scenarios for model training and validation. This will enable us to use ML to identify optimal network actions given the network conditions. We will persist in refining our learning-based methods to enhance network performance by considering existing network signals.</span></p>The post <a href="https://dailyzsocialmedianews.com/optimizing-rtc-bandwidth-estimation-with-machine-studying/">Optimizing RTC bandwidth estimation with machine studying</a> first appeared on <a href="https://dailyzsocialmedianews.com">DAILY ZSOCIAL MEDIA NEWS</a>.]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Higher video for cellular RTC with AV1 and HD</title>
		<link>https://dailyzsocialmedianews.com/higher-video-for-cellular-rtc-with-av1-and-hd/</link>
		
		<dc:creator><![CDATA[]]></dc:creator>
		<pubDate>Wed, 20 Mar 2024 21:32:05 +0000</pubDate>
				<category><![CDATA[Facebook]]></category>
		<category><![CDATA[AV1]]></category>
		<category><![CDATA[Mobile]]></category>
		<category><![CDATA[RTC]]></category>
		<category><![CDATA[video]]></category>
		<guid isPermaLink="false">https://dailyzsocialmedianews.com/?p=24930</guid>

					<description><![CDATA[<div style="margin-bottom:20px;"><img width="1023" height="610" src="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/20213203/Better-video-for-mobile-RTC-with-AV1-and-HD.png" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="Better video for mobile RTC with AV1 and HD" decoding="async" loading="lazy" srcset="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/20213203/Better-video-for-mobile-RTC-with-AV1-and-HD.png 1023w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/20213203/Better-video-for-mobile-RTC-with-AV1-and-HD-300x179.png 300w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/20213203/Better-video-for-mobile-RTC-with-AV1-and-HD-768x458.png 768w" sizes="auto, (max-width: 1023px) 100vw, 1023px" /></div><p>At Meta, we support real-time communication (RTC) for billions of people through our apps, including Messenger, Instagram, and WhatsApp. We’ve seen significant benefits by adopting the AV1 codec for RTC. Here’s how we are improving the RTC video quality for our apps with tools like the AV1 codec, the challenges we face, and how we [&#8230;]</p>
The post <a href="https://dailyzsocialmedianews.com/higher-video-for-cellular-rtc-with-av1-and-hd/">Higher video for cellular RTC with AV1 and HD</a> first appeared on <a href="https://dailyzsocialmedianews.com">DAILY ZSOCIAL MEDIA NEWS</a>.]]></description>
										<content:encoded><![CDATA[<div style="margin-bottom:20px;"><img width="1023" height="610" src="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/20213203/Better-video-for-mobile-RTC-with-AV1-and-HD.png" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="Better video for mobile RTC with AV1 and HD" decoding="async" loading="lazy" srcset="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/20213203/Better-video-for-mobile-RTC-with-AV1-and-HD.png 1023w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/20213203/Better-video-for-mobile-RTC-with-AV1-and-HD-300x179.png 300w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/20213203/Better-video-for-mobile-RTC-with-AV1-and-HD-768x458.png 768w" sizes="auto, (max-width: 1023px) 100vw, 1023px" /></div><p></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">At Meta, we support real-time communication (RTC) for billions of people through our apps, including Messenger, Instagram, and WhatsApp.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">We’ve seen significant benefits by adopting the </span><span style="font-weight: 400;">AV1 codec for RTC</span><span style="font-weight: 400;">.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Here’s how we are improving the RTC video quality for our apps with tools like the AV1 codec, the challenges we face, and how we mitigate those challenges.</span></li>
</ul>
<p><span style="font-weight: 400;">The last few decades have seen tremendous improvements in mobile phone camera quality as well as video quality for streaming video services. But if we look at real-time communication (RTC) applications, while the video quality also has improved over time, it has always lagged behind that of camera quality. </span></p>
<p><span style="font-weight: 400;">When we looked at ways to improve video quality for RTC across our family of apps, AV1 stood out as the best option. Meta has increasingly</span> <span style="font-weight: 400;">adopted the AV1 codec</span><span style="font-weight: 400;"> over the years because it offers high video quality at bitrates much lower than older codecs. But, </span><span style="font-weight: 400;">as we’ve implemented AV1 for mobile RTC</span><span style="font-weight: 400;">, we’ve also had to address a number of challenges including scaling,  improving video quality for low-bandwidth users as well as high-end networks, CPU and battery usage, and maintaining quality stability.</span></p>
<h2><span style="font-weight: 400;">Improving video quality for low-bandwidth networks</span></h2>
<p><span style="font-weight: 400;">This post is going to focus on peer-to-peer (P2P, or 1:1) calls, which involve two participants. </span></p>
<p><span style="font-weight: 400;">People who use our products and services experience a range of network conditions – some have really great networks, while others are using throttled or low-bandwidth networks.</span></p>
<p><span style="font-weight: 400;">This chart illustrates what the distribution of bandwidth looks like for some of these calls on Messenger:</span></p>
<p>Figure 1: Bandwidth distribution of P2P calls on Messenger.</p>
<p><span style="font-weight: 400;">As seen in Figure 1, some calls operate in very low-bandwidth conditions. </span></p>
<p><span style="font-weight: 400;">We consider anything less than 300 Kbps to be a low-end network, but we also see a lot of video calls operating at just 50 Kbps, or even under 25 Kbps.</span></p>
<p><span style="font-weight: 400;">Note that this bandwidth is the share for the video encoder. Total bandwidth is shared with audio, RTP overhead, signaling overhead, RTX (re-transmissions of packets to handle lost packets)/FEC (forward error correction)/duplication (packet duplication), and so on. The big assumption here is that the bandwidth estimator is working correctly and estimating true bitrates. </span></p>
<p><span style="font-weight: 400;">There are no universal definitions for low, mid, and high networks, but for the purpose of this blog post, less than 300 Kbps will be considered as low, 300-800 Kbps as mid, and above 800 Kbps as a high, HD-capable, or high-end network.</span></p>
<p><span style="font-weight: 400;">When we looked into improving the video quality for low-bandwidth users, there were few key options. Migrating to a newer codec such as AV1 presented the greatest opportunity. Other options such as better video scalers and region-of-interest encoding offered incremental improvements. </span></p>
<h3><span style="font-weight: 400;">Video scalers</span></h3>
<p><span style="font-weight: 400;">We use WebRTC in most of our apps, but the video scalers shipped with WebRTC don’t have the best quality video scaling. We have been able to improve the video scaling quality significantly by leveraging in-house scalers. </span></p>
<p><span style="font-weight: 400;">At low bitrates, we often end up downscaling the video to encode at ¼ resolution (assuming the camera capture is 640×480 or 1280×720). With our custom scaler implementations, we have seen significant improvements in video quality. From public tests we saw gains in peak signal-to-noise ratio (PSNR) by 0.75 db on average.</span></p>
<p><span style="font-weight: 400;">Here is a snapshot showing results with the default</span> <span style="font-weight: 400;">libyuv</span><span style="font-weight: 400;"> scaler (a box filter):</span></p>
<p><img loading="lazy" decoding="async" class="size-large wp-image-21107" src="https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-2a.png?w=866" alt="" width="866" height="290" srcset="https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-2a.png 866w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-2a.png?resize=768,257 768w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-2a.png?resize=96,32 96w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-2a.png?resize=192,64 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>Figure 2.a: Video image results using WebRTC/libyuv video scaler.</p>
<p><span style="font-weight: 400;">And the results after downscaling with our video scaler:</span></p>
<p><img loading="lazy" decoding="async" class="size-large wp-image-21108" src="https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-2b.png?w=866" alt="" width="866" height="290" srcset="https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-2b.png 866w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-2b.png?resize=768,257 768w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-2b.png?resize=96,32 96w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-2b.png?resize=192,64 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>Figure 2.b: Video image results using Meta’s video scaler.</p>
<h3><span style="font-weight: 400;">Region-of-interest encoding</span></h3>
<p><span style="font-weight: 400;">Identifying the region of interest (ROI) allowed us to optimize by spending more encoder bitrate in the area that’s most important to a viewer (the speaker’s face in a talking head video, for example). Most mobile devices have APIs to locate the face region without utilizing any CPU overhead. Once we have found the face region we can configure the encoder to spend more bits on this important region and less on the rest. The easiest way to do this was to have some APIs on encoders to configure the quantization parameters (QP) for ROI versus the rest of the image. These changes provided incremental improvements in the video quality metrics like PSNR. </span></p>
<h2><span style="font-weight: 400;">Adopting the AV1 video codec</span></h2>
<p><span style="font-weight: 400;">The video encoder is a key element when it comes to video quality for RTC. H.264 has been the most popular codec over the last decade, with hardware support and most applications supporting it. But it is a 20-year-old codec. Back in 2018, the Alliance for Open Media (AOMedia) standardized the AV1 video codec. Since then, several companies including Meta, YouTube, and Netflix have </span><span style="font-weight: 400;">deployed it at a large scale for video streaming</span><span style="font-weight: 400;">. </span></p>
<p><span style="font-weight: 400;">At Meta, moving from H.264 to AV1 led us to our greatest improvements in video quality at low bitrates.</span></p>
<p><img loading="lazy" decoding="async" class="size-large wp-image-21109" src="https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-3.png?w=1024" alt="" width="1024" height="427" srcset="https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-3.png 1600w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-3.png?resize=916,382 916w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-3.png?resize=768,320 768w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-3.png?resize=1024,427 1024w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-3.png?resize=1536,640 1536w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-3.png?resize=96,40 96w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-3.png?resize=192,80 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>Figure 3: Improvements over time, moving from H.262 to AV1 and H.266</p>
<h3><span style="font-weight: 400;">Why AV1?</span></h3>
<p><span style="font-weight: 400;">We chose to use AV1 in part because it’s royalty-free. Codec licensing (and concurrent fees) was an important aspect in our decision-making process. Typically, if an application uses a device’s hardware codec, no additional codec licensing costs will be incurred. But if an application is shipping a software version of the codec,  there will most likely be licensing costs to cover.</span></p>
<p><span style="font-weight: 400;">But why do we need to use software codecs even though most phones have hardware-supported codecs?</span></p>
<p><span style="font-weight: 400;">Most mobile devices have dedicated hardware for video encoding and decoding. And these days most mobile devices support H.264 and even H.265s. But those encoders are designed for common use cases such as camera capture, which uses much higher resolutions, frame rates, and bitrates. Most mobile device hardware is currently capable of encoding 4K 60 FPS in real time with very low battery usage, but the results of encoding a 7 FPS, 320×180, 200 Kbps video are often worse than those of software encoders running on the same mobile device. </span></p>
<p><span style="font-weight: 400;">The reason for that is prioritization of the RTC use case. Most independent hardware vendors (IHVs) are not aware of the network conditions where RTC calls operate; hence, these hardware codecs are not optimized for RTC scenarios, especially for low bitrates, resolutions, and frame rates. So, we leverage software encoders when operating in these low bitrates to provide high-quality video.</span></p>
<p><span style="font-weight: 400;">And since we can’t ship software codecs without a license, AV1 is a very good option for RTC.</span></p>
<h2><span style="font-weight: 400;">AV1 for RTC</span></h2>
<p><span style="font-weight: 400;">The biggest reason to move to a more advanced video codec is simple: The same quality experience can be delivered with a much lower bitrate, and we can deliver a much higher-quality real-time calling experience for our users who are on bandwidth-constrained networks.</span></p>
<p><span style="font-weight: 400;">Measuring video quality is a complex topic, but a relatively simple way to look at it is to use the </span><span style="font-weight: 400;">Bjontegaard Delta-Bit Rate</span><span style="font-weight: 400;"> (BD-BR) metric. BD-BR compares how much bitrate various codecs need to produce a certain quality level. By generating multiple samples at different bitrates, measuring the quality of the produced video provides a rate-distortion (RD) curve, and from the RD curve you can derive the BD-BR (as shown below).</span></p>
<p><span style="font-weight: 400;">As can be seen in Figure 4, AV1 provided higher quality for all bitrate ranges in our local tests.</span></p>
<p><img loading="lazy" decoding="async" class="size-large wp-image-21110" src="https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-4.png?w=1024" alt="" width="1024" height="650" srcset="https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-4.png 1282w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-4.png?resize=916,582 916w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-4.png?resize=768,488 768w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-4.png?resize=1024,650 1024w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-4.png?resize=96,61 96w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-4.png?resize=192,122 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>Figure 4: Bitrate distortion comparison chart.</p>
<h3><span style="font-weight: 400;">Screen-encoding tools</span></h3>
<p><span style="font-weight: 400;">AV1 also has a few key tools that are useful for RTC. Screen content quality is becoming an increasingly important factor for Meta, with relevant use cases, including screen sharing, game streaming, and VR remote desktop, requiring high-quality encoding. In these areas, AV1 truly shines. </span></p>
<p><span style="font-weight: 400;">Traditionally, video encoders aren’t well suited to complex content such as text with a lot of high-frequency content, and humans are sensitive to reading blurry text. AV1 has a set of coding tools—palette mode and intra-block copy—that drastically improve performance for screen content. Palette mode is designed according to the observation that the pixel values in a screen-content frame usually concentrate on the limited number of color values. Palette mode can represent the screen content efficiently by signaling the color clusters instead of the quantized transform-domain coefficients. In addition, for typical screen content, repetitive patterns can usually be found within the same picture. Intra-block copy facilitates block prediction within the same frame, so that the compression efficiency can be improved significantly. That AV1 provides these two tools at the baseline profile is a huge plus.</span></p>
<h3><span style="font-weight: 400;">Reference picture resampling</span><span style="font-weight: 400;">: Fewer key frames</span></h3>
<p><span style="font-weight: 400;">Another useful feature is reference picture resampling (RPR), which allows resolution changes without generating a key frame. In video compression, a key frame is one that’s encoded independently, like a still image. It’s the only type of frame that can be decoded without having another frame as reference. </span></p>
<p><span style="font-weight: 400;">For RTC applications, since the bandwidth keeps on changing often, there are frequent resolution changes needed to adapt to these network changes. With older codecs like H.264, each of these resolution changes requires a key frame that is much larger in size and thus inefficient for RTC apps. Such large key frames increase the amount of data needing to be sent over the network and result in higher end-to-end latencies and congestion. </span></p>
<p><span style="font-weight: 400;">By using RPR, we can avoid generating any key frames.</span></p>
<p><img loading="lazy" decoding="async" class="alignnone size-large wp-image-21111" src="https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-5.png?w=788" alt="" width="788" height="242" srcset="https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-5.png 788w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-5.png?resize=768,236 768w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-5.png?resize=96,29 96w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-5.png?resize=192,59 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/></p>
<h2><span style="font-weight: 400;">Challenges around improving video quality for low-bandwidth users</span></h2>
<h3><span style="font-weight: 400;">CPU/battery usage</span></h3>
<p><span style="font-weight: 400;">AV1 is great for coding efficiency, but codecs achieve this at the cost of higher CPU and battery usage. A lot of modern codecs pose these challenges when running real-time applications on mobile platforms.</span></p>
<p><span style="font-weight: 400;">Based on local lab testing, we anticipated a roughly 4 percent increase in battery usage, and we saw similar results in public tests. We used a power meter to do this local battery measurement.</span></p>
<p><span style="font-weight: 400;">Even though the AV1 encoder itself increased CPU usage three-fold when compared to H.264 implementation, the overall contribution of CPU usage from the encoder was a small part of the battery usage. The phone display screen, networking/radio, and other processes using the CPU contribute significantly to battery usage, hence the increase in battery usage was 5-6 percent (a significant increase in battery usage). </span></p>
<p><span style="font-weight: 400;">A lot of calls run out of device battery, or people hang up once their operating system indicates a low battery, so increasing battery usage isn’t worthwhile for users unless it provides increased value such as video quality improvement. Even then it’s a trade-off between video quality versus battery use.</span></p>
<p><span style="font-weight: 400;">We use WebRTC and Session Description Protocol (SDP) for codec negotiation, which allows us to negotiate multiple codecs (e.g., AV1 and H.264) up front and then switch the codecs without any need for signaling or a handshake during the call. This means the codec switch is seamless, without users noticing any glitches or pauses in video.</span></p>
<p><span style="font-weight: 400;">We created a custom encoder that encapsulates both H.264 and the AV1 encoders. We call it a hybrid encoder. This allowed us to switch the codec during the call based on triggers such as CPU usage, battery level, or encoding time — and to switch to the more battery-efficient H.264 encoder when needed. </span></p>
<h3><span style="font-weight: 400;">Increased crashes and out of memory errors</span></h3>
<p><span style="font-weight: 400;">Even without new leaks added, AV1 used more memory than H.264. Any time additional memory is used, apps are more likely to hit out of memory (OOM) crashes or hit OOM sooner because of other leaks or memory demands on the system from other apps. To mitigate this, we had to disable AV1 on devices with low memory. This is one area for improvement and for further optimizing the encoder’s memory usage.</span></p>
<h3><span style="font-weight: 400;">In-product quality measurement</span></h3>
<p><span style="font-weight: 400;">To compare the quality between H.264 and AV1 using public tests, we needed a low-complexity metric. Metrics such as encoded bitrates and frame rates won’t show any gains as the total bandwidth available to send video is still the same, as these are limited by the network capacity, which means the bitrates and frame rates for video will not change much with the change in the codec. We had been using composite metrics that combine the quantization parameter (QP is often used as a proxy for video quality, as this introduces pixel data loss during the encoding process), resolutions, and frame rate, and freezes it to generate video composite metrics, but QP is not comparable between AV1 and H.264 codecs, and hence can’t be used.</span></p>
<p><span style="font-weight: 400;">PSNR is a standard metric, but it’s reference-based and hence does not work for RTC. Non-reference, video-quality metrics are quite CPU-intensive (e.g., BRISQUE: Blind/Referenceless Image Spatial Quality Evaluator), though we are exploring those as well.</span></p>
<p><img loading="lazy" decoding="async" class="size-large wp-image-21112" src="https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-6.png?w=1024" alt="" width="1024" height="635" srcset="https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-6.png 1220w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-6.png?resize=916,568 916w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-6.png?resize=768,476 768w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-6.png?resize=1024,635 1024w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-6.png?resize=96,59 96w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-6.png?resize=192,119 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>Figure 6: High-level architecture for PSNR computation in RTC.</p>
<p><span style="font-weight: 400;">We have come up with a framework for PSNR computation. We first modified the encoder to report distortions caused by compression (most software encoders already have support for this metric). Then we designed a lightweight, scaling-distortion algorithm that estimates the distortion introduced by video scaling. This algorithm can combine these scaling distortions with the encoder distortions to produce output PSNR. We developed and verified this algorithm locally and will be sharing the findings in publications and at academic conferences over the next year. With this lightweight PSNR metric, we saw 2 db improvements with AV1 compared to H.264.</span></p>
<h2><span style="font-weight: 400;">Challenges around improving video quality for high-end networks</span></h2>
<p><span style="font-weight: 400;">As a quick review: For our purposes, high bandwidth covers users for whom bandwidth is greater than 800 kbps. </span></p>
<p><span style="font-weight: 400;">Over the years, there have been huge improvements in camera capture quality. As a result, people’s expectations have gone up, and they want to see RTC video quality on par with local camera capture quality. </span></p>
<p><span style="font-weight: 400;">Based on local testing, we settled on settings resulting in video quality that looks similar to that of camera recordings. We call this HD mode. We found that with a video codec like H.264 encoding at 3.5 Mbps and 30 frames per second, 720p resolution looked very similar to local camera recordings. We also compared 720p to 1080p in subjective quality tests and found that the difference is not noticeable on most devices except for those with a larger screen when we conducted subjective quality tests.</span></p>
<h3><span style="font-weight: 400;">Bandwidth estimator improvements</span></h3>
<p><span style="font-weight: 400;">Improving the video quality for users who have high-end phones with good CPUs, good batteries, hardware codecs, and good network speeds seems trivial. It may seem like all you have to do is increase the maximum bitrate, capture resolution, and capture frame rates, and users will send high-quality video. But, in reality, it’s not that simple. </span></p>
<p><span style="font-weight: 400;">If you increase the bitrate, you expose your bandwidth estimation and congestion detection algorithm to hit congestion more often, and your algorithm will be tested many more times than if you were not using these higher bitrates. </span></p>
<p><img loading="lazy" decoding="async" class="size-large wp-image-21113" src="https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-7.png?w=1024" alt="" width="1024" height="395" srcset="https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-7.png 1650w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-7.png?resize=916,354 916w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-7.png?resize=768,296 768w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-7.png?resize=1024,395 1024w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-7.png?resize=1536,593 1536w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-7.png?resize=96,37 96w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-7.png?resize=192,74 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>Figure 7: Example showing how using higher bandwidth increases the instances for congestion.</p>
<p><span style="font-weight: 400;">If you look at the network pipeline in Figure 7, the higher the bitrates you are using, the more your algorithm/code will be tested for robustness over the time of the RTC call. Figure 7 shows how using 1 Mbps hits more congestion than using 500 Kbps and using 3 Mbps hits more congestion than 1 Mbps, and so on. If you are using bandwidths lower than the minimum throughput of the network, however, you won’t hit congestion at all. For example, see the 500-Kbps call in Figure 7. </span></p>
<p><span style="font-weight: 400;">To mitigate these issues, we improved congestion detection. For example, we added custom ISP throttling detection, something that was not being caught by the traditional delay-based estimator of WebRTC. </span></p>
<p><span style="font-weight: 400;">Bandwidth estimator and network resilience comprise a complex area on their own, and this is where RTC products stand out. They have their own custom algorithms that work best for their products and customers.</span></p>
<h3><span style="font-weight: 400;">Stable quality</span></h3>
<p><span style="font-weight: 400;">People don’t like oscillations in video quality. These can happen when we send high-quality video for a few seconds and then drop back to low-quality because of congestion. Learning from past history, we added support in  bandwidth estimation to prevent these oscillations.</span></p>
<h3><span style="font-weight: 400;">Audio is more important than video for RTC</span></h3>
<p><span style="font-weight: 400;">When network congestion occurs, all media packets could be lost. This causes video freezes and broken audio, (aka, robotic audio). For RTC, both are bad, but audio quality is more important than video. </span></p>
<p><span style="font-weight: 400;">Broken audio often completely prevents conversations from happening, often causing people to hang up or redial the call. Broken video, on the other hand, often results in less delightful conversations, but, depending on the scenario, it could also be a block for some users.</span></p>
<p><span style="font-weight: 400;">At high bitrates like 2.5 Mbps and higher, you can afford to have three to five times more audio packets or duplication without any noticeable degradation to video. When operating in these higher bitrates with cell phone connections, we saw more of these congestion, packet loss, and ISP throttling issues, so we had to make changes to our network resiliency algorithms. And since people are highly sensitive to data usage on their cell phones, we disabled high bitrates on cellular connections.</span></p>
<h3><span style="font-weight: 400;">When to enable HD?</span></h3>
<p><span style="font-weight: 400;">We used ML-based targeting to guess which call should be HD-capable. We relied on the network stats from the users’ previous calls to predict if HD should be enabled or not.</span></p>
<h3><span style="font-weight: 400;">Battery regressions</span></h3>
<p><span style="font-weight: 400;">We have lots of metrics, including performance, networking, and media quality, to track the quality of RTC calls. When we ran tests for HD, we noticed regressions in battery metrics. What we found was that most battery regressions do not come from higher bitrates or resolution but from the capture frame rates.</span></p>
<p><span style="font-weight: 400;">To mitigate the regressions, we built a mechanism for detecting both caller and callee device capabilities, including device model, battery levels, Wi-Fi or mobile usage, and so on. To enable high-quality modes, we check both sides of the call to ensure that they satisfy the requirements and only then do we enable these high-quality, resource-intensive configurations.</span></p>
<p><img loading="lazy" decoding="async" class="size-large wp-image-21114" src="https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-8.png?w=972" alt="" width="972" height="608" srcset="https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-8.png 972w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-8.png?resize=916,573 916w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-8.png?resize=768,480 768w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-8.png?resize=96,60 96w, https://engineering.fb.com/wp-content/uploads/2024/03/RTC-Video_Figure-8.png?resize=192,120 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>Figure 8: Signaling server setup for turning HD on or off.</p>
<h2><span style="font-weight: 400;">What the future holds for RTC</span></h2>
<p><span style="font-weight: 400;">Hardware manufacturers are acknowledging the significant benefits of using AV1 for RTC. The new Apple iPhone 15 Pro supports AV1’s hardware decoder, and the Google Pixel 8 supports AV1 encoding and decoding. Hardware codecs are an absolute necessity for high-end network and HD resolutions. Video calling is becoming as ubiquitous as traditional audio calling and we hope that as hardware manufacturers recognize this shift, there will be more opportunities for collaboration between RTC app creators and hardware manufacturers to optimize encoders for these scenarios. </span></p>
<p><span style="font-weight: 400;">On the software side, we will continue to work on optimizing AV1 software encoders and developing new encoder implementations. We try to provide the best experience for our users, but at the same time we want to let people have full control over their RTC experience. We will provide controls to the users so that they can choose whether they want higher quality at the cost of battery and data usage, or vice versa.</span></p>
<p><span style="font-weight: 400;">We also plan to work with IHVs to collaborate on hardware codec development to make these codecs usable for RTC scenarios including low-bandwidth use cases. </span></p>
<p><span style="font-weight: 400;">We also will investigate forward-looking features such as video processing to increase the resolution and frame rates on the receiver’s rendering stack and leveraging AI/ML to improve bandwidth estimation (BWE) and network resiliency.</span></p>
<p><span style="font-weight: 400;">Further, we’re investigating</span> <span style="font-weight: 400;">Pixel Codec Avatar</span><span style="font-weight: 400;"> technologies that will allow us to transmit the model/share once and then send the geometry/vectors for receiver side rendering. This enables video rendering with much smaller bandwidth usage than traditional video codecs for RTC scenarios. </span></p>The post <a href="https://dailyzsocialmedianews.com/higher-video-for-cellular-rtc-with-av1-and-hd/">Higher video for cellular RTC with AV1 and HD</a> first appeared on <a href="https://dailyzsocialmedianews.com">DAILY ZSOCIAL MEDIA NEWS</a>.]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Logarithm: A logging engine for AI coaching workflows and providers</title>
		<link>https://dailyzsocialmedianews.com/logarithm-a-logging-engine-for-ai-coaching-workflows-and-providers/</link>
		
		<dc:creator><![CDATA[]]></dc:creator>
		<pubDate>Mon, 18 Mar 2024 16:57:43 +0000</pubDate>
				<category><![CDATA[Facebook]]></category>
		<category><![CDATA[Engine]]></category>
		<category><![CDATA[Logarithm]]></category>
		<category><![CDATA[logging]]></category>
		<category><![CDATA[services]]></category>
		<category><![CDATA[Training]]></category>
		<category><![CDATA[workflows]]></category>
		<guid isPermaLink="false">https://dailyzsocialmedianews.com/?p=24905</guid>

					<description><![CDATA[<div style="margin-bottom:20px;"><img width="1020" height="166" src="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/18165741/Logarithm-A-logging-engine-for-AI-training-workflows-and-services.png" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="Logarithm: A logging engine for AI training workflows and services" decoding="async" loading="lazy" srcset="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/18165741/Logarithm-A-logging-engine-for-AI-training-workflows-and-services.png 1020w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/18165741/Logarithm-A-logging-engine-for-AI-training-workflows-and-services-300x49.png 300w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/18165741/Logarithm-A-logging-engine-for-AI-training-workflows-and-services-768x125.png 768w" sizes="auto, (max-width: 1020px) 100vw, 1020px" /></div><p>Systems and application logs play a key role in operations, observability, and debugging workflows at Meta. Logarithm is a hosted, serverless, multitenant service, used only internally at Meta, that consumes and indexes these logs and provides an interactive query interface to retrieve and view logs. In this post, we present the design behind Logarithm, and [&#8230;]</p>
The post <a href="https://dailyzsocialmedianews.com/logarithm-a-logging-engine-for-ai-coaching-workflows-and-providers/">Logarithm: A logging engine for AI coaching workflows and providers</a> first appeared on <a href="https://dailyzsocialmedianews.com">DAILY ZSOCIAL MEDIA NEWS</a>.]]></description>
										<content:encoded><![CDATA[<div style="margin-bottom:20px;"><img width="1020" height="166" src="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/18165741/Logarithm-A-logging-engine-for-AI-training-workflows-and-services.png" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="Logarithm: A logging engine for AI training workflows and services" decoding="async" loading="lazy" srcset="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/18165741/Logarithm-A-logging-engine-for-AI-training-workflows-and-services.png 1020w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/18165741/Logarithm-A-logging-engine-for-AI-training-workflows-and-services-300x49.png 300w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/18165741/Logarithm-A-logging-engine-for-AI-training-workflows-and-services-768x125.png 768w" sizes="auto, (max-width: 1020px) 100vw, 1020px" /></div><p></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Systems and application logs play a key role in operations, observability, and debugging workflows at Meta.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Logarithm is a hosted, serverless, multitenant service, used only internally at Meta, that consumes and indexes these logs and provides an interactive query interface to retrieve and view logs.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">In this post, we present the design behind Logarithm, and show how it powers AI training debugging use cases.</span></li>
</ul>
<p><span style="font-weight: 400;">Logarithm indexes 100+GB/s of logs in real time, and thousands of queries a second. We designed the system to support service-level guarantees on log freshness, completeness, durability, query latency, and query result completeness. Users can emit logs using their choice of logging library (the common library at Meta is the Google Logging Library [glog]). Users can query using regular expressions on log lines, arbitrary </span><span style="font-weight: 400;">metadata</span><span style="font-weight: 400;"> fields attached to logs, and across log files of hosts and services.</span></p>
<p><span style="font-weight: 400;">Logarithm is written in C++21 and the codebase follows modern C++ patterns, including coroutines and async execution. This has supported both performance and maintainability, and helped the team move fast – developing Logarithm in just three years.</span></p>
<h2><span style="font-weight: 400;">Logarithm’s data model</span></h2>
<p><span style="font-weight: 400;">Logarithm represents logs as a named </span><span style="font-weight: 400;">log stream</span><span style="font-weight: 400;"> of (host-local) time-ordered sequences of immutable unstructured text, corresponding to a single log file. A process can emit multiple log streams (</span><span style="font-weight: 400; font-family: 'courier new', courier;">stdout</span><span style="font-weight: 400;">, </span><span style="font-weight: 400;"><span style="font-family: 'courier new', courier;">stderr</span>,</span><span style="font-weight: 400;"> and custom log files). Each log line can have zero or more metadata key-value pairs attached to it. A common example of metadata is rank ID in machine learning (ML) training, when multiple sequences of log lines are multiplexed into a single log stream (e.g., in PyTorch).</span></p>
<p><span style="font-weight: 400;">Logarithm supports typed structures in two ways – via typed APIs (ints, floats, and strings), and extraction from a log line using regex-based parse-and-extract rules – a common example is metrics of tensors in ML model logging. The extracted key-value pairs are added to the log line’s metadata.</span></p>
<p>Figure 1: Logarithm data model. The boxes on text represent typed structures.</p>
<h2><span style="font-weight: 400;">AI training debugging with Logarithm</span></h2>
<p><span style="font-weight: 400;">Before looking at Logarithm’s internals, we present support for training systems and model issue debugging, one of the prominent use cases of Logarithm at Meta. ML model training workflows tend to have a wide range of failure modes, spanning data inputs, model code and hyperparameters, and systems components (e.g., PyTorch, data readers, checkpointing, framework code, and hardware). Further, failure root causes evolve over time faster than traditional service architectures due to rapidly-evolving workloads, from scale to architectures to sharding and optimizations. In order to triage such a dynamic nature of failures, it is necessary to collect detailed telemetry on the systems and model telemetry.</span></p>
<p><span style="font-weight: 400;">Since training jobs run for extended periods of time, training systems and model telemetry and state need to be continuously captured in order to be able to debug a failure without reproducing the failure with additional logging (which may not be deterministic and wastes GPU resources).</span></p>
<p><span style="font-weight: 400;">Given the scale of training jobs, systems and model telemetry tend to be detailed and very high-throughput – logs are relatively cheap to write (e.g., compared to metrics, relational tables, and traces) and have the information content to power debugging use cases.</span></p>
<p><span style="font-weight: 400;">We stream, index and query high-throughput logs from systems and model layers using Logarithm.</span></p>
<p><span style="font-weight: 400;">Logarithm ingests both systems logs from the training stack, and model telemetry from training jobs that the stack executes. In our setup, each host runs multiple PyTorch ranks (processes), one per GPU, and the processes write their output streams to a single log file. Debugging distributed job failures leads to ambiguity due to lack of rank information in log lines, and adding it means that we modify all logging sites (including third-party code). With the Logarithm metadata API, process context such as rank ID is attached to every log line – the API adds it into thread-local context and attaches a glog handler.</span></p>
<p><span style="font-weight: 400;">We added UI tools to enable common log-based interactive debugging primitives. The following figures show screenshots of two such features (on top of Logarithm’s filtering operations).</span></p>
<p><span style="font-weight: 400;">Filter–by-callsite enables hiding known log lines or verbose/noisy logging sites when walking through a log stream. Walking through multiple log streams side-by-side enables finding rank state that is different from other ranks (i.e., additional lines or missing lines), which typically is a symptom or root cause. This is directly a result of the single program, multiple data nature of production training jobs, where every rank iterates on data batches with the same code (with batch-level barriers).</span></p>
<p><img loading="lazy" decoding="async" class="alignnone size-large wp-image-21076" src="https://engineering.fb.com/wp-content/uploads/2024/03/Logarithm_image-2.png?w=1024" alt="" width="1024" height="742" srcset="https://engineering.fb.com/wp-content/uploads/2024/03/Logarithm_image-2.png 1999w, https://engineering.fb.com/wp-content/uploads/2024/03/Logarithm_image-2.png?resize=916,664 916w, https://engineering.fb.com/wp-content/uploads/2024/03/Logarithm_image-2.png?resize=768,557 768w, https://engineering.fb.com/wp-content/uploads/2024/03/Logarithm_image-2.png?resize=1024,742 1024w, https://engineering.fb.com/wp-content/uploads/2024/03/Logarithm_image-2.png?resize=1536,1113 1536w, https://engineering.fb.com/wp-content/uploads/2024/03/Logarithm_image-2.png?resize=96,70 96w, https://engineering.fb.com/wp-content/uploads/2024/03/Logarithm_image-2.png?resize=192,139 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/></p>
<p><img loading="lazy" decoding="async" class="size-large wp-image-21077" src="https://engineering.fb.com/wp-content/uploads/2024/03/Logarithm_image-3.png?w=1024" alt="" width="1024" height="549" srcset="https://engineering.fb.com/wp-content/uploads/2024/03/Logarithm_image-3.png 1999w, https://engineering.fb.com/wp-content/uploads/2024/03/Logarithm_image-3.png?resize=916,491 916w, https://engineering.fb.com/wp-content/uploads/2024/03/Logarithm_image-3.png?resize=768,412 768w, https://engineering.fb.com/wp-content/uploads/2024/03/Logarithm_image-3.png?resize=1024,549 1024w, https://engineering.fb.com/wp-content/uploads/2024/03/Logarithm_image-3.png?resize=1536,824 1536w, https://engineering.fb.com/wp-content/uploads/2024/03/Logarithm_image-3.png?resize=96,51 96w, https://engineering.fb.com/wp-content/uploads/2024/03/Logarithm_image-3.png?resize=192,103 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>Figure 2: Logarithm UI features for training systems debugging (Logs shown are for demonstration purposes).</p>
<p><span style="font-weight: 400;">Logarithm ingests continuous model telemetry and summary statistics that span model input and output tensors, model properties (e.g., learning rate), model internal state tensors (e.g., neuron activations) and gradients during training. This powers live training model monitoring dashboards such as an internal deployment of TensorBoard, and is used by ML engineers to debug model convergence issues and training failures (due to gradient/loss explosions) using notebooks on raw telemetry.</span></p>
<p><span style="font-weight: 400;">Model telemetry tends to be iteration-based tensor timeseries with dimensions (e.g., model architecture, neuron, or module names), and tends to be high-volume and high-throughput (which makes low-cost ingestion in Logarithm a natural choice). Collocating systems and model telemetry enables debugging issues that cascade from one layer to the other. The model telemetry APIs internally write timeseries and dimensions as typed key-value pairs using the Logarithm metadata API. Multimodal data (e.g., images) are captured as references to files written to an external blob store.</span></p>
<p><span style="font-weight: 400;">Model telemetry dashboards typically tend to be a large number of timeseries visualizations arranged in a grid – this enables ML engineers to eyeball spatial and temporal dynamics of the model external and internal state over time and find anomalies and correlation structure. A single dashboard hence needs to get a significantly large number of timeseries and their tensors. In order to render at interactive latencies, dashboards batch and fan out queries to Logarithm using the streaming API. The streaming API returns results with random ordering, which enables dashboards to incrementally render all plots in parallel – within 100s of milliseconds to the first set of samples and within seconds to the full set of points.</span></p>
<p><img loading="lazy" decoding="async" class="size-large wp-image-21078" src="https://engineering.fb.com/wp-content/uploads/2024/03/Logarithm_image-4.png?w=1024" alt="" width="1024" height="534" srcset="https://engineering.fb.com/wp-content/uploads/2024/03/Logarithm_image-4.png 1999w, https://engineering.fb.com/wp-content/uploads/2024/03/Logarithm_image-4.png?resize=916,477 916w, https://engineering.fb.com/wp-content/uploads/2024/03/Logarithm_image-4.png?resize=768,400 768w, https://engineering.fb.com/wp-content/uploads/2024/03/Logarithm_image-4.png?resize=1024,534 1024w, https://engineering.fb.com/wp-content/uploads/2024/03/Logarithm_image-4.png?resize=1536,801 1536w, https://engineering.fb.com/wp-content/uploads/2024/03/Logarithm_image-4.png?resize=96,50 96w, https://engineering.fb.com/wp-content/uploads/2024/03/Logarithm_image-4.png?resize=192,100 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>Figure 3: TensorBoard model telemetry dashboard powered by Logarithm. Renders 722 metric time series at once (total of 450k samples).</p>
<h2><span style="font-weight: 400;">Logarithm’s system architecture</span></h2>
<p><span style="font-weight: 400;">Our goal behind Logarithm is to build a highly scalable and fault tolerant system that supports high-throughput ingestion and interactive query latencies; and provides strong guarantees on availability, durability, freshness, completeness, and query latency.</span></p>
<p><img loading="lazy" decoding="async" class="size-large wp-image-21079" src="https://engineering.fb.com/wp-content/uploads/2024/03/Logarithm_image-5.png?w=1024" alt="" width="1024" height="566" srcset="https://engineering.fb.com/wp-content/uploads/2024/03/Logarithm_image-5.png 1818w, https://engineering.fb.com/wp-content/uploads/2024/03/Logarithm_image-5.png?resize=916,506 916w, https://engineering.fb.com/wp-content/uploads/2024/03/Logarithm_image-5.png?resize=768,424 768w, https://engineering.fb.com/wp-content/uploads/2024/03/Logarithm_image-5.png?resize=1024,566 1024w, https://engineering.fb.com/wp-content/uploads/2024/03/Logarithm_image-5.png?resize=1536,848 1536w, https://engineering.fb.com/wp-content/uploads/2024/03/Logarithm_image-5.png?resize=96,53 96w, https://engineering.fb.com/wp-content/uploads/2024/03/Logarithm_image-5.png?resize=192,106 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>Figure 4: Logarithm’s system architecture.</p>
<p><span style="font-weight: 400;">At a high level, Logarithm comprises the following components:</span></p>
<ol>
<li style="font-weight: 400;" aria-level="1">Application processes<span style="font-weight: 400;"> emit logs using logging APIs. The APIs support emitting unstructured log lines along with typed metadata key-value pairs (per-line).</span></li>
<li style="font-weight: 400;" aria-level="1">A host-side agent<span style="font-weight: 400;"> discovers the format of lines and parses lines for common fields, such as timestamp, severity, process ID, and callsite.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">The resulting object is buffered and written to a </span>distributed queue<span style="font-weight: 400;"> (for that log stream) that provides durability guarantees with days of object lifetime.</span></li>
<li style="font-weight: 400;" aria-level="1">Ingestion clusters<span style="font-weight: 400;"> read objects from queues, and support additional parsing based on any user-defined regex extraction rules – the extracted key-value pairs are written to the line’s metadata.</span></li>
<li style="font-weight: 400;" aria-level="1">Query clusters<span style="font-weight: 400;"> support interactive and bulk queries on one or more log streams with predicate filters on log text and metadata.</span></li>
</ol>
<p><span style="font-weight: 400;">Logarithm stores locality of data blocks in a central </span><span style="font-weight: 400;">locality service</span><span style="font-weight: 400;">. We implement this on a hosted, highly partitioned and replicated collection of MySQL instances. Every block that is generated at ingestion clusters is written as a set of locality rows (one for each log stream in the block) to a deterministic shard, and reads are distributed across replicas for a shard. For scalability, we do not use distributed transactions since the workload is append-only. Note that since the ingestion processing across log streams is not coordinated by design (for scalability), federated queries across log streams may not return the same last-logged timestamps between log streams.</span></p>
<p><span style="font-weight: 400;">Our design choices center around layering storage, query, and log analytics and simplicity in state distribution. We design for two common properties of logs: they are written more than queried, and recent logs tend to be queried more than older ones.</span></p>
<h3><span style="font-weight: 400;">Design decisions</span></h3>
<p><span style="font-weight: 400;">Logarithm stores logs as blocks of text and metadata and maintains </span>secondary indices<span style="font-weight: 400;"> to support low latency lookups on text and/or metadata. Since logs rapidly lose query likelihood with time, Logarithm </span>tiers the storage<span style="font-weight: 400;"> of logs and secondary indices across physical memory, local SSD, and a remote durable and highly available blob storage service (at Meta we use</span> <span style="font-weight: 400;">Manifold</span><span style="font-weight: 400;">). In addition to secondary indices, tiering also ensures the lowest latencies for the most accessed (recent) logs.</span></p>
<p>Lightweight disaggregated secondary indices.<span style="font-weight: 400;"> Maintaining secondary indices on disaggregated blob storage magnifies data lookup costs at query time. Logarithm’s secondary indices are designed to be lightweight, using Bloom filters. The Bloom filters are prefetched (or loaded on-query) into a distributed cache on the query clusters when blocks are published on disaggregated storage, to hide network latencies on index lookups. We later added support for data blocks in the query cache when executing a query. The system tries to collocate data from the same log stream in order to reduce fan outs and stragglers during query processing. The logs and metadata are implemented as ORC files. The Bloom filters currently index log stream locality and metadata key-value information (i.e., min-max values and Bloom filters for each column of ORC stripes).</span></p>
<p><span style="font-weight: 400;">Logarithm </span>separates compute (ingestion and query) and storage<span style="font-weight: 400;"> to rapidly scale out the volume of log blocks and secondary indices. The exception to this is the in-memory </span><span style="font-weight: 400;">memtable</span><span style="font-weight: 400;"> on ingestion clusters that buffer time-ordered lists of log streams, which is a staging area for both writes and reads. The memtable is a bounded per-log stream buffer of the most recent and long enough time window of logs that are expected to be queried. The ingestion implementation is designed to be I/O-bound and not compute or host memory bandwidth-heavy to handle close to GB/s of per-host ingestion streaming. To minimize memtable contention, we implement multiple memtables, for staging, and an immutable prior version for serializing to disk. Ingestion implementation follows zero-copy semantics.</span></p>
<p><span style="font-weight: 400;">Similarly, Logarithm </span>separates ingestion and query<span style="font-weight: 400;"> resources to ensure bulk processing (ingestion) and interactive workloads do not impact each other. Note that Logarithm’s design uses schema-on-write, but the data model and parsing computation is distributed between the logging hosts (which scales ingestion compute), and optionally, the ingestion clusters (for user-defined parsing). Customers can add additional anticipated capacity for storage (e.g., increased retention limits), ingestion and query workloads.</span></p>
<p><span style="font-weight: 400;">Logarithm </span>pushes down distributed state maintenance<span style="font-weight: 400;"> to disaggregated storage layers (instead of replicating compute at ingestion layer). The disaggregated storage in</span> <span style="font-weight: 400;">Manifold</span><span style="font-weight: 400;"> uses read-write quorums to provide strong consistency, durability and availability guarantees. The distributed queues in</span> <span style="font-weight: 400;">Scribe</span><span style="font-weight: 400;"> use</span> <span style="font-weight: 400;">LogDevice</span><span style="font-weight: 400;"> for maintaining objects as a durable replicated log. This simplifies ingestion and query tier fault tolerance. Ingestion nodes stream serialized objects on local SSDs to Manifold in 20-min. epochs, and checkpoint Scribe offsets on Manifold. When a failed ingestion node is replaced, the new node downloads the last epoch of data from Manifold, and restarts ingesting raw logs from the last Scribe checkpoint.</span></p>
<p>Ingestion elasticity.<span style="font-weight: 400;"> The Logarithm control plane (based on</span> <span style="font-weight: 400;">Shard Manager</span><span style="font-weight: 400;">) tracks ingestion node health and log stream shard-level hotspots, and relocates shards to other nodes when it finds issues or load. When there is an increase in logs written in a log stream, the control plane scales out the shard count and allocates new shards on ingestion nodes with available resources. The system is designed to provide resource isolation at ingestion-time between log streams. If there is a significant surge in very short timescales, the distributed queues in Scribe absorb the spikes, but when the queues are full, the log stream can lose logs (until elasticity mechanisms increase shard counts). Such spikes typically tend to result from logging bugs (e.g., verbosity) in application code.</span></p>
<p>Query processing.<span style="font-weight: 400;"> Queries are routed randomly across the query clusters. When a query node receives a request, it assumes the role of an </span><span style="font-weight: 400;">aggregator</span><span style="font-weight: 400;"> and partitions the request across a bounded subset of query cluster nodes (balancing between cluster load and query latency). The aggregator pushes down filter and sort operators to query nodes and returns sorted results (an end-to-end blocking operation). The query nodes read their partitions of logs by looking up locality, followed by secondary indices and data blocks – the read can span the query cache, ingestion nodes (for most recent logs) and disaggregated storage. We added 2x </span>replication of the query cache<span style="font-weight: 400;"> to support query cluster load distribution and fast failover (without waiting for cache shard movement). Logarithm also provides a streaming query API with randomized and incremental sampling that returns filtered logs (an end-to-end non-blocking operation) for lower-latency reads and time-to-first-log. Logarithm paginates result sets.</span></p>
<p><span style="font-weight: 400;">Logarithm can </span>tradeoff query result completeness or ordering to maintain query latency<span style="font-weight: 400;"> (and flag to the client when it does so). For example, this can be the case when a partition of a query is slow or when the number of blocks to be read is too high. In the former, it times out and skips the straggler. In the latter scenario, it starts from skipped blocks (or offsets) when processing the next result page. In practice, we provide guarantees for both result completeness and query latency. This is primarily feasible since the system has mechanisms to reduce the likelihood of root causes that lead to stragglers. Logarithm also does query admission control at client or user-level.</span></p>
<p><span style="font-weight: 400;">The following figures characterize Logarithm’s aggregate production performance and scalability across all log streams. They highlight scalability as a result of design choices that make the system simpler (spanning disaggregation, ingestion-query separation, indexes, and fault tolerance design). We present our production service-level objectives (SLOs) over a month, which are defined as the fraction of time they violate thresholds on availability, durability (including completeness), freshness, and query latency.</span></p>
<p><img loading="lazy" decoding="async" class="wp-image-21095" src="https://engineering.fb.com/wp-content/uploads/2024/03/Logarithm_image-6b.png?w=1024" alt="" width="350" height="216" srcset="https://engineering.fb.com/wp-content/uploads/2024/03/Logarithm_image-6b.png 1200w, https://engineering.fb.com/wp-content/uploads/2024/03/Logarithm_image-6b.png?resize=916,566 916w, https://engineering.fb.com/wp-content/uploads/2024/03/Logarithm_image-6b.png?resize=768,475 768w, https://engineering.fb.com/wp-content/uploads/2024/03/Logarithm_image-6b.png?resize=1024,633 1024w, https://engineering.fb.com/wp-content/uploads/2024/03/Logarithm_image-6b.png?resize=96,59 96w, https://engineering.fb.com/wp-content/uploads/2024/03/Logarithm_image-6b.png?resize=192,119 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>Figure 5: Logarithm’s ingestion-query scalability for the month of January 2024 (one point per day).<br />
<img loading="lazy" decoding="async" class="wp-image-21096" src="https://engineering.fb.com/wp-content/uploads/2024/03/Logarithm_image-7b.png?w=1024" alt="" width="350" height="216" srcset="https://engineering.fb.com/wp-content/uploads/2024/03/Logarithm_image-7b.png 1200w, https://engineering.fb.com/wp-content/uploads/2024/03/Logarithm_image-7b.png?resize=916,566 916w, https://engineering.fb.com/wp-content/uploads/2024/03/Logarithm_image-7b.png?resize=768,475 768w, https://engineering.fb.com/wp-content/uploads/2024/03/Logarithm_image-7b.png?resize=1024,633 1024w, https://engineering.fb.com/wp-content/uploads/2024/03/Logarithm_image-7b.png?resize=96,59 96w, https://engineering.fb.com/wp-content/uploads/2024/03/Logarithm_image-7b.png?resize=192,119 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>Figure 6: Logarithm SLOs for the month of January 2024 (one point per day).</p>
<p><span style="font-weight: 400;">Logarithm supports strong security and privacy guarantees. Access control can be enforced on a per-log line granularity at ingestion and query-time. Log streams can have configurable retention windows with line-level deletion operations.</span></p>
<h2><span style="font-weight: 400;">Next steps</span></h2>
<p><span style="font-weight: 400;">Over the last few years, several use cases have been built over the foundational log primitives that Logarithm implements. Systems such as relational algebra on structured data and log analytics are being layered on top with Logarithm’s query latency guarantees – using pushdowns of search-filter-sort and federated retrieval operations. Logarithm supports a native UI for interactive log exploration, search, and filtering to aid debugging use cases. This UI is embedded as a widget in service consoles across Meta services. Logarithm also supports a CLI for bulk download of service logs for scripting analyses.</span></p>
<p><span style="font-weight: 400;">The Logarithm design has centered around simplicity for scalability guarantees. We are continuously building domain-specific and agnostic log analytics capabilities within or layered on Logarithm with appropriate pushdowns for performance optimizations. We continue to invest in storage and query-time improvements, such as lightweight disaggregated inverted indices for text search, storage layouts optimized for queries and distributed debugging UI primitives for AI systems.</span></p>
<h2><span style="font-weight: 400;">Acknowledgements</span></h2>
<p><span style="font-weight: 400;">We thank Logarithm team’s current and past members, particularly our leads: Amir Alon, Stavros Harizopoulos, Rukmani Ravisundaram, and Laurynas Sukys, </span><span style="font-weight: 400;">and our leadership: Vinay Perneti, Shah Rahman, Nikhilesh Reddy, Gautam Shanbhag, Girish Vaitheeswaran, and </span><span style="font-weight: 400;">Yogesh Upadhay. </span><span style="font-weight: 400;">Thank you to our partners and customers: Sergey Anpilov, Jenya (Eugene) Lee, Aravind Ram,  </span><span style="font-weight: 400;">Vikram Srivastava, and Mik Vyatskov.</span></p>The post <a href="https://dailyzsocialmedianews.com/logarithm-a-logging-engine-for-ai-coaching-workflows-and-providers/">Logarithm: A logging engine for AI coaching workflows and providers</a> first appeared on <a href="https://dailyzsocialmedianews.com">DAILY ZSOCIAL MEDIA NEWS</a>.]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Constructing Meta’s GenAI Infrastructure &#8211; Engineering at Meta</title>
		<link>https://dailyzsocialmedianews.com/constructing-metas-genai-infrastructure-engineering-at-meta/</link>
		
		<dc:creator><![CDATA[]]></dc:creator>
		<pubDate>Tue, 12 Mar 2024 15:22:58 +0000</pubDate>
				<category><![CDATA[Facebook]]></category>
		<category><![CDATA[Building]]></category>
		<category><![CDATA[Engineering]]></category>
		<category><![CDATA[GenAI]]></category>
		<category><![CDATA[infrastructure]]></category>
		<category><![CDATA[Meta]]></category>
		<category><![CDATA[Metas]]></category>
		<guid isPermaLink="false">https://dailyzsocialmedianews.com/?p=24857</guid>

					<description><![CDATA[<div style="margin-bottom:20px;"><img width="1023" height="733" src="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/12152256/Building-Metas-GenAI-Infrastructure-Engineering-at-Meta.png" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="Building Meta’s GenAI Infrastructure - Engineering at Meta" decoding="async" loading="lazy" srcset="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/12152256/Building-Metas-GenAI-Infrastructure-Engineering-at-Meta.png 1023w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/12152256/Building-Metas-GenAI-Infrastructure-Engineering-at-Meta-300x215.png 300w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/12152256/Building-Metas-GenAI-Infrastructure-Engineering-at-Meta-768x550.png 768w" sizes="auto, (max-width: 1023px) 100vw, 1023px" /></div><p>Marking a major investment in Meta’s AI future, we are announcing two 24k GPU clusters. We are sharing details on the hardware, network, storage, design, performance, and software that help us extract high throughput and reliability for various AI workloads. We use this cluster design for Llama 3 training. We are strongly committed to open [&#8230;]</p>
The post <a href="https://dailyzsocialmedianews.com/constructing-metas-genai-infrastructure-engineering-at-meta/">Constructing Meta’s GenAI Infrastructure – Engineering at Meta</a> first appeared on <a href="https://dailyzsocialmedianews.com">DAILY ZSOCIAL MEDIA NEWS</a>.]]></description>
										<content:encoded><![CDATA[<div style="margin-bottom:20px;"><img width="1023" height="733" src="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/12152256/Building-Metas-GenAI-Infrastructure-Engineering-at-Meta.png" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="Building Meta’s GenAI Infrastructure - Engineering at Meta" decoding="async" loading="lazy" srcset="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/12152256/Building-Metas-GenAI-Infrastructure-Engineering-at-Meta.png 1023w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/12152256/Building-Metas-GenAI-Infrastructure-Engineering-at-Meta-300x215.png 300w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/12152256/Building-Metas-GenAI-Infrastructure-Engineering-at-Meta-768x550.png 768w" sizes="auto, (max-width: 1023px) 100vw, 1023px" /></div><p></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Marking a major investment in Meta’s AI future, we are announcing two 24k GPU clusters. We are sharing details on the hardware, network, storage, design, performance, and software that help us extract high throughput and reliability for various AI workloads. We use this cluster design for Llama 3 training.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">We are strongly committed to open compute and open source. We built these clusters on top of </span><span style="font-weight: 400;">Grand Teton</span><span style="font-weight: 400;">, </span><span style="font-weight: 400;">OpenRack</span><span style="font-weight: 400;">, and </span><span style="font-weight: 400;">PyTorch</span><span style="font-weight: 400;"> and continue to push open innovation across the industry.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">This announcement is one step in our ambitious infrastructure roadmap. By the end of 2024, we’re aiming to continue to grow our infrastructure build-out that will include 350,000 NVIDIA H100 GPUs as part of a portfolio that will feature compute power equivalent to nearly 600,000 H100s.</span></li>
</ul>
<p><span style="font-weight: 400;">To lead in developing AI means leading investments in hardware infrastructure. Hardware infrastructure plays an important role in AI’s future. Today, we’re sharing details on two versions of our </span><span style="font-weight: 400;">24,576-GPU data center scale cluster at Meta. These clusters support our current and next generation AI models, including Llama 3, the successor to</span> <span style="font-weight: 400;">Llama 2</span><span style="font-weight: 400;">, our publicly released LLM, as well as AI research and development across GenAI and other areas .</span></p>
<h2><span style="font-weight: 400;">A peek into Meta’s large-scale AI clusters</span></h2>
<p><span style="font-weight: 400;">Meta’s long-term vision is to build artificial general intelligence (AGI) that is open and built responsibly so that it can be widely available for everyone to benefit from. As we work towards AGI, we have also worked on scaling our clusters to power this ambition. The progress we make towards AGI creates new products,</span> <span style="font-weight: 400;">new AI features for our family of apps</span><span style="font-weight: 400;">, and new AI-centric computing devices. </span></p>
<p><span style="font-weight: 400;">While we’ve had a long history of building AI infrastructure, we first shared details on our </span><span style="font-weight: 400;">AI Research SuperCluster (RSC)</span><span style="font-weight: 400;">, featuring 16,000 NVIDIA A100 GPUs, in 2022. RSC has accelerated our open and responsible AI research by helping us build our first generation of advanced AI models. It played and continues to play an important role in the development of </span><span style="font-weight: 400;">Llama</span><span style="font-weight: 400;"> and </span><span style="font-weight: 400;">Llama 2</span><span style="font-weight: 400;">, as well as advanced AI models for applications ranging from computer vision, NLP, and speech recognition, to</span> <span style="font-weight: 400;">image generation</span><span style="font-weight: 400;">, and even</span> <span style="font-weight: 400;">coding</span><span style="font-weight: 400;">.</span></p>
</p>
<h2><span style="font-weight: 400;">Under the hood</span></h2>
<p><span style="font-weight: 400;">Our newer AI clusters build upon the successes and lessons learned from RSC. We focused on building end-to-end AI systems with a major emphasis on researcher and developer experience and productivity. The efficiency of the high-performance network fabrics within these clusters, some of the key storage decisions, combined with the 24,576 NVIDIA Tensor Core H100 GPUs in each, allow both cluster versions to support models larger and more complex than that could be supported in the RSC and pave the way for advancements in GenAI product development and AI research.</span></p>
<h3><span style="font-weight: 400;">Network</span></h3>
<p><span style="font-weight: 400;">At Meta, we handle hundreds of trillions of AI model executions per day. Delivering these services at a large scale requires a highly advanced and flexible infrastructure. Custom designing much of our own hardware, software, and network fabrics allows us to optimize the end-to-end experience for our AI researchers while ensuring our data centers operate efficiently. </span></p>
<p><span style="font-weight: 400;">With this in mind, we built one cluster with a remote direct memory access (RDMA) over converged Ethernet (RoCE) network fabric solution based on the </span><span style="font-weight: 400;">Arista 7800</span><span style="font-weight: 400;"> with </span><span style="font-weight: 400;">Wedge400</span><span style="font-weight: 400;"> and </span><span style="font-weight: 400;">Minipack2</span><span style="font-weight: 400;"> OCP rack switches. The other cluster features an </span><span style="font-weight: 400;">NVIDIA Quantum2 InfiniBand</span><span style="font-weight: 400;"> fabric. Both of these solutions interconnect 400 Gbps endpoints. With these two, we are able to assess the suitability and scalability of these </span><span style="font-weight: 400;">different types of interconnect for large-scale training,</span><span style="font-weight: 400;"> giving us more insights that will help inform how we design and build even larger, scaled-up clusters in the future. Through careful co-design of the network, software, and model architectures, we have successfully used both RoCE and InfiniBand clusters for large, GenAI workloads (including our ongoing training of Llama 3 on our RoCE cluster) without any network bottlenecks.</span></p>
<h3><span style="font-weight: 400;">Compute</span></h3>
<p><span style="font-weight: 400;">Both clusters are built using</span> <span style="font-weight: 400;">Grand Teton</span><span style="font-weight: 400;">, our in-house-designed, open GPU hardware platform that we’ve contributed to the Open Compute Project (OCP). Grand Teton builds on the many generations of AI systems that integrate power, control, compute, and fabric interfaces into a single chassis for better overall performance, signal integrity, and thermal performance. It provides rapid scalability and flexibility in a simplified design, allowing it to be quickly deployed into data center fleets and easily maintained and scaled. Combined with other in-house innovations like our</span> <span style="font-weight: 400;">Open Rack</span><span style="font-weight: 400;"> power and rack architecture, Grand Teton allows us to build new clusters in a way that is purpose-built for current and future applications at Meta.</span></p>
<p><span style="font-weight: 400;">We have been openly designing our GPU hardware platforms beginning with our </span><span style="font-weight: 400;">Big Sur platform in 2015</span><span style="font-weight: 400;">.</span></p>
<h3><span style="font-weight: 400;">Storage</span></h3>
<p><span style="font-weight: 400;">Storage plays an important role in AI training, and yet is one of the least talked-about aspects. As the GenAI training jobs become more multimodal over time, consuming large amounts of image, video, and text data, the need for data storage grows rapidly. The need to fit all that data storage into a performant, yet power-efficient footprint doesn’t go away though, which makes the problem more interesting.</span></p>
<p><span style="font-weight: 400;">Our storage deployment addresses the data and checkpointing needs of the AI clusters via a home-grown Linux Filesystem in Userspace (FUSE) API backed by a version of Meta’s </span><span style="font-weight: 400;">‘Tectonic’ distributed storage solution</span><span style="font-weight: 400;"> optimized for Flash media. This solution enables thousands of GPUs to save and load checkpoints in a synchronized fashion (a </span><span style="font-weight: 400;">challenge</span><span style="font-weight: 400;"> for any storage solution) while also providing a flexible and high-throughput exabyte scale storage required for data loading.</span></p>
<p><span style="font-weight: 400;">We have also partnered with </span><span style="font-weight: 400;">Hammerspace</span><span style="font-weight: 400;"> to co-develop and land a parallel network file system (NFS) deployment to meet the developer experience requirements for this AI cluster. Among other benefits, Hammerspace enables engineers to perform interactive debugging for jobs using thousands of GPUs as code changes are immediately accessible to all nodes within the environment. When paired together, the combination of our Tectonic distributed storage solution and Hammerspace enable fast iteration velocity without compromising on scale.     </span></p>
<p><span style="font-weight: 400;">The storage deployments in our GenAI clusters, both Tectonic- and Hammerspace-backed, are based on the </span><span style="font-weight: 400;">YV3 Sierra Point server platform</span><span style="font-weight: 400;">, upgraded with the latest high capacity E1.S SSD we can procure in the market today. Aside from the higher SSD capacity, the servers per rack was customized to achieve the right balance of throughput capacity per server, rack count reduction, and associated power efficiency. Utilizing the OCP servers as Lego-like building blocks, our storage layer is able to flexibly scale to future requirements in this cluster as well as in future, bigger AI clusters, while being fault-tolerant to day-to-day Infrastructure maintenance operations.</span></p>
<h3><span style="font-weight: 400;">Performance</span></h3>
<p><span style="font-weight: 400;">One of the principles we have in building our large-scale AI clusters is to maximize performance and ease of use simultaneously without compromising one for the other. This is an important principle in creating the best-in-class AI models. </span></p>
<p><span style="font-weight: 400;">As we push the limits of AI systems, the best way we can test our ability to scale-up our designs is to simply build a system, optimize it, and actually test it (while simulators help, they only go so far). In this design journey, we compared the performance seen in our small clusters and with large clusters to see where our bottlenecks are. In the graph below, AllGather collective performance is shown (as normalized bandwidth on a 0-100 scale) when a large number of GPUs are communicating with each other at message sizes where roofline performance is expected. </span></p>
<p><span style="font-weight: 400;">Our out-of-box performance for large clusters was initially poor and inconsistent, compared to optimized small cluster performance. To address this we made several changes to how our internal job scheduler schedules jobs with network topology awareness – this resulted in latency benefits and minimized the amount of traffic going to upper layers of the network. We also optimized our network routing strategy in combination with NVIDIA Collective Communications Library (NCCL) changes to achieve optimal network utilization. This helped push our large clusters to achieve great and expected performance just as our small clusters.</span></p>
<p><img loading="lazy" decoding="async" class="size-large wp-image-21048" src="https://engineering.fb.com/wp-content/uploads/2024/03/Meta-24K-GenAi-clusters-performance.png?w=1024" alt="" width="1024" height="768" srcset="https://engineering.fb.com/wp-content/uploads/2024/03/Meta-24K-GenAi-clusters-performance.png 1999w, https://engineering.fb.com/wp-content/uploads/2024/03/Meta-24K-GenAi-clusters-performance.png?resize=916,687 916w, https://engineering.fb.com/wp-content/uploads/2024/03/Meta-24K-GenAi-clusters-performance.png?resize=768,576 768w, https://engineering.fb.com/wp-content/uploads/2024/03/Meta-24K-GenAi-clusters-performance.png?resize=1024,768 1024w, https://engineering.fb.com/wp-content/uploads/2024/03/Meta-24K-GenAi-clusters-performance.png?resize=1536,1152 1536w, https://engineering.fb.com/wp-content/uploads/2024/03/Meta-24K-GenAi-clusters-performance.png?resize=96,72 96w, https://engineering.fb.com/wp-content/uploads/2024/03/Meta-24K-GenAi-clusters-performance.png?resize=192,144 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>In the figure we see that small cluster performance (overall communication bandwidth and utilization) reaches 90%+ out of the box, but an unoptimized large cluster performance has very poor utilization, ranging from 10% to 90%. After we optimize the full system (software, network, etc.), we see large cluster performance return to the ideal 90%+ range.</p>
<p><span style="font-weight: 400;">In addition to software changes targeting our internal infrastructure, we worked closely with teams authoring training frameworks and models to adapt to our evolving infrastructure. For example, NVIDIA H100 GPUs open the possibility of leveraging new data types such as 8-bit floating point (FP8) for training. Fully utilizing larger clusters required investments in additional parallelization techniques and new storage solutions provided opportunities to highly optimize checkpointing across thousands of ranks to run in hundreds of milliseconds.</span></p>
<p><span style="font-weight: 400;">We also recognize debuggability as one of the major challenges in large-scale training. Identifying a problematic GPU that is stalling an entire training job becomes very difficult at a large scale. We’re building tools such as desync debug, or a distributed collective flight recorder, to expose the details of distributed training, and help identify issues in a much faster and easier way</span></p>
<p><span style="font-weight: 400;">Finally, we’re continuing to evolve PyTorch, the foundational AI framework powering our AI workloads, to make it ready for tens, or even hundreds, of thousands of GPU training. We have identified multiple bottlenecks for process group initialization, and reduced the startup time from sometimes hours down to minutes. </span></p>
<h2><span style="font-weight: 400;">Commitment to open AI innovation</span></h2>
<p><span style="font-weight: 400;">Meta maintains its commitment to open innovation in AI software and hardware. We believe open-source hardware and software will always be a valuable tool to help the industry solve problems at large scale.</span></p>
<p><span style="font-weight: 400;">Today, we continue to support</span> <span style="font-weight: 400;">open hardware innovation</span><span style="font-weight: 400;"> as a founding member of OCP, where we make designs like Grand Teton and Open Rack available to the OCP community. We also continue to be the largest and primary contributor to </span><span style="font-weight: 400;">PyTorch</span><span style="font-weight: 400;">, the AI software framework that is powering a large chunk of the industry.</span></p>
<p><span style="font-weight: 400;">We also continue to be committed to open innovation in the AI research community. We’ve launched the</span> <span style="font-weight: 400;">Open Innovation AI Research Community</span><span style="font-weight: 400;">, a partnership program for academic researchers to deepen our understanding of how to responsibly develop and share AI technologies – with a particular focus on LLMs.</span></p>
<p><span style="font-weight: 400;">An open approach to AI is not new for Meta. We’ve also launched the </span><span style="font-weight: 400;">AI Alliance</span><span style="font-weight: 400;">, a group of leading organizations across the AI industry focused on accelerating responsible innovation in AI within an open community. Our AI efforts are built on a philosophy of open science and cross-collaboration. An open ecosystem brings transparency, scrutiny, and trust to AI development and leads to innovations that everyone can benefit from that are built with safety and responsibility top of mind. </span></p>
<h2><span style="font-weight: 400;">The future of Meta’s AI infrastructure</span></h2>
<p><span style="font-weight: 400;">These two AI training cluster designs are a part of our larger roadmap for the future of AI. By the end of 2024, we’re aiming to continue to grow our infrastructure build-out that will include 350,000 NVIDIA H100s as part of a portfolio that will feature compute power equivalent to nearly 600,000 H100s.</span></p>
<p><span style="font-weight: 400;">As we look to the future, we recognize that what worked yesterday or today may not be sufficient for tomorrow’s needs. That’s why we are constantly evaluating and improving every aspect of our infrastructure, from the physical and virtual layers to the software layer and beyond. Our goal is to create systems that are flexible and reliable to support the fast-evolving new models and research.  </span></p>The post <a href="https://dailyzsocialmedianews.com/constructing-metas-genai-infrastructure-engineering-at-meta/">Constructing Meta’s GenAI Infrastructure – Engineering at Meta</a> first appeared on <a href="https://dailyzsocialmedianews.com">DAILY ZSOCIAL MEDIA NEWS</a>.]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Making messaging interoperability with third events protected for customers in Europe</title>
		<link>https://dailyzsocialmedianews.com/making-messaging-interoperability-with-third-events-protected-for-customers-in-europe/</link>
		
		<dc:creator><![CDATA[]]></dc:creator>
		<pubDate>Wed, 06 Mar 2024 09:54:22 +0000</pubDate>
				<category><![CDATA[Facebook]]></category>
		<category><![CDATA[Europe]]></category>
		<category><![CDATA[Interoperability]]></category>
		<category><![CDATA[Making]]></category>
		<category><![CDATA[messaging]]></category>
		<category><![CDATA[parties]]></category>
		<category><![CDATA[Safe]]></category>
		<category><![CDATA[users]]></category>
		<guid isPermaLink="false">https://dailyzsocialmedianews.com/?p=24818</guid>

					<description><![CDATA[<div style="margin-bottom:20px;"><img width="1024" height="432" src="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/06095421/Making-messaging-interoperability-with-third-parties-safe-for-users-in.png" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="Making messaging interoperability with third parties safe for users in Europe" decoding="async" loading="lazy" srcset="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/06095421/Making-messaging-interoperability-with-third-parties-safe-for-users-in.png 1024w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/06095421/Making-messaging-interoperability-with-third-parties-safe-for-users-in-300x127.png 300w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/06095421/Making-messaging-interoperability-with-third-parties-safe-for-users-in-768x324.png 768w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></div><p>To comply with a new EU law, the Digital Markets Act (DMA), which comes into force on March 7th, we’ve made major changes to WhatsApp and Messenger to enable interoperability with third-party messaging services.  We’re sharing how we enabled third-party interoperability (interop) while maintaining end-to-end encryption (E2EE) and other privacy guarantees in our services as [&#8230;]</p>
The post <a href="https://dailyzsocialmedianews.com/making-messaging-interoperability-with-third-events-protected-for-customers-in-europe/">Making messaging interoperability with third events protected for customers in Europe</a> first appeared on <a href="https://dailyzsocialmedianews.com">DAILY ZSOCIAL MEDIA NEWS</a>.]]></description>
										<content:encoded><![CDATA[<div style="margin-bottom:20px;"><img width="1024" height="432" src="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/06095421/Making-messaging-interoperability-with-third-parties-safe-for-users-in.png" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="Making messaging interoperability with third parties safe for users in Europe" decoding="async" loading="lazy" srcset="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/06095421/Making-messaging-interoperability-with-third-parties-safe-for-users-in.png 1024w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/06095421/Making-messaging-interoperability-with-third-parties-safe-for-users-in-300x127.png 300w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/06095421/Making-messaging-interoperability-with-third-parties-safe-for-users-in-768x324.png 768w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></div><p></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">To comply with a new EU law, the Digital Markets Act (DMA), which comes into force on March 7th, we’ve made major changes to WhatsApp and Messenger to enable interoperability with third-party messaging services. </span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">We’re sharing how we enabled third-party interoperability (interop) while maintaining end-to-end encryption (E2EE) and other privacy guarantees in our services as far as possible.</span></li>
</ul>
<p><span style="font-weight: 400;">On March 7th, a new EU law, the Digital Markets Act (DMA), comes into force. One of its requirements is that designated messaging services must let third-party messaging services become interoperable, provided the third-party meets a series of eligibility, including technical and security requirements. </span></p>
<p><span style="font-weight: 400;">This allows users of third-party providers who choose to enable interoperability (interop) to send and receive messages with opted-in users of either Messenger or WhatsApp – both designated by the European Commission (EC) as being required to independently provide interoperability to third-party messaging services.  </span></p>
<p><span style="font-weight: 400;">For nearly two years our team has been working with the EC to implement interop in a way that meets the requirements of the law and maximizes the security, privacy and safety of users. Interoperability is a technical challenge – even when focused on the basic functionalities as required by the DMA. In year one, the requirement is for 1:1 text messaging between individual users and the sharing of images, voice messages, videos, and other attached files between individual end users. In the future, requirements expand to group functionality and calling. </span></p>
<p><span style="font-weight: 400;">To interoperate, third-party providers will sign an agreement with Messenger and/or WhatsApp and we’ll work together to enable interoperability. Today we’ll publish the WhatsApp Reference Offer for third-party providers which will outline what will be required to interoperate with the service. The Reference Offer for Messenger will follow in due course. </span></p>
<p><span style="font-weight: 400;">While Meta must be ready to enable interoperability with other services within three months of receiving a request, it may take longer before the functionality is ready for public use. We wanted to take this opportunity to set out the technical infrastructure and thinking that sits behind our interop solution.</span></p>
<h2>A privacy-centric approach to building interoperable messaging services</h2>
<p><span style="font-weight: 400;">Our approach to compliance with the DMA is centered around preserving privacy and security for users as far as is possible. The DMA quite rightly makes it a legal requirement that we should not weaken security provided to Meta’s own users. </span></p>
<p><span style="font-weight: 400;">The approach we have taken in terms of implementing interoperability is the best way of meeting DMA requirements, whilst also creating a viable approach for the third-party providers interested in becoming interoperable with Meta and maximizing user security and privacy.</span></p>
<h2>Implementing an end-to-end encrypted protocol</h2>
<p><span style="font-weight: 400;">First, we need to protect the underlying security that keeps communication on Meta E2EE messaging apps secure: the encryption protocol. WhatsApp and Messenger both use the tried and tested Signal protocol as a foundational piece for their encryption. </span></p>
<p><span style="font-weight: 400;">Messenger is still rolling out E2EE by default for personal communication, but on WhatsApp, this default has been the case since 2016. In both cases, we are using the Signal protocol as the foundation for these E2EE communications, as it represents the current gold standard for E2EE chats.</span></p>
<p><span style="font-weight: 400;">In order to maximize user security, we would prefer third-party providers to use the Signal protocol. Since this has to work for everyone however, we will allow third-party providers to use a compatible protocol if they are able to demonstrate it offers the same security guarantees as Signal. </span></p>
<p><span style="font-weight: 400;">To send messages, the third-party providers have to construct message protobuf structures which are then encrypted using the Signal Protocol and then packaged into message stanzas in eXtensible Markup Language (XML). </span></p>
<p><span style="font-weight: 400;">Meta servers push messages to connected clients over a persistent connection. Third-party servers are responsible for hosting any media files their client applications send to Meta clients (such as image or video files). After receiving a media message, Meta clients will subsequently download the encrypted media from the third-party messaging servers using a Meta proxy service.</span></p>
<p><span style="font-weight: 400;">It’s important to note that the</span> <span style="font-weight: 400;">E2EE promise Meta provides to users of our messaging services requires us to control both the sending and receiving clients. This allows us to ensure that only the sender and the intended recipient(s) can see what has been sent, and that no one can listen to your conversation without both parties knowing. </span></p>
<p><span style="font-weight: 400;">While we have built a secure solution for interop that uses the Signal protocol encryption to protect messages in transit, without ownership of both clients (endpoints) we cannot guarantee what a third-party provider does with sent or received messages, and we therefore cannot make the same promise.</span></p>
<p>Our technical solution builds on Meta’s existing client / server architecture </p>
<p><span style="font-weight: 400;">We think the best way to deliver interoperability is through a solution which builds on Meta’s existing client / server architecture [Figure 1]. In particular, the requirement that clients connect to Meta infrastructure has the following benefits, it:</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Enables Meta to maximize the level of security and safety for all users by carrying out many of the same  integrity checks as it does for existing Meta users</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Constitutes a “plug-and-play” model for third-party providers, lowering the barriers for potential new entrants and costs for third-party providers</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Helps maximize protection of user privacy by limiting the exposure of their personal data to Meta servers only</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Improves overall reliability of the interoperable service as it benefits from Meta’s infrastructure, which is already globally scaled to handle over 100 billion messages each day</span></li>
</ul>
<p>Figure 1: A simplified illustration of WhatsApp’s technical architecture.</p>
<p><span style="font-weight: 400;">Taking the example of WhatsApp, third-party clients will connect to WhatsApp servers using our protocol (based on the Extensible Messaging and Presence Protocol – XMPP). The WhatsApp server will interface with a third-party server over HTTP in order to facilitate a variety of things including authenticating third-party users and push notifications.</span></p>
<p><span style="font-weight: 400;">WhatsApp exposes an Enlistment API that third-party clients must execute when opting in to the WhatsApp network. When a third-party user registers on WhatsApp or Messenger, they keep their existing user-visible identifier, and are also assigned a unique, WhatsApp-internal identifier that is used at the infrastructure level (for protocols, data storage, etc.) </span></p>
<p><span style="font-weight: 400;">WhatsApp requires third-party clients to provide “proof” of their ownership of the third-party user-visible identifier when connecting or enlisting. The proof is constructed by the third-party service cryptographically signing an authentication token. WhatsApp uses the standard OpenID protocol (with some minor modifications) alongside a JSON Web Token (JWT Token) to verify the user-visible identifier through public keys periodically fetched from the third-party server.</span></p>
<p><span style="font-weight: 400;">WhatsApp uses the Noise Protocol Framework to encrypt all data traveling between the client and the WhatsApp server. As part of the Noise Protocol, the third-party client must perform a “Noise Handshake” every time the client connects to the WhatsApp server. Part of this Handshake is providing a payload to the server which also contains the JWT Token.</span></p>
<p><span style="font-weight: 400;">Once the client has successfully connected to the WhatsApp server, the client must use WhatsApp’s chat protocol to communicate with the WhatsApp server. WhatsApp’s chat protocol uses optimized XML stanzas to communicate with our servers. </span></p>
<p><span style="font-weight: 400;">As we continue to discuss this architecture with third-party providers, we think there is also an approach to implementing interop where we could give third-party providers the option to add a proxy or an “intermediary” between their client and the WhatsApp server. A proxy could potentially give third-party providers more flexibility and control over what their client can receive from the WhatsApp server and also removes the requirement that third-party clients must implement WhatsApp’s client-to-server protocol, i.e. maintain their existing “chat channel” on their clients. </span></p>
<p><span style="font-weight: 400;">The challenge here is that WhatsApp would no longer have direct connection to both clients and, as a result, would lose connection level signals that are important for keeping users safe from spam and scams such as TCP fingerprints. We would therefore anticipate implementing additional requirements for third-party providers who take up this option under our Reference Offer. This approach also exposes all the chat metadata to the proxy server, which increases the likelihood that this data could be accidentally or intentionally leaked.</span></p>
<h2>Clearly explaining how interop works to users</h2>
<p><span style="font-weight: 400;">We believe it is essential that we give users transparent information about how interop works and how it differs from their chats with other WhatsApp or Messenger users. This will be the first time that users have been part of an interoperable network on our services, so giving them clear and straightforward information about what to expect will be paramount. For example, users need to know that our security and privacy promise, as well as the feature set, won’t exactly match what we offer in WhatsApp chats. </span></p>
<h2>Privacy and security is a shared responsibility</h2>
<p><span style="font-weight: 400;">As is hopefully clear from this post, preserving privacy and security in an interoperable system is a shared responsibility, and not something that Meta is able to do on its own.  We will therefore need to continue collaborating with third-party providers in order to provide the safest and best experience for our users. </span></p>The post <a href="https://dailyzsocialmedianews.com/making-messaging-interoperability-with-third-events-protected-for-customers-in-europe/">Making messaging interoperability with third events protected for customers in Europe</a> first appeared on <a href="https://dailyzsocialmedianews.com">DAILY ZSOCIAL MEDIA NEWS</a>.]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>How DotSlash makes executable deployment easier</title>
		<link>https://dailyzsocialmedianews.com/how-dotslash-makes-executable-deployment-easier/</link>
		
		<dc:creator><![CDATA[]]></dc:creator>
		<pubDate>Mon, 26 Feb 2024 23:14:07 +0000</pubDate>
				<category><![CDATA[Facebook]]></category>
		<category><![CDATA[deployment]]></category>
		<category><![CDATA[DotSlash]]></category>
		<category><![CDATA[executable]]></category>
		<category><![CDATA[simpler]]></category>
		<guid isPermaLink="false">https://dailyzsocialmedianews.com/?p=24763</guid>

					<description><![CDATA[<p>Andres Suarez and Michael Bolin, two software engineers at Meta, join Pascal Hartig (@passy) on the Meta Tech Podcast to discuss the ins and outs of DotSlash, a new open source tool from Meta. DotSlash takes the pain out of distributing binaries and toolchains to developers. Instead of committing large, platform-specific executables to a repository, [&#8230;]</p>
The post <a href="https://dailyzsocialmedianews.com/how-dotslash-makes-executable-deployment-easier/">How DotSlash makes executable deployment easier</a> first appeared on <a href="https://dailyzsocialmedianews.com">DAILY ZSOCIAL MEDIA NEWS</a>.]]></description>
										<content:encoded><![CDATA[<p></p>
<p>Andres Suarez and Michael Bolin, two software engineers at Meta, join Pascal Hartig (@passy) on the Meta Tech Podcast to discuss the ins and outs of DotSlash, a new open source tool from Meta.</p>
<p>DotSlash takes the pain out of distributing binaries and toolchains to developers. Instead of committing large, platform-specific executables to a repository, DotSlash combines a fast Rust program with a JSON manifest prefixed with a <span style="font-family: 'courier new', courier;">#!</span> to transparently fetch and execute the binary.</p>
<p>Learn how DotSlash was built, how it’s used at Meta, and how Michael and Andres’ career trajectories lead them to create this open source project at Meta.</p>
<p>To learn more about DotSlash:</p>
<p>Download or listen to the episode below:<br /><iframe loading="lazy" style="border: none;" title="Libsyn Player" src="https://html5-player.libsyn.com/embed/episode/id/29909003/height/90/theme/custom/thumbnail/yes/direction/forward/render-playlist/no/custom-color/000000/" width="100%" height="90" scrolling="no" allowfullscreen="allowfullscreen"></iframe><br />Or find the episode wherever you get your podcasts, including:</p>
<p>Spotify<br />Apple Podcasts<br />PocketCasts<br />Castro<br />Overcast</p>
<p>The Meta Tech Podcast is a podcast, brought to you by Meta, where we highlight the work Meta’s engineers are doing at every level – from low-level frameworks to end-user features.</p>
<p>Send us feedback on Instagram, Threads, or X.</p>
<p>And if you’re interested in learning more about career opportunities at Meta visit the Meta Careers page.</p>The post <a href="https://dailyzsocialmedianews.com/how-dotslash-makes-executable-deployment-easier/">How DotSlash makes executable deployment easier</a> first appeared on <a href="https://dailyzsocialmedianews.com">DAILY ZSOCIAL MEDIA NEWS</a>.]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Aligning Velox and Apache Arrow: In the direction of composable knowledge administration</title>
		<link>https://dailyzsocialmedianews.com/aligning-velox-and-apache-arrow-in-the-direction-of-composable-knowledge-administration/</link>
		
		<dc:creator><![CDATA[]]></dc:creator>
		<pubDate>Tue, 20 Feb 2024 17:13:37 +0000</pubDate>
				<category><![CDATA[Facebook]]></category>
		<category><![CDATA[Aligning]]></category>
		<category><![CDATA[Apache]]></category>
		<category><![CDATA[Arrow]]></category>
		<category><![CDATA[composable]]></category>
		<category><![CDATA[Data]]></category>
		<category><![CDATA[Management]]></category>
		<category><![CDATA[Velox]]></category>
		<guid isPermaLink="false">https://dailyzsocialmedianews.com/?p=24723</guid>

					<description><![CDATA[<div style="margin-bottom:20px;"><img width="913" height="427" src="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/02/20171335/Aligning-Velox-and-Apache-Arrow-Towards-composable-data-management.png" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="Aligning Velox and Apache Arrow: Towards composable data management" decoding="async" loading="lazy" srcset="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/02/20171335/Aligning-Velox-and-Apache-Arrow-Towards-composable-data-management.png 913w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/02/20171335/Aligning-Velox-and-Apache-Arrow-Towards-composable-data-management-300x140.png 300w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/02/20171335/Aligning-Velox-and-Apache-Arrow-Towards-composable-data-management-768x359.png 768w" sizes="auto, (max-width: 913px) 100vw, 913px" /></div><p>We’ve partnered with Voltron Data and the Arrow community to align and converge Apache Arrow with Velox, Meta’s open source execution engine. Apache Arrow 15 includes three new format layouts developed through this partnership: StringView, ListView, and Run-End-Encoding (REE). This new convergence helps Meta and the larger community build data management systems that are unified, [&#8230;]</p>
The post <a href="https://dailyzsocialmedianews.com/aligning-velox-and-apache-arrow-in-the-direction-of-composable-knowledge-administration/">Aligning Velox and Apache Arrow: In the direction of composable knowledge administration</a> first appeared on <a href="https://dailyzsocialmedianews.com">DAILY ZSOCIAL MEDIA NEWS</a>.]]></description>
										<content:encoded><![CDATA[<div style="margin-bottom:20px;"><img width="913" height="427" src="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/02/20171335/Aligning-Velox-and-Apache-Arrow-Towards-composable-data-management.png" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="Aligning Velox and Apache Arrow: Towards composable data management" decoding="async" loading="lazy" srcset="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/02/20171335/Aligning-Velox-and-Apache-Arrow-Towards-composable-data-management.png 913w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/02/20171335/Aligning-Velox-and-Apache-Arrow-Towards-composable-data-management-300x140.png 300w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/02/20171335/Aligning-Velox-and-Apache-Arrow-Towards-composable-data-management-768x359.png 768w" sizes="auto, (max-width: 913px) 100vw, 913px" /></div><p></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">We’ve partnered with</span> <span style="font-weight: 400;">Voltron Data</span><span style="font-weight: 400;"> and the Arrow community to align and converge Apache Arrow with </span><span style="font-weight: 400;">Velox</span><span style="font-weight: 400;">, Meta’s open source execution engine.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Apache Arrow 15 includes three new format layouts developed through this partnership: StringView, ListView, and Run-End-Encoding (REE).</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">This new convergence helps Meta and the larger community build data management systems that are unified, more efficient, and composable.</span></li>
</ul>
<p><span style="font-weight: 400;">Meta’s Data Infrastructure teams have been</span> <span style="font-weight: 400;">rethinking how data management systems are designed</span><span style="font-weight: 400;">. We want to</span> <span style="font-weight: 400;">make our data management systems more composable</span><span style="font-weight: 400;"> – meaning that instead of individually developing systems as monoliths we identify common components, factor them out as reusable libraries, and leverage common APIs and standards to increase the interoperability between them. </span></p>
<p><span style="font-weight: 400;">As we decompose our large, monolithic systems into a more modular stack of reusable components, open standards, such as</span> <span style="font-weight: 400;">Apache Arrow</span><span style="font-weight: 400;">, </span><span style="font-weight: 400;">play an important role for interoperability of these components. To further our efforts in creating a more unified data landscape for our systems as well as those in the larger community, we’ve partnered with</span> <span style="font-weight: 400;">Voltron Data</span><span style="font-weight: 400;"> and the Arrow community to converge Apache Arrow’s open source columnar layouts with Velox, Meta’s open source execution engine.</span></p>
<p><span style="font-weight: 400;">The result combines the efficiency and agility offered by Velox with the widely-used Apache standard.  </span></p>
<h2><span style="font-weight: 400;">Why we need a composable data management system</span></h2>
<p><span style="font-weight: 400;">Meta’s data engines support large-scale workloads that include processing large datasets offline (ETL), interactive dashboard generation, ad hoc data exploration, and stream processing. More recently, a variety of feature engineering, data preprocessing, and training systems were built to support our rapidly expanding AI/ML infrastructure. To ensure our engineering teams can efficiently maintain and enhance these engines as our products evolve, Meta has started a series of projects aimed at increasing our engineering efficiency by minimizing the duplication of work, improving the experience of internal data users through more consistent semantics across these engines, and, ultimately, accelerating the pace of innovation in data management. </span></p>
<h2><span style="font-weight: 400;">An introduction to Velox</span></h2>
<p><span style="font-weight: 400;">Velox</span><span style="font-weight: 400;"> is the first project in our composable data management system program. It’s a unified execution engine, implemented as a C++ library, aimed at replacing the very processing core of many of these data management systems – their execution engine.</span></p>
<p><span style="font-weight: 400;">Velox improves the efficiency of these systems by providing a unified, state-of-the-art implementation of features and optimizations that were previously only available in individual engines. It also improves the engineering efficiency of our organization since these features can now be written once, in a single library, and be (re-)used everywhere.</span></p>
<p><span style="font-weight: 400;">Velox is currently in different stages of integration in more than 10 of Meta’s data systems. We have observed</span> <span style="font-weight: 400;">3-10x efficiency improvements</span><span style="font-weight: 400;"> in integrations with well-known systems in the industry like Apache Spark and Presto. </span></p>
<p><span style="font-weight: 400;">We </span><span style="font-weight: 400;">open-sourced Velox in 2022</span><span style="font-weight: 400;">. Today, it is developed in collaboration with more than 200 individual contributors around the world from more than 20 companies. </span></p>
<h2><span style="font-weight: 400;">Open standards and Apache Arrow</span></h2>
<p><span style="font-weight: 400;">In order to enable interoperability with other components, a composable data management system has to understand common storage (file) formats, network serialization protocols, table APIs, and have a unified way of expressing computation. Oftentimes these components have to directly share in-memory datasets with each other, for example, when transferring data across language boundaries (C++ to Java or Python) for efficient UDF support.</span></p>
<p><span style="font-weight: 400;">Our focus is to use open standards in these APIs as often as possible.</span> <span style="font-weight: 400;">Apache Arrow</span><span style="font-weight: 400;"> is an open source in-memory layout standard for columnar data that has been widely adopted in the industry. In a way, Arrow can be seen as the layer underneath Velox: Arrow describes how columnar data is represented in memory; Velox provides a series of execution and resource management primitives to process this data.</span></p>
<p><span style="font-weight: 400;">Although the Arrow format predates Velox, we made a conscious design decision while creating Velox to extend and deviate from the Arrow format, creating a layout we call</span> <span style="font-weight: 400;">Velox Vectors</span><span style="font-weight: 400;">. The purpose was to accelerate the data processing operations commonly found in our workloads in ways that were not possible using Arrow. Velox Vectors provided the efficiency and agility we need to move fast, but in return created a fragmented space with limited component interoperability. </span></p>
<p><span style="font-weight: 400;">To bridge this gap and create a more unified data landscape for our systems and the community, we partnered with</span> <span style="font-weight: 400;">Voltron Data</span><span style="font-weight: 400;"> and the Arrow community to align and converge these two formats. After a year of work, the new Apache Arrow release,</span> <span style="font-weight: 400;">Apache Arrow 15.0.0</span><span style="font-weight: 400;">, includes three new format layouts inspired by Velox Vectors: StringView, ListView, and Run-End-Encoding (REE).</span></p>
<p><span style="font-weight: 400;">Arrow 15 not only enables efficient (zero-copy) in-memory communication across components using Velox and Arrow, but also increases Arrow’s applicability in modern execution engines, unlocking a variety of use cases across the industry. </span></p>
<h2><span style="font-weight: 400;">Details of the Arrow and Velox layout</span></h2>
<p><span style="font-weight: 400;">Both Arrow and Velox Vectors are columnar layouts whose purpose is to represent batches of data in memory. A column is usually composed of a sequential buffer where row values are stored contiguously and an optional bitmask to represent the nullability/validity of each value: </span></p>
<p>(a) Logical and (b) physical representation of an example dataset.</p>
<p><span style="font-weight: 400;">The Arrow and Velox Vectors formats already had compatible layout representations for scalar fixed-size data types (such as integers, floats, and booleans) and dictionary-encoded data. However, there were incompatibilities in string representation and container types such as arrays and maps, and a lack of support for constant and run-length-encoded (RLE) data.</span></p>
<h3><span style="font-weight: 400;">StringView – strings</span></h3>
<p><span style="font-weight: 400;">Arrow’s typical string representation uses the</span> <span style="font-weight: 400;">variable-sized element layout</span><span style="font-weight: 400;">, which consists of one contiguous buffer containing the string contents (the data), and one buffer marking where each string starts (the offsets). The size of a string </span><span style="font-weight: 400;">i</span><span style="font-weight: 400;"> can be obtained by subtracting </span><span style="font-weight: 400;">offsets[i+1]</span><span style="font-weight: 400;"> by </span><span style="font-weight: 400;">offsets[i].</span><span style="font-weight: 400;"> This is equivalent to representing strings as an array of characters:</span><span style="font-weight: 400;"> </span></p>
<p><img loading="lazy" decoding="async" class="size-large wp-image-21003" src="https://engineering.fb.com/wp-content/uploads/2024/02/Velox-Arrow-Convergence-2.png?w=1018" alt="" width="1018" height="436" srcset="https://engineering.fb.com/wp-content/uploads/2024/02/Velox-Arrow-Convergence-2.png 1018w, https://engineering.fb.com/wp-content/uploads/2024/02/Velox-Arrow-Convergence-2.png?resize=916,392 916w, https://engineering.fb.com/wp-content/uploads/2024/02/Velox-Arrow-Convergence-2.png?resize=768,329 768w, https://engineering.fb.com/wp-content/uploads/2024/02/Velox-Arrow-Convergence-2.png?resize=96,41 96w, https://engineering.fb.com/wp-content/uploads/2024/02/Velox-Arrow-Convergence-2.png?resize=192,82 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>Arrow original string representation.</p>
<p><span style="font-weight: 400;">While Arrow’s representation stands out in simplicity, we found through a series of experiments that the following alternate string representation (which is now referred to as </span><span style="font-weight: 400;">StringView</span><span style="font-weight: 400;">) provides compelling properties that are important for efficient string processing:</span><span style="font-weight: 400;"> </span></p>
<p><img loading="lazy" decoding="async" class="size-large wp-image-21004" src="https://engineering.fb.com/wp-content/uploads/2024/02/Velox-Arrow-Convergence-3.png?w=1024" alt="" width="1024" height="326" srcset="https://engineering.fb.com/wp-content/uploads/2024/02/Velox-Arrow-Convergence-3.png 1048w, https://engineering.fb.com/wp-content/uploads/2024/02/Velox-Arrow-Convergence-3.png?resize=916,292 916w, https://engineering.fb.com/wp-content/uploads/2024/02/Velox-Arrow-Convergence-3.png?resize=768,245 768w, https://engineering.fb.com/wp-content/uploads/2024/02/Velox-Arrow-Convergence-3.png?resize=1024,326 1024w, https://engineering.fb.com/wp-content/uploads/2024/02/Velox-Arrow-Convergence-3.png?resize=96,31 96w, https://engineering.fb.com/wp-content/uploads/2024/02/Velox-Arrow-Convergence-3.png?resize=192,61 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>New StringView representation in Arrow 15.</p>
<p><span style="font-weight: 400;">In the</span> <span style="font-weight: 400;">new representation</span><span style="font-weight: 400;">, the first four bytes of the </span><span style="font-weight: 400;">view</span><span style="font-weight: 400;"> object always contain the string size. If the string is short (up to 12 characters), the contents are stored inline in the view structure. Otherwise, a prefix of the string is stored in the next four bytes, followed by the buffer ID (StringViews can contain multiple data buffers) and the offset in that data buffer.</span></p>
<p><span style="font-weight: 400;">The benefits of this layout are:</span></p>
<ol>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Small strings of up to 12 bytes are fully inlined within the views buffer and can be read without dereferencing the data buffer. This increases memory locality as the typical cache miss of accessing the data buffer is avoided, increasing performance.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Since StringViews store a small (four bytes) prefix with the view object, string comparisons can fail-fast and, in many cases, avoid accessing the data buffer. This property speeds up common operations such as highly selective filters and sorting.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">StringView gives developers more flexibility on how string data is laid out in memory. For example, it allows for certain common string operations, such as </span><span style="font-weight: 400;">𝑡𝑟𝑖𝑚</span><span style="font-weight: 400;">() and </span><span style="font-weight: 400;">𝑠𝑢𝑏𝑠𝑡𝑟</span><span style="font-weight: 400;">(), to be executed zero-copy by only updating the view object.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Since StringView’s view object has a fixed size (16 bytes), StringViews can be written out of order (e.g., first writing StringView at position 2, then 0 and 1). </span></li>
</ol>
<p><span style="font-weight: 400;">Besides these properties, we have found that other modern processing engines and libraries like </span><span style="font-weight: 400;">Umbra</span><span style="font-weight: 400;"> and DuckDB follow a similar string representation approach, and, consequently, also used to deviate from Arrow. In Arrow 15, StringView has been added as a supported layout and can now be used to efficiently transfer string batches across these systems.</span></p>
<h3><span style="font-weight: 400;">ListView – variable-sized containers</span></h3>
<p><span style="font-weight: 400;">Variable-size containers like arrays and maps are</span> <span style="font-weight: 400;">represented in Arrow</span><span style="font-weight: 400;"> using one buffer containing the flattened elements from all rows, and one </span><span style="font-weight: 400;">offsets</span><span style="font-weight: 400;"> buffer marking where the container on each row starts, similar to the original string representation. The number of elements a container on row </span><span style="font-weight: 400;">i</span><span style="font-weight: 400;"> stores can be obtained by subtracting </span><span style="font-weight: 400;">offsets[i+1]</span><span style="font-weight: 400;"> by </span><span style="font-weight: 400;">offsets[i]</span><span style="font-weight: 400;">:</span><span style="font-weight: 400;"> </span></p>
<p><img loading="lazy" decoding="async" class="size-large wp-image-21005" src="https://engineering.fb.com/wp-content/uploads/2024/02/Velox-Arrow-Convergence-4.png?w=883" alt="" width="883" height="277" srcset="https://engineering.fb.com/wp-content/uploads/2024/02/Velox-Arrow-Convergence-4.png 883w, https://engineering.fb.com/wp-content/uploads/2024/02/Velox-Arrow-Convergence-4.png?resize=768,241 768w, https://engineering.fb.com/wp-content/uploads/2024/02/Velox-Arrow-Convergence-4.png?resize=96,30 96w, https://engineering.fb.com/wp-content/uploads/2024/02/Velox-Arrow-Convergence-4.png?resize=192,60 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>Arrow original list representation.</p>
<p><span style="font-weight: 400;">To efficiently support execution of </span><span style="font-weight: 400;">vectorized conditionals</span><span style="font-weight: 400;"> (e.g., IF and SWITCH operations), the Velox Vectors layout has to allow developers to write columns out of order. This means that developers can, for example, first write all even row records then all odd row records without having to reorganize elements that have already been written.</span></p>
<p><span style="font-weight: 400;">Primitive types can always be written out of order since the element size is constant and known beforehand. Likewise, strings can also be written out of order using StringView because the string metadata objects have a constant size (16 bytes), and string contents do not need to be written contiguously. To increase flexibility and support out-of-order writes for the remaining variable-sized types in Velox, we decided to keep both </span><span style="font-weight: 400;">lengths</span><span style="font-weight: 400;"> and </span><span style="font-weight: 400;">offsets</span><span style="font-weight: 400;"> buffers:</span></p>
<p><img loading="lazy" decoding="async" class="size-large wp-image-21006" src="https://engineering.fb.com/wp-content/uploads/2024/02/Velox-Arrow-Convergence-5.png?w=954" alt="" width="954" height="319" srcset="https://engineering.fb.com/wp-content/uploads/2024/02/Velox-Arrow-Convergence-5.png 954w, https://engineering.fb.com/wp-content/uploads/2024/02/Velox-Arrow-Convergence-5.png?resize=916,306 916w, https://engineering.fb.com/wp-content/uploads/2024/02/Velox-Arrow-Convergence-5.png?resize=768,257 768w, https://engineering.fb.com/wp-content/uploads/2024/02/Velox-Arrow-Convergence-5.png?resize=96,32 96w, https://engineering.fb.com/wp-content/uploads/2024/02/Velox-Arrow-Convergence-5.png?resize=192,64 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>New ListView representation in Arrow 15.</p>
<p><span style="font-weight: 400;">To bridge the gap, a new format called ListView has been added to Arrow 15. It allows the representation of variable-sized elements that have both lengths and offsets buffers.</span></p>
<p><span style="font-weight: 400;">Beyond allowing for efficient execution of conditionals, ListView gives developers more flexibility to slice and rearrange containers (e.g., operations like </span><span style="font-weight: 400;">slice()</span><span style="font-weight: 400;"> and </span><span style="font-weight: 400;">trim_array()</span><span style="font-weight: 400;"> can be implemented zero-copy), other than allowing for containers with overlapping ranges of elements.</span></p>
<h3><span style="font-weight: 400;">REE – more encodings</span></h3>
<p><span style="font-weight: 400;">We have also added two additional encoding formats commonly found in data warehouse workloads into Velox: constant encoding, to represent that all values in a column are the same, typically used to represent literals and partition keys; and RLE, to compactly represent consecutive runs of the same element.</span></p>
<p><span style="font-weight: 400;">Upon discussion with the community, it was decided to add the REE format to Arrow. The REE format is a slight variation of RLE that, instead of storing the lengths of each run, stores the offset in which each run ends, providing better random-access support. With REEs it is also possible to represent constant encoded values by encoding them as a single run whose size is the entire batch.</span></p>
<h2><span style="font-weight: 400;">Composability is the future of data management</span></h2>
<p><span style="font-weight: 400;">Converging Arrow and Velox’s memory layout is an important step towards making data management systems more composable. It enables systems to combine the power of Velox’s state-of-the-art execution with the widespread industry adoption of Arrow’s standard, resulting in a more efficient and seamless cooperation. The new extensions are already seeing adoption in libraries like </span><span style="font-weight: 400;">PyArrow</span><span style="font-weight: 400;"> and </span><span style="font-weight: 400;">Polars</span><span style="font-weight: 400;"> and within Meta. In the future, it will allow more efficient interplay between projects like </span><span style="font-weight: 400;">Apache Gluten</span><span style="font-weight: 400;"> (which uses Velox internally) and </span><span style="font-weight: 400;">PySpark</span><span style="font-weight: 400;"> (which consumes Arrow), for example.</span></p>
<p><span style="font-weight: 400;">We envision that fragmentation and duplication of work can be reduced by decomposing data systems into reusable components which are open source and built based on open standards and APIs. Ultimately, we hope this work will help provide the foundation required to accelerate the pace of innovation in data management.</span></p>
<h2><span style="font-weight: 400;">Acknowledgments</span></h2>
<p><span style="font-weight: 400;">This format alignment was only possible due to a broad collaboration across different groups. A special thank you to Masha Basmanova, Orri Erling, Xiaoxuan Meng, Krishna Pai, Jimmy Lu, Kevin Wilfong, Laith Sakka, Wei He, Bikramjeet Vig, and Sridhar Anumandla from the Velox team at Meta; Felipe Carvalho, Ben Kietzman, Jacob Wujciak-Jens, Srikanth Nadukudy, Wes McKinney, and Keith Kraus from Voltron Data; and the entire Apache Arrow community for the insightful discussions, feedback, and receptivity to new ideas.</span></p>The post <a href="https://dailyzsocialmedianews.com/aligning-velox-and-apache-arrow-in-the-direction-of-composable-knowledge-administration/">Aligning Velox and Apache Arrow: In the direction of composable knowledge administration</a> first appeared on <a href="https://dailyzsocialmedianews.com">DAILY ZSOCIAL MEDIA NEWS</a>.]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Meta loves Python &#8211; Engineering at Meta</title>
		<link>https://dailyzsocialmedianews.com/meta-loves-python-engineering-at-meta/</link>
		
		<dc:creator><![CDATA[]]></dc:creator>
		<pubDate>Mon, 12 Feb 2024 16:58:40 +0000</pubDate>
				<category><![CDATA[Facebook]]></category>
		<category><![CDATA[Engineering]]></category>
		<category><![CDATA[loves]]></category>
		<category><![CDATA[Meta]]></category>
		<category><![CDATA[Python]]></category>
		<guid isPermaLink="false">https://dailyzsocialmedianews.com/?p=24687</guid>

					<description><![CDATA[<p>By now you’re already aware that Python 3.12 has been released. But did you know that several of its new features were developed by Meta? Meta engineer Pascal Hartig (@passy) is joined on the Meta Tech Podcast by Itamar Oren and Carl Meyer, two software engineers at Meta, to discuss their teams’ contributions to the [&#8230;]</p>
The post <a href="https://dailyzsocialmedianews.com/meta-loves-python-engineering-at-meta/">Meta loves Python – Engineering at Meta</a> first appeared on <a href="https://dailyzsocialmedianews.com">DAILY ZSOCIAL MEDIA NEWS</a>.]]></description>
										<content:encoded><![CDATA[<p></p>
<p>By now you’re already aware that Python 3.12 has been released. But did you know that several of its new features were developed by Meta?</p>
<p>Meta engineer Pascal Hartig (@passy) is joined on the Meta Tech Podcast by Itamar Oren and Carl Meyer, two software engineers at Meta, to discuss their teams’ contributions to the latest Python release, including new hooks that allow for custom JITs like Cinder, Immortal Objects, improvements to the type system, faster comprehensions, and more.</p>
<p>Learn how and why they built these new features for Python and how they worked with and engaged with the Python community.</p>
<p>Download or listen to the episode below:</p>
<p><iframe loading="lazy" style="border: none;" title="Libsyn Player" src="https://html5-player.libsyn.com/embed/episode/id/29730333/height/90/theme/custom/thumbnail/yes/direction/forward/render-playlist/no/custom-color/000000/" width="100%" height="90" scrolling="no" allowfullscreen="allowfullscreen"></iframe></p>
<p>You can also find the episode wherever you get your podcasts, including:</p>
<p>Spotify<br />Apple Podcasts<br />PocketCasts<br />Castro<br />Overcast</p>
<p>The Meta Tech Podcast is a podcast, brought to you by Meta, where we highlight the work Meta’s engineers are doing at every level – from low-level frameworks to end-user features.</p>
<p>Send us feedback on Instagram, Threads, or X.</p>
<p>And if you’re interested in learning more about career opportunities at Meta visit the Meta Careers page.</p>The post <a href="https://dailyzsocialmedianews.com/meta-loves-python-engineering-at-meta/">Meta loves Python – Engineering at Meta</a> first appeared on <a href="https://dailyzsocialmedianews.com">DAILY ZSOCIAL MEDIA NEWS</a>.]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
