<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>learning | DAILY ZSOCIAL MEDIA NEWS</title>
	<atom:link href="https://dailyzsocialmedianews.com/tag/learning/feed/" rel="self" type="application/rss+xml" />
	<link>https://dailyzsocialmedianews.com</link>
	<description>ALL ABOUT DAILY ZSOCIAL MEDIA NEWS</description>
	<lastBuildDate>Thu, 21 Mar 2024 01:34:52 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.7.1</generator>

<image>
	<url>https://dailyzsocialmedianews.com/wp-content/uploads/2020/12/cropped-DAILY-ZSOCIAL-MEDIA-NEWS-e1607166156946-32x32.png</url>
	<title>learning | DAILY ZSOCIAL MEDIA NEWS</title>
	<link>https://dailyzsocialmedianews.com</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Optimizing RTC bandwidth estimation with machine studying</title>
		<link>https://dailyzsocialmedianews.com/optimizing-rtc-bandwidth-estimation-with-machine-studying/</link>
		
		<dc:creator><![CDATA[]]></dc:creator>
		<pubDate>Thu, 21 Mar 2024 01:34:51 +0000</pubDate>
				<category><![CDATA[Facebook]]></category>
		<category><![CDATA[bandwidth]]></category>
		<category><![CDATA[estimation]]></category>
		<category><![CDATA[learning]]></category>
		<category><![CDATA[Machine]]></category>
		<category><![CDATA[Optimizing]]></category>
		<category><![CDATA[RTC]]></category>
		<guid isPermaLink="false">https://dailyzsocialmedianews.com/?p=24934</guid>

					<description><![CDATA[<div style="margin-bottom:20px;"><img width="1023" height="576" src="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/21013450/Optimizing-RTC-bandwidth-estimation-with-machine-learning.png" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="Optimizing RTC bandwidth estimation with machine learning" decoding="async" fetchpriority="high" srcset="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/21013450/Optimizing-RTC-bandwidth-estimation-with-machine-learning.png 1023w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/21013450/Optimizing-RTC-bandwidth-estimation-with-machine-learning-300x169.png 300w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/21013450/Optimizing-RTC-bandwidth-estimation-with-machine-learning-768x432.png 768w" sizes="(max-width: 1023px) 100vw, 1023px" /></div><p>Bandwidth estimation (BWE) and congestion control play an important role in delivering high-quality real-time communication (RTC) across Meta’s family of apps. We’ve adopted a machine learning (ML)-based approach that allows us to solve networking problems holistically across cross-layers such as BWE, network resiliency, and transport. We’re sharing our experiment results from this approach, some of [&#8230;]</p>
The post <a href="https://dailyzsocialmedianews.com/optimizing-rtc-bandwidth-estimation-with-machine-studying/">Optimizing RTC bandwidth estimation with machine studying</a> first appeared on <a href="https://dailyzsocialmedianews.com">DAILY ZSOCIAL MEDIA NEWS</a>.]]></description>
										<content:encoded><![CDATA[<div style="margin-bottom:20px;"><img width="1023" height="576" src="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/21013450/Optimizing-RTC-bandwidth-estimation-with-machine-learning.png" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="Optimizing RTC bandwidth estimation with machine learning" decoding="async" srcset="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/21013450/Optimizing-RTC-bandwidth-estimation-with-machine-learning.png 1023w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/21013450/Optimizing-RTC-bandwidth-estimation-with-machine-learning-300x169.png 300w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/21013450/Optimizing-RTC-bandwidth-estimation-with-machine-learning-768x432.png 768w" sizes="(max-width: 1023px) 100vw, 1023px" /></div><p></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Bandwidth estimation (BWE) and congestion control play an important role in delivering high-quality real-time communication (RTC) across Meta’s family of apps.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">We’ve adopted a machine learning (ML)-based approach that allows us</span><span style="font-weight: 400;"> to solve networking problems holistically across cross-layers such as BWE, network resiliency, and transport.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">We’re sharing our experiment results from this approach, some of the challenges we encountered during execution, and learnings for new adopters.</span></li>
</ul>
<p><span style="font-weight: 400;">Our existing bandwidth estimation (BWE) module at Meta is</span> <span style="font-weight: 400;">based on WebRTC’s Google Congestion Controller (GCC)</span><span style="font-weight: 400;">. We have made several improvements through parameter tuning, but this has resulted in a more complex system, as shown in Figure 1.</span></p>
<p>Figure 1: BWE module’s system diagram for congestion control in RTC.</p>
<p><span style="font-weight: 400;">One challenge with the tuned congestion control (CC)/BWE algorithm was that it had multiple parameters and actions that were dependent on network conditions. For example, there was a trade-off between quality and reliability; improving quality for high-bandwidth users often led to reliability regressions for low-bandwidth users, and vice versa, making it challenging to optimize the user experience for different network conditions.</span></p>
<p><span style="font-weight: 400;">Additionally, we noticed some inefficiencies in regards to improving and maintaining the module with the complex BWE module:</span></p>
<ol>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Due to the absence of realistic network conditions during our experimentation process, fine-tuning the parameters for user clients necessitated several attempts.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Even after the rollout, it wasn’t clear if the optimized parameters were still applicable for the targeted network types.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">This resulted in complex code logics and branches for engineers to maintain.</span></li>
</ol>
<p><span style="font-weight: 400;">To solve these inefficiencies, we developed a machine learning (ML)-based, network-targeting approach that offers a cleaner alternative to hand-tuned rules. This approach also allows us to solve networking problems holistically across cross-layers such as BWE, network resiliency, and transport.</span></p>
<h2><span style="font-weight: 400;">Network characterization</span></h2>
<p><span style="font-weight: 400;">An ML model-based approach leverages time series data to improve the bandwidth estimation by using offline parameter tuning for characterized network types. </span></p>
<p><span style="font-weight: 400;">For an RTC call to be completed, the endpoints must be connected to each other through network devices. The optimal configs that have been tuned offline are stored on the server and can be updated in real-time. During the call connection setup, these optimal configs are delivered to the client. During the call, media is transferred directly between the endpoints or through a relay server. Depending on the network signals collected during the call, an ML-based approach characterizes the network into different types and applies the optimal configs for the detected type.</span></p>
<p><span style="font-weight: 400;">Figure 2 illustrates an example of an RTC call that’s optimized using the ML-based approach. </span><span style="font-weight: 400;"> </span></p>
<p><img decoding="async" class="size-large wp-image-21120" src="https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-2.png?w=1024" alt="" width="1024" height="576" srcset="https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-2.png 1999w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-2.png?resize=580,326 580w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-2.png?resize=916,516 916w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-2.png?resize=768,432 768w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-2.png?resize=1024,576 1024w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-2.png?resize=1536,864 1536w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-2.png?resize=96,54 96w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-2.png?resize=192,108 192w" sizes="(max-width: 992px) 100vw, 62vw"/>Figure 2: An example RTC call configuration with optimized parameters delivered from the server and based on the current network type.</p>
<h2><span style="font-weight: 400;">Model learning and offline parameter tuning</span></h2>
<p><span style="font-weight: 400;">On a high level, network characterization consists of two main components, as shown in Figure 3. The first component is offline ML model learning using ML to categorize the network type (random packet loss versus bursty loss). The second component uses offline simulations to tune parameters optimally for the categorized network type. </span></p>
<p><img loading="lazy" decoding="async" class="size-large wp-image-21121" src="https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-3.png?w=1024" alt="" width="1024" height="576" srcset="https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-3.png 1999w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-3.png?resize=580,326 580w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-3.png?resize=916,516 916w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-3.png?resize=768,432 768w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-3.png?resize=1024,576 1024w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-3.png?resize=1536,864 1536w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-3.png?resize=96,54 96w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-3.png?resize=192,108 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>Figure 3: Offline ML-model learning and parameter tuning.</p>
<p><span style="font-weight: 400;">For model learning, we leverage the time series data (network signals and non-personally identifiable information, see Figure 6, below) from production calls and simulations. Compared to the aggregate metrics logged after the call, time series captures the time-varying nature of the network and dynamics. We use</span><span style="font-weight: 400;"> FBLearner</span><span style="font-weight: 400;">, our internal AI stack, for the training pipeline and deliver the PyTorch model files on demand to the clients at the start of the call.</span></p>
<p><span style="font-weight: 400;">For offline tuning, we use simulations to run network profiles for the detected types and choose the optimal parameters for the modules based on improvements in technical metrics (such as quality, freeze, and so on.).</span></p>
<h2><span style="font-weight: 400;">Model architecture</span></h2>
<p><span style="font-weight: 400;">From our experience, we’ve found that it’s necessary to combine time series features with non-time series (i.e., derived metrics from the time window) for a highly accurate modeling.</span></p>
<p><span style="font-weight: 400;">To handle both time series and non-time series data, we’ve designed a model architecture that can process input from both sources.</span></p>
<p><span style="font-weight: 400;">The time series data will pass through a</span> <span style="font-weight: 400;">long short-term memory (LSTM) layer</span><span style="font-weight: 400;"> that will convert time series input into a one-dimensional vector representation, such as 16×1. The non-time series data or dense data will pass through a dense layer (i.e., a fully connected layer). Then the two vectors will be concatenated, to fully represent the network condition in the past, and passed through a fully connected layer again. The final output from the neural network model will be the predicted output of the target/task, as shown in Figure 4. </span></p>
<p><img loading="lazy" decoding="async" class="size-large wp-image-21122" src="https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-4.png?w=1024" alt="" width="1024" height="576" srcset="https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-4.png 1999w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-4.png?resize=580,326 580w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-4.png?resize=916,516 916w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-4.png?resize=768,432 768w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-4.png?resize=1024,576 1024w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-4.png?resize=1536,864 1536w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-4.png?resize=96,54 96w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-4.png?resize=192,108 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>Figure 4: Combined-model architecture with LSTM and Dense Layers</p>
<h2><span style="font-weight: 400;">Use case: Random packet loss classification</span></h2>
<p><span style="font-weight: 400;">Let’s consider the use case of categorizing packet loss as either random or congestion. The former loss is due to the network components, and the latter is due to the limits in queue length (which are delay dependent). Here is the ML task definition:</span><span style="font-weight: 400;"><br /></span><span style="font-weight: 400;"><br /></span><span style="font-weight: 400;">Given the network conditions in the past N seconds (10), and that the network is currently incurring packet loss, the goal is to characterize the packet loss at the current timestamp as RANDOM or not.</span></p>
<p><span style="font-weight: 400;">Figure 5 illustrates how we leverage the architecture to achieve that goal:</span></p>
<p><img loading="lazy" decoding="async" class="size-large wp-image-21123" src="https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-5.png?w=1024" alt="" width="1024" height="576" srcset="https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-5.png 1999w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-5.png?resize=580,326 580w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-5.png?resize=916,516 916w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-5.png?resize=768,432 768w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-5.png?resize=1024,576 1024w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-5.png?resize=1536,864 1536w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-5.png?resize=96,54 96w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-5.png?resize=192,108 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>Figure 5: Model architecture for a random packet loss classification task.</p>
<h3><span style="font-weight: 400;">Time series features</span></h3>
<p><span style="font-weight: 400;">We leverage the following time series features gathered from logs:</span></p>
<p><img loading="lazy" decoding="async" class="wp-image-21136 size-large" src="https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-6b.png?w=1024" alt="" width="1024" height="576" srcset="https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-6b.png 2500w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-6b.png?resize=580,326 580w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-6b.png?resize=916,515 916w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-6b.png?resize=768,432 768w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-6b.png?resize=1024,576 1024w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-6b.png?resize=1536,864 1536w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-6b.png?resize=2048,1152 2048w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-6b.png?resize=96,54 96w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-6b.png?resize=192,108 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>Figure 6: Time series features used for model training.</p>
<h3><span style="font-weight: 400;">BWE optimization</span></h3>
<p><span style="font-weight: 400;">When the ML model detects random packet loss, we perform local optimization on the BWE module by:</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Increasing the tolerance to random packet loss in the loss-based BWE (holding the bitrate).</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Increasing the ramp-up speed, depending on the link capacity on high bandwidths.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Increasing the network resiliency by sending additional forward-error correction packets to recover from packet loss.</span></li>
</ul>
<h2><span style="font-weight: 400;">Network prediction</span></h2>
<p><span style="font-weight: 400;">The network characterization problem discussed in the previous sections focuses on classifying network types based on past information using time series data. For those simple classification tasks, we achieve this using the hand-tuned rules with some limitations. The real power of leveraging ML for networking, however, comes from using it for predicting future network conditions.</span></p>
<p><span style="font-weight: 400;">We have applied ML for solving congestion-prediction problems for optimizing low-bandwidth users’ experience.</span></p>
<h2><span style="font-weight: 400;">Congestion prediction</span></h2>
<p><span style="font-weight: 400;">From our analysis of production data, we found that low-bandwidth users often incur congestion due to the behavior of the GCC module. By predicting this congestion, we can improve the reliability of such users’ behavior. Towards this, we addressed the following problem statement using round-trip time (RTT) and packet loss:</span><span style="font-weight: 400;"><br /></span><span style="font-weight: 400;"><br /></span><span style="font-weight: 400;">Given the historical time-series data from production/simulation (“N” seconds), the goal is to predict packet loss due to congestion or the congestion itself in the next “N” seconds; that is, a spike in RTT followed by a packet loss or a further growth in RTT.</span></p>
<p><span style="font-weight: 400;">Figure 7 shows an example from a simulation where the bandwidth alternates between 500 Kbps and 100 Kbps every 30 seconds. As we lower the bandwidth, the network incurs congestion and the ML model predictions fire the green spikes even before the delay spikes and packet loss occur. This early prediction of congestion is helpful in faster reactions and thus improves the user experience by preventing video freezes and connection drops.</span></p>
<p><img loading="lazy" decoding="async" class="size-large wp-image-21137" src="https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-7b.png?w=1024" alt="" width="1024" height="576" srcset="https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-7b.png 2500w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-7b.png?resize=580,326 580w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-7b.png?resize=916,515 916w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-7b.png?resize=768,432 768w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-7b.png?resize=1024,576 1024w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-7b.png?resize=1536,864 1536w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-7b.png?resize=2048,1152 2048w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-7b.png?resize=96,54 96w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-7b.png?resize=192,108 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>Figure 7: Simulated network scenario with alternating bandwidth for congestion prediction</p>
<h2><span style="font-weight: 400;">Generating training samples</span></h2>
<p><span style="font-weight: 400;">The main challenge in modeling is generating training samples for a variety of congestion situations. With simulations, it’s harder to capture different types of congestion that real user clients would encounter in production networks. As a result, we used actual production logs for labeling congestion samples, following the RTT-spikes criteria in the past and future windows according to the following assumptions:</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Absent past RTT spikes, packet losses in the past and future are independent.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Absent past RTT spikes, we cannot predict future RTT spikes or fractional losses (i.e., flosses).</span></li>
</ul>
<p><span style="font-weight: 400;">We split the time window into past (4 seconds) and future (4 seconds) for labeling.</span><span style="font-weight: 400;"><br /></span></p>
<p><img loading="lazy" decoding="async" class="size-large wp-image-21126" src="https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-8.png?w=1024" alt="" width="1024" height="576" srcset="https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-8.png 1999w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-8.png?resize=580,326 580w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-8.png?resize=916,516 916w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-8.png?resize=768,432 768w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-8.png?resize=1024,576 1024w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-8.png?resize=1536,864 1536w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-8.png?resize=96,54 96w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-8.png?resize=192,108 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>Figure 8: Labeling criteria for congestion prediction</p>
<h2><span style="font-weight: 400;">Model performance</span></h2>
<p><span style="font-weight: 400;">Unlike network characterization, where ground truth is unavailable, we can obtain ground truth by examining the future time window after it has passed and then comparing it with the prediction made four seconds earlier. With this logging information gathered from real production clients, we compared the performance in offline training to online data from user clients:</span></p>
<p><img loading="lazy" decoding="async" class="size-large wp-image-21127" src="https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-9.png?w=1024" alt="" width="1024" height="576" srcset="https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-9.png 1999w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-9.png?resize=580,326 580w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-9.png?resize=916,516 916w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-9.png?resize=768,432 768w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-9.png?resize=1024,576 1024w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-9.png?resize=1536,864 1536w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-9.png?resize=96,54 96w, https://engineering.fb.com/wp-content/uploads/2024/03/Optimizing-BWE-with-ML-Hero_Figure-9.png?resize=192,108 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>Figure 9: Offline versus online model performance comparison.</p>
<h2><span style="font-weight: 400;">Experiment results</span></h2>
<p><span style="font-weight: 400;">Here are some highlights from our deployment of various ML models to improve bandwidth estimation:</span></p>
<h3><span style="font-weight: 400;">Reliability wins for congestion prediction</span></h3>
<p><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /></span> <span style="font-weight: 400;">connection_drop_rate -0.326371 +/- 0.216084<br /></span><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> last_minute_quality_regression_v1 -0.421602 +/- 0.206063<br /></span><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> last_minute_quality_regression_v2 -0.371398 +/- 0.196064<br /></span><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> bad_experience_percentage -0.230152 +/- 0.148308<br /></span><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> transport_not_ready_pct -0.437294 +/- 0.400812</span></p>
<p><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /></span><span style="font-weight: 400;"> peer_video_freeze_percentage -0.749419 +/- 0.180661<br /></span><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> peer_video_freeze_percentage_above_500ms -0.438967 +/- 0.212394</span></p>
<h3><span style="font-weight: 400;">Quality and user engagement wins for random packet loss characterization in high bandwidth</span></h3>
<p><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /></span><span style="font-weight: 400;"> peer_video_freeze_percentage -0.379246 +/- 0.124718<br /></span><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> peer_video_freeze_percentage_above_500ms -0.541780 +/- 0.141212<br /></span><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> peer_neteq_plc_cng_perc -0.242295 +/- 0.137200</span></p>
<p><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> total_talk_time 0.154204 +/- 0.148788</span></p>
<h3><span style="font-weight: 400;">Reliability and quality wins for cellular low bandwidth classification</span></h3>
<p><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> connection_drop_rate -0.195908 +/- 0.127956<br /></span><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> last_minute_quality_regression_v1 -0.198618 +/- 0.124958<br /></span><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> last_minute_quality_regression_v2 -0.188115 +/- 0.138033</span></p>
<p><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> peer_neteq_plc_cng_perc -0.359957 +/- 0.191557<br /></span><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> peer_video_freeze_percentage -0.653212 +/- 0.142822</span></p>
<h3><span style="font-weight: 400;">Reliability and quality wins for cellular high bandwidth classification</span></h3>
<p><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> avg_sender_video_encode_fps 0.152003 +/- 0.046807<br /></span><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> avg_sender_video_qp -0.228167 +/- 0.041793<br /></span><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> avg_video_quality_score 0.296694 +/- 0.043079<br /></span><span style="font-weight: 400;"><img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> avg_video_sent_bitrate 0.430266 +/- 0.092045</span></p>
<h2><span style="font-weight: 400;">Future plans for applying ML to RTC</span></h2>
<p><span style="font-weight: 400;">From our project execution and experimentation on production clients, we noticed that a ML-based approach is more efficient in targeting, end-to-end monitoring, and updating than traditional hand-tuned rules for networking. However, the efficiency of ML solutions largely depends on data quality and labeling (using simulations or production logs). By applying ML-based solutions to solving network prediction problems – congestion in particular – we fully leveraged the power of ML. </span></p>
<p><span style="font-weight: 400;">In the future, we will be consolidating all the network characterization models into a single model using the multi-task approach to fix the inefficiency due to redundancy in model download, inference, and so on. We will be building a shared representation model for the time series to solve different tasks (e.g., bandwidth classification, packet loss classification, etc.) in network characterization. We will focus on building realistic production network scenarios for model training and validation. This will enable us to use ML to identify optimal network actions given the network conditions. We will persist in refining our learning-based methods to enhance network performance by considering existing network signals.</span></p>The post <a href="https://dailyzsocialmedianews.com/optimizing-rtc-bandwidth-estimation-with-machine-studying/">Optimizing RTC bandwidth estimation with machine studying</a> first appeared on <a href="https://dailyzsocialmedianews.com">DAILY ZSOCIAL MEDIA NEWS</a>.]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Bettering machine studying iteration velocity with sooner utility construct and packaging</title>
		<link>https://dailyzsocialmedianews.com/bettering-machine-studying-iteration-velocity-with-sooner-utility-construct-and-packaging/</link>
		
		<dc:creator><![CDATA[]]></dc:creator>
		<pubDate>Mon, 29 Jan 2024 20:26:43 +0000</pubDate>
				<category><![CDATA[Facebook]]></category>
		<category><![CDATA[Application]]></category>
		<category><![CDATA[Build]]></category>
		<category><![CDATA[faster]]></category>
		<category><![CDATA[Improving]]></category>
		<category><![CDATA[iteration]]></category>
		<category><![CDATA[learning]]></category>
		<category><![CDATA[Machine]]></category>
		<category><![CDATA[packaging]]></category>
		<category><![CDATA[speed]]></category>
		<guid isPermaLink="false">https://dailyzsocialmedianews.com/?p=24615</guid>

					<description><![CDATA[<div style="margin-bottom:20px;"><img width="812" height="800" src="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/01/29202641/Improving-machine-learning-iteration-speed-with-faster-application-build-and.png" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="Improving machine learning iteration speed with faster application build and packaging" decoding="async" loading="lazy" srcset="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/01/29202641/Improving-machine-learning-iteration-speed-with-faster-application-build-and.png 812w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/01/29202641/Improving-machine-learning-iteration-speed-with-faster-application-build-and-300x296.png 300w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/01/29202641/Improving-machine-learning-iteration-speed-with-faster-application-build-and-768x757.png 768w" sizes="auto, (max-width: 812px) 100vw, 812px" /></div><p>Slow build times and inefficiencies in packaging and distributing execution files were costing our ML/AI engineers a significant amount of time while working on our training stack. By addressing these issues head-on, we were able to reduce this overhead by double-digit percentages.  In the fast-paced world of AI/ML development, it’s crucial to ensure that our [&#8230;]</p>
The post <a href="https://dailyzsocialmedianews.com/bettering-machine-studying-iteration-velocity-with-sooner-utility-construct-and-packaging/">Bettering machine studying iteration velocity with sooner utility construct and packaging</a> first appeared on <a href="https://dailyzsocialmedianews.com">DAILY ZSOCIAL MEDIA NEWS</a>.]]></description>
										<content:encoded><![CDATA[<div style="margin-bottom:20px;"><img width="812" height="800" src="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/01/29202641/Improving-machine-learning-iteration-speed-with-faster-application-build-and.png" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="Improving machine learning iteration speed with faster application build and packaging" decoding="async" loading="lazy" srcset="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/01/29202641/Improving-machine-learning-iteration-speed-with-faster-application-build-and.png 812w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/01/29202641/Improving-machine-learning-iteration-speed-with-faster-application-build-and-300x296.png 300w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/01/29202641/Improving-machine-learning-iteration-speed-with-faster-application-build-and-768x757.png 768w" sizes="auto, (max-width: 812px) 100vw, 812px" /></div><p></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Slow build times and inefficiencies in packaging and distributing execution files were costing our ML/AI engineers a significant amount of time while working on our training stack.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">By addressing these issues head-on, we were able to reduce this overhead by double-digit percentages. </span></li>
</ul>
<p><span style="font-weight: 400;">In the fast-paced world of AI/ML development, it’s crucial to ensure that our infrastructure can keep up with the increasing demands and needs of our ML engineers, whose workflows include checking out code, writing code, building, packaging, and verification.</span></p>
</p>
<p><span style="font-weight: 400;">In our efforts to maintain efficiency and productivity while empowering our ML/AI engineers to deliver cutting-edge solutions, we found two major challenges that needed to be addressed head-on: slow builds and inefficiencies in packaging and distributing executable files.</span></p>
<p><span style="font-weight: 400;">The frustrating problem of slow builds often arises when ML engineers work on older (“cold”) revisions for which our build infrastructure doesn’t maintain a high cache hit rate, requiring us to repeatedly rebuild and relink many components. Moreover, build non-determinism further contributes to rebuilding by introducing inefficiencies and producing different outputs for the same source code, making previously cached results unusable.</span></p>
<p><span style="font-weight: 400;">Executable packaging and distribution was another significant challenge because, historically, most ML Python executables were represented as</span> <span style="font-weight: 400;">XAR files</span><span style="font-weight: 400;"> (self-contained executables) and it is not always possible to leverage OSS layer-based solutions efficiently (see more details below). Unfortunately, creating such executables can be computationally costly, especially when dealing with a large number of files or substantial file sizes. Even if a developer modifies only a few Python files, a full XAR file reassembly and distribution is often required, causing delays for the executable to be executed on remote machines.</span></p>
<p><span style="font-weight: 400;">Our goal in improving build speed was to minimize the need for extensive rebuilding. To accomplish this, we streamlined the build graph by reducing dependency counts, mitigated the challenges posed by build non-determinism, and maximized the utilization of built artifacts.</span></p>
<p><span style="font-weight: 400;">Simultaneously, our efforts in packaging and distribution aimed to introduce incrementality support, thereby eliminating the time-consuming overhead associated with XAR creation and distribution.</span></p>
<h2><span style="font-weight: 400;">How we improved build speeds</span></h2>
<p><span style="font-weight: 400;">To make builds faster we wanted to ensure that we built as little as possible by addressing non-determinism and eliminating unused code and dependencies.</span></p>
<p><span style="font-weight: 400;">We identified two sources of build non-determinism:</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1">Non-determinism in tooling.<span style="font-weight: 400;"> Some compilers, such as Clang, Rustc, and NVCC, can produce different binary files for the same input, leading to non-deterministic results. Tackling these tooling non-determinism issues proved challenging, as they often required extensive root cause analysis and time-consuming fixes.</span></li>
<li style="font-weight: 400;" aria-level="1">Non-determinism in source code and build rules.<span style="font-weight: 400;"> Developers, whether intentionally or unintentionally, introduced non-determinism by incorporating things like temporary directories, random values, or timestamps into build rules code. Addressing these issues posed a similar challenge, demanding a substantial investment of time to identify and fix.</span></li>
</ul>
<p><span style="font-weight: 400;">Thanks to</span> <span style="font-weight: 400;">Buck2</span><span style="font-weight: 400;">,</span><span style="font-weight: 400;"> which sends nearly all of the build actions to the</span> <span style="font-weight: 400;">Remote Execution (RE) service</span><span style="font-weight: 400;">, we have been able to successfully implement non-determinism mitigation within RE. Now we provide consistent outputs for identical actions, paving the way for the adoption of a warm and stable revision for ML development. In practice, this approach will eliminate build times in many cases.</span></p>
<p><span style="font-weight: 400;">Though removing the build process from the critical path of ML engineers might not be possible in all cases, we understand how important it is to handle dependencies for controlling build times. As dependencies naturally increased, we made enhancements to our tools for managing them better. These improvements helped us find and remove many unnecessary dependencies, making build graph analysis and overall build times much better. For example, we removed GPU code from the final binary when it wasn’t needed and figured out ways to identify which Python modules are actually used and cut native code using linker maps.</span></p>
<h2><span style="font-weight: 400;">Adding incrementality for executable distribution</span></h2>
<p><span style="font-weight: 400;">A typical self-executable Python binary, when unarchived, is represented by thousands of Python files (.py and/or .pyc), substantial native libraries, and the Python interpreter. The cumulative result is a multitude of files, often numbering in the hundreds of thousands, with a total size reaching tens of gigabytes.</span></p>
<p><span style="font-weight: 400;">Engineers </span><span style="font-weight: 400;">spend a significant amount of time</span><span style="font-weight: 400;"> dealing with incremental builds where packaging and fetching overhead of such a large executable surpasses the build time. In response to this challenge, we implemented a new solution for the packaging and distribution of Python executables – the Content Addressable Filesystem (CAF).</span></p>
<p><span style="font-weight: 400;">The primary strength of CAF lies in its ability to operate incrementally during content addressable file packaging and fetching stages:</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1">Packaging<span style="font-weight: 400;">: By adopting a content-aware approach, CAF can intelligently skip redundant uploads of files already present in Content Addressable Storage (CAS), whether as part of a different executable or the same executable with a different version.</span></li>
<li style="font-weight: 400;" aria-level="1">Fetching<span style="font-weight: 400;">: CAF maintains a cache on the destination host, ensuring that only content not already present in the cache needs to be downloaded.</span></li>
</ul>
<p><span style="font-weight: 400;">To optimize the efficiency of this system, we deploy a CAS daemon on the majority of Meta’s data center hosts. The CAS daemon assumes multiple responsibilities, including maintaining the local cache on the host (materialization into the cache and cache GC) and organizing a P2P network with other CAS daemon instances using</span> <span style="font-weight: 400;">Owl</span><span style="font-weight: 400;">, our high-fanout distribution system for large data objects. This P2P network enables direct content fetching from other CAS daemon instances, significantly reducing latency and storage bandwidth capacity.</span></p>
<p><span style="font-weight: 400;">In the case of CAF, an executable is defined by a flat manifest file detailing all symlinks, directories, hard links, and files, along with their digest and attributes. This manifest implementation allows us to deduplicate all unique files across executables and implement a smart affinity/routing mechanism for scheduling, thereby minimizing the amount of content that needs to be downloaded by maximizing local cache utilization.</span></p>
<p><span style="font-weight: 400;">While the concept may bear some resemblance to what</span> <span style="font-weight: 400;">Docker achieves with OverlayFS</span><span style="font-weight: 400;">, our approach differs significantly. Organizing proper layers is not always feasible in our case due to the number of executables with diverse dependencies. In this context, layering becomes less efficient and its organization becomes more complex to achieve. Additionally direct access to files is essential for P2P support.</span></p>
<p><span style="font-weight: 400;">We opted for</span> <span style="font-weight: 400;">Btrfs</span><span style="font-weight: 400;"> as our filesystem because of its</span> <span style="font-weight: 400;">compression</span><span style="font-weight: 400;"> and ability to write compressed storage data directly to extents, which bypasses redundant decompression and compression and</span> <span style="font-weight: 400;">Copy-on-write (COW)</span><span style="font-weight: 400;"> capabilities. These attributes allow us to maintain executables on block devices with a total size similar to those represented as XAR files, share the same files from cache across executables, and implement a highly efficient COW mechanism that, when needed, only copies affected file extents.</span></p>
<h2><span style="font-weight: 400;">LazyCAF and enforcing uniform revisions: Areas for further ML iteration improvements</span></h2>
<p><span style="font-weight: 400;">The improvements we implemented have proven highly effective, drastically reducing the overhead and significantly elevating the efficiency of our ML engineers. Faster build times and more efficient packaging and distribution of executables have reduced overhead by double-digit percentages.</span></p>
<p><span style="font-weight: 400;">Yet, our journey to slash build overhead doesn’t end here. We’ve identified several promising improvements that we aim to deliver soon. In our investigation into our ML workflows, we discovered that only a fraction of the entire executable content is utilized in certain scenarios. Recognizing that, we intend  to start working on optimizations to fetch executable parts on demand, thereby significantly reducing materialization time and minimizing the overall disk footprint.</span></p>
<p><span style="font-weight: 400;">We can also further accelerate the development process by enforcing uniform revisions. We plan to enable all our ML engineers to operate on the same revision, which will improve the cache hit ratios of our build. This move will further increase the percentage of incremental builds since most of the artifacts will be cached.</span></p>The post <a href="https://dailyzsocialmedianews.com/bettering-machine-studying-iteration-velocity-with-sooner-utility-construct-and-packaging/">Bettering machine studying iteration velocity with sooner utility construct and packaging</a> first appeared on <a href="https://dailyzsocialmedianews.com">DAILY ZSOCIAL MEDIA NEWS</a>.]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Lazy is the brand new quick: How Lazy Imports and Cinder speed up machine studying at Meta</title>
		<link>https://dailyzsocialmedianews.com/lazy-is-the-brand-new-quick-how-lazy-imports-and-cinder-speed-up-machine-studying-at-meta/</link>
		
		<dc:creator><![CDATA[]]></dc:creator>
		<pubDate>Thu, 18 Jan 2024 20:16:31 +0000</pubDate>
				<category><![CDATA[Facebook]]></category>
		<category><![CDATA[Accelerate]]></category>
		<category><![CDATA[Cinder]]></category>
		<category><![CDATA[Fast]]></category>
		<category><![CDATA[imports]]></category>
		<category><![CDATA[Lazy]]></category>
		<category><![CDATA[learning]]></category>
		<category><![CDATA[Machine]]></category>
		<category><![CDATA[Meta]]></category>
		<guid isPermaLink="false">https://dailyzsocialmedianews.com/?p=24539</guid>

					<description><![CDATA[<p>At Meta, the quest for faster model training has yielded an exciting milestone: the adoption of Lazy Imports and the Python Cinder runtime. The outcome? Up to 40 percent time to first batch (TTFB) improvements, along with a 20 percent reduction in Jupyter kernel startup times. This advancement facilitates swifter experimentation capabilities and elevates the [&#8230;]</p>
The post <a href="https://dailyzsocialmedianews.com/lazy-is-the-brand-new-quick-how-lazy-imports-and-cinder-speed-up-machine-studying-at-meta/">Lazy is the brand new quick: How Lazy Imports and Cinder speed up machine studying at Meta</a> first appeared on <a href="https://dailyzsocialmedianews.com">DAILY ZSOCIAL MEDIA NEWS</a>.]]></description>
										<content:encoded><![CDATA[<p></p>
<ul>
<li><span style="font-weight: 400;">At Meta, the quest for faster model training has yielded an exciting milestone: the adoption of Lazy Imports and the Python Cinder runtime. </span></li>
<li><span style="font-weight: 400;">The outcome? Up to 40 percent time to first batch (TTFB) improvements, along with a </span>20 percent<span style="font-weight: 400;"> reduction in Jupyter kernel startup times. </span></li>
<li><span style="font-weight: 400;">This advancement facilitates swifter experimentation capabilities and elevates the ML developer experience (DevX).</span></li>
</ul>
<p><span style="font-weight: 400;">Time is of the essence in the realm of machine learning (ML) development. The milliseconds it takes for an ML model to transition from conceptualization to processing the initial training data can dramatically impact productivity and experimentation.</span></p>
<p><span style="font-weight: 400;">At Meta, we’ve been able to significantly improve our model training times, as well as our overall developer experience (DevX) by adopting </span><span style="font-weight: 400;">Lazy Imports</span><span style="font-weight: 400;"> and the </span><span style="font-weight: 400;">Python Cinder runtime</span><span style="font-weight: 400;">. </span></p>
<h2><span style="font-weight: 400;">The time to first batch challenge</span></h2>
<p><span style="font-weight: 400;">Batch processing has been a game changer in ML development. It handles large volumes of data in groups (or batches) and allows us to train models, optimize parameters, and perform inference more effectively and swiftly.</span></p>
<p><span style="font-weight: 400;">But ML training workloads are notorious for their sluggish starts. When we look to improve our batch processing speeds, time to first batch (TTFB) comes into focus. TTFB is the time elapsed from the moment you hit the “start” button on your ML model training to the point when the first batch of data enters the model for processing. It is a critical metric that determines the speed at which an ML model goes from idle to learning. TTFB can vary widely due to factors like infrastructure overhead and scheduling delays. But reducing TTFB means reducing the development waiting times that can often feel like an eternity to engineers – waiting periods that can quickly amass as expensive resource wastage.</span></p>
<p><span style="font-weight: 400;">In the pursuit of faster TTFB, Meta set its sights on reducing this overhead, and Lazy Imports with Cinder emerged as a promising solution.</span></p>
<h2><span style="font-weight: 400;">The magic of Lazy Imports</span></h2>
<p><span style="font-weight: 400;">Previously, ML developers explored alternatives like the standard </span><span style="font-weight: 400; font-family: 'courier new', courier;">LazyLoader</span><span style="font-weight: 400;"> in </span><span style="font-weight: 400; font-family: 'courier new', courier;">importlib</span><span style="font-weight: 400;"> or </span><span style="font-weight: 400;">lazy-import</span><span style="font-weight: 400;">`, to defer explicit imports until necessary. While promising, these approaches are limited by their much narrower scope, and the need to manually select which dependencies will be lazily imported (often with suboptimal results). Using these approaches demands meticulous codebase curation and a fair amount of code refactoring.</span></p>
<p><span style="font-weight: 400;">In contrast, </span><span style="font-weight: 400;">Cinder’s Lazy Imports</span><span style="font-weight: 400;"> approach is a comprehensive and aggressive strategy that goes beyond the limitations of other libraries and delivers significant enhancements to the developer experience. Instead of painstakingly handpicking imports to become lazy, Cinder simplifies and accelerates the startup process by transparently deferring all imports as a default action, resulting in a much broader and more powerful deferral of imports until the exact moment they’re needed. Once in place, this method ensures that developers no longer have to navigate the maze of selective import choices. With it, developers can bid farewell to the need of typing-only imports and the use of </span><span style="font-weight: 400; font-family: 'courier new', courier;">TYPE_CHECKING</span><span style="font-weight: 400;">. It allows a simple </span><span style="font-weight: 400;"><span style="font-family: 'courier new', courier;">from __future__ import</span> annotations</span><span style="font-weight: 400;"> declaration at the beginning of a file to delay type evaluation, while Lazy Imports defer the actual import statements until required. The combined effect of these optimizations reduced costly runtime imports and further streamlined the development workflow.</span></p>
<p><span style="font-weight: 400;">The Lazy Imports solution delivers. Meta’s initiative to enhance ML development has involved rolling out Cinder with Lazy Imports to several workloads, including our ML frameworks and Jupyter kernels, producing lightning-fast startup times, improved experimentation capabilities, reduced infrastructure overhead, and code that is a breeze to maintain. We’re pleased to share that Meta’s key AI workloads have experienced noteworthy improvements, with TTFB wins reaching up to 40 percent. Resulting time savings can vary from seconds to minutes per run.</span></p>
<p><span style="font-weight: 400;">These impressive results translate to a substantial boost in the efficiency of ML workflows, since they mean ML developers can get to the model training phase more swiftly.</span></p>
<h2><span style="font-weight: 400;">The challenges of adopting Lazy Imports</span></h2>
<p><span style="font-weight: 400;">While Lazy Imports’ approach significantly improved ML development, it was not all a bed of roses. We encountered several hurdles that tested our resolve and creativity.</span></p>
<h3><span style="font-weight: 400;">Compatibility</span></h3>
<p><span style="font-weight: 400;">One of the primary challenges we grappled with was the compatibility of existing libraries with Lazy Imports. Libraries such as PyTorch, Numba, NumPy, and SciPy, among others, did not seamlessly align with the deferred module loading approach. These libraries often rely on import side effects and other patterns that do not play well with Lazy Imports. The order in which Python imports could change or be postponed, often led to side effects failing to register classes, functions, and operations correctly. This required painstaking troubleshooting to identify and address import cycles and discrepancies.</span></p>
<h3><span style="font-weight: 400;">Balancing performance versus dependability</span></h3>
<p><span style="font-weight: 400;">We also had to strike the right balance between performance optimization and code dependability. While Lazy Imports significantly reduced TTFB and enhanced resource utilization, it also introduced a considerable semantic change in the way Python imports work that could make the codebase less intuitive. Achieving the perfect equilibrium was a constant consideration, and was ensured by limiting the impact of semantic changes to only the relevant parts that could be thoroughly tested.</span></p>
<p><span style="font-weight: 400;">Ensuring seamless interaction with the existing codebase required meticulous testing and adjustments. The task was particularly intricate when dealing with complex, multifaceted ML models, where the implications of deferred imports needed to be thoroughly considered. We ultimately opted for enabling Lazy Imports only during the startup and preparation phases and disabling it before the first batch started.</span></p>
<h3><span style="font-weight: 400;">Learning curve</span></h3>
<p><span style="font-weight: 400;">Adopting new paradigms like Lazy Imports can introduce a learning curve for the development team. Training ML engineers, infra engineers, and system engineers to adapt to the new approach, understand its nuances, and implement it effectively is a process in itself.</span></p>
<h2><span style="font-weight: 400;">What is next for Lazy Imports at Meta?</span></h2>
<p><span style="font-weight: 400;">The adoption of Lazy Imports and Cinder represented a meaningful enhancement in Meta’s AI key workloads. It came with its share of ups and downs, but ultimately demonstrated that Lazy Imports can be a game changer in expediting ML development. The TTFB wins, DevX improvements, and reduced kernel startup times are all tangible results of this initiative. With Lazy Imports, Meta’s ML developers are now equipped to work more efficiently, experiment more rapidly, and achieve results faster.</span></p>
<p><span style="font-weight: 400;">While we’ve achieved remarkable success with the adoption of Lazy Imports, our journey is far from over. So, what’s next for us? Here’s a glimpse into our future endeavors:</span></p>
<h3><span style="font-weight: 400;">Streamlining developer onboarding</span></h3>
<p><span style="font-weight: 400;">The learning curve associated with Lazy Imports can be a challenge for newcomers. We’re investing in educational resources and onboarding materials to make it easier for developers to embrace this game-changing approach. </span></p>
<h3><span style="font-weight: 400;">Enhancing tooling</span></h3>
<p><span style="font-weight: 400;">Debugging code with deferred imports can be intricate. We’re working on developing tools and techniques that simplify the debugging and troubleshooting process, ensuring that developers can quickly identify and resolve issues.</span></p>
<h3><span style="font-weight: 400;">Community collaboration</span></h3>
<p><span style="font-weight: 400;">The power of Lazy Imports lies in its adaptability and versatility. We’re eager to collaborate with the Python community – sharing insights, best practices, and addressing challenges together. Building a robust community that helps supporting paradigms and patterns that play well with Lazy Imports is one of our future priorities.</span></p>The post <a href="https://dailyzsocialmedianews.com/lazy-is-the-brand-new-quick-how-lazy-imports-and-cinder-speed-up-machine-studying-at-meta/">Lazy is the brand new quick: How Lazy Imports and Cinder speed up machine studying at Meta</a> first appeared on <a href="https://dailyzsocialmedianews.com">DAILY ZSOCIAL MEDIA NEWS</a>.]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
