<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Meta | DAILY ZSOCIAL MEDIA NEWS</title>
	<atom:link href="https://dailyzsocialmedianews.com/tag/meta/feed/" rel="self" type="application/rss+xml" />
	<link>https://dailyzsocialmedianews.com</link>
	<description>ALL ABOUT DAILY ZSOCIAL MEDIA NEWS</description>
	<lastBuildDate>Tue, 12 Mar 2024 15:22:59 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.7.1</generator>

<image>
	<url>https://dailyzsocialmedianews.com/wp-content/uploads/2020/12/cropped-DAILY-ZSOCIAL-MEDIA-NEWS-e1607166156946-32x32.png</url>
	<title>Meta | DAILY ZSOCIAL MEDIA NEWS</title>
	<link>https://dailyzsocialmedianews.com</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Constructing Meta’s GenAI Infrastructure &#8211; Engineering at Meta</title>
		<link>https://dailyzsocialmedianews.com/constructing-metas-genai-infrastructure-engineering-at-meta/</link>
		
		<dc:creator><![CDATA[]]></dc:creator>
		<pubDate>Tue, 12 Mar 2024 15:22:58 +0000</pubDate>
				<category><![CDATA[Facebook]]></category>
		<category><![CDATA[Building]]></category>
		<category><![CDATA[Engineering]]></category>
		<category><![CDATA[GenAI]]></category>
		<category><![CDATA[infrastructure]]></category>
		<category><![CDATA[Meta]]></category>
		<category><![CDATA[Metas]]></category>
		<guid isPermaLink="false">https://dailyzsocialmedianews.com/?p=24857</guid>

					<description><![CDATA[<div style="margin-bottom:20px;"><img width="1023" height="733" src="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/12152256/Building-Metas-GenAI-Infrastructure-Engineering-at-Meta.png" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="Building Meta’s GenAI Infrastructure - Engineering at Meta" decoding="async" fetchpriority="high" srcset="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/12152256/Building-Metas-GenAI-Infrastructure-Engineering-at-Meta.png 1023w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/12152256/Building-Metas-GenAI-Infrastructure-Engineering-at-Meta-300x215.png 300w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/12152256/Building-Metas-GenAI-Infrastructure-Engineering-at-Meta-768x550.png 768w" sizes="(max-width: 1023px) 100vw, 1023px" /></div><p>Marking a major investment in Meta’s AI future, we are announcing two 24k GPU clusters. We are sharing details on the hardware, network, storage, design, performance, and software that help us extract high throughput and reliability for various AI workloads. We use this cluster design for Llama 3 training. We are strongly committed to open [&#8230;]</p>
The post <a href="https://dailyzsocialmedianews.com/constructing-metas-genai-infrastructure-engineering-at-meta/">Constructing Meta’s GenAI Infrastructure – Engineering at Meta</a> first appeared on <a href="https://dailyzsocialmedianews.com">DAILY ZSOCIAL MEDIA NEWS</a>.]]></description>
										<content:encoded><![CDATA[<div style="margin-bottom:20px;"><img width="1023" height="733" src="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/12152256/Building-Metas-GenAI-Infrastructure-Engineering-at-Meta.png" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="Building Meta’s GenAI Infrastructure - Engineering at Meta" decoding="async" srcset="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/12152256/Building-Metas-GenAI-Infrastructure-Engineering-at-Meta.png 1023w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/12152256/Building-Metas-GenAI-Infrastructure-Engineering-at-Meta-300x215.png 300w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/03/12152256/Building-Metas-GenAI-Infrastructure-Engineering-at-Meta-768x550.png 768w" sizes="(max-width: 1023px) 100vw, 1023px" /></div><p></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Marking a major investment in Meta’s AI future, we are announcing two 24k GPU clusters. We are sharing details on the hardware, network, storage, design, performance, and software that help us extract high throughput and reliability for various AI workloads. We use this cluster design for Llama 3 training.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">We are strongly committed to open compute and open source. We built these clusters on top of </span><span style="font-weight: 400;">Grand Teton</span><span style="font-weight: 400;">, </span><span style="font-weight: 400;">OpenRack</span><span style="font-weight: 400;">, and </span><span style="font-weight: 400;">PyTorch</span><span style="font-weight: 400;"> and continue to push open innovation across the industry.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">This announcement is one step in our ambitious infrastructure roadmap. By the end of 2024, we’re aiming to continue to grow our infrastructure build-out that will include 350,000 NVIDIA H100 GPUs as part of a portfolio that will feature compute power equivalent to nearly 600,000 H100s.</span></li>
</ul>
<p><span style="font-weight: 400;">To lead in developing AI means leading investments in hardware infrastructure. Hardware infrastructure plays an important role in AI’s future. Today, we’re sharing details on two versions of our </span><span style="font-weight: 400;">24,576-GPU data center scale cluster at Meta. These clusters support our current and next generation AI models, including Llama 3, the successor to</span> <span style="font-weight: 400;">Llama 2</span><span style="font-weight: 400;">, our publicly released LLM, as well as AI research and development across GenAI and other areas .</span></p>
<h2><span style="font-weight: 400;">A peek into Meta’s large-scale AI clusters</span></h2>
<p><span style="font-weight: 400;">Meta’s long-term vision is to build artificial general intelligence (AGI) that is open and built responsibly so that it can be widely available for everyone to benefit from. As we work towards AGI, we have also worked on scaling our clusters to power this ambition. The progress we make towards AGI creates new products,</span> <span style="font-weight: 400;">new AI features for our family of apps</span><span style="font-weight: 400;">, and new AI-centric computing devices. </span></p>
<p><span style="font-weight: 400;">While we’ve had a long history of building AI infrastructure, we first shared details on our </span><span style="font-weight: 400;">AI Research SuperCluster (RSC)</span><span style="font-weight: 400;">, featuring 16,000 NVIDIA A100 GPUs, in 2022. RSC has accelerated our open and responsible AI research by helping us build our first generation of advanced AI models. It played and continues to play an important role in the development of </span><span style="font-weight: 400;">Llama</span><span style="font-weight: 400;"> and </span><span style="font-weight: 400;">Llama 2</span><span style="font-weight: 400;">, as well as advanced AI models for applications ranging from computer vision, NLP, and speech recognition, to</span> <span style="font-weight: 400;">image generation</span><span style="font-weight: 400;">, and even</span> <span style="font-weight: 400;">coding</span><span style="font-weight: 400;">.</span></p>
</p>
<h2><span style="font-weight: 400;">Under the hood</span></h2>
<p><span style="font-weight: 400;">Our newer AI clusters build upon the successes and lessons learned from RSC. We focused on building end-to-end AI systems with a major emphasis on researcher and developer experience and productivity. The efficiency of the high-performance network fabrics within these clusters, some of the key storage decisions, combined with the 24,576 NVIDIA Tensor Core H100 GPUs in each, allow both cluster versions to support models larger and more complex than that could be supported in the RSC and pave the way for advancements in GenAI product development and AI research.</span></p>
<h3><span style="font-weight: 400;">Network</span></h3>
<p><span style="font-weight: 400;">At Meta, we handle hundreds of trillions of AI model executions per day. Delivering these services at a large scale requires a highly advanced and flexible infrastructure. Custom designing much of our own hardware, software, and network fabrics allows us to optimize the end-to-end experience for our AI researchers while ensuring our data centers operate efficiently. </span></p>
<p><span style="font-weight: 400;">With this in mind, we built one cluster with a remote direct memory access (RDMA) over converged Ethernet (RoCE) network fabric solution based on the </span><span style="font-weight: 400;">Arista 7800</span><span style="font-weight: 400;"> with </span><span style="font-weight: 400;">Wedge400</span><span style="font-weight: 400;"> and </span><span style="font-weight: 400;">Minipack2</span><span style="font-weight: 400;"> OCP rack switches. The other cluster features an </span><span style="font-weight: 400;">NVIDIA Quantum2 InfiniBand</span><span style="font-weight: 400;"> fabric. Both of these solutions interconnect 400 Gbps endpoints. With these two, we are able to assess the suitability and scalability of these </span><span style="font-weight: 400;">different types of interconnect for large-scale training,</span><span style="font-weight: 400;"> giving us more insights that will help inform how we design and build even larger, scaled-up clusters in the future. Through careful co-design of the network, software, and model architectures, we have successfully used both RoCE and InfiniBand clusters for large, GenAI workloads (including our ongoing training of Llama 3 on our RoCE cluster) without any network bottlenecks.</span></p>
<h3><span style="font-weight: 400;">Compute</span></h3>
<p><span style="font-weight: 400;">Both clusters are built using</span> <span style="font-weight: 400;">Grand Teton</span><span style="font-weight: 400;">, our in-house-designed, open GPU hardware platform that we’ve contributed to the Open Compute Project (OCP). Grand Teton builds on the many generations of AI systems that integrate power, control, compute, and fabric interfaces into a single chassis for better overall performance, signal integrity, and thermal performance. It provides rapid scalability and flexibility in a simplified design, allowing it to be quickly deployed into data center fleets and easily maintained and scaled. Combined with other in-house innovations like our</span> <span style="font-weight: 400;">Open Rack</span><span style="font-weight: 400;"> power and rack architecture, Grand Teton allows us to build new clusters in a way that is purpose-built for current and future applications at Meta.</span></p>
<p><span style="font-weight: 400;">We have been openly designing our GPU hardware platforms beginning with our </span><span style="font-weight: 400;">Big Sur platform in 2015</span><span style="font-weight: 400;">.</span></p>
<h3><span style="font-weight: 400;">Storage</span></h3>
<p><span style="font-weight: 400;">Storage plays an important role in AI training, and yet is one of the least talked-about aspects. As the GenAI training jobs become more multimodal over time, consuming large amounts of image, video, and text data, the need for data storage grows rapidly. The need to fit all that data storage into a performant, yet power-efficient footprint doesn’t go away though, which makes the problem more interesting.</span></p>
<p><span style="font-weight: 400;">Our storage deployment addresses the data and checkpointing needs of the AI clusters via a home-grown Linux Filesystem in Userspace (FUSE) API backed by a version of Meta’s </span><span style="font-weight: 400;">‘Tectonic’ distributed storage solution</span><span style="font-weight: 400;"> optimized for Flash media. This solution enables thousands of GPUs to save and load checkpoints in a synchronized fashion (a </span><span style="font-weight: 400;">challenge</span><span style="font-weight: 400;"> for any storage solution) while also providing a flexible and high-throughput exabyte scale storage required for data loading.</span></p>
<p><span style="font-weight: 400;">We have also partnered with </span><span style="font-weight: 400;">Hammerspace</span><span style="font-weight: 400;"> to co-develop and land a parallel network file system (NFS) deployment to meet the developer experience requirements for this AI cluster. Among other benefits, Hammerspace enables engineers to perform interactive debugging for jobs using thousands of GPUs as code changes are immediately accessible to all nodes within the environment. When paired together, the combination of our Tectonic distributed storage solution and Hammerspace enable fast iteration velocity without compromising on scale.     </span></p>
<p><span style="font-weight: 400;">The storage deployments in our GenAI clusters, both Tectonic- and Hammerspace-backed, are based on the </span><span style="font-weight: 400;">YV3 Sierra Point server platform</span><span style="font-weight: 400;">, upgraded with the latest high capacity E1.S SSD we can procure in the market today. Aside from the higher SSD capacity, the servers per rack was customized to achieve the right balance of throughput capacity per server, rack count reduction, and associated power efficiency. Utilizing the OCP servers as Lego-like building blocks, our storage layer is able to flexibly scale to future requirements in this cluster as well as in future, bigger AI clusters, while being fault-tolerant to day-to-day Infrastructure maintenance operations.</span></p>
<h3><span style="font-weight: 400;">Performance</span></h3>
<p><span style="font-weight: 400;">One of the principles we have in building our large-scale AI clusters is to maximize performance and ease of use simultaneously without compromising one for the other. This is an important principle in creating the best-in-class AI models. </span></p>
<p><span style="font-weight: 400;">As we push the limits of AI systems, the best way we can test our ability to scale-up our designs is to simply build a system, optimize it, and actually test it (while simulators help, they only go so far). In this design journey, we compared the performance seen in our small clusters and with large clusters to see where our bottlenecks are. In the graph below, AllGather collective performance is shown (as normalized bandwidth on a 0-100 scale) when a large number of GPUs are communicating with each other at message sizes where roofline performance is expected. </span></p>
<p><span style="font-weight: 400;">Our out-of-box performance for large clusters was initially poor and inconsistent, compared to optimized small cluster performance. To address this we made several changes to how our internal job scheduler schedules jobs with network topology awareness – this resulted in latency benefits and minimized the amount of traffic going to upper layers of the network. We also optimized our network routing strategy in combination with NVIDIA Collective Communications Library (NCCL) changes to achieve optimal network utilization. This helped push our large clusters to achieve great and expected performance just as our small clusters.</span></p>
<p><img decoding="async" class="size-large wp-image-21048" src="https://engineering.fb.com/wp-content/uploads/2024/03/Meta-24K-GenAi-clusters-performance.png?w=1024" alt="" width="1024" height="768" srcset="https://engineering.fb.com/wp-content/uploads/2024/03/Meta-24K-GenAi-clusters-performance.png 1999w, https://engineering.fb.com/wp-content/uploads/2024/03/Meta-24K-GenAi-clusters-performance.png?resize=916,687 916w, https://engineering.fb.com/wp-content/uploads/2024/03/Meta-24K-GenAi-clusters-performance.png?resize=768,576 768w, https://engineering.fb.com/wp-content/uploads/2024/03/Meta-24K-GenAi-clusters-performance.png?resize=1024,768 1024w, https://engineering.fb.com/wp-content/uploads/2024/03/Meta-24K-GenAi-clusters-performance.png?resize=1536,1152 1536w, https://engineering.fb.com/wp-content/uploads/2024/03/Meta-24K-GenAi-clusters-performance.png?resize=96,72 96w, https://engineering.fb.com/wp-content/uploads/2024/03/Meta-24K-GenAi-clusters-performance.png?resize=192,144 192w" sizes="(max-width: 992px) 100vw, 62vw"/>In the figure we see that small cluster performance (overall communication bandwidth and utilization) reaches 90%+ out of the box, but an unoptimized large cluster performance has very poor utilization, ranging from 10% to 90%. After we optimize the full system (software, network, etc.), we see large cluster performance return to the ideal 90%+ range.</p>
<p><span style="font-weight: 400;">In addition to software changes targeting our internal infrastructure, we worked closely with teams authoring training frameworks and models to adapt to our evolving infrastructure. For example, NVIDIA H100 GPUs open the possibility of leveraging new data types such as 8-bit floating point (FP8) for training. Fully utilizing larger clusters required investments in additional parallelization techniques and new storage solutions provided opportunities to highly optimize checkpointing across thousands of ranks to run in hundreds of milliseconds.</span></p>
<p><span style="font-weight: 400;">We also recognize debuggability as one of the major challenges in large-scale training. Identifying a problematic GPU that is stalling an entire training job becomes very difficult at a large scale. We’re building tools such as desync debug, or a distributed collective flight recorder, to expose the details of distributed training, and help identify issues in a much faster and easier way</span></p>
<p><span style="font-weight: 400;">Finally, we’re continuing to evolve PyTorch, the foundational AI framework powering our AI workloads, to make it ready for tens, or even hundreds, of thousands of GPU training. We have identified multiple bottlenecks for process group initialization, and reduced the startup time from sometimes hours down to minutes. </span></p>
<h2><span style="font-weight: 400;">Commitment to open AI innovation</span></h2>
<p><span style="font-weight: 400;">Meta maintains its commitment to open innovation in AI software and hardware. We believe open-source hardware and software will always be a valuable tool to help the industry solve problems at large scale.</span></p>
<p><span style="font-weight: 400;">Today, we continue to support</span> <span style="font-weight: 400;">open hardware innovation</span><span style="font-weight: 400;"> as a founding member of OCP, where we make designs like Grand Teton and Open Rack available to the OCP community. We also continue to be the largest and primary contributor to </span><span style="font-weight: 400;">PyTorch</span><span style="font-weight: 400;">, the AI software framework that is powering a large chunk of the industry.</span></p>
<p><span style="font-weight: 400;">We also continue to be committed to open innovation in the AI research community. We’ve launched the</span> <span style="font-weight: 400;">Open Innovation AI Research Community</span><span style="font-weight: 400;">, a partnership program for academic researchers to deepen our understanding of how to responsibly develop and share AI technologies – with a particular focus on LLMs.</span></p>
<p><span style="font-weight: 400;">An open approach to AI is not new for Meta. We’ve also launched the </span><span style="font-weight: 400;">AI Alliance</span><span style="font-weight: 400;">, a group of leading organizations across the AI industry focused on accelerating responsible innovation in AI within an open community. Our AI efforts are built on a philosophy of open science and cross-collaboration. An open ecosystem brings transparency, scrutiny, and trust to AI development and leads to innovations that everyone can benefit from that are built with safety and responsibility top of mind. </span></p>
<h2><span style="font-weight: 400;">The future of Meta’s AI infrastructure</span></h2>
<p><span style="font-weight: 400;">These two AI training cluster designs are a part of our larger roadmap for the future of AI. By the end of 2024, we’re aiming to continue to grow our infrastructure build-out that will include 350,000 NVIDIA H100s as part of a portfolio that will feature compute power equivalent to nearly 600,000 H100s.</span></p>
<p><span style="font-weight: 400;">As we look to the future, we recognize that what worked yesterday or today may not be sufficient for tomorrow’s needs. That’s why we are constantly evaluating and improving every aspect of our infrastructure, from the physical and virtual layers to the software layer and beyond. Our goal is to create systems that are flexible and reliable to support the fast-evolving new models and research.  </span></p>The post <a href="https://dailyzsocialmedianews.com/constructing-metas-genai-infrastructure-engineering-at-meta/">Constructing Meta’s GenAI Infrastructure – Engineering at Meta</a> first appeared on <a href="https://dailyzsocialmedianews.com">DAILY ZSOCIAL MEDIA NEWS</a>.]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Meta loves Python &#8211; Engineering at Meta</title>
		<link>https://dailyzsocialmedianews.com/meta-loves-python-engineering-at-meta/</link>
		
		<dc:creator><![CDATA[]]></dc:creator>
		<pubDate>Mon, 12 Feb 2024 16:58:40 +0000</pubDate>
				<category><![CDATA[Facebook]]></category>
		<category><![CDATA[Engineering]]></category>
		<category><![CDATA[loves]]></category>
		<category><![CDATA[Meta]]></category>
		<category><![CDATA[Python]]></category>
		<guid isPermaLink="false">https://dailyzsocialmedianews.com/?p=24687</guid>

					<description><![CDATA[<p>By now you’re already aware that Python 3.12 has been released. But did you know that several of its new features were developed by Meta? Meta engineer Pascal Hartig (@passy) is joined on the Meta Tech Podcast by Itamar Oren and Carl Meyer, two software engineers at Meta, to discuss their teams’ contributions to the [&#8230;]</p>
The post <a href="https://dailyzsocialmedianews.com/meta-loves-python-engineering-at-meta/">Meta loves Python – Engineering at Meta</a> first appeared on <a href="https://dailyzsocialmedianews.com">DAILY ZSOCIAL MEDIA NEWS</a>.]]></description>
										<content:encoded><![CDATA[<p></p>
<p>By now you’re already aware that Python 3.12 has been released. But did you know that several of its new features were developed by Meta?</p>
<p>Meta engineer Pascal Hartig (@passy) is joined on the Meta Tech Podcast by Itamar Oren and Carl Meyer, two software engineers at Meta, to discuss their teams’ contributions to the latest Python release, including new hooks that allow for custom JITs like Cinder, Immortal Objects, improvements to the type system, faster comprehensions, and more.</p>
<p>Learn how and why they built these new features for Python and how they worked with and engaged with the Python community.</p>
<p>Download or listen to the episode below:</p>
<p><iframe loading="lazy" style="border: none;" title="Libsyn Player" src="https://html5-player.libsyn.com/embed/episode/id/29730333/height/90/theme/custom/thumbnail/yes/direction/forward/render-playlist/no/custom-color/000000/" width="100%" height="90" scrolling="no" allowfullscreen="allowfullscreen"></iframe></p>
<p>You can also find the episode wherever you get your podcasts, including:</p>
<p>Spotify<br />Apple Podcasts<br />PocketCasts<br />Castro<br />Overcast</p>
<p>The Meta Tech Podcast is a podcast, brought to you by Meta, where we highlight the work Meta’s engineers are doing at every level – from low-level frameworks to end-user features.</p>
<p>Send us feedback on Instagram, Threads, or X.</p>
<p>And if you’re interested in learning more about career opportunities at Meta visit the Meta Careers page.</p>The post <a href="https://dailyzsocialmedianews.com/meta-loves-python-engineering-at-meta/">Meta loves Python – Engineering at Meta</a> first appeared on <a href="https://dailyzsocialmedianews.com">DAILY ZSOCIAL MEDIA NEWS</a>.]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Easy Precision Time Protocol at Meta</title>
		<link>https://dailyzsocialmedianews.com/easy-precision-time-protocol-at-meta/</link>
		
		<dc:creator><![CDATA[]]></dc:creator>
		<pubDate>Wed, 07 Feb 2024 19:33:31 +0000</pubDate>
				<category><![CDATA[Facebook]]></category>
		<category><![CDATA[Meta]]></category>
		<category><![CDATA[precision]]></category>
		<category><![CDATA[Protocol]]></category>
		<category><![CDATA[simple]]></category>
		<category><![CDATA[time]]></category>
		<guid isPermaLink="false">https://dailyzsocialmedianews.com/?p=24667</guid>

					<description><![CDATA[<div style="margin-bottom:20px;"><img width="1024" height="562" src="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/02/07193329/Simple-Precision-Time-Protocol-at-Meta.png" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="Simple Precision Time Protocol at Meta" decoding="async" loading="lazy" srcset="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/02/07193329/Simple-Precision-Time-Protocol-at-Meta.png 1024w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/02/07193329/Simple-Precision-Time-Protocol-at-Meta-300x165.png 300w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/02/07193329/Simple-Precision-Time-Protocol-at-Meta-768x422.png 768w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></div><p>While deploying Precision Time Protocol (PTP) at Meta, we’ve developed a simplified version of the protocol (Simple Precision Time Protocol – SPTP), that can offer the same level of clock synchronization as unicast PTPv2 more reliably and with fewer resources. In our own tests, SPTP boasts comparable performance to PTP, but with significant improvements in [&#8230;]</p>
The post <a href="https://dailyzsocialmedianews.com/easy-precision-time-protocol-at-meta/">Easy Precision Time Protocol at Meta</a> first appeared on <a href="https://dailyzsocialmedianews.com">DAILY ZSOCIAL MEDIA NEWS</a>.]]></description>
										<content:encoded><![CDATA[<div style="margin-bottom:20px;"><img width="1024" height="562" src="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/02/07193329/Simple-Precision-Time-Protocol-at-Meta.png" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="Simple Precision Time Protocol at Meta" decoding="async" loading="lazy" srcset="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/02/07193329/Simple-Precision-Time-Protocol-at-Meta.png 1024w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/02/07193329/Simple-Precision-Time-Protocol-at-Meta-300x165.png 300w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/02/07193329/Simple-Precision-Time-Protocol-at-Meta-768x422.png 768w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></div><p></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">While deploying Precision Time Protocol (PTP) at Meta, we’ve developed a simplified version of the protocol (Simple Precision Time Protocol – SPTP), that can offer the same level of clock synchronization as unicast PTPv2 more reliably and with fewer resources.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">In our own tests, SPTP boasts comparable performance to PTP, but with significant improvements in CPU, memory, and network utilization.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">We’ve made the source code for the SPTP client and server available on</span> <span style="font-weight: 400;">GitHub</span><span style="font-weight: 400;">.</span></li>
</ul>
<p><span style="font-weight: 400;">We’ve previously spoken in great detail about</span> <span style="font-weight: 400;">how Precision Time Protocol is being deployed at Meta</span><span style="font-weight: 400;">, including </span><span style="font-weight: 400;">the protocol itself and Meta’s precision time architecture.</span></p>
<p><span style="font-weight: 400;">As we deployed PTP into one of our data centers, we were also evaluating and testing alternative PTP clients. In doing so, we soon realized that we could eliminate a lot of complexity in the PTP protocol itself that we experienced during data center deployments while still maintaining complete hardware compatibility with our existing equipment.</span></p>
<p><span style="font-weight: 400;">This is how the idea of Simple Precision Time Protocol (SPTP) was born. </span></p>
<p><span style="font-weight: 400;">But before we dive under the hood of SPTP we should explore why the IEEE 1588</span> <span style="font-weight: 400;">G8265.1</span><span style="font-weight: 400;"> and</span> <span style="font-weight: 400;">G8275.2</span><span style="font-weight: 400;"> unicast profiles (here, we just call them PTP) weren’t a perfect fit for our data center deployment.</span></p>
<h2><span style="font-weight: 400;">PTP and its limitations</span></h2>
<h3><span style="font-weight: 400;">Excessive network communication</span></h3>
<p><span style="font-weight: 400;">A typical IEEE 1588-2019 two-step PTPv2 unicast UDP flow consists of the following exchange:</span></p>
<p>Figure 1: Typical two-step PTPv2 exchange.</p>
<p><span style="font-weight: 400;">This sequence repeats either in full or in part depending on the negotiation result. The exchange shown is one of many possible combinations. It may involve additional steps such as grant cancellation, grand cancellation acknowledgements, and so on.</span></p>
<p><span style="font-weight: 400;">The frequency of these messages may vary depending on the implementation and configuration. After completing negotiation, the frequency of some messages can change dynamically.</span></p>
<p><span style="font-weight: 400;">This design allows for a lot of flexibility, especially for less powerful equipment where resources are limited. In combination with multicast, it allows us to support a relatively large number of clients using either very old or embedded devices. For example, a PTP server can reject the request or confirm a less frequent exchange if the resources are exhausted.</span></p>
<p><span style="font-weight: 400;">This design, however, leads to excessive network communication, which is particularly visible on a</span> <span style="font-weight: 400;">time appliance</span><span style="font-weight: 400;"> serving a large number of clients.</span></p>
<h3><span style="font-weight: 400;">State machine</span></h3>
<p><span style="font-weight: 400;">Due to the “subscription” model, both the PTP client and the server have to keep the state in memory. This approach comes with the tradeoffs such as:</span><span style="font-weight: 400;"><br /></span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Excessive usage of resources such as memory and CPU.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Strict capacity limits that mean multicast support is required for large numbers of clients.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Code complexity.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Fragile state transitions.</span></li>
</ul>
<p><span style="font-weight: 400;">These issues can manifest, for example, in so-called abandoned syncs – situations where the work of a PTP client is interrupted (either forcefully stopped or crashed). Because the PTP server didn’t receive a cancellation signaling message it will keep sending sync and followup packets until the subscription expires (which may take hours). This leads to additional complexity and fragility in the system. </span></p>
<p><span style="font-weight: 400;"> </span><span style="font-weight: 400;">There are additional protocol design side effects such as:</span><span style="font-weight: 400;"><br /></span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">An almost infinite Denial of Service Attack (DoS) amplification factor.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Server-driven communication with little control by the client.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Complete trust in the validity of server timestamps.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Asynchronous path delay calculations.</span></li>
</ul>
<p><span style="font-weight: 400;">In data centers, where communication is typically driven by hundreds of thousands of clients and multicast is not supported, these tradeoffs are very limiting. </span></p>
<h2><span style="font-weight: 400;">SPTP</span></h2>
<p><span style="font-weight: 400;">True to its name, SPTP significantly reduces the number of exchanges between a server and client, allowing for much more efficient network communication.</span></p>
<h3><span style="font-weight: 400;">Exchange</span></h3>
<p><img loading="lazy" decoding="async" class="size-large wp-image-20921" src="https://engineering.fb.com/wp-content/uploads/2024/02/SPTP_image2.jpg?w=1024" alt="" width="1024" height="1024" srcset="https://engineering.fb.com/wp-content/uploads/2024/02/SPTP_image2.jpg 1080w, https://engineering.fb.com/wp-content/uploads/2024/02/SPTP_image2.jpg?resize=916,916 916w, https://engineering.fb.com/wp-content/uploads/2024/02/SPTP_image2.jpg?resize=768,768 768w, https://engineering.fb.com/wp-content/uploads/2024/02/SPTP_image2.jpg?resize=1024,1024 1024w, https://engineering.fb.com/wp-content/uploads/2024/02/SPTP_image2.jpg?resize=96,96 96w, https://engineering.fb.com/wp-content/uploads/2024/02/SPTP_image2.jpg?resize=192,192 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>Figure 2: Typical SPTP exchange.</p>
<p><span style="font-weight: 400;">In a typical SPTP exchange:</span><span style="font-weight: 400;"><br /></span></p>
<ol>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">The client sends a delay request.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">The server responds with a sync.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">The server sends a followup/announce.</span></li>
</ol>
<p><span style="font-weight: 400;">The number of network exchanges is drastically reduced. Instead of 11 different network exchanges as shown on Figure 1 and the requirement for client and server state machines for the duration of the subscription, there are only three packets exchanged and no state needs to be preserved on either side. In the simplified exchange, every packet has an important role:</span></p>
<h4><span style="font-weight: 400;">Delay request</span></h4>
<p><span style="font-weight: 400;">A delay request initiates the SPTP exchange. It’s interpreted by a server not only as a standard delay request containing the correction field (CF1) of the transparent clock, but also as a signal to respond with sync and followup packets. Just like in a two-step PTPv2 exchange, it generates T3 upon departure from the client side and T4 upon arrival on the server side.</span></p>
<p><span style="font-weight: 400;">To distinguish between a PTPv2 delay request and a SPTP delay request, the PTP profile Specific 1 flag must be set by the client.</span></p>
<h4><span style="font-weight: 400;">Sync</span></h4>
<p><span style="font-weight: 400;">In response to a delay request, a sync packet would be sent containing the T4 generated at an earlier stage. Just like in a regular two-step PTPv2 exchange, a sync packet will generate a T1 upon departure from the server side. While in transit, the correction field of the packet (CF2) is populated by the network equipment.</span></p>
<h4><span style="font-weight: 400;">Followup/announce</span></h4>
<p><span style="font-weight: 400;">Following the sync packet, an announce packet is immediately sent containing T1 generated at a previous stage. In addition, the correction filed from the Delay Request field is populated by the CF1 value collected at an earlier stage.  </span></p>
<p><span style="font-weight: 400;">The announce packet also contains typical PTPv2 information such as clock class, clock accuracy, and so on. On the client side, the arrival of the packet generates the T2 timestamp.</span></p>
<p><span style="font-weight: 400;">After a successful SPTP exchange, default two-step PTPv2 formulas for mean path delay and clock offset must be applied:</span></p>
<p style="text-align: center;"><span style="font-weight: 400;">mean_path_delay = ((T4 – T3) + (T2-T1) – CF1 -CF2)/2</span></p>
<p style="text-align: center;"><span style="font-weight: 400;">clock_offset = T2 – T1 – mean_path_delay</span></p>
<p><span style="font-weight: 400;">After every exchange the client has access to the announce message attributes such as time source, clock quality, etc., as well as the path delay and a calculated clock offset after every exchange with every server. And, because the exchange is client-driven, the offsets could be calculated at the exact same time. This avoids a situation where a client is following a faulty server and has no chance of detecting it.</span></p>
<p><img loading="lazy" decoding="async" class="size-large wp-image-20922" src="https://engineering.fb.com/wp-content/uploads/2024/02/SPTP_image3.png?w=1024" alt="" width="1024" height="553" srcset="https://engineering.fb.com/wp-content/uploads/2024/02/SPTP_image3.png 1408w, https://engineering.fb.com/wp-content/uploads/2024/02/SPTP_image3.png?resize=916,494 916w, https://engineering.fb.com/wp-content/uploads/2024/02/SPTP_image3.png?resize=768,415 768w, https://engineering.fb.com/wp-content/uploads/2024/02/SPTP_image3.png?resize=1024,553 1024w, https://engineering.fb.com/wp-content/uploads/2024/02/SPTP_image3.png?resize=96,52 96w, https://engineering.fb.com/wp-content/uploads/2024/02/SPTP_image3.png?resize=192,104 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>Figure 3: Client following faulty Time Server 2 based on announce.</p>
<h3><span style="font-weight: 400;">Reliability</span></h3>
<p><span style="font-weight: 400;">We can also provide stronger reliability guarantees by using multi-clock reliance.</span></p>
<p><span style="font-weight: 400;">In our implementation for precision time synchronization, we provide time as well as a window of uncertainty (WOU) to the consumer application via the fbclock API. As we described in a previous blog post on</span> <span style="font-weight: 400;">how PTP is being deployed at Meta</span><span style="font-weight: 400;"> the WOU is based on the observation of time sync errors for the minimum duration to have stationarity of the state of the system. </span></p>
<p><span style="font-weight: 400;">In addition, we’ve established a method based on a collection of clocks that each client can access for timing information that we call a </span><span style="font-weight: 400;">clock ensemble</span><span style="font-weight: 400;">. The clock ensemble operates in two modes, steady state and transient; where steady state is during normal operation and transient is in the case of holdover.</span></p>
<p><span style="font-weight: 400;">However, with a pool of N clocks, C, forming the clock ensemble, the question becomes which clocks to select for determining robustness and accurate timing information. Clocks that are not accurate are rejected (C_reject) and, thus, our ensemble size falls to N = C_total –  C_reject. We employ two stages, one that is based on each individual clock, and the second that acts on the collection of valid clocks in the ensemble. </span></p>
<p><span style="font-weight: 400;">The first stage observes the previous measurements of each individual clock, where the main criteria is to reject outliers in the previous states of the clock. Once this criterion threshold is exceeded, the entire clock is rejected from the valid clock ensemble pool. This is based off</span> <span style="font-weight: 400;">Chauvenet’s criterion</span><span style="font-weight: 400;">, where the criterion is a probability band that is centered on the mean of the clock outputs (assuming a normal distribution during steady state). Based on the stationarity tests, we use a sample size of 400 previous clock outputs and calculate a maximum allowable deviation. </span></p>
<p><span style="font-weight: 400;">For example:</span></p>
<p><span style="font-weight: 400;"><img decoding="async" src="https://latex.codecogs.com/png.latex?%5Cfn_cm%20D_%7Bmax%7D%5Cge%20%5Cfrac%7B%7CC%20-%20%5Cbar%7BC%7D%7C%7D%7BS_%7Bc%7D%7D"/>, where <img decoding="async" src="https://latex.codecogs.com/png.latex?%5Cfn_cm%20C"/> is the current clock output, <img decoding="async" src="https://latex.codecogs.com/png.latex?%5Cfn_cm%20%5Coverline%20C"/> </span><span style="font-weight: 400;">is the clock sample mean, and <img decoding="async" src="https://latex.codecogs.com/png.latex?%5Cfn_cm%20S_%7Bc%7D"/> </span><span style="font-weight: 400;">is the clock set standard deviation.</span></p>
<p><span style="font-weight: 400;">We find the probability that the current clock output is in disagreement with the previous 400 samples:</span></p>
<p><img decoding="async" src="https://latex.codecogs.com/png.latex?%5Cfn_cm%20%7BP_%7Bz%7D%20%3D%201%20-%20%5Cfrac%7B1%7D%7B4%28400%29%7D%20%5Capprox%200.9993%7D"/></p>
<p><span style="font-weight: 400;">Based on a window size of 400 previous samples, the maximum allowed deviation is:</span></p>
<p><img decoding="async" src="https://latex.codecogs.com/png.latex?%5Cfn_cm%20D_%7Bmax%7D%20%3D%203.2272"/></p>
<p><span style="font-weight: 400;">Now, the clock outputs are tested against this value. If they exceed the <img decoding="async" src="https://latex.codecogs.com/png.latex?%5Cfn_cm%20D_%7Bmax%7D"/> </span><span style="font-weight: 400;">they are rejected, an alert is raised, and a threshold counter is incremented. Once the rejection threshold is reached for an individual clock, this clock is entirely rejected.</span></p>
<p><span style="font-weight: 400;">Now, we enter the second stage of verifying the clock ensemble composed of the valid clocks. The second stage forms a weighted average of the non-rejected clocks in the valid clock ensemble, where each clock in the ensemble is reported as its sample size, mean, and variance. The average of the clocks’ means is the weighted average, where the weights are inversely proportional to the mean absolute deviations reported by each clock after applying Chauvenet’s criterion. </span></p>
<p><span style="font-weight: 400;">Now we can report the mean and variance of the clock ensemble, ensuring the clocks contained therewith are valid and not providing erroneous values. The confidence interval is scaled with the number of good clocks in the ensemble, where the higher the number of valid clocks out of the total clocks provides greater reliability.</span></p>
<p><span style="font-weight: 400;">For a number of hosts, we show that the distribution of clocks falls within the following heatmap:</span></p>
<p><img loading="lazy" decoding="async" class="size-large wp-image-20923" src="https://engineering.fb.com/wp-content/uploads/2024/02/SPTP_image4.png?w=640" alt="" width="640" height="480" srcset="https://engineering.fb.com/wp-content/uploads/2024/02/SPTP_image4.png 640w, https://engineering.fb.com/wp-content/uploads/2024/02/SPTP_image4.png?resize=96,72 96w, https://engineering.fb.com/wp-content/uploads/2024/02/SPTP_image4.png?resize=192,144 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>Figure 4: Offset distribution overlay of multiple clocks.</p>
<p><span style="font-weight: 400;">We calculate the variance, <img decoding="async" src="https://latex.codecogs.com/png.latex?%5Cfn_cm%20v_%7Bk%7D"/></span><span style="font-weight: 400;">, of each individual clock’s observations, then we calculate a weighted mean, <img decoding="async" src="https://latex.codecogs.com/png.latex?%5Cfn_cm%20w_%7Bk%7D"/></span><span style="font-weight: 400;">, taking into consideration the reciprocal of each clock’s variance as the weight.</span></p>
<p><img decoding="async" class="aligncenter" src="https://latex.codecogs.com/png.latex?%5Cfn_cm%20w_%7Bk%7D%20%3D%20%5Cfrac%7BC%7D%7B%5Csqrt%7B%5Cfrac%7Bv%7D%7Bk%7D%7D%7D%2C%20C%20%3D%20%5B%5Cfrac%7B1%7D%7Bk%7D%5Csum%20%5Cfrac%7B1%7D%7Bv_%7Bk%7D%7D%5D%5E%7B-1%7D"/></p>
<p><span style="font-weight: 400;">Due to independence of clocks, the variance of the weighted sum, <img decoding="async" src="https://latex.codecogs.com/png.latex?%5Cfn_cm%20w_%7Bk%7D"/></span><span style="font-weight: 400;">, is:</span></p>
<p><img decoding="async" class="aligncenter" src="https://latex.codecogs.com/png.latex?%5Cfn_cm%20%5Cfrac%7B1%7D%7Bk%7D%5Csum_%7B%7D%5E%7B%7D%5Cmathrm%7BW%7D_%7Bk%7D%5E%7B2%7Dv_%7Bk%7D%20%3D%20%5Csum_%7B%7D%5E%7B%7DC%5E%7B2%7D%20%3D%20N_%7Bw%7DC%5E%7B2%7D"/></p>
<p><span style="font-weight: 400;">In summary, we collect samples from a number of clock sources that form our clock ensemble. The overall precision and reliability of the provided data by SPTP is a function of the number of reliable and in distribution clocks forming the clock ensemble.</span></p>
<p><span style="font-weight: 400;">A future post will focus on this specifically. </span></p>
<h2><span style="font-weight: 400;">SPTP’s performance</span></h2>
<p><span style="font-weight: 400;">Let’s explore performance of the SPTP versus PTP.</span></p>
<p><span style="font-weight: 400;">Initial deployments to a single client confirmed no regression in the precision of the synchronization:</span></p>
<p><img loading="lazy" decoding="async" class="size-large wp-image-20970" src="https://engineering.fb.com/wp-content/uploads/2024/02/SPTP-Figure-5-updated.png?w=1024" alt="" width="1024" height="403" srcset="https://engineering.fb.com/wp-content/uploads/2024/02/SPTP-Figure-5-updated.png 1520w, https://engineering.fb.com/wp-content/uploads/2024/02/SPTP-Figure-5-updated.png?resize=916,360 916w, https://engineering.fb.com/wp-content/uploads/2024/02/SPTP-Figure-5-updated.png?resize=768,302 768w, https://engineering.fb.com/wp-content/uploads/2024/02/SPTP-Figure-5-updated.png?resize=1024,403 1024w, https://engineering.fb.com/wp-content/uploads/2024/02/SPTP-Figure-5-updated.png?resize=96,38 96w, https://engineering.fb.com/wp-content/uploads/2024/02/SPTP-Figure-5-updated.png?resize=192,76 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>Figure 5: Clock offset after switching from ptp4l and SPTP.</p>
<p><span style="font-weight: 400;">Repeating the same measurement after migration to SPTP produces a very similar result, only marginally different due to a statistical error:</span></p>
<p><img loading="lazy" decoding="async" class="size-large wp-image-20925" src="https://engineering.fb.com/wp-content/uploads/2024/02/SPTP_image6.png?w=1024" alt="" width="1024" height="574" srcset="https://engineering.fb.com/wp-content/uploads/2024/02/SPTP_image6.png 1999w, https://engineering.fb.com/wp-content/uploads/2024/02/SPTP_image6.png?resize=580,326 580w, https://engineering.fb.com/wp-content/uploads/2024/02/SPTP_image6.png?resize=916,514 916w, https://engineering.fb.com/wp-content/uploads/2024/02/SPTP_image6.png?resize=768,431 768w, https://engineering.fb.com/wp-content/uploads/2024/02/SPTP_image6.png?resize=1024,574 1024w, https://engineering.fb.com/wp-content/uploads/2024/02/SPTP_image6.png?resize=1536,861 1536w, https://engineering.fb.com/wp-content/uploads/2024/02/SPTP_image6.png?resize=96,54 96w, https://engineering.fb.com/wp-content/uploads/2024/02/SPTP_image6.png?resize=192,108 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>Figure 6: P99.99 offset collected from over 100000 SPTP clients.</p>
<p><span style="font-weight: 400;">With large-scale deployment of our implementations, we can confirm resource utilization improvements.</span></p>
<p><span style="font-weight: 400;">We noticed that due to the difference in multi-server support, the performance gains vary significantly depending on the number of tracked time servers.</span></p>
<p><span style="font-weight: 400;">For example, with just a single time appliance serving the entire network there are significant improvements across the board. Most notably over 40 percent CPU, 70 percent memory, and 50 percent network utilization improvements:</span></p>
<p><img loading="lazy" decoding="async" class="size-large wp-image-20926" src="https://engineering.fb.com/wp-content/uploads/2024/02/SPTP_image7.png?w=1024" alt="" width="1024" height="574" srcset="https://engineering.fb.com/wp-content/uploads/2024/02/SPTP_image7.png 1234w, https://engineering.fb.com/wp-content/uploads/2024/02/SPTP_image7.png?resize=580,326 580w, https://engineering.fb.com/wp-content/uploads/2024/02/SPTP_image7.png?resize=916,514 916w, https://engineering.fb.com/wp-content/uploads/2024/02/SPTP_image7.png?resize=768,431 768w, https://engineering.fb.com/wp-content/uploads/2024/02/SPTP_image7.png?resize=1024,574 1024w, https://engineering.fb.com/wp-content/uploads/2024/02/SPTP_image7.png?resize=96,54 96w, https://engineering.fb.com/wp-content/uploads/2024/02/SPTP_image7.png?resize=192,108 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>Figure 7: Packets per second with ptp4l (green) vs SPTP (blue).</p>
<h2><span style="font-weight: 400;">The next steps for SPTP at Meta</span></h2>
<p><span style="font-weight: 400;">Since SPTP can offer the exact same level of synchronization with a lot fewer resources consumed, we think it’s a reasonable alternative to the existing unicast PTP profiles.</span></p>
<p><span style="font-weight: 400;">In a large-scale data center deployment, it can help to combat frequently changing network paths and create savings in terms of network traffic, memory usage, and number of CPU cycles.</span></p>
<p><span style="font-weight: 400;">It will also eliminate a lot of complexity inherited from multicast PTP profiles, which is not necessarily useful in the trusted networks of the modern data centers.</span></p>
<p><span style="font-weight: 400;">It should be noted that SPTP may not be suitable for systems that still require subscription and authentication. But this could be solved by using PTP TLVs (type-length-value). </span></p>
<p><span style="font-weight: 400;">Additionally, by removing the need for subscriptions, it’s possible to observe multiple clocks – which allows us to provide higher reliability by comparing the time sync from multiple sources at the end node.</span></p>
<p><span style="font-weight: 400;">SPTP can offer significantly simpler, faster, and more reliable synchronization. Similar to G.8265.1 and G.8275.2 it provides excellent synchronization quality using a different set of parameters. Simplification comes with certain tradeoffs, such as missing signaling messages, that users need to be aware of and decide which profile is the best for them.</span></p>
<p><span style="font-weight: 400;">Having it standardized and assigned a unicast profile identifier will encourage wider support, adoption, and popularization of PTP as a default precise time synchronization protocol.</span></p>
<p><span style="font-weight: 400;">The source code for the SPTP client and the server can be accessed on our</span> <span style="font-weight: 400;">GitHub page</span><span style="font-weight: 400;">.</span></p>
<h2><span style="font-weight: 400;">Acknowledgements</span></h2>
<p>We would like to thank Alexander Bulimov, Vadim Fedorenko, and Mike Lambeta for their help implementing the code and the math for this article.</p>The post <a href="https://dailyzsocialmedianews.com/easy-precision-time-protocol-at-meta/">Easy Precision Time Protocol at Meta</a> first appeared on <a href="https://dailyzsocialmedianews.com">DAILY ZSOCIAL MEDIA NEWS</a>.]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>DotSlash: Simplified executable deployment &#8211; Engineering at Meta</title>
		<link>https://dailyzsocialmedianews.com/dotslash-simplified-executable-deployment-engineering-at-meta/</link>
		
		<dc:creator><![CDATA[]]></dc:creator>
		<pubDate>Tue, 06 Feb 2024 15:15:06 +0000</pubDate>
				<category><![CDATA[Facebook]]></category>
		<category><![CDATA[deployment]]></category>
		<category><![CDATA[DotSlash]]></category>
		<category><![CDATA[Engineering]]></category>
		<category><![CDATA[executable]]></category>
		<category><![CDATA[Meta]]></category>
		<category><![CDATA[Simplified]]></category>
		<guid isPermaLink="false">https://dailyzsocialmedianews.com/?p=24656</guid>

					<description><![CDATA[<div style="margin-bottom:20px;"><img width="643" height="1024" src="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/02/06151504/DotSlash-Simplified-executable-deployment-Engineering-at-Meta.png" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="DotSlash: Simplified executable deployment - Engineering at Meta" decoding="async" loading="lazy" srcset="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/02/06151504/DotSlash-Simplified-executable-deployment-Engineering-at-Meta.png 643w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/02/06151504/DotSlash-Simplified-executable-deployment-Engineering-at-Meta-188x300.png 188w" sizes="auto, (max-width: 643px) 100vw, 643px" /></div><p>We’ve open sourced DotSlash, a tool that makes large executables available in source control with a negligible impact on repository size, thus avoiding I/O-heavy clone operations. With DotSlash, a set of platform-specific executables is replaced with a single script containing descriptors for the supported platforms. DotSlash handles transparently fetching, decompressing, and verifying the appropriate remote [&#8230;]</p>
The post <a href="https://dailyzsocialmedianews.com/dotslash-simplified-executable-deployment-engineering-at-meta/">DotSlash: Simplified executable deployment – Engineering at Meta</a> first appeared on <a href="https://dailyzsocialmedianews.com">DAILY ZSOCIAL MEDIA NEWS</a>.]]></description>
										<content:encoded><![CDATA[<div style="margin-bottom:20px;"><img width="643" height="1024" src="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/02/06151504/DotSlash-Simplified-executable-deployment-Engineering-at-Meta.png" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="DotSlash: Simplified executable deployment - Engineering at Meta" decoding="async" loading="lazy" srcset="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/02/06151504/DotSlash-Simplified-executable-deployment-Engineering-at-Meta.png 643w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/02/06151504/DotSlash-Simplified-executable-deployment-Engineering-at-Meta-188x300.png 188w" sizes="auto, (max-width: 643px) 100vw, 643px" /></div><p></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">We’ve open sourced </span><span style="font-weight: 400;">DotSlash</span><span style="font-weight: 400;">, a tool that makes large executables available in source control with a negligible impact on repository size, thus avoiding I/O-heavy clone operations.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">With DotSlash, a set of platform-specific executables is replaced with a single script containing descriptors for the supported platforms. DotSlash handles transparently fetching, decompressing, and verifying the appropriate remote artifact for the current operating system and CPU.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">At Meta, the overwhelming majority of DotSlash files are generated and committed to source control via automation, so we are also releasing a complementary GitHub Action to assemble a comparable setup outside of Meta.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">DotSlash is written in Rust for performance and is cross-platform.</span></li>
</ul>
<p><span style="font-weight: 400;">At Meta, we have a vast array of first-party and third-party command line tools that need to be available across a diverse range of developer environments. Reliably getting the appropriate version of each tool to the right place can be a challenging task.</span></p>
<p><span style="font-weight: 400;">For example, the source code for many of our first-party tools lives alongside the projects that leverage them inside our </span><span style="font-weight: 400;">massive monorepo</span><span style="font-weight: 400;">. For such tools, the standard practice is to use </span><span style="font-weight: 400;">buck2 run</span><span style="font-weight: 400;"> to build and run executables from source, as necessary. This has the advantage that tools and the projects that use them can be updated atomically in a single commit.</span></p>
<p><span style="font-weight: 400;">While we use extensive caching and </span><span style="font-weight: 400;">remote execution</span><span style="font-weight: 400;"> to provide our developers with fast builds, there will always be cases where buck2 run is going to be considerably slower than running the prebuilt binary directly. While we leverage a </span><span style="font-weight: 400;">virtual filesystem</span><span style="font-weight: 400;"> that reduces the drawbacks of checking large binaries into source control compared to a traditional physical filesystem, there are still pathological cases that are best avoided by keeping such files out of the repository in the first place. (This practice also eliminates a large class of code provenance issues.)</span></p>
<p><span style="font-weight: 400;">Further, not everything we use is built from source, nor do all of our tools live in source control. For example, there is the case of buck2 </span><span style="font-weight: 400;">itself, which needs to be pre-built for developers and readily available on the $PATH</span><span style="font-weight: 400;"> for convenience. For core developer tools like </span><span style="font-weight: 400;">Buck2</span><span style="font-weight: 400;"> and </span><span style="font-weight: 400;">Sapling</span><span style="font-weight: 400;">, we use a </span><span style="font-weight: 400;">Chef recipe</span> to deploy new versions, installing them in /usr/local/bin<span style="font-weight: 400;"> (or somewhere within the appropriate %PATH$% </span><span style="font-weight: 400;">on Windows) across a variety of developer environments.</span></p>
<p><span style="font-weight: 400;">While this approach is reasonable for commonly-used executables, it is not a great fit for the long tail of tools. That is, while it might be convenient to install everything a developer might need in /usr/local/bin</span><span style="font-weight: 400;"> by default, this could easily add up to tens or hundreds of gigabytes of disk, very little of which will end up being executed, in practice. In turn, this makes Chef runs more expensive and prone to failure.</span></p>
<h2><span style="font-weight: 400;">Introducing DotSlash</span></h2>
<p><span style="font-weight: 400;">DotSlash attempts to solve many of the problems described in the previous section. While </span><span style="font-weight: 400;">we do not claim it is a silver bullet</span><span style="font-weight: 400;">, we have found it to be the right solution for many of our internal use cases. At Meta, DotSlash is executed </span><span style="font-weight: 400;">hundreds of millions of times per day</span><span style="font-weight: 400;"> to deliver a mix of first-party and third-party tools to end-user developers as well as hermetic build environments.</span></p>
<p><span style="font-weight: 400;">The idea is fairly simple: we replace the contents of a set of platform-specific, heavyweight executables with a single lightweight text file that can be read by the dotslash </span><span style="font-weight: 400;">command line tool (which must be installed on the user’s $PATH</span><span style="font-weight: 400;">). We call such a file a </span>DotSlash file<span style="font-weight: 400;">. It contains the information DotSlash needs to fetch and run the executable it replaces for the host platform. By convention, a DotSlash file maintains the name of the original file rather than calling attention to itself via a custom file extension. Instead, it aspires to be a transparent wrapper for the original executable. To that end, a DotSlash file is </span><span style="font-weight: 400;">required</span><span style="font-weight: 400;"> to start with #!/usr/bin/env dotslash</span><span style="font-weight: 400;"> (even on Windows) to help maintain this illusion.</span></p>
<p><span style="font-weight: 400;">The following is a hypothetical DotSlash file named node </span><span style="font-weight: 400;">that is designed to run v18.19.0 of Node.js. Note that users across x86 Linux, x86 macOS, and ARM macOS can all run the </span><span style="font-weight: 400;">same</span><span style="font-weight: 400;"> DotSlash file, as DotSlash will take care of doing the work to select the appropriate executable for the host on which it is being run. In this way, DotSlash simplifies the work of cross-platform releases: </span></p>
<p><span style="font-weight: 400;">In this example, the workflow DotSlash runs through when executing node</span><span style="font-weight: 400;"> looks like: </span></p>
<p><span style="font-weight: 400;">See the </span><span style="font-weight: 400;">How DotSlash Works</span><span style="font-weight: 400;"> documentation for details.</span></p>
<p><span style="font-weight: 400;">Because of how </span><span style="font-weight: 400; font-family: 'courier new', courier; color: #339966;">#!</span><span style="font-weight: 400;"> works on Mac and Linux, when a user runs ./node &#8211;version, </span><span style="font-weight: 400;">the invocation effectively becomes dotslash ./node &#8211;version</span><span style="font-weight: 400;">. DotSlash requires that its first argument is a file that starts with #!/usr/bin/env dotslash</span><span style="font-weight: 400;">, as mentioned above. Once it verifies the header, it uses a </span><span style="font-weight: 400;">lenient JSON parser</span><span style="font-weight: 400;"> to read the rest of the file. DotSlash finds the entry in the &#8220;platforms&#8221;</span><span style="font-weight: 400;"> </span><span style="font-weight: 400;">section that corresponds to the host it is running on.</span></p>
<p><span style="font-weight: 400;">DotSlash uses the information in this entry and hashes it to compute a corresponding file path (that doubles as a key) in the user’s local DotSlash cache. DotSlash attempts to exec </span><span style="font-weight: 400;">the corresponding file, replacing argv0 </span><span style="font-weight: 400;">with the path to the DotSlash file and forwarding the remaining command line arguments (&#8211;version</span><span style="font-weight: 400;">, in this example) to the exec </span><span style="font-weight: 400;">invocation.</span></p>
<p><span style="font-weight: 400;">If the target executable is in the cache, the user immediately runs Node.js as originally intended. In the event of a cache miss (indicated by exec </span><span style="font-weight: 400;">failing with ENOENT</span><span style="font-weight: 400;">), DotSlash uses the information from the DotSlash file to determine the URL it should use to fetch the artifact containing the executable as well as the size and digest information it should use to verify the contents. If this succeeds, the verified artifact is atomically mv‘d </span><span style="font-weight: 400;">into the appropriate location in the DotSlash cache and the exec </span><span style="font-weight: 400;">invocation is performed again. Note that DotSlash uses </span><span style="font-weight: 400;">advisory file locking</span><span style="font-weight: 400;"> to avoid making duplicate requests even if DotSlash files requiring the same artifact are run concurrently.</span></p>
<p><span style="font-weight: 400;">Note that it is common to have multiple DotSlash files refer to the same artifact, </span><span style="font-weight: 400;">such as a </span><span style="font-weight: 400;">.tar.zst</span><span style="font-weight: 400;"> file</span><span style="font-weight: 400;">, while each DotSlash file maps to a distinct entry within the archive. For example, suppose </span><span style="font-weight: 400;">node-v18.19.0-darwin-arm64.tar.gz</span><span style="font-weight: 400;"> is a compressed </span><span style="font-weight: 400;">tar</span><span style="font-weight: 400;"> file that contains many entries, including node , npm , and npx</span><span style="font-weight: 400;">. The DotSlash file for </span><span style="font-weight: 400;">node</span><span style="font-weight: 400;"> would be as follows:</span></p>
<p>#!/usr/bin/env dotslash</p>
<p>{<br />
  &#8220;name&#8221;: &#8220;node-v18.19.0&#8221;,<br />
  &#8220;platforms&#8221;: {<br />
    &#8220;macos-aarch64&#8221;: {<br />
      &#8220;size&#8221;: 40660307,<br />
      &#8220;hash&#8221;: &#8220;blake3&#8221;,<br />
      &#8220;digest&#8221;: &#8220;6e2ca33951e586e7670016dd9e503d028454bf9249d5ff556347c3d98c347c34&#8221;,<br />
      // Note the difference from the previous example where &#8220;format&#8221;: &#8220;zst&#8221; has been<br />
      // replaced with &#8220;format&#8221;: &#8220;tar.gz&#8221;, which specifies what type of decompression<br />
      // logic to use as well as the path within the decompressed archive to run when<br />
      // this DotSlash file is executed.<br />
      &#8220;format&#8221;: &#8220;tar.gz&#8221;,<br />
      // Assuming node-v18.19.0-darwin-arm64.tar.gz contains node, npm, and npx in the<br />
      // node-v18.19.0-darwin-arm64/bin/ folder within the the archive, the following<br />
      // is the only line that has to change in the DotSlash file that represents<br />
      // those other executables.<br />
      &#8220;path&#8221;: &#8220;node-v18.19.0-darwin-arm64/bin/node&#8221;,<br />
      &#8220;providers&#8221;: [<br />
        {<br />
          &#8220;url&#8221;: &#8220;https://nodejs.org/dist/v18.19.0/node-v18.19.0-darwin-arm64.tar.gz&#8221;<br />
        }<br />
      ]<br />
    },<br />
    /* other platforms omitted for brevity */<br />
  }<br />
}</p>
<p><span style="font-weight: 400;">As noted in the comments, the only change in the DotSlash files for npm </span><span style="font-weight: 400;">and npx </span><span style="font-weight: 400;">would be the &#8220;path&#8221;</span><span style="font-weight: 400;"> entry. Because the artifact for all three DotSlash files would be the same, whichever DotSlash file was run first would fetch the artifact and put it in the cache whereas all subsequent runs of </span><span style="font-weight: 400;">any</span><span style="font-weight: 400;"> of the three DotSlash files would leverage the cached entry.</span></p>
<p><span style="font-weight: 400;">This technique is often used to ensure that a set of complementary executables is released together. Further, because the archive will be decompressed in its own directory, it may also contain resource files (or library files, such as .dll </span><span style="font-weight: 400;">files that need to live alongside .exe </span><span style="font-weight: 400;">files on Windows) that will be unpacked using the directory structure specified by the archive. This also makes DotSlash a good fit for distributing executables that are not binaries, but trees of script files, which is common for Node.js or Python.</span></p>
<h2><span style="font-weight: 400;">Generating DotSlash files</span></h2>
<p><span style="font-weight: 400;">At Meta, most DotSlash files are produced as part of an automated build pipeline. Our continuous integration (CI) system supports special configuration for DotSlash jobs where a user must specify:</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">A set of builds to run (these can span multiple platforms).</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">The resulting generated artifacts to publish to an internal blobstore.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">The DotSlash files in source control to update with entries for the new artifacts.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">The conditions under which the job should be triggered (this is analogous to </span><span style="font-weight: 400;">workflow triggers on GitHub</span><span style="font-weight: 400;">).</span></li>
</ul>
<p><span style="font-weight: 400;">The result of such a job is a proposed change to the codebase containing the updated DotSlash files. At Meta, we call such a change a </span><span style="font-weight: 400;">“diff,”</span><span style="font-weight: 400;"> though on GitHub, this is known as a </span><span style="font-weight: 400;">pull request</span><span style="font-weight: 400;">. Just like an ordinary human-authored diff at Meta, putting it up for review triggers a number of jobs that include linters, automated tests, and other tools that provide signal on the proposed change. For a DotSlash diff, if all of the signals come back clean, the diff is automatically committed to the codebase without further human intervention.</span></p>
<p><img loading="lazy" decoding="async" class="size-large wp-image-20910" src="https://engineering.fb.com/wp-content/uploads/2024/02/DotSlash_2.png?w=572" alt="" width="572" height="560" srcset="https://engineering.fb.com/wp-content/uploads/2024/02/DotSlash_2.png 572w, https://engineering.fb.com/wp-content/uploads/2024/02/DotSlash_2.png?resize=96,94 96w, https://engineering.fb.com/wp-content/uploads/2024/02/DotSlash_2.png?resize=192,188 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>See the Generating DotSlash Files at Meta documentation for details.</p>
<p><span style="font-weight: 400;">The script we use to generate DotSlash files injects metadata about the build job that makes it straightforward to trace the provenance of the underlying artifacts. The following is a hypothetical example of a generated DotSlash file for the </span><span style="font-weight: 400;">CodeCompose</span><span style="font-weight: 400;"> LSP built from source at a specific commit in clang-opt </span><span style="font-weight: 400;">mode. Note the &#8220;metadata&#8221; </span><span style="font-weight: 400;">entries in the DotSlash file will be ignored by the dotslash</span><span style="font-weight: 400;"> CLI, but we include them as structured data so they can be parsed by other tools to facilitate programmatic audits:</span></p>
<p>#!/usr/bin/env dotslash</p>
<p>// @generated SignedSource<<d8621e8ccbd7a595a3018e6a070be9c0>><br />
// https://yarnpkg.com/package?name=signedsource can be used to<br />
// generate and verify the above signature to flag tampering<br />
// in generated code.</p>
<p>{<br />
  &#8220;name&#8221;: &#8220;code-compose-lsp&#8221;,<br />
  // Added by automation.<br />
  &#8220;metadata&#8221;: {<br />
    &#8220;build-info&#8221;: {<br />
      &#8220;job-repo&#8221;: &#8220;fbsource&#8221;,<br />
      &#8220;job-src&#8221;: &#8220;dotslash/code-compose-lsp.star&#8221;,<br />
      // It is considered best practice to build the artifacts for<br />
      // all platforms from the same commit within a DotSlash file.<br />
      &#8220;commit&#8221;: {<br />
        &#8220;repo&#8221;: &#8220;fbsource&#8221;,<br />
        &#8220;scm&#8221;: &#8220;sapling&#8221;,<br />
        &#8220;hash&#8221;: &#8220;0f9e3d9e189bf393f7f9d0b6879361cd76fcdcd0&#8221;,<br />
        &#8220;date&#8221;: &#8220;2024-01-03 20:07:54 PST&#8221;,<br />
        &#8220;timestamp&#8221;: 1704341274<br />
      }<br />
    }<br />
  },<br />
  &#8220;platforms&#8221;: {<br />
    &#8220;linux-x86_64&#8221;: {<br />
      &#8220;size&#8221;: 2740736,<br />
      &#8220;hash&#8221;: &#8220;blake3&#8221;,<br />
      &#8220;digest&#8221;: &#8220;fc8a3ade56a97a6e73469ade1575e8f8e33fda99fbf6df429d555e480d6453d0&#8221;,<br />
      &#8220;format&#8221;: &#8220;zst&#8221;,<br />
      &#8220;providers&#8221;: [<br />
        {<br />
          &#8220;type&#8221;: &#8220;meta-cas&#8221;,<br />
          &#8220;key&#8221;: &#8220;fc8a3ade56a97a6e73469ade1575e8f8e33fda99fbf6df429d555e480d6453d0:2740736&#8221;<br />
        }<br />
      ]<br />
      // Added by automation.<br />
      &#8220;metadata&#8221;: {<br />
        &#8220;build-command&#8221;: [<br />
          &#8220;buck2&#8221;,<br />
          &#8220;build&#8221;,<br />
          &#8220;&#8211;config-file&#8221;,<br />
          &#8220;//buildconfig/clang-opt&#8221;,<br />
          &#8220;//codecompose/lsp/cli:code-compose-lsp&#8221;<br />
        ]<br />
      }<br />
    },<br />
    // additional platforms&#8230;<br />
  }<br />
}</p>
<p><span style="font-weight: 400;">Without DotSlash, a developer would have to run buck2 build &#8211;config-file //buildconfig/clang-opt //codecompose/lsp/cli:code-compose-lsp</span><span style="font-weight: 400;"> to build and run the LSP from source, which could be a slow operation depending on the size of the build, the state of the build cache, etc. With DotSlash, the developer can run the optimized LSP as quickly as they can fetch and decompress it from the specified URL, which is likely much faster than doing a build.</span></p>
<p><span style="font-weight: 400;">Another thing you may have noticed about this example is that the &#8220;key&#8221;</span><span style="font-weight: 400;"> is not an ordinary URL, but an identifier that happens to be the concatenation of the BLAKE3 hash and the size of the specified artifact. This is because &#8220;type&#8221;: &#8220;meta-cas&#8221; </span><span style="font-weight: 400;">indicates that this artifact must be fetched via a </span><span style="font-weight: 400;">custom provider</span><span style="font-weight: 400;"> in DotSlash, which is specialized fetching logic built into DotSlash that has its own identifier scheme. In this case, the artifact would be fetched from Meta’s in-house content-addressable storage (CAS) system, which uses the artifact hash+size as a key.</span></p>
<p><span style="font-weight: 400;">While we do not provide the code for the meta-cas</span><span style="font-weight: 400;"> provider in the open source version of DotSlash, we do include one custom provider out-of-the-box beyond the default http </span><span style="font-weight: 400;">provider.</span></p>
<h2><span style="font-weight: 400;">Using DotSlash with GitHub releases</span></h2>
<p><span style="font-weight: 400;">While DotSlash is generally useful for fetching an executable from an arbitrary URL and running it, we have found the combination of DotSlash and CI to be particularly powerful. To that end, we include custom tooling to facilitate generating DotSlash files for GitHub releases. To ensure DotSlash can fetch artifacts from private GitHub repositories as well as GitHub Enterprise instances, DotSlash includes a custom provider for GitHub releases that includes an appropriate authentication token when fetching artifacts.</span></p>
<p><span style="font-weight: 400;">For example, suppose you have existing workflows for building your release artifacts and publish them via gh release upload</span><span style="font-weight: 400;">. For simplicity, let’s assume these are named linux-release</span><span style="font-weight: 400;">, macos-release</span><span style="font-weight: 400;">, and windows-release</span><span style="font-weight: 400;">. To create a single DotSlash file that includes the artifacts from all three platforms you would introduce a new </span><span style="font-weight: 400;">GitHub Action</span><span style="font-weight: 400;"> that leverages the workflow_run</span><span style="font-weight: 400;"> trigger so it fires whenever one of these release workflows succeeds. (Note that </span><span style="font-weight: 400;">GitHub’s documentation states</span><span style="font-weight: 400;">: “You can’t use workflow_run</span><span style="font-weight: 400;"> to chain together more than three levels of workflows,” so check the depth of your workflow graph if your workflow is not firing.)</span></p>
<p><span style="font-weight: 400;">The .yml</span><span style="font-weight: 400;"> file to define the new workflow would look like this:</span></p>
<p>name: Generate DotSlash File</p>
<p>on:<br />
  workflow_run:<br />
    # These must match the names of the workflows that publish<br />
    # artifacts to your GitHub Release.<br />
    workflows: [linux-release, macos-release, windows-release]<br />
    types:<br />
      &#8211; completed</p>
<p>jobs:<br />
  create-dotslash-file:<br />
    name: Generating DotSlash File<br />
    runs-on: ubuntu-latest<br />
    if: ${{ github.event.workflow_run.conclusion == &#8216;success&#8217; }}<br />
    steps:<br />
      &#8211; uses: facebook/dotslash-publish-release@v1<br />
        env:<br />
          # This is necessary because the action uses<br />
          # `gh release upload` to publish the generated DotSlash file(s)<br />
          # as part of the release.<br />
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}<br />
        with:<br />
          # Additional file that lives in your repo that defines<br />
          # how your DotSlash file(s) should be generated.<br />
          config: .github/workflows/dotslash-config.json<br />
          # Tag for the release to to target.<br />
          tag: ${{ github.event.workflow_run.head_branch }}</p>
<p><span style="font-weight: 400;">Because </span><span style="font-weight: 400;">inputs to GitHub Actions</span><span style="font-weight: 400;"> are limited to string values, facebook/dotslash-publish-release</span><span style="font-weight: 400;"> takes config</span><span style="font-weight: 400;">, which is a path to a JSON file in the repo that supports a rich set of configuration options for generating the DotSlash files. The other required input is the ID of the release, which in GitHub, </span><span style="font-weight: 400;">is defined by a Git tag</span><span style="font-weight: 400;">. When the action is run, it will check to see whether all of the artifacts specified in the config are present in the release, and if so, will generate the appropriate DotSlash files and add them to the release.</span></p>
<p><span style="font-weight: 400;">For example, consider an open source project like </span><span style="font-weight: 400;">Hermes</span><span style="font-weight: 400;"> where a </span><span style="font-weight: 400;">release</span><span style="font-weight: 400;"> includes a number of platform-specific .tar.gz </span><span style="font-weight: 400;">files, each containing a handful of executables (hermes</span><span style="font-weight: 400;">, hdb</span><span style="font-weight: 400;">, etc.). To create a separate an individual DotSlash file for each executable, the JSON configuration for the action would be:</span></p>
<p>{<br />
  &#8220;outputs&#8221;: {</p>
<p>    &#8220;hermes&#8221;: {<br />
      &#8220;platforms&#8221;: {<br />
        &#8220;macos-x86_64&#8221;: {<br />
          &#8220;regex&#8221;: &#8220;^hermes-cli-darwin-&#8220;,<br />
          &#8220;path&#8221;: &#8220;hermes&#8221;<br />
        },<br />
        &#8220;macos-aarch64&#8221;: {<br />
          &#8220;regex&#8221;: &#8220;^hermes-cli-darwin-&#8220;,<br />
          &#8220;path&#8221;: &#8220;hermes&#8221;<br />
        },<br />
        &#8220;linux-x86_64&#8221;: {<br />
          &#8220;regex&#8221;: &#8220;^hermes-cli-linux-&#8220;,<br />
          &#8220;path&#8221;: &#8220;hermes&#8221;<br />
        },<br />
        &#8220;windows-x86_64&#8221;: {<br />
          &#8220;regex&#8221;: &#8220;^hermes-cli-windows-&#8220;,<br />
          &#8220;path&#8221;: &#8220;hermes.exe&#8221;<br />
        }<br />
      }<br />
    },</p>
<p>    &#8220;hdb&#8221;: {<br />
      &#8220;platforms&#8221;: {<br />
        &#8220;macos-x86_64&#8221;: {<br />
          &#8220;regex&#8221;: &#8220;^hermes-cli-darwin-&#8220;,<br />
          &#8220;path&#8221;: &#8220;hdb&#8221;<br />
        },<br />
        &#8220;macos-aarch64&#8221;: {<br />
          &#8220;regex&#8221;: &#8220;^hermes-cli-darwin-&#8220;,<br />
          &#8220;path&#8221;: &#8220;hdb&#8221;<br />
        },<br />
        &#8220;linux-x86_64&#8221;: {<br />
          &#8220;regex&#8221;: &#8220;^hermes-cli-linux-&#8220;,<br />
          &#8220;path&#8221;: &#8220;hdb&#8221;<br />
        },<br />
        &#8220;windows-x86_64&#8221;: {<br />
          &#8220;regex&#8221;: &#8220;^hermes-cli-windows-&#8220;,<br />
          &#8220;path&#8221;: &#8220;hdb.exe&#8221;<br />
        }<br />
      }<br />
    },</p>
<p>    // Additional entries for hvm, hbcdump, and hermesc&#8230;</p>
<p>  }<br />
}&#8217;</p>
<p><span style="font-weight: 400;">Each entry in &#8220;outputs&#8221;</span><span style="font-weight: 400;"> corresponds to the name of a DotSlash file that will be added to the release. The &#8220;platforms&#8221;</span><span style="font-weight: 400;"> for each entry defines the &#8220;platforms&#8221;</span><span style="font-weight: 400;"> that should be present in the generated DotSlash file. The action uses the &#8220;regex&#8221;</span><span style="font-weight: 400;"> to identify the file in the GitHub release that should be used as the backing artifact for the entry. Assuming the artifact is an “archive” of some sort (.tar.gz</span><span style="font-weight: 400;">, .tar.zst</span><span style="font-weight: 400;">, etc.), the &#8220;path&#8221;</span><span style="font-weight: 400;"> indicates the path within the archive that the DotSlash file should run.</span></p>
<p><span style="font-weight: 400;">In this particular case, Hermes does not provide an ARM-specific binary for macOS, so the &#8220;macos-aarch64&#8221;</span><span style="font-weight: 400;"> entry is the same as the &#8220;macos-x86_64&#8221;</span><span style="font-weight: 400;">one. Though if that changes in the future, a simple update to &#8220;regex&#8221;</span><span style="font-weight: 400;"> to distinguish the two binaries is all that is needed.</span></p>
<p><span style="font-weight: 400;">Note that the action will take responsibility for computing the digest for each binary. In this example, the resulting DotSlash file for hermes </span><span style="font-weight: 400;">would be:</span></p>
<p>#!/usr/bin/env dotslash</p>
<p>{<br />
  &#8220;name&#8221;: &#8220;hermes&#8221;,<br />
  &#8220;platforms&#8221;: {<br />
    &#8220;linux-x86_64&#8221;: {<br />
      &#8220;size&#8221;: 47099598,<br />
      &#8220;hash&#8221;: &#8220;blake3&#8221;,<br />
      &#8220;digest&#8221;: &#8220;8d2c1bcefc2ce6e278167495810c2437e8050780ebb4da567811f1d754ad198c&#8221;,<br />
      &#8220;format&#8221;: &#8220;tar.gz&#8221;,<br />
      &#8220;path&#8221;: &#8220;hermes&#8221;,<br />
      &#8220;providers&#8221;: [<br />
        {<br />
          &#8220;url&#8221;: &#8220;https://github.com/facebook/hermes/releases/download/v0.12.0/hermes-cli-linux-v0.12.0.tar.gz&#8221;<br />
        },<br />
        {<br />
          &#8220;type&#8221;: &#8220;github-release&#8221;,<br />
          &#8220;repo&#8221;: &#8220;facebook/hermes&#8221;,<br />
          &#8220;tag&#8221;: &#8220;v0.12.0&#8221;,<br />
          &#8220;name&#8221;: &#8220;hermes-cli-linux-v0.12.0.tar.gz&#8221;<br />
        }<br />
      ],<br />
    },<br />
    // additional platforms&#8230;<br />
  }<br />
}</p>
<p><span style="font-weight: 400;">Note that there are two entries in the &#8220;providers&#8221;</span><span style="font-weight: 400;"> section for the Linux artifact. When DotSlash fetches an artifact, it will try the providers in order until one succeeds. Regardless of which provider is used, the downloaded binary will be verified against the specified &#8220;hash&#8221;</span><span style="font-weight: 400;">, &#8220;digest&#8221;</span><span style="font-weight: 400;">,  and &#8220;size&#8221;</span><span style="font-weight: 400;"> values.</span></p>
<p><span style="font-weight: 400;">In this case, the first provider is an ordinary, public URL that can be fetched using curl &#8211;location</span><span style="font-weight: 400;">, but the second is an example of a </span><span style="font-weight: 400;">custom provider</span><span style="font-weight: 400;"> discussed earlier. The &#8220;type&#8221;: &#8220;github-release&#8221; </span><span style="font-weight: 400;">line indicates that the </span><span style="font-weight: 400;">GitHub</span><span style="font-weight: 400;"> provider for DotSlash should be used, which shells out to the </span><span style="font-weight: 400;">GitHub CLI</span><span style="font-weight: 400;"> (gh</span><span style="font-weight: 400;">, which must be installed separately from DotSlash) to fetch the artifact instead of </span><span style="font-weight: 400;">curl</span><span style="font-weight: 400;">. Because facebook/hermes</span><span style="font-weight: 400;"> is a public GitHub repository, the first provider should be sufficient here. However, if the repository were private and the fetch required authentication, we would expect the first provider to fail and DotSlash would fallback to the GitHub provider. Assuming the user had run gh auth login </span><span style="font-weight: 400;">in advance to configure credentials for the specified repo, DotSlash would be able to fetch the artifact using gh release download</span><span style="font-weight: 400;">.</span></p>
<p><span style="font-weight: 400;">By publishing DotSlash files as part of GitHub releases, users can copy them to their own repositories to “vendor in” a specific version of your tool with minimal effect on their repository size, regardless of how large your releases might be.</span></p>
<h2><span style="font-weight: 400;">Try DotSlash Today </span></h2>
<p><span style="font-weight: 400;">Visit the </span><span style="font-weight: 400;">DotSlash site for</span><span style="font-weight: 400;"> more in-depth documentation and technical details. The site includes instructions on </span><span style="font-weight: 400;">Installing DotSlash</span><span style="font-weight: 400;"> so you can start playing with it firsthand. </span></p>
<p><span style="font-weight: 400;">We also encourage you to </span><span style="font-weight: 400;">check out the DotSlash source code</span><span style="font-weight: 400;"> and provide feedback via </span><span style="font-weight: 400;">GitHub issues</span><span style="font-weight: 400;">. We look forward to hearing from you!</span></p>The post <a href="https://dailyzsocialmedianews.com/dotslash-simplified-executable-deployment-engineering-at-meta/">DotSlash: Simplified executable deployment – Engineering at Meta</a> first appeared on <a href="https://dailyzsocialmedianews.com">DAILY ZSOCIAL MEDIA NEWS</a>.]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Lazy is the brand new quick: How Lazy Imports and Cinder speed up machine studying at Meta</title>
		<link>https://dailyzsocialmedianews.com/lazy-is-the-brand-new-quick-how-lazy-imports-and-cinder-speed-up-machine-studying-at-meta/</link>
		
		<dc:creator><![CDATA[]]></dc:creator>
		<pubDate>Thu, 18 Jan 2024 20:16:31 +0000</pubDate>
				<category><![CDATA[Facebook]]></category>
		<category><![CDATA[Accelerate]]></category>
		<category><![CDATA[Cinder]]></category>
		<category><![CDATA[Fast]]></category>
		<category><![CDATA[imports]]></category>
		<category><![CDATA[Lazy]]></category>
		<category><![CDATA[learning]]></category>
		<category><![CDATA[Machine]]></category>
		<category><![CDATA[Meta]]></category>
		<guid isPermaLink="false">https://dailyzsocialmedianews.com/?p=24539</guid>

					<description><![CDATA[<p>At Meta, the quest for faster model training has yielded an exciting milestone: the adoption of Lazy Imports and the Python Cinder runtime. The outcome? Up to 40 percent time to first batch (TTFB) improvements, along with a 20 percent reduction in Jupyter kernel startup times. This advancement facilitates swifter experimentation capabilities and elevates the [&#8230;]</p>
The post <a href="https://dailyzsocialmedianews.com/lazy-is-the-brand-new-quick-how-lazy-imports-and-cinder-speed-up-machine-studying-at-meta/">Lazy is the brand new quick: How Lazy Imports and Cinder speed up machine studying at Meta</a> first appeared on <a href="https://dailyzsocialmedianews.com">DAILY ZSOCIAL MEDIA NEWS</a>.]]></description>
										<content:encoded><![CDATA[<p></p>
<ul>
<li><span style="font-weight: 400;">At Meta, the quest for faster model training has yielded an exciting milestone: the adoption of Lazy Imports and the Python Cinder runtime. </span></li>
<li><span style="font-weight: 400;">The outcome? Up to 40 percent time to first batch (TTFB) improvements, along with a </span>20 percent<span style="font-weight: 400;"> reduction in Jupyter kernel startup times. </span></li>
<li><span style="font-weight: 400;">This advancement facilitates swifter experimentation capabilities and elevates the ML developer experience (DevX).</span></li>
</ul>
<p><span style="font-weight: 400;">Time is of the essence in the realm of machine learning (ML) development. The milliseconds it takes for an ML model to transition from conceptualization to processing the initial training data can dramatically impact productivity and experimentation.</span></p>
<p><span style="font-weight: 400;">At Meta, we’ve been able to significantly improve our model training times, as well as our overall developer experience (DevX) by adopting </span><span style="font-weight: 400;">Lazy Imports</span><span style="font-weight: 400;"> and the </span><span style="font-weight: 400;">Python Cinder runtime</span><span style="font-weight: 400;">. </span></p>
<h2><span style="font-weight: 400;">The time to first batch challenge</span></h2>
<p><span style="font-weight: 400;">Batch processing has been a game changer in ML development. It handles large volumes of data in groups (or batches) and allows us to train models, optimize parameters, and perform inference more effectively and swiftly.</span></p>
<p><span style="font-weight: 400;">But ML training workloads are notorious for their sluggish starts. When we look to improve our batch processing speeds, time to first batch (TTFB) comes into focus. TTFB is the time elapsed from the moment you hit the “start” button on your ML model training to the point when the first batch of data enters the model for processing. It is a critical metric that determines the speed at which an ML model goes from idle to learning. TTFB can vary widely due to factors like infrastructure overhead and scheduling delays. But reducing TTFB means reducing the development waiting times that can often feel like an eternity to engineers – waiting periods that can quickly amass as expensive resource wastage.</span></p>
<p><span style="font-weight: 400;">In the pursuit of faster TTFB, Meta set its sights on reducing this overhead, and Lazy Imports with Cinder emerged as a promising solution.</span></p>
<h2><span style="font-weight: 400;">The magic of Lazy Imports</span></h2>
<p><span style="font-weight: 400;">Previously, ML developers explored alternatives like the standard </span><span style="font-weight: 400; font-family: 'courier new', courier;">LazyLoader</span><span style="font-weight: 400;"> in </span><span style="font-weight: 400; font-family: 'courier new', courier;">importlib</span><span style="font-weight: 400;"> or </span><span style="font-weight: 400;">lazy-import</span><span style="font-weight: 400;">`, to defer explicit imports until necessary. While promising, these approaches are limited by their much narrower scope, and the need to manually select which dependencies will be lazily imported (often with suboptimal results). Using these approaches demands meticulous codebase curation and a fair amount of code refactoring.</span></p>
<p><span style="font-weight: 400;">In contrast, </span><span style="font-weight: 400;">Cinder’s Lazy Imports</span><span style="font-weight: 400;"> approach is a comprehensive and aggressive strategy that goes beyond the limitations of other libraries and delivers significant enhancements to the developer experience. Instead of painstakingly handpicking imports to become lazy, Cinder simplifies and accelerates the startup process by transparently deferring all imports as a default action, resulting in a much broader and more powerful deferral of imports until the exact moment they’re needed. Once in place, this method ensures that developers no longer have to navigate the maze of selective import choices. With it, developers can bid farewell to the need of typing-only imports and the use of </span><span style="font-weight: 400; font-family: 'courier new', courier;">TYPE_CHECKING</span><span style="font-weight: 400;">. It allows a simple </span><span style="font-weight: 400;"><span style="font-family: 'courier new', courier;">from __future__ import</span> annotations</span><span style="font-weight: 400;"> declaration at the beginning of a file to delay type evaluation, while Lazy Imports defer the actual import statements until required. The combined effect of these optimizations reduced costly runtime imports and further streamlined the development workflow.</span></p>
<p><span style="font-weight: 400;">The Lazy Imports solution delivers. Meta’s initiative to enhance ML development has involved rolling out Cinder with Lazy Imports to several workloads, including our ML frameworks and Jupyter kernels, producing lightning-fast startup times, improved experimentation capabilities, reduced infrastructure overhead, and code that is a breeze to maintain. We’re pleased to share that Meta’s key AI workloads have experienced noteworthy improvements, with TTFB wins reaching up to 40 percent. Resulting time savings can vary from seconds to minutes per run.</span></p>
<p><span style="font-weight: 400;">These impressive results translate to a substantial boost in the efficiency of ML workflows, since they mean ML developers can get to the model training phase more swiftly.</span></p>
<h2><span style="font-weight: 400;">The challenges of adopting Lazy Imports</span></h2>
<p><span style="font-weight: 400;">While Lazy Imports’ approach significantly improved ML development, it was not all a bed of roses. We encountered several hurdles that tested our resolve and creativity.</span></p>
<h3><span style="font-weight: 400;">Compatibility</span></h3>
<p><span style="font-weight: 400;">One of the primary challenges we grappled with was the compatibility of existing libraries with Lazy Imports. Libraries such as PyTorch, Numba, NumPy, and SciPy, among others, did not seamlessly align with the deferred module loading approach. These libraries often rely on import side effects and other patterns that do not play well with Lazy Imports. The order in which Python imports could change or be postponed, often led to side effects failing to register classes, functions, and operations correctly. This required painstaking troubleshooting to identify and address import cycles and discrepancies.</span></p>
<h3><span style="font-weight: 400;">Balancing performance versus dependability</span></h3>
<p><span style="font-weight: 400;">We also had to strike the right balance between performance optimization and code dependability. While Lazy Imports significantly reduced TTFB and enhanced resource utilization, it also introduced a considerable semantic change in the way Python imports work that could make the codebase less intuitive. Achieving the perfect equilibrium was a constant consideration, and was ensured by limiting the impact of semantic changes to only the relevant parts that could be thoroughly tested.</span></p>
<p><span style="font-weight: 400;">Ensuring seamless interaction with the existing codebase required meticulous testing and adjustments. The task was particularly intricate when dealing with complex, multifaceted ML models, where the implications of deferred imports needed to be thoroughly considered. We ultimately opted for enabling Lazy Imports only during the startup and preparation phases and disabling it before the first batch started.</span></p>
<h3><span style="font-weight: 400;">Learning curve</span></h3>
<p><span style="font-weight: 400;">Adopting new paradigms like Lazy Imports can introduce a learning curve for the development team. Training ML engineers, infra engineers, and system engineers to adapt to the new approach, understand its nuances, and implement it effectively is a process in itself.</span></p>
<h2><span style="font-weight: 400;">What is next for Lazy Imports at Meta?</span></h2>
<p><span style="font-weight: 400;">The adoption of Lazy Imports and Cinder represented a meaningful enhancement in Meta’s AI key workloads. It came with its share of ups and downs, but ultimately demonstrated that Lazy Imports can be a game changer in expediting ML development. The TTFB wins, DevX improvements, and reduced kernel startup times are all tangible results of this initiative. With Lazy Imports, Meta’s ML developers are now equipped to work more efficiently, experiment more rapidly, and achieve results faster.</span></p>
<p><span style="font-weight: 400;">While we’ve achieved remarkable success with the adoption of Lazy Imports, our journey is far from over. So, what’s next for us? Here’s a glimpse into our future endeavors:</span></p>
<h3><span style="font-weight: 400;">Streamlining developer onboarding</span></h3>
<p><span style="font-weight: 400;">The learning curve associated with Lazy Imports can be a challenge for newcomers. We’re investing in educational resources and onboarding materials to make it easier for developers to embrace this game-changing approach. </span></p>
<h3><span style="font-weight: 400;">Enhancing tooling</span></h3>
<p><span style="font-weight: 400;">Debugging code with deferred imports can be intricate. We’re working on developing tools and techniques that simplify the debugging and troubleshooting process, ensuring that developers can quickly identify and resolve issues.</span></p>
<h3><span style="font-weight: 400;">Community collaboration</span></h3>
<p><span style="font-weight: 400;">The power of Lazy Imports lies in its adaptability and versatility. We’re eager to collaborate with the Python community – sharing insights, best practices, and addressing challenges together. Building a robust community that helps supporting paradigms and patterns that play well with Lazy Imports is one of our future priorities.</span></p>The post <a href="https://dailyzsocialmedianews.com/lazy-is-the-brand-new-quick-how-lazy-imports-and-cinder-speed-up-machine-studying-at-meta/">Lazy is the brand new quick: How Lazy Imports and Cinder speed up machine studying at Meta</a> first appeared on <a href="https://dailyzsocialmedianews.com">DAILY ZSOCIAL MEDIA NEWS</a>.]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>How Meta is advancing GenAI</title>
		<link>https://dailyzsocialmedianews.com/how-meta-is-advancing-genai/</link>
		
		<dc:creator><![CDATA[]]></dc:creator>
		<pubDate>Thu, 11 Jan 2024 17:36:56 +0000</pubDate>
				<category><![CDATA[Facebook]]></category>
		<category><![CDATA[Advancing]]></category>
		<category><![CDATA[GenAI]]></category>
		<category><![CDATA[Meta]]></category>
		<guid isPermaLink="false">https://dailyzsocialmedianews.com/?p=24496</guid>

					<description><![CDATA[<p>What’s going on with generative AI (GenAI) at Meta? And what does the future have in store? In this episode of the Meta Tech Podcast, Meta engineer Pascal Hartig (@passy) speaks with Devi Parikh, an AI research director at Meta. They cover a wide range of topics, including the history and future of GenAI and the most [&#8230;]</p>
The post <a href="https://dailyzsocialmedianews.com/how-meta-is-advancing-genai/">How Meta is advancing GenAI</a> first appeared on <a href="https://dailyzsocialmedianews.com">DAILY ZSOCIAL MEDIA NEWS</a>.]]></description>
										<content:encoded><![CDATA[<p></p>
<p>What’s going on with generative AI (GenAI) at Meta? And what does the future have in store?</p>
<p>In this episode of the Meta Tech Podcast, Meta engineer Pascal Hartig (@passy) speaks with Devi Parikh, an AI research director at Meta. They cover a wide range of topics, including the history and future of GenAI and the most interesting research papers that have come out recently.</p>
<p>And, of course, they discuss some of Meta’s latest GenAI innovations, including:</p>
<ul>
<li>Audiobox, a foundational model for generating sound and soundscapes using natural language prompts.</li>
<li>Emu, Meta’s first foundational model for image generation.</li>
<li>Purple Llama, a suite of tools to help developers safely and responsibly deploy GenAI models.</li>
</ul>
<p>Download or listen to the episode below:</p>
<p><iframe loading="lazy" style="border: none;" title="Libsyn Player" src="https://html5-player.libsyn.com/embed/episode/id/29182733/height/90/theme/custom/thumbnail/yes/direction/forward/render-playlist/no/custom-color/000000/" width="100%" height="90" scrolling="no" allowfullscreen="allowfullscreen"></iframe></p>
<p>You can also find the episode on various podcast platforms:</p>
<p>Spotify<br />PocketCasts<br />Apple Podcasts<br />Google Podcasts</p>
<p>The Meta Tech Podcast is a podcast, brought to you by Meta, where we highlight the work Meta’s engineers are doing at every level – from low-level frameworks to end-user features.</p>
<p>Send us feedback on Instagram, Threads, or X.</p>
<p>And if you’re interested in AI career opportunities at Meta visit the Meta Careers page.</p>The post <a href="https://dailyzsocialmedianews.com/how-meta-is-advancing-genai/">How Meta is advancing GenAI</a> first appeared on <a href="https://dailyzsocialmedianews.com">DAILY ZSOCIAL MEDIA NEWS</a>.]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
