<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>faster | DAILY ZSOCIAL MEDIA NEWS</title>
	<atom:link href="https://dailyzsocialmedianews.com/tag/faster/feed/" rel="self" type="application/rss+xml" />
	<link>https://dailyzsocialmedianews.com</link>
	<description>ALL ABOUT DAILY ZSOCIAL MEDIA NEWS</description>
	<lastBuildDate>Mon, 29 Jan 2024 20:26:43 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.7.1</generator>

<image>
	<url>https://dailyzsocialmedianews.com/wp-content/uploads/2020/12/cropped-DAILY-ZSOCIAL-MEDIA-NEWS-e1607166156946-32x32.png</url>
	<title>faster | DAILY ZSOCIAL MEDIA NEWS</title>
	<link>https://dailyzsocialmedianews.com</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Bettering machine studying iteration velocity with sooner utility construct and packaging</title>
		<link>https://dailyzsocialmedianews.com/bettering-machine-studying-iteration-velocity-with-sooner-utility-construct-and-packaging/</link>
		
		<dc:creator><![CDATA[]]></dc:creator>
		<pubDate>Mon, 29 Jan 2024 20:26:43 +0000</pubDate>
				<category><![CDATA[Facebook]]></category>
		<category><![CDATA[Application]]></category>
		<category><![CDATA[Build]]></category>
		<category><![CDATA[faster]]></category>
		<category><![CDATA[Improving]]></category>
		<category><![CDATA[iteration]]></category>
		<category><![CDATA[learning]]></category>
		<category><![CDATA[Machine]]></category>
		<category><![CDATA[packaging]]></category>
		<category><![CDATA[speed]]></category>
		<guid isPermaLink="false">https://dailyzsocialmedianews.com/?p=24615</guid>

					<description><![CDATA[<div style="margin-bottom:20px;"><img width="812" height="800" src="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/01/29202641/Improving-machine-learning-iteration-speed-with-faster-application-build-and.png" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="Improving machine learning iteration speed with faster application build and packaging" decoding="async" fetchpriority="high" srcset="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/01/29202641/Improving-machine-learning-iteration-speed-with-faster-application-build-and.png 812w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/01/29202641/Improving-machine-learning-iteration-speed-with-faster-application-build-and-300x296.png 300w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/01/29202641/Improving-machine-learning-iteration-speed-with-faster-application-build-and-768x757.png 768w" sizes="(max-width: 812px) 100vw, 812px" /></div><p>Slow build times and inefficiencies in packaging and distributing execution files were costing our ML/AI engineers a significant amount of time while working on our training stack. By addressing these issues head-on, we were able to reduce this overhead by double-digit percentages.  In the fast-paced world of AI/ML development, it’s crucial to ensure that our [&#8230;]</p>
The post <a href="https://dailyzsocialmedianews.com/bettering-machine-studying-iteration-velocity-with-sooner-utility-construct-and-packaging/">Bettering machine studying iteration velocity with sooner utility construct and packaging</a> first appeared on <a href="https://dailyzsocialmedianews.com">DAILY ZSOCIAL MEDIA NEWS</a>.]]></description>
										<content:encoded><![CDATA[<div style="margin-bottom:20px;"><img width="812" height="800" src="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/01/29202641/Improving-machine-learning-iteration-speed-with-faster-application-build-and.png" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="Improving machine learning iteration speed with faster application build and packaging" decoding="async" srcset="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/01/29202641/Improving-machine-learning-iteration-speed-with-faster-application-build-and.png 812w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/01/29202641/Improving-machine-learning-iteration-speed-with-faster-application-build-and-300x296.png 300w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/01/29202641/Improving-machine-learning-iteration-speed-with-faster-application-build-and-768x757.png 768w" sizes="(max-width: 812px) 100vw, 812px" /></div><p></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Slow build times and inefficiencies in packaging and distributing execution files were costing our ML/AI engineers a significant amount of time while working on our training stack.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">By addressing these issues head-on, we were able to reduce this overhead by double-digit percentages. </span></li>
</ul>
<p><span style="font-weight: 400;">In the fast-paced world of AI/ML development, it’s crucial to ensure that our infrastructure can keep up with the increasing demands and needs of our ML engineers, whose workflows include checking out code, writing code, building, packaging, and verification.</span></p>
</p>
<p><span style="font-weight: 400;">In our efforts to maintain efficiency and productivity while empowering our ML/AI engineers to deliver cutting-edge solutions, we found two major challenges that needed to be addressed head-on: slow builds and inefficiencies in packaging and distributing executable files.</span></p>
<p><span style="font-weight: 400;">The frustrating problem of slow builds often arises when ML engineers work on older (“cold”) revisions for which our build infrastructure doesn’t maintain a high cache hit rate, requiring us to repeatedly rebuild and relink many components. Moreover, build non-determinism further contributes to rebuilding by introducing inefficiencies and producing different outputs for the same source code, making previously cached results unusable.</span></p>
<p><span style="font-weight: 400;">Executable packaging and distribution was another significant challenge because, historically, most ML Python executables were represented as</span> <span style="font-weight: 400;">XAR files</span><span style="font-weight: 400;"> (self-contained executables) and it is not always possible to leverage OSS layer-based solutions efficiently (see more details below). Unfortunately, creating such executables can be computationally costly, especially when dealing with a large number of files or substantial file sizes. Even if a developer modifies only a few Python files, a full XAR file reassembly and distribution is often required, causing delays for the executable to be executed on remote machines.</span></p>
<p><span style="font-weight: 400;">Our goal in improving build speed was to minimize the need for extensive rebuilding. To accomplish this, we streamlined the build graph by reducing dependency counts, mitigated the challenges posed by build non-determinism, and maximized the utilization of built artifacts.</span></p>
<p><span style="font-weight: 400;">Simultaneously, our efforts in packaging and distribution aimed to introduce incrementality support, thereby eliminating the time-consuming overhead associated with XAR creation and distribution.</span></p>
<h2><span style="font-weight: 400;">How we improved build speeds</span></h2>
<p><span style="font-weight: 400;">To make builds faster we wanted to ensure that we built as little as possible by addressing non-determinism and eliminating unused code and dependencies.</span></p>
<p><span style="font-weight: 400;">We identified two sources of build non-determinism:</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1">Non-determinism in tooling.<span style="font-weight: 400;"> Some compilers, such as Clang, Rustc, and NVCC, can produce different binary files for the same input, leading to non-deterministic results. Tackling these tooling non-determinism issues proved challenging, as they often required extensive root cause analysis and time-consuming fixes.</span></li>
<li style="font-weight: 400;" aria-level="1">Non-determinism in source code and build rules.<span style="font-weight: 400;"> Developers, whether intentionally or unintentionally, introduced non-determinism by incorporating things like temporary directories, random values, or timestamps into build rules code. Addressing these issues posed a similar challenge, demanding a substantial investment of time to identify and fix.</span></li>
</ul>
<p><span style="font-weight: 400;">Thanks to</span> <span style="font-weight: 400;">Buck2</span><span style="font-weight: 400;">,</span><span style="font-weight: 400;"> which sends nearly all of the build actions to the</span> <span style="font-weight: 400;">Remote Execution (RE) service</span><span style="font-weight: 400;">, we have been able to successfully implement non-determinism mitigation within RE. Now we provide consistent outputs for identical actions, paving the way for the adoption of a warm and stable revision for ML development. In practice, this approach will eliminate build times in many cases.</span></p>
<p><span style="font-weight: 400;">Though removing the build process from the critical path of ML engineers might not be possible in all cases, we understand how important it is to handle dependencies for controlling build times. As dependencies naturally increased, we made enhancements to our tools for managing them better. These improvements helped us find and remove many unnecessary dependencies, making build graph analysis and overall build times much better. For example, we removed GPU code from the final binary when it wasn’t needed and figured out ways to identify which Python modules are actually used and cut native code using linker maps.</span></p>
<h2><span style="font-weight: 400;">Adding incrementality for executable distribution</span></h2>
<p><span style="font-weight: 400;">A typical self-executable Python binary, when unarchived, is represented by thousands of Python files (.py and/or .pyc), substantial native libraries, and the Python interpreter. The cumulative result is a multitude of files, often numbering in the hundreds of thousands, with a total size reaching tens of gigabytes.</span></p>
<p><span style="font-weight: 400;">Engineers </span><span style="font-weight: 400;">spend a significant amount of time</span><span style="font-weight: 400;"> dealing with incremental builds where packaging and fetching overhead of such a large executable surpasses the build time. In response to this challenge, we implemented a new solution for the packaging and distribution of Python executables – the Content Addressable Filesystem (CAF).</span></p>
<p><span style="font-weight: 400;">The primary strength of CAF lies in its ability to operate incrementally during content addressable file packaging and fetching stages:</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1">Packaging<span style="font-weight: 400;">: By adopting a content-aware approach, CAF can intelligently skip redundant uploads of files already present in Content Addressable Storage (CAS), whether as part of a different executable or the same executable with a different version.</span></li>
<li style="font-weight: 400;" aria-level="1">Fetching<span style="font-weight: 400;">: CAF maintains a cache on the destination host, ensuring that only content not already present in the cache needs to be downloaded.</span></li>
</ul>
<p><span style="font-weight: 400;">To optimize the efficiency of this system, we deploy a CAS daemon on the majority of Meta’s data center hosts. The CAS daemon assumes multiple responsibilities, including maintaining the local cache on the host (materialization into the cache and cache GC) and organizing a P2P network with other CAS daemon instances using</span> <span style="font-weight: 400;">Owl</span><span style="font-weight: 400;">, our high-fanout distribution system for large data objects. This P2P network enables direct content fetching from other CAS daemon instances, significantly reducing latency and storage bandwidth capacity.</span></p>
<p><span style="font-weight: 400;">In the case of CAF, an executable is defined by a flat manifest file detailing all symlinks, directories, hard links, and files, along with their digest and attributes. This manifest implementation allows us to deduplicate all unique files across executables and implement a smart affinity/routing mechanism for scheduling, thereby minimizing the amount of content that needs to be downloaded by maximizing local cache utilization.</span></p>
<p><span style="font-weight: 400;">While the concept may bear some resemblance to what</span> <span style="font-weight: 400;">Docker achieves with OverlayFS</span><span style="font-weight: 400;">, our approach differs significantly. Organizing proper layers is not always feasible in our case due to the number of executables with diverse dependencies. In this context, layering becomes less efficient and its organization becomes more complex to achieve. Additionally direct access to files is essential for P2P support.</span></p>
<p><span style="font-weight: 400;">We opted for</span> <span style="font-weight: 400;">Btrfs</span><span style="font-weight: 400;"> as our filesystem because of its</span> <span style="font-weight: 400;">compression</span><span style="font-weight: 400;"> and ability to write compressed storage data directly to extents, which bypasses redundant decompression and compression and</span> <span style="font-weight: 400;">Copy-on-write (COW)</span><span style="font-weight: 400;"> capabilities. These attributes allow us to maintain executables on block devices with a total size similar to those represented as XAR files, share the same files from cache across executables, and implement a highly efficient COW mechanism that, when needed, only copies affected file extents.</span></p>
<h2><span style="font-weight: 400;">LazyCAF and enforcing uniform revisions: Areas for further ML iteration improvements</span></h2>
<p><span style="font-weight: 400;">The improvements we implemented have proven highly effective, drastically reducing the overhead and significantly elevating the efficiency of our ML engineers. Faster build times and more efficient packaging and distribution of executables have reduced overhead by double-digit percentages.</span></p>
<p><span style="font-weight: 400;">Yet, our journey to slash build overhead doesn’t end here. We’ve identified several promising improvements that we aim to deliver soon. In our investigation into our ML workflows, we discovered that only a fraction of the entire executable content is utilized in certain scenarios. Recognizing that, we intend  to start working on optimizations to fetch executable parts on demand, thereby significantly reducing materialization time and minimizing the overall disk footprint.</span></p>
<p><span style="font-weight: 400;">We can also further accelerate the development process by enforcing uniform revisions. We plan to enable all our ML engineers to operate on the same revision, which will improve the cache hit ratios of our build. This move will further increase the percentage of incremental builds since most of the artifacts will be cached.</span></p>The post <a href="https://dailyzsocialmedianews.com/bettering-machine-studying-iteration-velocity-with-sooner-utility-construct-and-packaging/">Bettering machine studying iteration velocity with sooner utility construct and packaging</a> first appeared on <a href="https://dailyzsocialmedianews.com">DAILY ZSOCIAL MEDIA NEWS</a>.]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
