<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Velox | DAILY ZSOCIAL MEDIA NEWS</title>
	<atom:link href="https://dailyzsocialmedianews.com/tag/velox/feed/" rel="self" type="application/rss+xml" />
	<link>https://dailyzsocialmedianews.com</link>
	<description>ALL ABOUT DAILY ZSOCIAL MEDIA NEWS</description>
	<lastBuildDate>Tue, 20 Feb 2024 17:13:37 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.7.1</generator>

<image>
	<url>https://dailyzsocialmedianews.com/wp-content/uploads/2020/12/cropped-DAILY-ZSOCIAL-MEDIA-NEWS-e1607166156946-32x32.png</url>
	<title>Velox | DAILY ZSOCIAL MEDIA NEWS</title>
	<link>https://dailyzsocialmedianews.com</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Aligning Velox and Apache Arrow: In the direction of composable knowledge administration</title>
		<link>https://dailyzsocialmedianews.com/aligning-velox-and-apache-arrow-in-the-direction-of-composable-knowledge-administration/</link>
		
		<dc:creator><![CDATA[]]></dc:creator>
		<pubDate>Tue, 20 Feb 2024 17:13:37 +0000</pubDate>
				<category><![CDATA[Facebook]]></category>
		<category><![CDATA[Aligning]]></category>
		<category><![CDATA[Apache]]></category>
		<category><![CDATA[Arrow]]></category>
		<category><![CDATA[composable]]></category>
		<category><![CDATA[Data]]></category>
		<category><![CDATA[Management]]></category>
		<category><![CDATA[Velox]]></category>
		<guid isPermaLink="false">https://dailyzsocialmedianews.com/?p=24723</guid>

					<description><![CDATA[<div style="margin-bottom:20px;"><img width="913" height="427" src="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/02/20171335/Aligning-Velox-and-Apache-Arrow-Towards-composable-data-management.png" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="Aligning Velox and Apache Arrow: Towards composable data management" decoding="async" fetchpriority="high" srcset="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/02/20171335/Aligning-Velox-and-Apache-Arrow-Towards-composable-data-management.png 913w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/02/20171335/Aligning-Velox-and-Apache-Arrow-Towards-composable-data-management-300x140.png 300w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/02/20171335/Aligning-Velox-and-Apache-Arrow-Towards-composable-data-management-768x359.png 768w" sizes="(max-width: 913px) 100vw, 913px" /></div><p>We’ve partnered with Voltron Data and the Arrow community to align and converge Apache Arrow with Velox, Meta’s open source execution engine. Apache Arrow 15 includes three new format layouts developed through this partnership: StringView, ListView, and Run-End-Encoding (REE). This new convergence helps Meta and the larger community build data management systems that are unified, [&#8230;]</p>
The post <a href="https://dailyzsocialmedianews.com/aligning-velox-and-apache-arrow-in-the-direction-of-composable-knowledge-administration/">Aligning Velox and Apache Arrow: In the direction of composable knowledge administration</a> first appeared on <a href="https://dailyzsocialmedianews.com">DAILY ZSOCIAL MEDIA NEWS</a>.]]></description>
										<content:encoded><![CDATA[<div style="margin-bottom:20px;"><img width="913" height="427" src="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/02/20171335/Aligning-Velox-and-Apache-Arrow-Towards-composable-data-management.png" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="Aligning Velox and Apache Arrow: Towards composable data management" decoding="async" srcset="https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/02/20171335/Aligning-Velox-and-Apache-Arrow-Towards-composable-data-management.png 913w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/02/20171335/Aligning-Velox-and-Apache-Arrow-Towards-composable-data-management-300x140.png 300w, https://social-media-news.s3.amazonaws.com/wp-content/uploads/2024/02/20171335/Aligning-Velox-and-Apache-Arrow-Towards-composable-data-management-768x359.png 768w" sizes="(max-width: 913px) 100vw, 913px" /></div><p></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">We’ve partnered with</span> <span style="font-weight: 400;">Voltron Data</span><span style="font-weight: 400;"> and the Arrow community to align and converge Apache Arrow with </span><span style="font-weight: 400;">Velox</span><span style="font-weight: 400;">, Meta’s open source execution engine.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Apache Arrow 15 includes three new format layouts developed through this partnership: StringView, ListView, and Run-End-Encoding (REE).</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">This new convergence helps Meta and the larger community build data management systems that are unified, more efficient, and composable.</span></li>
</ul>
<p><span style="font-weight: 400;">Meta’s Data Infrastructure teams have been</span> <span style="font-weight: 400;">rethinking how data management systems are designed</span><span style="font-weight: 400;">. We want to</span> <span style="font-weight: 400;">make our data management systems more composable</span><span style="font-weight: 400;"> – meaning that instead of individually developing systems as monoliths we identify common components, factor them out as reusable libraries, and leverage common APIs and standards to increase the interoperability between them. </span></p>
<p><span style="font-weight: 400;">As we decompose our large, monolithic systems into a more modular stack of reusable components, open standards, such as</span> <span style="font-weight: 400;">Apache Arrow</span><span style="font-weight: 400;">, </span><span style="font-weight: 400;">play an important role for interoperability of these components. To further our efforts in creating a more unified data landscape for our systems as well as those in the larger community, we’ve partnered with</span> <span style="font-weight: 400;">Voltron Data</span><span style="font-weight: 400;"> and the Arrow community to converge Apache Arrow’s open source columnar layouts with Velox, Meta’s open source execution engine.</span></p>
<p><span style="font-weight: 400;">The result combines the efficiency and agility offered by Velox with the widely-used Apache standard.  </span></p>
<h2><span style="font-weight: 400;">Why we need a composable data management system</span></h2>
<p><span style="font-weight: 400;">Meta’s data engines support large-scale workloads that include processing large datasets offline (ETL), interactive dashboard generation, ad hoc data exploration, and stream processing. More recently, a variety of feature engineering, data preprocessing, and training systems were built to support our rapidly expanding AI/ML infrastructure. To ensure our engineering teams can efficiently maintain and enhance these engines as our products evolve, Meta has started a series of projects aimed at increasing our engineering efficiency by minimizing the duplication of work, improving the experience of internal data users through more consistent semantics across these engines, and, ultimately, accelerating the pace of innovation in data management. </span></p>
<h2><span style="font-weight: 400;">An introduction to Velox</span></h2>
<p><span style="font-weight: 400;">Velox</span><span style="font-weight: 400;"> is the first project in our composable data management system program. It’s a unified execution engine, implemented as a C++ library, aimed at replacing the very processing core of many of these data management systems – their execution engine.</span></p>
<p><span style="font-weight: 400;">Velox improves the efficiency of these systems by providing a unified, state-of-the-art implementation of features and optimizations that were previously only available in individual engines. It also improves the engineering efficiency of our organization since these features can now be written once, in a single library, and be (re-)used everywhere.</span></p>
<p><span style="font-weight: 400;">Velox is currently in different stages of integration in more than 10 of Meta’s data systems. We have observed</span> <span style="font-weight: 400;">3-10x efficiency improvements</span><span style="font-weight: 400;"> in integrations with well-known systems in the industry like Apache Spark and Presto. </span></p>
<p><span style="font-weight: 400;">We </span><span style="font-weight: 400;">open-sourced Velox in 2022</span><span style="font-weight: 400;">. Today, it is developed in collaboration with more than 200 individual contributors around the world from more than 20 companies. </span></p>
<h2><span style="font-weight: 400;">Open standards and Apache Arrow</span></h2>
<p><span style="font-weight: 400;">In order to enable interoperability with other components, a composable data management system has to understand common storage (file) formats, network serialization protocols, table APIs, and have a unified way of expressing computation. Oftentimes these components have to directly share in-memory datasets with each other, for example, when transferring data across language boundaries (C++ to Java or Python) for efficient UDF support.</span></p>
<p><span style="font-weight: 400;">Our focus is to use open standards in these APIs as often as possible.</span> <span style="font-weight: 400;">Apache Arrow</span><span style="font-weight: 400;"> is an open source in-memory layout standard for columnar data that has been widely adopted in the industry. In a way, Arrow can be seen as the layer underneath Velox: Arrow describes how columnar data is represented in memory; Velox provides a series of execution and resource management primitives to process this data.</span></p>
<p><span style="font-weight: 400;">Although the Arrow format predates Velox, we made a conscious design decision while creating Velox to extend and deviate from the Arrow format, creating a layout we call</span> <span style="font-weight: 400;">Velox Vectors</span><span style="font-weight: 400;">. The purpose was to accelerate the data processing operations commonly found in our workloads in ways that were not possible using Arrow. Velox Vectors provided the efficiency and agility we need to move fast, but in return created a fragmented space with limited component interoperability. </span></p>
<p><span style="font-weight: 400;">To bridge this gap and create a more unified data landscape for our systems and the community, we partnered with</span> <span style="font-weight: 400;">Voltron Data</span><span style="font-weight: 400;"> and the Arrow community to align and converge these two formats. After a year of work, the new Apache Arrow release,</span> <span style="font-weight: 400;">Apache Arrow 15.0.0</span><span style="font-weight: 400;">, includes three new format layouts inspired by Velox Vectors: StringView, ListView, and Run-End-Encoding (REE).</span></p>
<p><span style="font-weight: 400;">Arrow 15 not only enables efficient (zero-copy) in-memory communication across components using Velox and Arrow, but also increases Arrow’s applicability in modern execution engines, unlocking a variety of use cases across the industry. </span></p>
<h2><span style="font-weight: 400;">Details of the Arrow and Velox layout</span></h2>
<p><span style="font-weight: 400;">Both Arrow and Velox Vectors are columnar layouts whose purpose is to represent batches of data in memory. A column is usually composed of a sequential buffer where row values are stored contiguously and an optional bitmask to represent the nullability/validity of each value: </span></p>
<p>(a) Logical and (b) physical representation of an example dataset.</p>
<p><span style="font-weight: 400;">The Arrow and Velox Vectors formats already had compatible layout representations for scalar fixed-size data types (such as integers, floats, and booleans) and dictionary-encoded data. However, there were incompatibilities in string representation and container types such as arrays and maps, and a lack of support for constant and run-length-encoded (RLE) data.</span></p>
<h3><span style="font-weight: 400;">StringView – strings</span></h3>
<p><span style="font-weight: 400;">Arrow’s typical string representation uses the</span> <span style="font-weight: 400;">variable-sized element layout</span><span style="font-weight: 400;">, which consists of one contiguous buffer containing the string contents (the data), and one buffer marking where each string starts (the offsets). The size of a string </span><span style="font-weight: 400;">i</span><span style="font-weight: 400;"> can be obtained by subtracting </span><span style="font-weight: 400;">offsets[i+1]</span><span style="font-weight: 400;"> by </span><span style="font-weight: 400;">offsets[i].</span><span style="font-weight: 400;"> This is equivalent to representing strings as an array of characters:</span><span style="font-weight: 400;"> </span></p>
<p><img decoding="async" class="size-large wp-image-21003" src="https://engineering.fb.com/wp-content/uploads/2024/02/Velox-Arrow-Convergence-2.png?w=1018" alt="" width="1018" height="436" srcset="https://engineering.fb.com/wp-content/uploads/2024/02/Velox-Arrow-Convergence-2.png 1018w, https://engineering.fb.com/wp-content/uploads/2024/02/Velox-Arrow-Convergence-2.png?resize=916,392 916w, https://engineering.fb.com/wp-content/uploads/2024/02/Velox-Arrow-Convergence-2.png?resize=768,329 768w, https://engineering.fb.com/wp-content/uploads/2024/02/Velox-Arrow-Convergence-2.png?resize=96,41 96w, https://engineering.fb.com/wp-content/uploads/2024/02/Velox-Arrow-Convergence-2.png?resize=192,82 192w" sizes="(max-width: 992px) 100vw, 62vw"/>Arrow original string representation.</p>
<p><span style="font-weight: 400;">While Arrow’s representation stands out in simplicity, we found through a series of experiments that the following alternate string representation (which is now referred to as </span><span style="font-weight: 400;">StringView</span><span style="font-weight: 400;">) provides compelling properties that are important for efficient string processing:</span><span style="font-weight: 400;"> </span></p>
<p><img loading="lazy" decoding="async" class="size-large wp-image-21004" src="https://engineering.fb.com/wp-content/uploads/2024/02/Velox-Arrow-Convergence-3.png?w=1024" alt="" width="1024" height="326" srcset="https://engineering.fb.com/wp-content/uploads/2024/02/Velox-Arrow-Convergence-3.png 1048w, https://engineering.fb.com/wp-content/uploads/2024/02/Velox-Arrow-Convergence-3.png?resize=916,292 916w, https://engineering.fb.com/wp-content/uploads/2024/02/Velox-Arrow-Convergence-3.png?resize=768,245 768w, https://engineering.fb.com/wp-content/uploads/2024/02/Velox-Arrow-Convergence-3.png?resize=1024,326 1024w, https://engineering.fb.com/wp-content/uploads/2024/02/Velox-Arrow-Convergence-3.png?resize=96,31 96w, https://engineering.fb.com/wp-content/uploads/2024/02/Velox-Arrow-Convergence-3.png?resize=192,61 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>New StringView representation in Arrow 15.</p>
<p><span style="font-weight: 400;">In the</span> <span style="font-weight: 400;">new representation</span><span style="font-weight: 400;">, the first four bytes of the </span><span style="font-weight: 400;">view</span><span style="font-weight: 400;"> object always contain the string size. If the string is short (up to 12 characters), the contents are stored inline in the view structure. Otherwise, a prefix of the string is stored in the next four bytes, followed by the buffer ID (StringViews can contain multiple data buffers) and the offset in that data buffer.</span></p>
<p><span style="font-weight: 400;">The benefits of this layout are:</span></p>
<ol>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Small strings of up to 12 bytes are fully inlined within the views buffer and can be read without dereferencing the data buffer. This increases memory locality as the typical cache miss of accessing the data buffer is avoided, increasing performance.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Since StringViews store a small (four bytes) prefix with the view object, string comparisons can fail-fast and, in many cases, avoid accessing the data buffer. This property speeds up common operations such as highly selective filters and sorting.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">StringView gives developers more flexibility on how string data is laid out in memory. For example, it allows for certain common string operations, such as </span><span style="font-weight: 400;">𝑡𝑟𝑖𝑚</span><span style="font-weight: 400;">() and </span><span style="font-weight: 400;">𝑠𝑢𝑏𝑠𝑡𝑟</span><span style="font-weight: 400;">(), to be executed zero-copy by only updating the view object.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Since StringView’s view object has a fixed size (16 bytes), StringViews can be written out of order (e.g., first writing StringView at position 2, then 0 and 1). </span></li>
</ol>
<p><span style="font-weight: 400;">Besides these properties, we have found that other modern processing engines and libraries like </span><span style="font-weight: 400;">Umbra</span><span style="font-weight: 400;"> and DuckDB follow a similar string representation approach, and, consequently, also used to deviate from Arrow. In Arrow 15, StringView has been added as a supported layout and can now be used to efficiently transfer string batches across these systems.</span></p>
<h3><span style="font-weight: 400;">ListView – variable-sized containers</span></h3>
<p><span style="font-weight: 400;">Variable-size containers like arrays and maps are</span> <span style="font-weight: 400;">represented in Arrow</span><span style="font-weight: 400;"> using one buffer containing the flattened elements from all rows, and one </span><span style="font-weight: 400;">offsets</span><span style="font-weight: 400;"> buffer marking where the container on each row starts, similar to the original string representation. The number of elements a container on row </span><span style="font-weight: 400;">i</span><span style="font-weight: 400;"> stores can be obtained by subtracting </span><span style="font-weight: 400;">offsets[i+1]</span><span style="font-weight: 400;"> by </span><span style="font-weight: 400;">offsets[i]</span><span style="font-weight: 400;">:</span><span style="font-weight: 400;"> </span></p>
<p><img loading="lazy" decoding="async" class="size-large wp-image-21005" src="https://engineering.fb.com/wp-content/uploads/2024/02/Velox-Arrow-Convergence-4.png?w=883" alt="" width="883" height="277" srcset="https://engineering.fb.com/wp-content/uploads/2024/02/Velox-Arrow-Convergence-4.png 883w, https://engineering.fb.com/wp-content/uploads/2024/02/Velox-Arrow-Convergence-4.png?resize=768,241 768w, https://engineering.fb.com/wp-content/uploads/2024/02/Velox-Arrow-Convergence-4.png?resize=96,30 96w, https://engineering.fb.com/wp-content/uploads/2024/02/Velox-Arrow-Convergence-4.png?resize=192,60 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>Arrow original list representation.</p>
<p><span style="font-weight: 400;">To efficiently support execution of </span><span style="font-weight: 400;">vectorized conditionals</span><span style="font-weight: 400;"> (e.g., IF and SWITCH operations), the Velox Vectors layout has to allow developers to write columns out of order. This means that developers can, for example, first write all even row records then all odd row records without having to reorganize elements that have already been written.</span></p>
<p><span style="font-weight: 400;">Primitive types can always be written out of order since the element size is constant and known beforehand. Likewise, strings can also be written out of order using StringView because the string metadata objects have a constant size (16 bytes), and string contents do not need to be written contiguously. To increase flexibility and support out-of-order writes for the remaining variable-sized types in Velox, we decided to keep both </span><span style="font-weight: 400;">lengths</span><span style="font-weight: 400;"> and </span><span style="font-weight: 400;">offsets</span><span style="font-weight: 400;"> buffers:</span></p>
<p><img loading="lazy" decoding="async" class="size-large wp-image-21006" src="https://engineering.fb.com/wp-content/uploads/2024/02/Velox-Arrow-Convergence-5.png?w=954" alt="" width="954" height="319" srcset="https://engineering.fb.com/wp-content/uploads/2024/02/Velox-Arrow-Convergence-5.png 954w, https://engineering.fb.com/wp-content/uploads/2024/02/Velox-Arrow-Convergence-5.png?resize=916,306 916w, https://engineering.fb.com/wp-content/uploads/2024/02/Velox-Arrow-Convergence-5.png?resize=768,257 768w, https://engineering.fb.com/wp-content/uploads/2024/02/Velox-Arrow-Convergence-5.png?resize=96,32 96w, https://engineering.fb.com/wp-content/uploads/2024/02/Velox-Arrow-Convergence-5.png?resize=192,64 192w" sizes="auto, (max-width: 992px) 100vw, 62vw"/>New ListView representation in Arrow 15.</p>
<p><span style="font-weight: 400;">To bridge the gap, a new format called ListView has been added to Arrow 15. It allows the representation of variable-sized elements that have both lengths and offsets buffers.</span></p>
<p><span style="font-weight: 400;">Beyond allowing for efficient execution of conditionals, ListView gives developers more flexibility to slice and rearrange containers (e.g., operations like </span><span style="font-weight: 400;">slice()</span><span style="font-weight: 400;"> and </span><span style="font-weight: 400;">trim_array()</span><span style="font-weight: 400;"> can be implemented zero-copy), other than allowing for containers with overlapping ranges of elements.</span></p>
<h3><span style="font-weight: 400;">REE – more encodings</span></h3>
<p><span style="font-weight: 400;">We have also added two additional encoding formats commonly found in data warehouse workloads into Velox: constant encoding, to represent that all values in a column are the same, typically used to represent literals and partition keys; and RLE, to compactly represent consecutive runs of the same element.</span></p>
<p><span style="font-weight: 400;">Upon discussion with the community, it was decided to add the REE format to Arrow. The REE format is a slight variation of RLE that, instead of storing the lengths of each run, stores the offset in which each run ends, providing better random-access support. With REEs it is also possible to represent constant encoded values by encoding them as a single run whose size is the entire batch.</span></p>
<h2><span style="font-weight: 400;">Composability is the future of data management</span></h2>
<p><span style="font-weight: 400;">Converging Arrow and Velox’s memory layout is an important step towards making data management systems more composable. It enables systems to combine the power of Velox’s state-of-the-art execution with the widespread industry adoption of Arrow’s standard, resulting in a more efficient and seamless cooperation. The new extensions are already seeing adoption in libraries like </span><span style="font-weight: 400;">PyArrow</span><span style="font-weight: 400;"> and </span><span style="font-weight: 400;">Polars</span><span style="font-weight: 400;"> and within Meta. In the future, it will allow more efficient interplay between projects like </span><span style="font-weight: 400;">Apache Gluten</span><span style="font-weight: 400;"> (which uses Velox internally) and </span><span style="font-weight: 400;">PySpark</span><span style="font-weight: 400;"> (which consumes Arrow), for example.</span></p>
<p><span style="font-weight: 400;">We envision that fragmentation and duplication of work can be reduced by decomposing data systems into reusable components which are open source and built based on open standards and APIs. Ultimately, we hope this work will help provide the foundation required to accelerate the pace of innovation in data management.</span></p>
<h2><span style="font-weight: 400;">Acknowledgments</span></h2>
<p><span style="font-weight: 400;">This format alignment was only possible due to a broad collaboration across different groups. A special thank you to Masha Basmanova, Orri Erling, Xiaoxuan Meng, Krishna Pai, Jimmy Lu, Kevin Wilfong, Laith Sakka, Wei He, Bikramjeet Vig, and Sridhar Anumandla from the Velox team at Meta; Felipe Carvalho, Ben Kietzman, Jacob Wujciak-Jens, Srikanth Nadukudy, Wes McKinney, and Keith Kraus from Voltron Data; and the entire Apache Arrow community for the insightful discussions, feedback, and receptivity to new ideas.</span></p>The post <a href="https://dailyzsocialmedianews.com/aligning-velox-and-apache-arrow-in-the-direction-of-composable-knowledge-administration/">Aligning Velox and Apache Arrow: In the direction of composable knowledge administration</a> first appeared on <a href="https://dailyzsocialmedianews.com">DAILY ZSOCIAL MEDIA NEWS</a>.]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
