A fine-grained community visitors evaluation with Millisampler
What the research is:
Millisampler is one of Meta’s newest characterization tools and allows us to efficiently observe, characterize and debug network performance on timescales with high granularity. This lightweight network traffic characterization tool for continuous monitoring works with fine, configurable time scales. It collects time series of inbound and outbound traffic volume, number of active flows, inbound ECN marks, and inbound and outbound retransmissions. In addition, Millisampler is also able to identify traffic within the region and traffic between regions (longer RTT). Millisampler runs on our server fleet and collects short, periodic snapshots of this data at time granularities of 100us, 1ms and 10ms, stores them on a local disk and makes them available for on-demand analysis for several days. Since the data is only flow-level aggregated header information, it does not contain any personally identifiable information (PII). Even with the minimal amount of information collected, millisampler data has proven very useful in practice, especially when combined with existing coarser data – we can clearly see how, for example, switch buffers or host NICs might not be able to handle it the inbound traffic pattern.
How it works:
Millisampler includes userspace code for scheduling runs, storing data, and providing data, and an eBPF-based tc filter running in the kernel to collect fine timescale data. The user code attaches the tc filter and enables data collection. A tc filter is one of the first programmable steps in receiving a packet and almost the last step in transmission. For Ingress, this means that the eBPF code runs on the CPU core, which handles the soft IRQ (lower half) when the packet is routed to the owning socket. Because processing is done on many CPU cores, to avoid locking, we use per-CPU variables that increase memory footprint to eliminate the risk of conflicts. In order to minimize the effort, we take samples regularly and for short periods of time. Userspace therefore configures two parameters in Millisampler: the sampling interval and the number of samples. We plan runs with three sample intervals: 10 ms, 1 ms, and 100 μs, with a fixed number of samples up to 2,000 for all sample intervals. This means that our observation periods range from 200 ms (100 μs sample rate) to 20 s (10 ms sample rate), allowing us to observe events on sub-RTT to cross-region RTT timescales while correcting the memory footprint of each run to 2,000 64-bit counter per CPU core for each value we measured.
Millisampler collects a variety of metrics. It calculates the total input and output bytes and the ECN-tagged input bytes from the lengths and CE bits of the packets. Millisampler also soundsTTLd marked retransmissions. Millisampler uses a 128-bit sketch to estimate the number of active (incoming and outgoing) connections. Using the sketch results in an approximation of the link count that is accurate to a dozen links and saturates at about 500 links per sample interval. Although there is room for additional precision, in practice more than the actual number of links, the qualitative variation between a few links and tens or hundreds of links was helpful in identifying traffic patterns with more links (heavy incast) as opposed to more traffic with fewer connections.
Why it matters:
Millisampler is a powerful troubleshooting and performance analysis tool. Two conflicting network performance bugs we’ve solved at Meta over the past few years stem from our need for a fine-grained view of traffic. The first problem consisted of synchronized bursts of traffic on tight timescales, and seeing this motivated us to develop and use millisamplers to quickly catch it if it reoccurs. The second, which an early prototype millisampler helped pinpoint, had a NIC driver bug that caused it to stop delivering packets for milliseconds, proving the value of millisampler in complex investigations. While millisampler (or millisampler-like data) played an important role in these investigations, it was only as part of our rich ecosystem of data collection tools that track a dizzying array of metrics across hosts and a network.
Aside from such incidents, millisampler data has also proven useful in characterizing and analyzing the traffic characteristics of services, allowing us to develop and deploy a range of solutions to improve their performance. For example, we were able to characterize the nature of bursts across a range of services to understand the intensity of the incast and tune transport performance accordingly. We were also able to study complex interactions between short RTT and long RTT currents and understand how bursts from one of the two affect fairness to the other. In a following post we will look at an extension of Millisampler – Syncmillisampler – where we run Millisampler synchronously on all hosts in a rack and use this data to identify buffer conflicts in the top-of-rack ASICs.
Read the full paper:
Ehab Ghabashneh, Christian Lumezanu, Raghu Nallamothu, and Rob Sherwood also contributed to the design and implementation of Millisampler.