a time travelling useful resource monitoring software
In this blog post we present a licensed Apache 2.0 resource monitor for modern Linux systems below. below was designed and developed by Facebook’s resource control team to view and record historical Linux system data. The resource control team, as the name suggests, is responsible for large-scale resource management of Linux systems.
background
One of the main tasks of the kernel is to mediate access to resources. Sometimes this can mean dividing the physical memory so that multiple processes can share the same host. In other cases, it can mean ensuring a fair distribution of CPU time. In all of these contexts, the kernel provides the mechanism and leaves the guideline to a runtime such as systemd or dockerd. The runtime takes input from a planner or end user – according to what is to be done and how it is being executed – and turns the right knobs and pulls the right levers on the kernel so the workload can start executing.
In a perfect world, that would be the end of the story. The reality, however, is that resource management is a complex and opaque amalgam of technology that has evolved over decades of computing. Although some of these technologies have various warts and dead ends, the end result – a container – works well. While the user typically doesn’t have to worry about the details, it is critical for infrastructure operators to have visibility into this stack. Visibility and debugging ability are essential to identify and investigate misconfigurations, errors and systemic problems.
To make matters worse, resource failures are often difficult to reproduce. It’s not uncommon to wait weeks for a problem to recur so that the cause can be investigated. Scaling it up further exacerbates this problem: you can’t run a custom script on every host in the hopes of logging important status bits if the error occurs again. Therefore, more sophisticated tools as described below are required.
Why below?
Historically, Facebook has been a heavy user of atop. atop is a performance monitor for Linux that can report the activity of all processes as well as various system-level activities. One of the most compelling features atop has to offer over tools like htop is the ability to record historical data as a daemon. This sounds like a simple function, but in practice it has made it possible to debug countless production problems. If the data is kept long enough, it is possible to rewind in time and view the host status before, during and after the failure.
Unfortunately, over the years it turned out that atop had certain flaws. First, cgroups have emerged as the de facto method of controlling and monitoring resources on a Linux machine. atop still lacks support for this basic building block. Second, atop stores data on disk with custom delta compression. This works fine under normal circumstances, but under heavy resource pressures the host is likely to lose data points. Because delta compression is used, large amounts of data can be lost for periods when the data is most important. Third, the user experience has a steep learning curve. We have often heard from top power users that they love the dense layout and numerous keyboard shortcuts. However, this is a double-edged sword. Now, when someone new to atop wants to fix a production problem, they’re solving two problems at once: the problem at hand and using atop.
Since we recognized the possibility for a next-generation system monitor, we have developed the following, taking into account input from users from production and taking into account the following aspects:
- Ease of use: below needs to be intuitive for both new users and powerful for everyday users
- Opinion-Based Statistics: Accurate and useful statistics are shown below. We try to avoid collecting and dumping statistics just because we can.
- Flexibility: If the default settings are not enough, we allow the user to customize their experience. Examples include configurable key assignments, configurable default views, and a script interface (the default is a terminal user interface).
installation
To install the package:
# dnf install -y below
To turn on the recording daemon:
# systemctl enable – down now
Fast tour
The most common mode for below is playback mode. As the name suggests, the playback mode plays back previously recorded data. Assuming you’ve already started the recording daemon, start a session by running:
$ under Replay –time “5 minutes ago”
You should then be greeted with the group view:
If you get stuck or forget a key assignment, press? to access the help menu.
At the very top of the screen is the status bar. The status bar shows information about the current sample. You can move back and forth through the samples by pressing t and T, respectively. The middle section is the system overview. The system overview contains statistics about the whole system, which are always useful. The third and lowest section is the multipurpose view. In addition to the control group view that is displayed, there is a process view and a system view, which you can access by pressing p and s, respectively.
Press ↑ and ↓ to move the list selection. Press
Press z again to return to Group Call view. The control group view can sometimes be long. If you have a vague idea of what you’re looking for, you can filter on group names by pressing / and entering a filter:
At this point, you may have noticed a tab system that we haven’t explored yet. To scroll back and forth through the tabs, press
Other properties
Under the hood, the bottom has a powerful design and architecture. Facebook is constantly updating to newer kernels, so we never assume that a data source is always available. This tacit assumption enables full backward and forward compatibility between kernels and lower versions. In addition, each data point Zstandard (zstd) is compressed and completely saved. This solves the delta compression problems we saw at scale above. Based on our tests, our per-sample compression can achieve a compression ratio of 5x on average.
below also uses eBPF to collect information about short-lived processes (processes that live shorter than the data collection interval). In contrast, atop implements this function with BSD Process Accounting, a well-known slow kernel interface that is prone to priority inversion.
For the user, the bottom also supports live mode and a dump interface. The live mode combines the recording daemon and the TUI session in one process. This is useful for scanning the system status without committing to a long-running daemon or hard drive space for data storage. The dump interface is a scriptable interface to all of the data stores listed below. Dump is both powerful and flexible – detailed data is available in CSV, JSON, and human-readable formats.
below offers compelling advantages over existing resource monitoring tools. We (the developers listed below) have put a lot of effort into preparing the information below for open source use. We love that readers and the community below have the chance to try it out, and we hope that it gives you an interactive and easy-to-use system monitor. If you have any feedback, feature requests or bugs, please let us know in the Github Issue Tracker.
To learn more about Facebook Open Source, visit our open source site, subscribe to our YouTube channel, or follow us on Twitter and Facebook.
Comments are closed.