Enhancing debugging within the developer lifecycle
This article was written by Lee Howes and Kenny Yu from Facebook.
This is the final post in a series covering how we have used C++ coroutines at Facebook to regain stack traces for dependent chains of asynchronous waiting tasks. In previous blogs of this series, we have described how we implement async stack traces in folly, on top of C++’s coroutines. In this final post we talk about the tooling we’ve implemented to hook into this functionality and make debugging and profiling of coroutine code possible.
Most C++ developers at Facebook work on a team that operates a service. Operating a service in production requires developers to solve problems in a variety of situations.
- During development and testing we might need to know who calls a given function and how we reach a particular point in the code.
- While monitoring a service in production we may want to know what errors are being thrown and where they are coming from.
- In failure scenarios, where there are crashes or production bugs, we need to investigate what threads were doing when the crash happened.
Different tools for stack walking are suitable for each situation, and we have found it useful to provide async stack trace tooling for each:
- Development & testing:
- Provide helpers to easily print the async stack trace of the current thread on demand in code.
- Monitoring the service in production:
- Provide helpers to print the async stack trace where exceptions are thrown.
- Debugging crashes and other bugs in production:
- If an async operation is active, print the async stack trace by default when the program crashes.
- Provide a gdb extension to print the async stack trace from a core dump or when attached to a running program.
Below, we discuss how we built these async stack trace tools and the challenges we discovered when doing so. Much of this work is open source and available in the folly repo on github.
Production tracing
The most common use for stack traces, and now async stack traces, across Facebook infrastructure is in profiling services.
To provide service owners, and engineers more broadly, with a good idea of what is running and where bottlenecks appear, we take sampled stack traces in production constantly. For normal synchronous stack traces this works well, but for the wide range of asynchronous code running on folly’s Futures library this provided limited information.
As coroutine use has grown in production, we added the ability to hook into async stack traces in profiling. The machinery we described is carefully designed to be processed by BPF programs running during sampling. BPF programs can walk asynchronous stacks from outside of the process, and log these sampled traces. This gives a clear idea not only what code is running, but across the fleet of machines what asynchronously depends on what.
Printing async stack traces when the program crashes
Another time where developers commonly see stack traces is when programs crash. The folly library already provides a signal handler that prints the stack trace of the thread that is causing the program to crash, e.g., because of a SIGSEGV or SIGABRT signal. We extended this signal handler to also print the async stack trace if there is an ongoing coroutine call on the current thread.
As an example, suppose we have the following buggy program:
#include
#include
#include
void crash() {
char* ptr = nullptr;
*ptr = ‘c’; // Trigger a crash
}
folly::coro::Task
crash();
co_return;
}
folly::coro::Task
co_await co_funcC();
}
folly::coro::Task
co_await co_funcB();
}
int main(int argc, char** argv) {
folly::init(&argc, &argv);
folly::CPUThreadPoolExecutor executor(1);
co_funcA().scheduleOn(&executor).start().get();
return 0;
}
When the program crashes, the user now sees the async stack trace along with the normal stack trace, providing more context on the asynchronous operations that led to the crash. See the example output below (addresses, file names, and some functions omitted for brevity):
*** Aborted at … ***
*** Signal 11 (SIGSEGV) … stack trace: ***
@ folly::symbolizer::(anonymous namespace)::innerSignalHandler(int, siginfo_t*, void*)
@ …
@ crash()
@ co_funcC() [clone .resume]
@ …
@ folly::CPUThreadPoolExecutor::threadRun(std::shared_ptr
@ …
@ execute_native_thread_routine
@ start_thread
*** Check failure async stack trace: ***
…
@ folly::symbolizer::(anonymous namespace)::innerSignalHandler(int, siginfo_t*, void*)
@ …
@ crash()
@ co_funcC() [clone .resume]
@ co_funcB() [clone .resume]
@ co_funcA() [clone .resume]
@ main
@ folly::detached_task()
Handling signals is tricky; signal handlers have many restrictions in order to be async-signal-safe. For example, signal handlers cannot acquire any locks, cannot allocate memory, and must only call reentrant functions. In order to print the normal stack trace and async stack trace within a signal handler, our signal handler during program initialization preallocates memory to hold a fixed number of stack frames. During signal handling, the handler avoids dynamically allocating memory by using the preallocated buffer to hold the normal stack trace, and then reuses the buffer again when walking the async stack trace.
Printing async stack traces on demand
During development, a frequent question developers have is: What is the series of function calls that led to this function being called? Folly provides a convenience function to easily print the async stack trace on demand, helping developers quickly see how a function or coroutine is called, and what is dependent on it. Below is an example of how to use the convenience function:
#include
#include
folly::coro::Task Note that symbolizing stack traces is potentially expensive to call repeatedly, therefore this convenience function should only be used for debugging. C++ developers often need to use debuggers like GDB to debug crashes after-the-fact or to investigate buggy behavior in running programs. In addition to printing the async stack trace when the program crashes, we recently implemented a GDB extension to easily print the async stack trace for the current thread from within the debugger. Below is an example output from a GDB session where we are investigating a core dump generated from a crash of the buggy program from the previous section (addresses and file names omitted for brevity). # Load the folly::coro gdb extension co_bt [async_stack_root_addr] Prints async stack trace for the current thread. # Print the async stack trace for the current thread # Print the async stack trace for all threads The source command loads the GDB extension, which provides new commands available at the GDB command line. The co_bt command prints the async stack trace for the current thread if there is a coroutine call in progress. The co_bt command can also be composed with other GDB commands, such as thread apply all co_bt, which will print the async stack trace for all threads. The GDB extension also provides a command to print async stack roots for debugging the async stack framework itself. We used the GDB Python API to build the async stack trace GDB extension. To be most useful, we require the extension to work both with running programs and with core dumps after programs have crashed. As a result, the extension cannot make use of C++ function calls to obtain async stack trace information, as function calls will not work with core dumps. This means that the extension needs to examine pthread and async stack frame internals to extract information. To collect the async stack trace information within GDB, the first step is to obtain the async stack root pointer from the current thread. In the glibc Native POSIX Threads Library (NPTL), pthread_t is an alias for struct pthread*, where struct pthread is the internal implementation type for a thread. NPTL represents thread-local storage using a 2-level lookup table rooted at the specific field within struct pthread. For example, given a pthread_key_t , we can find the corresponding pthread_key_data for the key at the following location: // The key (an integer) for thread-local storage // The thread we want to look up. pthread_t is // Find the data associated with `key` within the Note that most programs use very few pthread_key_data entries. As an optimization, NPTL inlines the first block of references directly into struct pthread as the specific_1stblock field to avoid dynamically allocating memory for pthread_key_data entries for most programs. This means that given a struct pthread* pd, then pd->specific[0] == pd->specific_1stblock. Putting it altogether, this is how the gdb extension collects the async stack trace for the current thread: When C++ developers monitor their services in production, one important question they often investigate is: What exceptions are being thrown and where are they being thrown from? Currently, the C++ standard library does not provide facilities to easily track throw sites for exceptions. To meet this need, folly provides an exception tracer library to hook into the Itanium C++ ABI for exception handling to track exception information. Folly uses this library to provide helpers to easily track where exceptions are thrown, including both normal and async stack traces. Below is an example program that uses these helpers: #include [[noreturn]] void bad() { folly::coro::Task folly::coro::Task folly::coro::Task Below is example output from the above program (addresses omitted for brevity): what(): test, Exception type: std::runtime_error In this output, we can see that the exception was thrown from bad, and the output includes the async stack trace to see how bad was called. To understand how this works, we first need to understand how the Itanium C++ ABI works. The Itanium C++ ABI specifies the behavior on how exceptions are processed. For example, whenever an exception is thrown in C++, the compiler generates a call to the following function: // Called when an exception is thrown In addition, the Itanium C++ ABI also specifies how exceptions are caught, rethrown, and what to do if exceptions are not caught. The folly exception tracer library allows users to specify custom callbacks to be invoked as part of exception processing. For example, the library allows users to specify a custom function to invoke whenever exceptions are thrown. To implement this, the library provides its own implementation of __cxa_throw which will first invoke the user-supplied custom callbacks, and then uses dlsym to invoke the real implementation of __cxa_throw for normal C++ exception handling. To track normal and async stack traces where exceptions are thrown, SmartExceptionTracer supplies a callback that will do the following: This approach adds a slight runtime overhead at the point where exceptions are thrown. Although there is some overhead, we believe this is acceptable for the following reasons: That is the end of this series of posts. It’s a lot of information, but we hope it both helps document what folly is doing and provides inspiration about interesting ways we can make use of C++ coroutines to provide facilities that were challenging to provide otherwise. There is future work in the implementation, where some of the costs can be mitigated by changing the compiler and the lessons we’ve learned from this implementation will influence that. However, this work is not experimental, coroutines are being used heavily in production across Facebook infrastructure, and were rapidly adopted by our developers for their ease of use and increased readability before we improved debugging and tracing. We are actively making use of this tracing infrastructure to learn about and debug async workloads in production. To learn more about Facebook Open Source, visit our open source site, subscribe to our YouTube channel, or follow us on Twitter and Facebook. Interested in working with open source technologies at Facebook? Check out our open source-related job postings on our career page.
std::cerr << folly::symbolizer::getAsyncStackTraceStr() << std::endl;
}
GDB extension to print async stack traces
(gdb) source -v folly/experimental/coro/scripts/gdb.py
Pretty printers for folly::coro. Available commands:
If an async stack root address is provided,
prints the async stack starting from this root.
co_async_stack_roots Prints all async stack roots.
(gdb) co_bt
0x… in crash() ()
0x… in co_funcC() [clone .resume] ()
0x… in co_funcB() [clone .resume] ()
0x… in co_funcA() [clone .resume] ()
0x… in main ()
0x… in folly::detached_task() ()
(gdb) thread apply all co_bt
pthread_key_t key = …
// an alias for struct pthread*
pthread_t thread = …
// 2-level lookup table
void* data = ((struct pthread*)thread)->specific
[(int)key / PTHREAD_KEY_1STLEVEL_SIZE]
[(int)key % PTHREAD_KEY_2NDLEVEL_SIZE].data;
Tracking where exceptions are thrown
#include
#include
#include
#include
throw std::runtime_error(“test”);
}
bad();
co_return;
}
co_await co_funcC();
}
try {
co_await co_funcB();
} catch (const std::exception& ex) {
// Prints where the exception was thrown
std::cerr << "what(): " << ex.what() << ", "
<< folly::exception_tracer::getAsyncTrace(ex)
<< std::endl;
}
}
int main(int argc, char** argv) {
folly::init(&argc, &argv);
folly::CPUThreadPoolExecutor executor(1);
co_funcA().scheduleOn(&executor).start().get();
return 0;
}
@ folly::exception_tracer::(anonymous namespace)::throwCallback(void*, std::type_info*, void (**)(void*))
@ __cxa_throw
@ bad()
@ co_funcC() [clone .resume]
@ co_funcB() [clone .resume]
@ co_funcA() [clone .resume]
@ main
@ folly::detached_task()
// – `thrown_exception` is a pointer to the thrown object
// – `tinfo` contains type information about the thrown object
// to match against potential catch sites
// – `dest` is a function pointer to eventually destroy the thrown object
void __cxa_throw(
void* thrown_exception,
std::type_info* tinfo,
void (*dest) (void*));
Summary
Other Blogs in this Series
Comments are closed.