Analyzing Android and Java app safety in depth
We share details about Mariana Trench (MT), a tool we use to detect and prevent security and privacy errors in Android and Java applications. As part of our efforts to scale security through building automation, we recently deployed MT as an open source solution to assist security engineers at Facebook and across the industry.
This post is the third in our series of in-depth insights into the static and dynamic analysis tools we rely on. MT is the newest system, according to Zoncolan and Mosey, developed for hack and python code.
Facebook’s mobile applications, including Facebook, Instagram and Whatsapp, run on millions of lines of code and are constantly evolving to enable new features and improve our services. To handle this amount of code, we build sophisticated systems to help our safety engineers identify and review code for potential problems instead of just relying on manual code reviews. In the first half of 2021, over 50 percent of the security vulnerabilities we found in our app family were detected using automated tools.
We designed MT to focus on Android applications in particular. There are differences in patching and ensuring code updates are applied between mobile and web applications, so they require different approaches. While server-side code for web apps can be updated almost instantly, mitigating a security flaw in an Android application depends on each user updating the application on their device in a timely manner. It is all the more important for every app developer to set up systems to prevent weak points from getting into mobile releases if possible.
MT is designed to scan large mobile code bases and identify potential problems with pull requests before they go into production. It was developed as a result of a close collaboration between security and software engineers at Facebook who train MT to look at code and analyze how data flows through it. Analyzing data flows is useful because many security and privacy problems can be modeled so that data flows where it shouldn’t.
You can find MT on GitHub, and we have published a binary distribution on PyPI. We also wrote a short instructions to get you started. Our teams actively develop and improve MT. We look forward to your feedback: If you are interested in working with us, please open an issue or contact us on GitHub.
This is how the Mariana Trench works
MT works very much like Zoncolan and Mosey. The main difference is that MT is optimized for analyzing Android and Java applications. We briefly cover the basics in this blog post and encourage our readers to check our previous write-ups for a more in-depth technical explanation.
Security engineers often think of vulnerabilities related to data flows that they do not want to see in their applications. For example, an application should not log sensitive data or be subject to vulnerabilities that allow attackers to inject malicious code.
A data flow can be described in MT by:
- Source: a starting point. This can be a user-controlled string that is entered into the app via `Intent.getData`.
- Sink: one goal. In Android this can be a call to `Log.w` or` Runtime.exec`.
A large code base can contain many different types of corresponding sources and sinks. We can instruct MT to show us certain flows by defining rules. For example, a rule could say that we want to find Intention redirects (Problems that allow attackers to intercept sensitive data) by defining a rule that shows us all traces from “user-controlled” sources to an “intent redirection” sink.
MT finds possible paths from each source to its corresponding sink. It does this by computing a model for each Java method it sees in the code base. The models are made using a static analysis technique called. calculated abstract interpretation.
This is how safety engineers use Mariana Trench
MT is the way security engineers scale their work as part of Facebook’s deep application security efforts. In a typical scenario, a security engineer would begin by roughly defining the boundaries of the data flows that they would like to search the code base for. For example, if she wants to find SQL injections, she needs to specify where user-controlled data is to be entered into the code (e.g. intents in Android, the file system, etc.) and where not to go (e.g. API for building SQL queries ). However, this is just the beginning – it is not enough to define a rule that connects the two. Engineers also need to review the problems identified and refine the rules until the results are sufficiently meaningful.
As with any development effort, any tool that automatically scans code has inherent tradeoffs. Traditionally, static analysis research has focused heavily on minimizing false positives. For security reasons, this calculation can be very different. When using MT on Facebook, we make a point of finding more potential issues, even if it means showing more false positives. Because we take care of marginal cases: data flows that are theoretically possible and usable, but rarely occur in production.
To help safety engineers manage and sort the output, we designed MT so that they can quickly determine if a problem is indeed positive by searching through results based on criteria such as the length of a trace or the specific functions that on a track.
Once the rule has been created and proven to be effective, it will be executed on every pull request. If MT detects a flow that violates the rule, the flow can either be routed to a security engineer on standby or directly to the software engineer who made the pull request.
Instead of relying on MT as a panacea, we’re using it as part of the broader one Defense-in-depth approach. As Facebook invests in improving the fidelity of the signals generated by MT, security engineers are continually working to refine rules and diagnose MT limitations in collaboration with the software engineers who create our apps.
Navigate the results: Post processor for static analysis
In addition to building the static analysis systems themselves, we have created open source tools to review and analyze the results generated by MT (and Pysa). We call our stand-alone processing tool Static Analysis Post Processor (SAPP).
Us first shared our work on SAPP and how to use its command line interface (CLI) to navigate Pysa at DefCon 2020. SAPP is specifically designed to support various static analysis tools and it supports MT out of the box.
SAPP takes over the raw output of MT and facilitates the triage of the results. SAPP is designed to visually demonstrate how data can potentially flow from source to sink so that experts can quickly assess whether they are okay with the tool’s assessment.
The SAPP trace view shows the data flow step by step. It highlights the relevant lines of code and allows the safety engineer to traverse possible paths that will eventually reach the same sink location in the code.
To give you an idea of what this looks like, here’s a quick demo of how MT runs in a sample app:
As you can see, SAPP presents a list of problems, each of which is a potential vulnerability. Each problem contains one or more traces; If several traces are essentially similar, they are grouped on the same problem to judge whether the overall problem is valid. SAPP supports extensive filtering and search functions so that safety engineers can concentrate on the results that they want to investigate in each list.
First steps with Mariana Trench
MT is available on GitHub, and we have published a binary distribution on PyPI. We also wrote a short instructions to get you started.
Our teams are actively developing MT to improve it further. If you have any feedback or are interested in working with us, please open an issue or contact us on GitHub.
We would like to thank you Maxime Arthaud, Amar Bhosale, Tanning Janssen van Doorn, Yuh Shin Ong, Chenguang Shen, Simran Virk, Shannon Zhu, and everyone else who worked on Mariana Trench.