Pysa 101: Overview of Facebook’s open-source Python code analysis tool
Introduction to Pyre and Pysa
Pyre is a performance type-checker created by Facebook for the Python programming language. It is designed to rapidly identify type errors within Python applications.
The Python Static Analyzer (Pysa) is a static code analysis tool that ships as part of Pyre. Pysa performs Python source code analysis and uses taint analysis to identify potentially exploitable vulnerabilities within Python applications.
Taint analysis with Pysa
Taint analysis is designed to trace untrusted data through an application. It looks for cases where this untrusted data is used in potentially exploitable functionality within an application, such as SQL queries.
Defining sources, sinks and rules
Taint analysis detects flows of data from user-controlled inputs (sources) to potentially exploitable functionality (sinks). Pysa is designed to use a collection of default and user-defined sources and sinks for Python taint analysis.
The taint.config file is a JSON file that stores the primary definitions for Pysa taint analysis. This includes definitions of the taint sources, sinks and rules. Sources and sinks are defined using the same syntax as Python 3 type annotations.
In Pysa, a rule defines a flow from one or more sources to one or more sinks that is of interest. For example, a Pysa taint.config file may contain a rule for SQL injection that specifies sources of untrusted input and an SQL query as a sink.
The other important type of Pysa configuration files is the model file. A Pysa model file (.pysa file extension) is used to annotate Python code with sources, sinks, sanitizers (functions that remove taint from data like hash algorithms) and features (additional metadata assigned to taint flows). Pysa has a number of built-in model files and users can define additional custom models. When performing annotation of a Python source code file, Pysa will use the union of all applicable model files.
Pysa scope of analysis
Pysa is a Python source code analysis tool. As a result, it has some of the same limitations as other source code analysis tools.
One of the major limitations of source code analysis is that it can only see a subset of the code within a particular application. In the case of Pysa, this analysis is limited to the code in the code repository where Pysa is run and the directories explicitly specified within a .pyre_configuration file.
This means that Pysa is largely blind to the dependencies within a Python application. This is problematic because a dependency may contain unknown sinks or other functionality that impacts taint analysis (such as sanitizers).
Like most taint analysis tools, Pysa assumes that any function that it lacks visibility into and that has a tainted input produces a tainted output as well. While it is possible to explicitly label some functions as not transmitting taint, this is unscalable. As a result, Pysa can generate a number of false positive detections.
Additionally, all attributes of a tainted object are also considered tainted in Pysa. While this is good in some cases, it can also generate false positives. For example, if you have a tainted object tainted, then a reference to tainted.__class__ will also be labeled as tainted, generating a false positive detection.
This potential for false positives means that Pysa results should not be absolutely trusted and requires further analysis. However, the bias toward false positives (by over-labeling taint) is preferable to false negatives, where potentially exploitable vulnerabilities may be overlooked.
Securing Python applications
Facebook’s Pysa is a valuable tool for Python source code analysis. Its taint analysis functionality can help to rapidly detect potentially exploitable vulnerabilities.
However, Pysa requires a deep understanding of a Python application’s internals. Developers must be able to accurately identify sources and sinks within the application in order to effectively use Pysa to identify potentially vulnerable data flows within an application.
Additionally, Pysa cannot detect all potential vulnerabilities within an application, making it necessary to use other analysis techniques, such as dynamic code analysis tools, as well. Combining multiple static code analysis tools (like Pysa and pylint), dynamic code analysis and penetration testing helps to dramatically reduce the cybersecurity risk and exploitability of Python applications.