Analyzing Trends in Python Package Vulnerabilities

Zach Marks
Ochrona Security
Published in
5 min readAug 30, 2021

--

Overview

Securing the software supply chain has been a hot topic of conversation as of late (read: Malicious PyPI Packages Steal Credit Cards and Inject Code; or the White House’s Executive Order to Protect Software Supply Chains). The need for proactive open-source package management has never been more prevalent, and that notion starts with having a continuously updated inventory of known package vulnerabilities. For those less familiar with this concept, I break down the importance in layman’s terms here.

Ochrona, an open-source Python dependency management project, recently unveiled its publicly available vulnerability database of 1300+ Python package vulnerabilities. It pulls from multiple sources and includes in-depth vulnerability data per entry, such as CVE ID, CWE ID, attack vector string, exploit description, affiliated license, exploitability and impact scores, and more. In the nature of transparency and deriving insights from it all, I analyzed trends pertaining to vulnerability data for these popular Python packages.

Some terms to point out:

1) CVE = Common Vulnerability and Exposure → it’s the industry standard for labeling vulnerabilities pertaining to packages or devices on the internet.

2) CWE = Common Weakness Enumeration → it’s the industry standard for detailing software weaknesses and methods for which packages can be exploited.

3) CVSS score = the industry standard for scoring CVEs. A score of 10 is the most severe.

A few considerations before we start:

1) I went with the top 5 packages by sheer CVE count amongst Ochrona’s vulnerability database, though there are many, many more that are tracked.

2) I only counted CVEs with CVSS score >7.5, which in general is considered to be a high or critical vulnerability.

3) I extracted the top 3 CWEs affiliated with each package’s CVEs to shed light into weaknesses with how these packages are built.

Let’s dive in.

Analysis

1. Tensorflow

Downloads (last month): 10,785,623

Description: Tensorflow is an open-source artificial intelligence library, using data flow graphs to build models. It allows developers to create large-scale neural networks with many layers.

CVE Count: 67

Top 3 CWEs by count:

CWE-787 → Out-of-bounds write (Buffer Overflow)

  • 19 affiliated CVEs with this CWE, with average CVSS score of 8.1

CWE-369 → A divide by zero results in a crash (Denial of Service)

  • 15 affiliated CVEs with this CWE, with average CVSS score of 7.8

CWE-119 → Improper restriction of operations within the bounds of a memory buffer (Buffer Overflow)

  • 8 affiliated CVEs with this CWE, with average CVSS score of 7.8

Takeaway: Attackers seem to exploit flaws in Tensorflow by targeting ways in which the package parses the data it processes — whether it be via memory buffers or number parsing. All in all, this checks out as a Tensorflow package would typically be processing lots of numbers at scale for ML purposes.

2. Salt

Downloads (last month): 81,494

Description: Salt is a package for event-driven IT automation, remote task execution, and configuration management. It provides several entry points for interfacing with Python applications.

CVE Count: 28

Top 3 CWEs by count:

CWE-77 → Improper neutralization of special elements used in a command (Command injection)

  • 4 affiliated CVEs with this CWE, with average CVSS score of 8.8

CWE-287 → Improper authentication

  • 3 affiliated CVEs with this CWE, with average CVSS score of 9.5

CWE-22 → Improper limitation of a pathname to a restricted directory (Path Traversal)

  • 3 affiliated CVEs with this CWE, with average CVSS score of 9.6

Takeaway: Given that Salt can be used a central repository for carrying out critical functionality (e.g. provisioning new servers, making changes to existing ones, hybrid control of cloud environments), it makes sense that the most common weaknesses pertain to unsanitized input being executed (command injection) and path traversal to unprovisioned functionalities. Naturally, broken authentication would have a negative impact with any administrative functionality tangential to Salt’s core uses.

3. Pillow

Downloads (last month): 30,727,609

Description: Pillow is used for image handling and supports opening and manipulating many different image file formats (e.g. png, jpeg, and everyone’s favorite, gif).

CVE Count: 27

Top 3 CWEs by count:

CWE-400 → Uncontrolled Resource Consumption (Denial of Service)

  • 7 affiliated CVEs with this CWE, with average CVSS score of 7.5

CWE-125 → Out-of-bounds read (Buffer Overflow)

  • 7 affiliated CVEs with this CWE, with average CVSS score of 7.6

CWE-120 → Buffer copy without checking size of input (Buffer Overflow)

  • 4 affiliated CVEs with this CWE, with average CVSS score of 9.3

Takeaway: It’s no surprise to see issues pertaining to resource consumption and memory buffer when it comes to Pillow. Given its widespread use in image processing, it’s important that limits on resource consumption are set; otherwise, attackers will be able to leverage the core file processing component to cause buffer overflows and downtime in pillow applications.

4. Django

Downloads (last month): 6,550,773

Description: Django is a collection of python libraries that enables users to quickly build and scale quality Python web applications. It’s the most popular web framework for Python, and one of the oldest at that.

CVE Count: 22

Top 3 CWEs by count:

CWE-89 → Improper neutralization of Special Elements used in a SQL command (SQL Injection)

  • 4 affiliated CVEs with this CWE, with average CVSS score of 9.6

CWE-399 → Resource management errors (Resource injection)

  • 3 affiliated CVEs with this CWE, with average CVSS score of 8.4

CWE-20 → Improper input validation (XSS & other types of injection)

  • 3 affiliated CVEs with this CWE, with average CVSS score of 7.5

Takeaway: It makes sense that top attack vectors for a web framework pertain to user-derived input, as end users of web apps built with Django would be able to submit malicious characters dynamically. As most web apps involve some sort of connection to a backend database, I wasn’t surprised to see SQL injection as the top vector here as well. Utilizing secure development practices like encoding or white/blacklisting characters would help to prevent these types of injection attacks.

5. Ansible

Downloads (last month): 4,135,675

Description: Ansible is a package for provisioning, configuration management, and application-deployment enabling infrastructure as code; in other words, it’s a very simple IT automation engine.

CVE Count: 22

Top 3 CWEs by count:

CWE-20 → Improper input validation (XSS & injection flaws)

  • 6 affiliated CVEs with this CWE, with average CVSS score of 8.7

CWE-74 → Improper neutralization of special elements in output used by a downstream component (Injection)

  • 3 affiliated CVEs with this CWE, with average CVSS score of 9.8

CWE-200 → Exposure of sensitive information to an unauthorized actor (Information Disclosure)

  • 2 affiliated CVEs with this CWE, with average CVSS score of 7.5

Takeaway: CWE-20 (input validation) was one of the most common across the board and seems to be a common problem agnostic of the package’s purpose. Information disclosure to unauthorized users is a common problem when it comes to configuration management platforms, so no surprise to see that present with Ansible.

The Bottom Line

It’s no surprise that the most common attack vectors for exploiting vulnerabilities in these packages tie directly to the underlying functionality for which the package is used for. Maintainers of open-source libraries should pay close attention to these components and be sure to follow secure coding practices when iterating and pushing new versions. If you’re a Python developer and are wondering if the packages you’re using are secure, I’d recommend giving Ochrona a try.

--

--