Chief Scientist Emeritus Fabian Yamaguchi and foundational Code Property Graph technology recognized with IEEE Test of Time Award

For a lot of programmers, Python is their “love language.” Easy to learn and use, Python is perfect for building cutting-edge machine learning and cloud computing projects. Unfortunately, knowing that programmers love Python, malicious actors have started targeting the Python Package Index (PyPI) as part of supply chain attacks. 

As researchers identify more malicious Python packages, understanding how attackers are infiltrating PyPI and how to protect your application is critical. 

The Supply Chain Attack Threat

Although depositing malicious packages in software repositories is nothing new, attackers have started focusing on PyPI as the language becomes more popular. Attackers can infiltrate legitimate applications by targeting commonly used open-source software packages. 

A brief history of open-source malicious packages looks like this:

  • November 2022: 29 malicious packages detected in PyPI
  • January 2023: malicious ‘Lolip0p’ packages installing infostealer malware identified
  • February 2023: 451 nearly identical malicious packages identified in PyPI
  • March 2023: new malicious packages that obfuscate activity identified
  • May 2023: PyPI suspends new user and project signups in the face of more malicious users and packages
  • June 2023: attack taking advantage of Python bytecode files being directly executed 

Understanding New Threats

As attackers continue to use the PyPI repository to deploy attacks, they’ve become increasingly sneaky. While supply chain attacks like this are nothing new, the recent focus on PyPI is concerning. As attackers evolve their methodologies, developers should consider how malicious actors use these open-source packages and how that can compromise security.

Infiltration and Trickery

As attackers gain more experience, they evolve how they infiltrate the repository. To trick developers into using the resources they use:

  • Typosquatting, changing the file name a tiny bit so developers think the packages are legitimate
  • Complete project description to make the package appear legitimate
  • Automation to flood the PyPI ecosystem with the packages

Malware Behaviors

Some basic attack types found in these malicious packages include the following:

  • Installing malware on developers’ machines to steal information like files, passwords, browser cookies, system metadata
  • Attempting to run PowerShell on the device as a way to fetch the executable file that launches the infostealer
  • Evasive functions that determine whether they run inside a virtual machine or attempt to get around antivirus software
  • Remote access trojans (RATs) for collecting data, terminating applications, taking desktop screenshots, stealing cryptocurrency, and spying through the device’s webcam
  • Hiding malware in compiled bytecode rather than source code to hide from code scanning technologies

Protecting Your Software from Malicious Python Packages

PyPI has implemented new security capabilities in response to this barrage of attacks, like hiring new staff and requiring all accounts to implement two-factor authentication by the end of 2023. Although this will limit attackers’ ability to poison the repository, you should still build some security processes into your programming. 

Check sources

Attackers have become more adept at making their malicious Python packages look legitimate. When using an open-source repository, you should:

  • Verify the upstream repository
  • Read the file names carefully to identify typos
  • Choose well-maintained, regularly updated repositories

Know your components

To monitor for new vulnerabilities, you need to know what you have. With an intelligent software composition analysis (SCA) solution, you can automate the scanning process to get continuous visibility into your application’s components and changes in their security status. 

Limit all access

When you implement the principle of least privilege, you grant a person or system the least amount of access necessary to complete their job function. For example, you should:

  • Understand and limit the permissions required for the API calls that application must make
  • Remove duplicate sets of permissions
  • Apply only the least privileged set of permissions

Automate code-scanning

Since manual code reviews are time-consuming, you may only do them before pushing the application to production. In the end, this process becomes equally time-consuming and more costly as you have to go back and find the vulnerability so that you can remediate it. 

With automated code scanning, you can build the review into the development workflow. You should be scanning the following:

To protect your application, you should scan both the source and compiled code, especially as attackers seek to hide malicious packages in bytecode.

Prioritize activities based on reachability

Scanning will tell you the vulnerabilities in your application. However, you can’t remediate them all. 

To secure your application, you should prioritize your remediation activities based on reachability, whether attackers can exploit a vulnerability to compromise it. When you know how an application’s inputs relate to the sensitive data it stores, transmits, and processes, you can focus your limited time and staffing on the activities with the most security impact. 

Qwiet AI: Analyzing your Python code for you

With Qwiet AI’s preZero platform, you can identify and remediate your application’s most critical and impactful vulnerabilities. Our Code Property Graph (CPG) breaks down code into its fundamental parts so that you have a comprehensive component inventory. Our lightning fast scans can help you  discover source code vulnerabilities quickly. To understand how malicious actors evolve their supply chain attacks, you can use Qwiet Blacklight, the only threat intelligence feed focused on application security. 

Try Qwiet AI’s preZero platform for free to see how it can help you mitigate malicious Python package risks. 

About Qwiet AI

Qwiet AI empowers developers and AppSec teams to dramatically reduce risk by quickly finding and fixing the vulnerabilities most likely to reach their applications and ignoring reported vulnerabilities that pose little risk. Industry-leading accuracy allows developers to focus on security fixes that matter and improve code velocity while enabling AppSec engineers to shift security left.

A unified code security platform, Qwiet AI scans for attack context across custom code, APIs, OSS, containers, internal microservices, and first-party business logic by combining results of the company’s and Intelligent Software Composition Analysis (SCA). Using its unique graph database that combines code attributes and analyzes actual attack paths based on real application architecture, Qwiet AI then provides detailed guidance on risk remediation within existing development workflows and tooling. Teams that use Qwiet AI ship more secure code, faster. Backed by SYN Ventures, Bain Capital Ventures, Blackstone, Mayfield, Thomvest Ventures, and SineWave Ventures, Qwiet AI is based in Santa Clara, California. For information, visit: https://qwiet.ai

Share