DEVELOPER

Secure Coding: Arming Apache Maven against Cache Poisoning Attacks

18/01/2025

Cache poisoning is a specific type of attack that targets the way Apache Maven manages caches, packages, and dependencies in the software development process. Before going into the details of cache poisoning, it is important to understand how dependency management works with Maven.

Sven has been programming Java for over 15 years in industrial projects since 1996 and in industries such as automotive, aerospace, insurance, banking, United Nations and the World Bank around the world. For over 10 years he has been a speaker at conferences and community events from the US to New Zealand, worked as developer advocate for JFrog and Vaadin and regularly writes articles for IT magazines and technology portals. Apart from his core subject of Core Java, he deals with TDD and secure coding practices.

Overview of Maven and its cache

Apache Maven is a widely used build management tool primarily used in Java projects. Maven automates the dependency management, build process, and deployment of applications. When using Maven, some basic mechanisms are used that require developers to think about the security of repositories and dependencies.

Maven uses repositories to manage libraries and dependencies. A distinction should be made between two types:

local: A copy of all downloaded libraries and dependencies is saved on the local machine.

away: Maven can access various remote repositories, such as the central Maven repository (Maven Central) or even the company’s custom repository.

Maven stores all the dependencies in the local repository (cache) after downloading them from the remote repository. This means that dependencies that are needed multiple times can be loaded more quickly because they do not require repeated access to the remote repository each time.

What is cash poisoning?

Cache poisoning describes a class of attacks in which an attacker manages to fill a system’s cache (in this case the Maven cache) with manipulated or malicious content. This may result in legitimate requests to the cache not receiving the true original data, but instead receiving data injected by an attacker. In the case of Maven, cache poisoning refers to when an attacker manages to inject malicious artifacts into the cache of a developer or build by exploiting a vulnerability in the Maven build process or repository server.

The goal of the attack is to distribute malicious dependencies that are then integrated into software projects. These toxic dependencies may contain malicious code designed to steal sensitive data, take control of the system, or cause damage to the project.

iX Workshop: Securing Web Applications with OWASP® Top 10

Types of cache poisoning attacks on Maven cache

There are several scenarios for performing cache poisoning attacks on a Maven repository:

Man-in-the-middle (MITM) cache poisoning

Using a man-in-the-middle attack, an attacker can intercept and manipulate network traffic between a development computer and a remote Maven repository. If the communication is not encrypted, an attacker can inject compromised artifacts and inject them into the local Maven cache, causing developers to continue to believe that they are using a dependency from a trusted repository. When in reality they are being tampered with.

Such an attack is particularly promising if Maven communicates with the repository via an insecure HTTP connection. The central Maven repository (Maven Central) now accepts only HTTPS to prevent such attacks, but there are still private or legacy repositories that use HTTP.

Exploit repository vulnerabilities

If an attacker gains access to the remote repository, they can upload any artifacts or alter the versions present there. This happens, for example, if the repository is poorly secured or a vulnerability in the repository management tool (such as Sonatype Nexus or JFrog Artifactory) can be exploited. In this case, the attacker can inject the malware directly into the repository, allowing developers around the world to download the compromised artifact and store it in their Maven caches.

dependency illusion

One particularly dangerous attack vector that has attracted a lot of attention in recent years is the so-called “dependency fallacy”. This attack is based on the fact that many modern software projects acquire dependencies on both internal, private, and public repositories, such as Maven Central. The main goal of a dependency confusion attack is to inject malicious packages into a company or project through a publicly accessible repository that “believes” it is using internal or private dependencies.

Basics of Dependency Illusion

Many companies and projects maintain internal Maven repositories where they store their own libraries and dependencies that are not publicly accessible. These internal libraries may implement specific functionality or be compatible with public libraries. Developers often define the names and versions of dependencies in the Maven configuration (pom.xml), without realizing that Maven sets priority when resolving dependencies, favoring public repositories like Maven Central over internal repositories – when Unless explicitly configured otherwise.

A dependency confusion attack exploits exactly this priority order. The attacker publishes a package with the same name as the internal library in the public Maven repository, often with a higher version number than the one used internally. When Maven looks for that dependency, it often prefers the publicly available package rather than using the private internal version. It downloads the malicious package and stores it in the developer’s Maven cache, from where it will be used in future builds.

How was the dependency illusion discovered?

A security researcher named Alex Birson popularized this attack by organizing demonstrations in 2021 to show how easy it is to poison dependencies in projects of major tech companies. By releasing packages with names similar to internal libraries of large companies such as Apple, Microsoft, and Tesla, he was able to successfully launch dependency confusion attacks against these companies.

Birsan did not use malicious content in his attacks, only harmless code – simply to prove that the system was vulnerable. He was able to show that in many cases the companies’ build systems had downloaded and used malicious (in his case, harmless) packages instead of genuine internal libraries. This revelation increased awareness of the risks of dependency fallacy in the security community.

Why does Dependency Confusion work so effectively?

The success of a dependency confusion attack lies in the default configuration of many build systems and the way Maven resolves dependencies. There are several reasons why this attack vector is so effective:

Automatic prioritization of public repositories
trust in version number
signature verification missing
trust external code

typosquatting

Typosquatting attack techniques exploit user inattention by targeting common typos that can occur when typing package names in software development, such as in Maven. Attackers release packages with identical or slightly misspelled names that closely resemble legitimate libraries. The description may include minor changes such as missing letters, additional letters, or alternative spellings. If developers accidentally enter incorrect package names in their dependency definitions or automated tools resolve these packages, they end up downloading malicious packages. Typosquatting is one of the most well-known attack methods involving manipulating package managers like Maven, NPM, PyPy, and others that host publicly available libraries.

Typical typosquatting techniques

Misspelled package names: One of the simplest techniques is to change or add a letter to the name of a well-known library. An example would be the com.google.common package, which is frequently used. An attacker can publish a package named com.gooogle.common (with the extra “o”), which is easily missed.

Different spellings: Attackers can also use alternative spellings of well-known libraries or names. For example, an attacker could release a package called com.apache.loggin, which looks similar to the popular com.apache.logging but is easily exploited due to the missing letter combination of “n” and “g” in “logging”. Is left out.

Using prefixes or suffixes: Another option is to add prefixes or suffixes that increase similarity to valid packages. For example, an attacker could publish packages such as com.google.common-utils or com.google.commonx, which are identical to the legitimate package com.google.common.

Similarity in naming: Attackers can also take advantage of naming conventions in the open source community by publishing packages that contain common words or abbreviations that are often used in combination with other libraries. An example would be releasing a package like commons-lang3-utils, which is reminiscent of the popular Apache Commons library commons-lang3.

dangers of typing

The threat of typosquatting is particularly serious because it is difficult to detect. Developers often rely on build tools like Maven to reliably download and integrate packages into their projects. If you enter the wrong package name, you may not immediately realize that you have included a malicious dependency. Typosquatting is a form of social engineering because it exploits people’s sensitivity to errors.

A successful typosquatting attack can have serious consequences:

data loss
Introduction to Malware
loss of trust

Known cases of Maven typosquatting

There have also been incidents of typosquatting in the Maven community. In one case, a package called commons-login was released that was identical to the legitimate ApacheCommons logging package commons-logging. Developers who entered the package name incorrectly downloaded and integrated the malicious package, creating a potential security risk.

Typosquatting is a sophisticated and often difficult to detect attack method that targets human error. Attackers take advantage of the widespread use of package managers like Maven, NPM, and PyPy by publishing slightly misspelled or similar-sounding packages containing malicious code. Developers and organizations should be aware of this threat and take appropriate protective measures to ensure that only legitimate and trusted packages are included in their projects.

iX Workshop: Programming More Productively with Github Copilot and ChatGPT

{{post_title}}