Verifying integrity of Python packages

Verifying integrity of Python packages

Python's ecosystem is one of its greatest strengths. Every time you install a package with pip, you're pulling code from the internet and executing it in your environment. That makes package integrity verification a critical (and often skipped) security step.

This post covers:

  • How to download packages with pip
  • How to verify their integrity
  • Why integrity verification matters

How pip Downloads Packages

The most common way developers install a Python package is by running a command like this:

pip install requests

By default, pip:

  1. Resolves dependencies
  2. Downloads package files from Python Package Index (PyPI)
  3. Installs them into your environment

pip uses HTTPS behind the scenes, which protects against basic man-in-the-middle attacks, but HTTPS alone does not guarantee that the package itself hasn’t been tampered with or replaced upstream (for example, through compromised accounts or malicious uploads).

That’s where integrity verification comes in.

How to Verify Package Integrity with pip

Perhaps the best option is to use hash verification. Here, you tell pip exactly which file is allowed to be installed by giving it a cryptographic fingerprint (hash). If the file is different in any way, pip refuses to install it.

  1. Install the package normally (once). This lets pip resolve dependencies and identify the exact version. It may install other dependencies (i.e., additional packages).
pip install requests
  1. Generate a requirements file with pinned versions.
pip freeze > requirements.txt
  1. View the generated requirements.txt file. This lists all downloaded packages and their versions, but the integrity of the packages is not yet verified. The file may look something like this:
certifi==2023.11.17
charset-normalizer==3.3.2
idna==3.6
requests==2.31.0
urllib3==2.1.0
  1. Download all packages in the requirements.txt file to a packages/ subfolder so that you can tell pip which exact files are allowed.
pip download -r requirements.txt -d packages/
  1. Get the exact filename of the downloaded package you wish to get the hash for. For example, the command below will return "requests-2.32.5-py3-none-any.whl".
dir packages\requests*
  1. Generate the hash for this single package. pip hash can only be run on one file at a time.
pip hash packages/requests-2.32.5-py3-none-any.whl

You will get an output which lists the package name and its hash like this:

c:\Temp>pip hash packages/requests-2.32.5-py3-none-any.whl

packages/requests-2.32.5-py3-none-any.whl:
--hash=sha256:2462f94637a34fd532264295e186976db0f5d453d1cdd31473c85a6a161affb6
  1. Update requirements.txt and paste the hash into each line. Now repeat this for every package.
certifi==2023.11.17
charset-normalizer==3.3.2
idna==3.6
requests==2.31.0 \
    --hash=sha256:2462f94637a34fd532264295e186976db0f5d453d1cdd31473c85a6a161affb6
urllib3==2.1.0
  1. As you install your packages in your higher up environments, enforce hash checking on all future installations as shown.
pip install --require-hashes -r requirements.txt

pip will now refuse to install anything that doesn’t match the expected version and cryptographic hash. This protects you against tampered packages and dependency confusion.

Why Verifying Package Integrity Is Important

  1. Supply chain attacks are increasing

Attackers increasingly target:

  • Maintainer accounts
  • CI pipelines
  • Dependency confusion vulnerabilities

Once malicious code is published, it spreads instantly to thousands of systems via automated installs.

2. Transitive dependencies multiply risk

You may install one package, but pip may install dozens more automatically. If any dependency in the chain is compromised, your application is compromised.

Hash verification locks the entire dependency tree to known-good artifacts.

  1. Builds must be reproducible

Without integrity checks, the same build today may not match tomorrow if a package is updated.

What About Virtual Environments and Docker?

Virtual environments and containers do not solve integrity. They only isolate where the code runs. You still need to verify what goes into those environments. So the verification instructions mentioned above is still needed.

Example in Docker:

COPY requirements.txt .
RUN pip install --require-hashes -r requirements.txt

Summary

At minimum, there are two practices you should consider starting doing:

  1. Store hash values of your packages in requirements.txt
  2. Always use pip install --require-hashes

While pip makes installing software incredibly easy, it also makes it easy to run unverified code in production. With pinned versions in requirements.txt and hash verification during installs, you can reduce risk.