Coming from a background primarily focused on Java and the JVM environment, I have decided to delve deeper into Python this year. One is because the convenience of using Python for scripting, the other is the interest of learning more about ML and AI.
One thing I am surprised is that there are about 20+ projects related to package and dependency management listed in the Python Package User Guide, while in Java world, package and dependency management have a few good options.
In this blog, I am going to walk you through tools like pip
, venv
, pipenv
and poetry
with a simple example. During this process, I will explain what tools are suitable for what kind of work to help you make the choice.
What is package and dependency management?
Modern softwares usually have many dependencies this prevents reinventing the wheels, and well written packages are very useful and hard to create too. Imaging you have to write HTTP request processor every time if there is no package like requests
exist.
Therefore, softwares usually depends on other packages. In Python world, if you don’t have the dependency installed, you will usually get the error of no module found
.
Dependency management is part of package management, which focuses on downloading the dependencies, pining their versions, installing the transitive dependencies, etc.
Package management also includes publishing the packages that you created to a remote repository, just imaging where you download the packages, someone should publish packages there before we can use them.
In this blog post, I will focus on dependency management since I am mostly consuming the packages instead of publishing new ones currently.
pip
If you ever used Python, you must have used pip
already. Every time if you are searching for error like no module found for
, you will probably get a Stack Overflow answer to run pip install
on the missing module.
pip is the package installer for Python. You can use it to install packages from the Python Package Index and other indexes.
pip
get the modules from the Python Package Index (PyPI).
The Python Package Index (PyPI) is a repository of software for the Python programming language.
If you use Python scripting occasionally. pip
probably is a good enough solution for you.
However, pip
is enough if you use Python for more and bigger projects. This is because pip install packages globally. This can result in conflicts between different projects that requires different version of the same package. It also means that you need administrative privileges to install packages, which can be problematic on shared systems.
Virtual Environment
A Python virtual environment is a self-contained and isolated environment in which you can install Python packages and dependencies for a specific project without affecting the system-wide Python installation or other projects.
Virtual environments are particularly useful when you’re working on multiple Python projects, each with its own set of dependencies or when you want to ensure a clean and reproducible environment for your project. This solves the problem of installing packages globally with pip
.
We are going to use thevenv
module to create virtual environment in a simple example.
1. Create a project with virtual environment.
# Create a new directory for your project (if not already created)
mkdir venv-demo
cd venv-demo
# Create a virtual environment named 'venv-demo'
python -m venv venv-demo
This will create a directory named venv-demo
within your project directory, containing the isolated Python environment.
2. Activate the virtual environment.
source venv-demo/bin/activate
3. Install packages with pip
and work on your project.
# install package `requests`
pip install requests
# use package `requests` to fetch your ip address
echo """
import requests
response = requests.get('https://httpbin.org/ip')
print('Your IP is {0}'.format(response.json()['origin']))
""" > ip.py
Then run the script using python3 ip.py
to print your IP address.
Instead of using pip install
to install all individual dependencies, you can also put all your dependencies into a requirements.txt
file, and run pip install -r requirements.txt
.
4. Deactivate the Virtual Environment when you are done with the project.
deactivate
pipenv
pipenv
combines both dependency management (similar to pip
) and environment management (similar to venv
), making it a one-stop solution for project setup.
pipenv
generates a Pipfile.lock
that ensures deterministic builds by locking the versions of dependencies. This means every developer working on the project will be using the same versions of the packages, which avoids the "works on my machine" problem.
pipenv
also provides simpler and consistent commands. We can installpipenv
with pip install pipenv
. The following steps create the same example as above.
- Initialize a New Virtual Environment like:
pipenv -python 3.8
- Install packages:
pipenv install requests
. This will addrequests
to thePipfile
and lock its version inPipfile.lock
. - Create the same
ip.py
file. - Activate the virtual environment with
pipenv shell
. - Run script with
python3 ip.py
. - Deactivate the environment with
exit
.
You can check what packages are installed by running the pipenv graph
.
╰─$ pipenv graph
requests==2.31.0
├── certifi [required: >=2017.4.17, installed: 2023.7.22]
├── charset-normalizer [required: >=2,<4, installed: 3.3.0]
├── idna [required: >=2.5,<4, installed: 3.4]
└── urllib3 [required: >=1.21.1,<3, installed: 2.0.6]
In this example, 4 extra packages are installed as dependencies of requests
.
poetry
poetry
is a unified tool that gains popularity. It provides dependency management, packaging, and publishing in a single tool. It uses the pyproject.toml
file (PEP 518) for configuration, which is seen as a modern standard for Python projects.
To create the same example of fetching the IP address, we first install poetry with the following command from official doc.
curl -sSL https://install.python-poetry.org | python3 -
Create a new project and initialize with poetry:
mkdir poetry-demo
cd poetry-demo
poetry init
You can then add the dependency like below:
poetry add requests
This will update the pyproject.toml
with the direct dependency and create a poetry.lock
file to lock the version of requests
and its transitive dependencies.
Similar to venv
and pipenv
you need to activate the environment.
poetry shell
Then you can add your code. I created a file named ip.py
under poetry_demo
folder.
import requests
def ip():
response = requests.get('https://httpbin.org/ip')
print('Your IP is {0}'.format(response.json()['origin']))
Then I added this script to pyproject.toml
, this is what the file looks like:
╰─$ cat pyproject.toml
[tool.poetry]
name = "poetry-demo"
version = "0.1.0"
description = ""
authors = []
readme = "README.md"
[tool.poetry.dependencies]
python = "^3.11"
requests = "^2.13.0"
[tool.poetry.scripts]
script = "poetry_demo.ip:ip"
[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"
Then we are ready to run the script with:
poetry run script
Similar to pipenv
, a lock file poetry.lock
is created to lock the exact version of all dependencies ( requests
) and transitive dependencies (what is required by requests
). poetry
can also show dependency graph:
╰─$ poetry show --tree
requests 2.31.0 Python HTTP for Humans.
├── certifi >=2017.4.17
├── charset-normalizer >=2,<4
├── idna >=2.5,<4
└── urllib3 >=1.21.1,<3
venv
, pipenv
, poetry
which one should I use?
If you are not building a complex project, I think any of the tools are good enough. However, pipenv
and poetry
are better at pinning the dependency versions, and handling transitive dependencies than venv
.
Transitive dependencies are those indirect dependencies introduced by your direct dependencies. If your project depends on A
and B
, and both dependencies depends on C
but on different versions, there might be conflicts. lock
file shows you what exact versions are used.
A -> C-1.0
B -> C-2.0
Both pipenv
and poetry
attempt to solve this by choosing a version of C that's compatible with both A and B, if possible. If not possible, they raise an error indicating the conflict, allowing the developer to take action.
Comparing pipenv
and poetry
, I personally enjoy using poetry
more, because it is more like other packaging and dependency tools I used in other languages. It is easy to run scripts without remembering the exact python file name by configuring pyproject.toml
.
Conclusion
Packaging and dependency management is a complex topic in Python. Because there are so many packages, and the tools didn’t catch up with the requirements at first. But with tools like pipenv
and poetry
, dependency management has improved a lot. If you started learning the tools, starting with eitherpipenv
and poetry
should be a good choice.