Home > Blog > ProTip: Try to avoid sudo pip install

ProTip: Try to avoid sudo pip install

We've all read those blog posts and seen documentation for software that advocates the use of sudo pip install to install a package globally on your system. The goal of this blog post is to convince you that you should avoid this at all costs. Please note that this also applies to any non-system package manager (i.e. gem, pear, npm/yarn, etc), and everything covered in this post, while being pip/python specific, can typically be applied to other systems. Also, we will likely post follow up articles to cover some of those as well.

I know, I know, you're thinking, "But the documentation for the software explicitly told me to use sudo pip install!", or maybe "But *Well Respected Tech Thought Leader* wrote a blog post with sudo pip install all over it". True, you will find these instructions all over the internet. But read on, and maybe by the end of this post I can convince you to try a different approach.

Conflicts with System Package Manager

My first point is that most systems already have a package manager that is used to install system wide packages. Using another one system wide can lead to some strange and unintended consequences. Some of you may have already experienced this. You install a package with say pip system wide using sudo, and the next thing you know, other system packages break. Maybe your system's own package manager breaks! It's really hard to fix broken system dependencies when the system package manager doesn't work. Or even worse, you could break one of the packages involved in booting your system.

Obviously, these are extreme cases, and you will not always break your system. You may also be thinking, but I'm just installing some tool not used by the system, because if it was used by the system it would already be installed! This is true, that will not necessarily break anything. HOWEVER, don't forget about dependencies. The package you are installing may not conflict with anything on the system, BUT, it may have a dependency that conflicts with a system package, or requires a newer version of a package that some system package depends on. And think about it, how often have you upgraded some dependency only to have it break everything? Do you really want to chance breaking a dependency needed by a system tool?

Another issue is that most system package managers don't just let you install software but also uninstall software too. System package managers, however, don't just track what gets installed, but also dependencies. When you uninstall a package with a system package manager, it knows that the dependencies for that package are no longer needed, and makes it possible to "clean up" and also remove those unneeded dependencies. Non-system package managers don't always allow or make it easy to uninstall a package. If they DO support uninstallation, they don't always make it easy to also remove unneeded dependencies. For example, pip has the ability to uninstall a package, however, it leaves behind all of the packages installed as dependencies. This can obviously cause problems by causing dependency conflicts with other packages, including system packages.

Security concerns

Let's be honest. As good as say the group running PyPi is, there are bound to be a malicious package here or there that gets through. Sometimes it's a typo squatting package name (this is more and more rare on something like PyPi, but pretty common on other indexes), or there could be a compromised credential allowing a bad actor to replace a known package with a malicious one. Now I'm not saying this is extremely common. It's actually pretty rare, but it DOES happen.

Also, third party package indexes rely on developers uploading their own packages. When this is the case, the number of developers uploading will always be far larger than the number of moderators looking for security issues. Unlike operating system repositories where uploads to the repository are strictly controlled and only allowed by trusted individuals where each package is vetted prior to uploading. This is why compromised system repository packages happen so very infrequently as compared to the third party indexes.

Now, you may be thinking, "But I only install specific known packages, like ansible, which is distributed under the watchful eye of giant's like Redhat. I'm sure they are very vigilant on making sure their packages are always safe, BUT what about (again) the packages dependencies? Even something like ansible has quite a few dependencies, not all of which are under the control of an organization with the resources of Redhat. And yes, while you originally typed sudo pip install ansible, all of the packages that get installed (including all of the dependencies) are potentially running setup scripts as...you guessed it, the root user.

So while you could personally vet each package and all of the dependencies and sub-dependencies for security issues, isn't it maybe just simpler NOT to install them as root? And personally, I would say this goes double for packages from the node world, where there are massive numbers of dependencies and sub-dependencies (and lets be honest, far higher frequency of malicious packages being published).

But...but...Containers!

No!

OK, in all seriousness, yes, inside of a container, my first point is more or less moot. Even if you break all of the system tools, you will likely not need them to run your one process in the container. I mean, it is still possible that you may break like the package manager, and then subsequent calls to run the package manager in your container build process will fail. But you'll know pretty quickly if that is the case.

So is it ok in a container? I still say no, mainly because of my second point. But you're probably thinking, but this is in a container, it's secure! No, not really. With the current state of containers, at least on linux, it's best to just assume that if something has root access inside of a container, that it can gain root on the container host as well. This may not be the case for container systems that have been around much longer, like bsd jails for example (but I'm not as familiar with those implementations so I will not comment). Also, linux container implementations are maturing pretty rapidly, so it may be relatively safe one day soon. But for now, my personal recommendation is to avoid letting untrusted things have root even when run in containers.

It's really just not necessary.

My final point is, that in almost all cases, it's just not necessary to install globally/run pip install as root.

Ask yourself, does this thing I'm installing have to be installed globally? Likely, the answer is no. Most of the time (say on your work/development computer), you just need it installed for you. The great thing about most systems (hey, even windows) is that they all implement this idea of a path. That is, when you run a command, there is a configurable set of directories that are searched for the command you are trying to run.

Lets be honest. Usually when you're installing software, you don't need it to be available globally. You really just want it available somewhere on the system so that you can use it. If it's a command line tool, you want to be able to type it in your shell and it to just work. And since the PATH is typically just a variable, you can update it to add an additional place to search for the software you just installed.

So what can we do instead?

I'm going to use the awesome Ansible as an example here. The documentation for installing from source actually has a sudo pip install reference. That would install it system wide, meaning that it would be on your path and available for anyone on the system to use. But what if the system is your laptop? You're the only user, why install it system wide? Did you know pip has a nifty feature to let you install to your home directory? You could for example do the following:

# First, upgrade pip, because why not?
pip install --upgrade --user pip
# Notice the --user flag. It installs stuff to your home directory!
export PATH="${HOME}/.local/bin:${PATH}"
pip install --upgrade --user ansible
ansible -v

If you update the path variable adjustment and add it to your login settings (like .bashrc or .profile), it will be available every time you log in.

Ok, so what if you actually DO need it system wide for other users to use? I would recommend setting up a system wide python virtual env (say in /opt) and installing it there. Then you can update the PATH variable in the system wide /etc/profile or similar, so that everyone on the system gets their path updated. You could do something like this:

# create a dir for the venv and adjust the owner
sudo mkdir -p /opt/ansible
sudo chown $(id -u):$(id -g) /opt/ansible
python3 -m venv /opt/ansible
/opt/ansible/bin/pip install --upgrade ansible
export PATH=/opt/ansible/bin:${PATH}

The above chown command would set the ownership of the directory to the account running the command. Then, as you see after that, you can then run pip install without needing to use sudo. Then, if you later add the path variable adjustment to the system wide login scripts like /etc/profile or /etc/bash.bashrc then all users on the system should be able to use the shared python virtual env where you installed ansible. You could also make it group writable if you want others to be able to update the versions of things installed. You could also name the directory something more generic if you want to install more than a single tool (maybe you're installing tools like pipenv, poetry, black, etc for everyone on the team to use, so maybe name it /opt/devtools).

Ok, so what about all of the instructions that tell you to install things system wide and use sudo or run pip install as root in your docker containers? Well, we can work around all of those as well. In fact, we can utilize either of the two methods mentioned previously in this post. No matter which method you choose, one important aspect would be to create an unprivileged user to execute pip as inside the container. I would also recommend adding either something like gosu or suexec. They are simple tools to allow you to run commands as a specific user. They're also much simpler typically to make use of in docker builds than say su or sudo.

For Example:

FROM python:3.8-slim
RUN apt -yq update && apt install build-essential gosu && \
    useradd -s /bin/bash -m builder && \
    gosu builder:builder pip install --upgrade --user ansible
ENV PATH=/home/builder/.local/bin:${PATH}
USER builder
CWD /home/builder

So as you see, you do any tasks you need to as root, like using the system package manager to install build dependencies and the gosu utility, but then we add an unprivileged user. Once there's an unprivileged user, we can use the gosu tool to execute the pip install task as the unprivileged user. This makes sure that the fetching of dependencies (and executing downloaded code) happens as an unprivileged user.

While this is not a guarantee to provide safety against malicious code, it at least prevents possibly malicious code from running with elevated privileges inside the container. I would also highly recommend dropping all capabilities and possibly adding a stricter seccomp profile to your docker containers. That is, however, out of the scope of this blog post (maybe a later one though).

References
  1. https://www.bleepingcomputer.com/news/security/python-package-installation-can-trigger-malicious-code/
  2. https://www.helpnetsecurity.com/2019/07/18/malicious-python-packages/
  3. https://threatpost.com/hack-allows-escape-of-play-with-docker-containers/140831/
  4. https://www.eweek.com/security/researchers-warn-of-malicious-container-escape-vulnerability
  5. Here's gosu, typically available in debian/ubuntu repositories too (apt install gosu)
  6. Also see suexec, which is also typically available in alpine repositories.
local_offer protip python ansible security