As computers become more connected, distributing programs has never been easier. Over the last few decades, updates have gone from something distributed in the mail on physical media to data which is constantly available on the internet. This has led to an explosion of different mechanisms which distribute full programs to end users, and fragments of programs (libraries) to software developers. These fragments of programs allow modern developers to compose the work of hundreds or thousands of people into something powerful and easily digestable. These programs and program fragments can be collectively referred to as “packages”, and the mechanisms for reasoning about them and shipping them around can be referred to as “package managers”. With this post, I’m kicking off a series on package management. I’ll endeavor to layout the common strategies, speak about specific implementation decisions, and how I’ve perceived this space to change over time.
Static vs Dynamic Dependencies
No matter how you slice it, there are two ways for programs to access the fragments they depend on:
- Providing their own copy of each dependency. (static)
- Relying on a version of their dependency provided by the environment in which they’re executing. (dynamic)
The specifics of whether dependencies are static or dynamic changes a lot based on the technology you’re using. A classic and demonstrative example can be seen when programming with C or C++.
When you invoke
gcc (a very popular C compiler), you can produce a program that uses either of the strategies above.
When “statically linking”, the program itself contains all of the instructions needed to run it. As the program is
compiled, the code from its dependencies will be copied in. It makes it easy to distribute the program, as you don’t
need to spend much time reasoning about the computer it will run on. Your only concern may be whether the computer it
will run on uses Linux, Windows, macOS or something else. But this approach has downsides:
- The program you’re distributing is bigger than if you hadn’t copied in the instructions from all of your dependencies.
- Every change to that program requires it to be recompiled and redistributed.
On the other hand, “dynamically linking” a program allows it to rely on code fragments that it finds on the machine when it runs. This allows several programs to share the same set of dependencies, therefore taking up less space on a user’s computer, and using less bandwidth to download. It also allows for separation between code used to support the platform and code used exclusively for the program being distributed.
For example, OpenSSL is an implementation of TLS. It is a library that allows computers to communicate securely with one another. Over time, security vulnerabilities are discovered and improvements made. It is crucial to stay up-to-date with the most recent version of OpenSSL. By dynamically linking OpenSSL into your program, you’re allowing the owner of the computer your program is running on to ensure they’re safe without needing to download a new copy of every program on their computer. OpenSSL even describes this on their release strategy page., “…an application compiled and dynamically linked with 1.1.0 des not need to be recompiled when the shared library is updated to 1.1.1.”
Traditional Package Managers
No doubt, the biggest weakness of using dynamically linked libraries is reasoning about what is going to be available on the machine your program eventually runs on, and programming in a way that can accommodate as many variations of that as possible. As hinted at in the OpenSSL example above, responsible library authors try their best to protect their users by not creating a situation where programmers have to choose between two incompatible versions. However, even with these authors’ best efforts, as a programmer you often need to pick a lowest supported version of a library and assert that at least that is present on the machine.
This problem is one of the primary factors that led to the development of popular Linux package managers like the Yellowdog Updater, Modified, its successor Dandified YUM, and the Advanced Package Tool. Each of these programs allows a computer user to choose a program they would like to install, then fetches any uninstalled libraries needed to run that program and puts them in a predictable place ready for re-use. Historically speaking, a lot of the libraries written for Linux computers have been written in C or C++, but these package managers are agnostic to the language the library is written in. They can just as readily distribute programs/program fragments written in Python, Go, Rust, or anything else.
But how do they choose which versions of programs to ship? There are different philosophies, and that has caused a wide range of different “distributions” of Linux to blossom over the years. Some distributions like Arch use a “rolling release” strategy where the most recent version of every package is shipped. All programs must be constantly maintained to stay compatible with the most recent versions of the packages they depend on. But obviously, it’s not total anarchy. “Maintainers” of packages in Arch and other rolling release platforms know what programs depend on them, and make sure they have a transition plan in place before making widespread breaking changes. This model has picked up steam in recent years, and now many Linux distributions have a separate rolling release offering.
Despite inroads by rolling release distributions, the “standard” release model remains the most popular choice, and is the model used by flagship verisons of Ubuntu, Debian, Fedora, Red Hat Enterprise Linux, SuSE Enterprise Linux, and most others. Their strategy is to pick a window of time, and to lock all packages into a compatibility band for the duration of that window. Small changes like bug fixes and security patches will be accepted, but breaking and additive changes must wait until after the time window elapses. This allows users (especially enterprise server customers) to adopt new environments at a pace easy for them to control and test in big increments, instead of constantly tending to maintenance of their systems.
Traditional package managers have really stood the test of time, and distribute both static and dynamically linked code well.
So that’s the end of it, right? Package managers are perfect, blog-post complete!
Not quite. The package managers listed above have some common pain points:
- You can only have one version of any given package installed at one time.
- There’s some non-negligible overhead to shape your program into a package that they can work with.
- They’re not ubiquitous.
Some of these problems are pretty easy to work around. For years,
Fedora and Ubuntu distributed both
Python2 and Python3 by calling one
python and the other
python3. The overhead of creating a
.deb package for
apt or an
dnf can be frustrating until you know what you’re doing, but is often done by someone other than the
author of the program.
But the fact that there are so many different package managers that use many different formats, and are usually tied to just one or two operating systems is a real hassle. Further compounding the problem, major platforms like macOS and Windows only have unofficial community-supported package managers like Homebrew and Chocolatey. (There is some hope on this front; at time of writing, Microsoft is in the process of building one called Windows Package Manager.)
In light of these hurdles, and with how cheap hard drive space and bandwidth have become, lots of alternatives have presented themselves. I hope to explore many of those options in upcoming posts, but some of the trends are:
- Bespoke, independent install wizards (looking at you Windows and macOS.)
- App stores, which essentially follow the static model, distributing each app with independent copies of all of their dependencies. Think of the Apple App Store, Google’s Play Store, the Microsoft Store, etc.
- A new generation of package managers, which are similar to app stores, but help manage the project by rebuilding and distributing when dependencies change. Examples include Canonical’s Snapcraft and Flatpak. Arguably, Docker containers belong in this category.
- Package managers provided by and/or focused on a particular language. Cargo provides Rust
packages, Go modules have dramatically reshaped the Go ecosystem,
NuGet distributes .NET programs, and
All in all, traditional package managers have been a mainstay of computing for decades. They provide a framework for computer users to efficiently curate the programs and libraries needed on a particular computer, and have ushered in an era of massive choice.