Project versioning and tagging
Drew LeskeLately we’ve been discussing project versions and tagging for both releases and packaging. This has come up for tags on container images, and we also have developed a couple of Python libraries which we may want to publish to a public repository. While tags on container images have fairly relaxed restrictions, the Python Package Index for example follows a very prescriptive specification for versioning, and ideally we can find a solution that will work for both and is reasonably meaningful and intuitive.
A few words on nomenclature. The term “tag” has dual meanings here: there is container tagging, which are essentially container versions, and there is tagging in Git, where a commit is given an addressable name of some significance and possibly other metadata. Hopefully it’s clear to the reader which one I mean in this article.
Versioning systems
Our starting approach has been to loosely follow semantic versioning. By “loosely” I mean we follow the basics and may or may not adhere to a strict interpretation. The basics are that a version string has the format “MAJOR.MINOR.PATCH”: a MAJOR update is when API compatibility changes; a MINOR update is new features without breaking compatibility; a PATCH update fixes bugs. This is a good system, and clear; I only plan to diverge from it if we have major updates that don’t break API compatibility from previous versions but nonetheless represent significant new value and updates, warranting a major version update.
I like semantic versioning: it’s simple, makes sense, and so it’s easy to follow.
A complementary system comes from the Python ecosystem, which introduced PEP 440 and its successor in the Python Packaging User Guide. This is similar to semantic versioning but both more restrictive in its format and less restrictive in what the initial numeric components signify, though “abiding by these aspects is encouraged”. This system supports but does not enforce semantic versioning.
Neither of these systems precludes the other for what we want to represent. We want to (somewhat loosely) follow semantic versioning and in some cases we must follow the Python packaging guidelines, so we’re looking for the intersection here.
Where it gets hairy
These systems work well for releases, where we decide the software is at a deliverable state and we assign major, minor and patch numbers based on the last release version, the changes to the software since then, and the semantic versioning system. The trouble starts, as it always does, with trying to do more.
We sometimes make mistakes in packaging. As such I like the convention of
package release versioning by appending a hyphen and sequence number
starting at 1. For example, the package for my firetruck
app might be
0.4.1, but I had a bug in my CI pipeline that built an empty container image.
This is obviously useless. A quick fix and I can rebuild and push 0.4.1-1 to
the repository, and this is clearly an update over 0.4.1 (in which -0
is
implied and never… explied?).
I could be wrong about this: it’s clear to me but I’ve also spent quite some
times as a Unix sysadmin, and Linux distributions often use -N
to signify
updated packages of the same code.
Semantic versioning provides for the inclusion of build metadata, but
specifically states there is no precedence applied to it, so 0.4.1+1
would
not necessarily be chosen over 0.4.1
if the user requested the latest.
I will probably have to let go of this. Anyway, since the packaging is generally tied up in the source code, unlike a Linux distro’s packaging of software from another source—or a project using a non-code build system like Jenkins—the update to the packaging occurs in the same repository and its history and so on, and differentiating between functional and packaging changes.
The other complication is that we sometimes want or even need to assign meaningful version numbers to development builds. This is where the difficulties arise and what we’re pulling apart in the rest of this article.
Production versus development
Our basic development workflow shouldn’t be alien to many:
- We have a
main
branch which is normally the target for merging branches - Features and fixes are developed on a branch
- When ready the branch is merged to
main
- When ready we tag a commit on
main
with a release number
Typically updates to the main
branch are deployed to our dev environment,
and a release will be deployed to the prod environment. Developers test the
feature branches locally on their workstations or VMs, and automated testing
is also in play.
Release tagging
Git supports two types of tagging: lightweight and annotated. Annotated
tags have messages associated with them and are intended for releases
(according to the man page). The message
for the release tag will be the same as the release’s entry in the CHANGELOG
file. (Lightweight tags are sort of an alias for a commit, like assigning it
a name, and don’t really come into play here.)
Package tagging
Packages—container images, Python packages, other build outcomes—are tagged
with the release tag, if the build is for a tagged commit. So, if the code is
ready to go with release 1.0.4, we’ll tag it as such and build a package
firetruck-1.0.4.tgz
or an image firetruck:1.0.4
or whatever. If we’re
publishing our thing to a package repository such as PyPi
or a container registry such as DockerHub, this
tagged build artifact will be pushed there.
In-between versioning
We don’t tag every commit on the main branch, obviously, but if we’re working
on a web app and, as mentioned above, we want to have the latest on main
to
be running on the development platform, we’ll need to build a container image
tagged appropriately, and we also might want to make that tag available
somewhere in the app we can see it, so we’re sure of what iteration we’re
looking at.
The first option we’ll look at is git describe --tags
.
For the current state of this website’s repository, for example:
$ git describe --tags
1.0.8-232-g66fd264
This indicates:
- the last tag on this branch is 1.0.8;
- there have been 232 commits to this branch since then (we’ve abandoned the use of tags in this repo—never mind);
- this is a Git repository (that’s the “g”); and
- the commit represented here has the short hash 66fd264.
This is really useful in a lot of ways. It’s clearly in development, not a release, and you get an idea how far from the last release this is. Finally, if you want to track down the code, you can see exactly which commit this is running.
There are some drawbacks apart from the superfluous “g”. For one, it depends on there being tags. Well, too bad. We want to use tags meaningfully, so this is moot except for the first development of a project, and we’ve simply made “tag at 0.0.0” one of the steps of setting up a new repository.
A bigger problem is that in CI pipelines, we generally don’t clone the entire repository. To save time, space and bandwidth, pipelines by default use a shallow clone, so that only the last X commits from the current branch are pulled to the executor. Running the above on a clone of our website’s repository won’t find the tag, let alone get the correct number of commits since. So if our CI pipeline handles the container build or whatnot, this will be a problem.
Another issue we’ll have to deal with is that the above format doesn’t conform
to the version standards we’ll want to follow but this will be the case with
whatever we choose and we’ll have to write a simple converter. For example,
the above could be made compliant by translating it to 1.0.8+232.66fd264
(yes, I ditched the “g”).
Branch versioning
We sometimes want a build outcome from a development branch. Maybe we want to
try out our container image locally or try an in-development Python package on
a VM. The above is available, but it does not indicate the branch and so at a
glance the build could be confused with a build from the main branch. The
above tag for this branch is actually 1.0.8-233-gd08095b
, while the branch
name is i88-tagging-and-versioning
.
We could add the branch name to the above, or some kind of indicator:
- the entire branch name, translated for compliance to versioning standards.
For example,
1.0.8+232.66fd264.i88.tagging.and.versioning
. - if we always follow the convention of prefixing the branch name with the
issue number, we could take this first part:
1.0.8+232.66fd264.i88
. - we could just mention it’s a branch without specifying it using some static
indicator for any branch other than main:
1.0.8+232.66fd264.br
.
My preference is the second one, because it’s more compact and I also like that it would enforce the convention of including the issue number in the branch.
We’re not doing any of these, though, because of the issue with the cloning
depth, so the number of commits since the most recent release will be
unavailable. So we’re eliminating that, and we have to build the tagging
string ourselves, and git describe --tags
might not work anyway.
Wait but hang on, you said…
The astute reader, such as me on the third pass of writing this article, may
notice the contradiction in using the syntax for build metadata in development
releases when it was made clear that so far as semantic versioning is
concerned, 1.0.8+232.66fd264
is equivalent to 1.0.8
and there is no
implication that the former is more recent or more valid or more or less
anything than the latter. Well, there are two counterarguments to this.
First, tough: we need something. Second, development builds are for our use
only–we want them to conform as much as possible to the spirit of the law,
but we need them to conform to the letter just so we can build the packages
with Python tools.
So whether it’s a necessary evil or a weak excuse for a local hack, it’s what we’ve got.
Where we got to
Our final determination on versioning and tagging is as follows:
- We will tag a new project with
0.0.0
as part of the repository initialization, along with committing a blank or nearly blank README. - We will continue to loosely follow semantic versioning as we have been.
- Versioning for packages, container images and other build outcomes of a release will use that version as the tag or in the package name.
- Versioning for build outcomes of a non-release on the main branch will
have the format
<last-release>+<short hash>
. - Versioning for build outcomes on a feature branch will have the format
<last-release>+<short hash>.<branch>
where<branch>
is the branch name normalized to use the character set[a-zA-Z0-9\.]
.
Examples:
firetruck-0.1.4
orfiretruck:0.1.4
(release)firetruck-0.1.4+66fd264
(interim build from main)firetruck-0.1.4+66fd264.i32.new.feature.wow.wow.wow
(build from feature branch)