Project versioning and tagging

Drew Leske

Lately we’ve been discussing project versions and tagging for both releases and packaging. This has come up for tags on container images, and we also have developed a couple of Python libraries which we may want to publish to a public repository. While tags on container images have fairly relaxed restrictions, the Python Package Index for example follows a very prescriptive specification for versioning, and ideally we can find a solution that will work for both and is reasonably meaningful and intuitive.

A few words on nomenclature. The term “tag” has dual meanings here: there is container tagging, which are essentially container versions, and there is tagging in Git, where a commit is given an addressable name of some significance and possibly other metadata. Hopefully it’s clear to the reader which one I mean in this article.

Versioning systems

Our starting approach has been to loosely follow semantic versioning. By “loosely” I mean we follow the basics and may or may not adhere to a strict interpretation. The basics are that a version string has the format “MAJOR.MINOR.PATCH”: a MAJOR update is when API compatibility changes; a MINOR update is new features without breaking compatibility; a PATCH update fixes bugs. This is a good system, and clear; I only plan to diverge from it if we have major updates that don’t break API compatibility from previous versions but nonetheless represent significant new value and updates, warranting a major version update.

I like semantic versioning: it’s simple, makes sense, and so it’s easy to follow.

A complementary system comes from the Python ecosystem, which introduced PEP 440 and its successor in the Python Packaging User Guide. This is similar to semantic versioning but both more restrictive in its format and less restrictive in what the initial numeric components signify, though “abiding by these aspects is encouraged”. This system supports but does not enforce semantic versioning.

Neither of these systems precludes the other for what we want to represent. We want to (somewhat loosely) follow semantic versioning and in some cases we must follow the Python packaging guidelines, so we’re looking for the intersection here.

Where it gets hairy

These systems work well for releases, where we decide the software is at a deliverable state and we assign major, minor and patch numbers based on the last release version, the changes to the software since then, and the semantic versioning system. The trouble starts, as it always does, with trying to do more.

We sometimes make mistakes in packaging. As such I like the convention of package release versioning by appending a hyphen and sequence number starting at 1. For example, the package for my firetruck app might be 0.4.1, but I had a bug in my CI pipeline that built an empty container image. This is obviously useless. A quick fix and I can rebuild and push 0.4.1-1 to the repository, and this is clearly an update over 0.4.1 (in which -0 is implied and never… explied?).

I could be wrong about this: it’s clear to me but I’ve also spent quite some times as a Unix sysadmin, and Linux distributions often use -N to signify updated packages of the same code.

Semantic versioning provides for the inclusion of build metadata, but specifically states there is no precedence applied to it, so 0.4.1+1 would not necessarily be chosen over 0.4.1 if the user requested the latest.

I will probably have to let go of this. Anyway, since the packaging is generally tied up in the source code, unlike a Linux distro’s packaging of software from another source—or a project using a non-code build system like Jenkins—the update to the packaging occurs in the same repository and its history and so on, and differentiating between functional and packaging changes.

The other complication is that we sometimes want or even need to assign meaningful version numbers to development builds. This is where the difficulties arise and what we’re pulling apart in the rest of this article.

Production versus development

Our basic development workflow shouldn’t be alien to many:

  1. We have a main branch which is normally the target for merging branches
  2. Features and fixes are developed on a branch
  3. When ready the branch is merged to main
  4. When ready we tag a commit on main with a release number

Typically updates to the main branch are deployed to our dev environment, and a release will be deployed to the prod environment. Developers test the feature branches locally on their workstations or VMs, and automated testing is also in play.

Release tagging

Git supports two types of tagging: lightweight and annotated. Annotated tags have messages associated with them and are intended for releases (according to the man page). The message for the release tag will be the same as the release’s entry in the CHANGELOG file. (Lightweight tags are sort of an alias for a commit, like assigning it a name, and don’t really come into play here.)

Package tagging

Packages—container images, Python packages, other build outcomes—are tagged with the release tag, if the build is for a tagged commit. So, if the code is ready to go with release 1.0.4, we’ll tag it as such and build a package firetruck-1.0.4.tgz or an image firetruck:1.0.4 or whatever. If we’re publishing our thing to a package repository such as PyPi or a container registry such as DockerHub, this tagged build artifact will be pushed there.

In-between versioning

We don’t tag every commit on the main branch, obviously, but if we’re working on a web app and, as mentioned above, we want to have the latest on main to be running on the development platform, we’ll need to build a container image tagged appropriately, and we also might want to make that tag available somewhere in the app we can see it, so we’re sure of what iteration we’re looking at.

The first option we’ll look at is git describe --tags. For the current state of this website’s repository, for example:

$ git describe --tags
1.0.8-232-g66fd264

This indicates:

  • the last tag on this branch is 1.0.8;
  • there have been 232 commits to this branch since then (we’ve abandoned the use of tags in this repo—never mind);
  • this is a Git repository (that’s the “g”); and
  • the commit represented here has the short hash 66fd264.

This is really useful in a lot of ways. It’s clearly in development, not a release, and you get an idea how far from the last release this is. Finally, if you want to track down the code, you can see exactly which commit this is running.

There are some drawbacks apart from the superfluous “g”. For one, it depends on there being tags. Well, too bad. We want to use tags meaningfully, so this is moot except for the first development of a project, and we’ve simply made “tag at 0.0.0” one of the steps of setting up a new repository.

A bigger problem is that in CI pipelines, we generally don’t clone the entire repository. To save time, space and bandwidth, pipelines by default use a shallow clone, so that only the last X commits from the current branch are pulled to the executor. Running the above on a clone of our website’s repository won’t find the tag, let alone get the correct number of commits since. So if our CI pipeline handles the container build or whatnot, this will be a problem.

Another issue we’ll have to deal with is that the above format doesn’t conform to the version standards we’ll want to follow but this will be the case with whatever we choose and we’ll have to write a simple converter. For example, the above could be made compliant by translating it to 1.0.8+232.66fd264 (yes, I ditched the “g”).

Branch versioning

We sometimes want a build outcome from a development branch. Maybe we want to try out our container image locally or try an in-development Python package on a VM. The above is available, but it does not indicate the branch and so at a glance the build could be confused with a build from the main branch. The above tag for this branch is actually 1.0.8-233-gd08095b, while the branch name is i88-tagging-and-versioning.

We could add the branch name to the above, or some kind of indicator:

  • the entire branch name, translated for compliance to versioning standards. For example, 1.0.8+232.66fd264.i88.tagging.and.versioning.
  • if we always follow the convention of prefixing the branch name with the issue number, we could take this first part: 1.0.8+232.66fd264.i88.
  • we could just mention it’s a branch without specifying it using some static indicator for any branch other than main: 1.0.8+232.66fd264.br.

My preference is the second one, because it’s more compact and I also like that it would enforce the convention of including the issue number in the branch.

We’re not doing any of these, though, because of the issue with the cloning depth, so the number of commits since the most recent release will be unavailable. So we’re eliminating that, and we have to build the tagging string ourselves, and git describe --tags might not work anyway.

Wait but hang on, you said…

The astute reader, such as me on the third pass of writing this article, may notice the contradiction in using the syntax for build metadata in development releases when it was made clear that so far as semantic versioning is concerned, 1.0.8+232.66fd264 is equivalent to 1.0.8 and there is no implication that the former is more recent or more valid or more or less anything than the latter. Well, there are two counterarguments to this. First, tough: we need something. Second, development builds are for our use only–we want them to conform as much as possible to the spirit of the law, but we need them to conform to the letter just so we can build the packages with Python tools.

So whether it’s a necessary evil or a weak excuse for a local hack, it’s what we’ve got.

Where we got to

Our final determination on versioning and tagging is as follows:

  1. We will tag a new project with 0.0.0 as part of the repository initialization, along with committing a blank or nearly blank README.
  2. We will continue to loosely follow semantic versioning as we have been.
  3. Versioning for packages, container images and other build outcomes of a release will use that version as the tag or in the package name.
  4. Versioning for build outcomes of a non-release on the main branch will have the format <last-release>+<short hash>.
  5. Versioning for build outcomes on a feature branch will have the format <last-release>+<short hash>.<branch> where <branch> is the branch name normalized to use the character set [a-zA-Z0-9\.].

Examples:

  • firetruck-0.1.4 or firetruck:0.1.4 (release)
  • firetruck-0.1.4+66fd264 (interim build from main)
  • firetruck-0.1.4+66fd264.i32.new.feature.wow.wow.wow (build from feature branch)

References

  1. Semantic version specification (SemVer)
  2. Python Packaging User Guide; its chapter on Versioning