Migrating STRAP's Terraform to oject storage

Drew Leske

Up until now, using local storage for Terraform state has served STRAP development well: it’s the simplest to set up and, should you ever need to, it’s easy to pick apart. It’s completely adequate for many uses–if you have local storage, and you don’t need to share the state between multiple editors, then there is no advantage to storing Terraform state remotely. But Strapper doesn’t meet those criteria anymore so we need to go remote.

For Strapper development, local storage has been good enough while Archie has been building that up on his development node. However, we’re now ready to make the management app for STRAP self-hosting, which means deploying it to Kubernetes, and that means we can’t rely on the default anymore: in Kubernetes, applications run on pods, and pods are essentially ephemeral: they are assumed to be running on unreliable hardware, and the application is expected to run on any node it’s told to run on, and if that node disappears, the app will be told to run on another node.

Too bad if that new node doesn’t happen to have the same disk storage as the ex-node. Which it won’t.

Kubernetes provides a way around this: persistent volumes. There is nothing wrong with persistent volumes, and with it we’d be able to keep going as we have been. Nice, except:

  1. STRAP doesn’t currently support persistent volumes. This is a blocker, because we wouldn’t be able to manage Strapper using Strapper, and we’d be hacking it into the STRAP platform by directly adding and editing the various resources.

  2. We aren’t making STRAP support persistent volumes at this time.

At this stage and for the foreseeable future Strapper will not be so popular that it needs multiple concurrent instances, which would render “local” storage inadequate as the different instances attempt the mangle the state simultaneously. Even if we did have multiple STRAP instances, each would almost certainly host its own Strapper instance.

(I also tend to view decoupling from standard filesystem storage as a necessary aspect of containerization, but that’s not really the case: it’s just easy to swing too far one way once you have momentum.)

What I’ve used in the past for remote state handling in Terraform has been Minio, an object storage service with an S3-compatible API that’s pretty powerful and scalable and can also be easily installed on a single Linux VM. While I’m not a power user managing huge server farms or paradimensional AI matrices or whatever the kids are into these days, I’ve used it a bunch without any issues.

So that’s what I’m going to do.

Install Minio

I use Ubuntu for most development work, and I’m currently most comfortable with 20.04, though with Ubuntu I haven’t found the versions to be as important as with Red Hat and its variants (with the exception of Snaps, which–oh, never mind). For the basic install I followed a Digital Ocean article. (I tend to find Digital Ocean articles reliably excellent, and a definite service to the public.) I also at some point referred to Minio’s own documentation for details or clarification.

Assuming those links aren’t dead, it’s pointless to try and reword those articles here. Basic steps:

  1. Launch a new VM with Terraform. I used a 10GB root disk and a separate 40GB data disk. Among other things, my Terraform includes some basic cloud-init that sets up essential accounts (mine) and updates the OS automatically.

  2. Use LVM to make the data disk a physical volume and create a volume group. Create a logical volume called “data” and mount it at, oh I dunno, /data. Or /minio. It doesn’t matter and I forget. I give it 10GB to start but Terraform state is all text files so if that’s all I put there I’d better be Terraforming Skynet to use that up.

  3. Download the MinIO Debian package and install.

  4. Create a user and group for Minio. The default for both user and group are “minio-user”. I’ll spare the reader the rant on this but I named the user and group minio, even though this will add a bit of work later on:

    $ sudo groupadd -r minio
    $ sudo useradd -r -g minio minio
    

    The -r flag in both marks them as a system group and user, and in useradd’s case means the user’s home directory will not automatically be created.

  5. Create or obtain a certificate and key for the service (the Digital Ocean article describes how to create a self-signed certificate) and place them in a directory in the Minio user’s home directory using the target names given below:

    $ sudo mkdir -p /home/minio/certs
    $ sudo mv newcert.pem /home/minio/certs/public.crt
    $ sudo mv newkey.pem /home/minio/certs/private.key
    $ sudo chown -R minio:minio /home/minio
    $ sudo chmod 400 /home/minio/certs/private.key
    
  6. Edit /etc/default/minio to reflect what has been set up:

    • update MINIO_VOLUMES to be the path you mounted (/data in our example)
    • update MINIO_OPTS with the certificates directory you’re using
    • Give an admin user name and password for logging into the web console.
  7. This file will be world-readable by default (!) and has the admin credentials so update the permissions and ownership so the Minio user can read it, but the world cannot:

    $ sudo chgrp minio /etc/default/minio
    $ sudo chmod 640 /etc/default/minio
    
  8. Edit /etc/systemd/system/minio to reflect the user and group names I elected to use, which may kick me in the rear when I update Minio–we’ll see.

  9. Set up a firewall on the host or update your ACL or other protections to open up ports 9000 (API access) and 9001 (console access) to necessary users.

  10. Set Minio to start on bootup and then start it up:

    $ sudo systemctl enable minio
    $ sudo systemctl start minio
    
  11. If you use monitoring and backups, set up both now.

  12. At this point I like to reboot the system to ensure everything comes up as expected. Like backups, it’s best to do this now instead of after you find out the setup is inadequate.

At this point visit port https://yourhost:9001 and you should be able to log in.

Configure bucket and access for remote state

Using the admin credentials created earlier, log in to the Minio service and explore a little bit. I did because the web interface has come a long way since I last used it and I was pretty impressed. I know the company’s working hard at this because I’m on their mailing list for some reason and I get a monthly update (twice from two different individuals, not sure why) which I will at least skim.

First, create a bucket for storing state. I called mine strap-tf. Enable versioning; this is recommended by Terraform and will allow you to recover previous Terraform state.

Next, create a policy. This one works:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::strap-tf"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:DeleteObject",
                "s3:GetObject"
            ],
            "Resource": [
                "arn:aws:s3:::strap-tf/*"
            ]
        }
    ]
}

Create a group for access to this bucket if you like–optional. Make sure you assign the group to the policy you created above. Creating a group is useful if you will have more than one account accessing the bucket, such as if you’re part of a team. Or maybe you just like the extra organization. You do you.

Create a user and either assign them to the new group, if you created one, or assign them to the policy.

Select that user account and then “Service Accounts”. Click the button at the right to create an access key. This is what you should use for S3 credentials in the backend instead of your user account.

Configure remote state in Terraform

At this stage, Terraform state is still recorded in local files. We need to migrate the state from each of these files to the Minio via the S3 plugin for Terraform.

As it is now, the state for each app is recorded in its own file, which is addressed via the -state command-line argument. This argument is ignored in our case, however, and the “key” (the equivalent here of a filename) is set in the Terraform backend definition and cannot be defined using a configuration or environment variable.

(This is unfortunate, but it makes sense: the storage is initialized as part of the terraform init process as it is vital to maintaining state. Whenever you update the storage you need to re-run the initialization. Given that, using an environment variable or command-line argument is inappropriately transient. It would be handy if -state could be application to key in this context but that prove difficult to make consistent across storage backends.)

We could get around this by using separate backend definitions for each app such that each app is isolated by endpoint, bucket or key. Because this would be a .tf file, it can’t be optionally included on the command line like a .tfvars file so we’re looking at either constantly renaming it (:sick_emoji:) or making the rest of this a module, which we should probably do anyway, and keep each app isolated at the source anyway.

The other way is to use Terraform workspaces. This is how I first implemented it but now I’m not sure that’s how it should be done: I find workspaces non-intuitive and so far the facility does not behave quite as I expect when migrating the state, and not being confident about what’s going on with the state is not a very nice feeling.

  • STRAP’s base Terraform code will be reusable and nicely isolated from other components.

  • Strapper can simply reference the source repository when importing the module; we don’t even need to fuss with Git submodules.

  • Easier for others to use STRAP without Strapper, perhaps.

  • Easier to isolate Terraformed STRAP apps from eachother, including state and back end, variables, STRAP versions, or even additional configs, though we’re not designing for any use cases needing that.

And as it turns out, we’re already pretty much there anyway. Every Terraform configuration defines at least one module: the root module. That’s already our STRAP module, and we’ve almost been using it that way by separating out the individual app variables.

Modularizing the Terraform code

Okay, so I did that. I created a directory adjacent to the current Terraform directory and moved anything not part of the newly isolated module there. In my case I guess I’d anticipated this, or just been lucky, or guided by Terraform’s structure, but this was everything I needed to do to make this a module.

The new directory is just for instances, and I’ve created the first app’s subdirectory under that and renamed the app variables and state files to just be terraform.tfvars and terraform.tfstate, respectively. In app.tf I’ve called the STRAP module:

module "strap" {
  source = "../../strap-tf"
}

I realize/remember that the variables defined in the .tfvars file are unknown to this root module which also doesn’t know I want these to be module variables. So one way to handle this is to migrate these variable definitions to the module specification:

module "strap" {
  source = "../../strap-tf"

  app_name = "info"
  ...

There are quite a few of them and they include both STRAP platform definitions (previously held in strap.tfvars) and application definitions (myapp.tfvars, etc.). This TF file will need to be generated from configurations stored in Strapper, one way or another.

The other way to address this is to essentially copy the variables file from the STRAP module, refer to the variables as they were used before, and then in the module call, assign all the module variables from the locals:

module "strap" {
  source = "../../strap-tf"

  app_name = var.app_name
  ...

I can’t define module.strap.app_name in a variables file, either, in case you were wondering.

I don’t love either solution. As currently implemented, with Strapper variables have been provided using JSON (info.tfvars.json) without having to generate any Terraform code (all the .tf files are used as imported), which seems less clunky than what we’re doing in solution #1. Solution #2 replicates the entire STRAP module variables definition, which is awkward in a different way.

I went with solution #1. It’ll all be generated by the Strapper program anyway, so may as well make it slightly simpler. It’s really meep vs. morp.

Once tested, I pushed the changes–there were none outside of the README, a license file and removal of a convenience Makefile–and changed the source line in the module call to use the address of the Git repository instead of the local filesystem.

As I rediscovered later on, using this code as a module also means the outputs are consumed by the root module and aren’t available to the user unless the root module does so. So I had to add a basic output mirroring block to the code that we’ll need to generate for each app:

outputs "postgres" {
  value = module.strap.postgres
}

That’s it. This does make me think I want to make STRAP’s output value structure richer and pack more in it so I don’t have to update this block for every change to the STRAP outputs.

At this stage I’m not convinced using this as a module is the right way to go but I feel it’s evenly split on technique and convenience and the edge is best practices.

One final note on this: when Terraform code is migrated to a module, the resource addresses change to reflect the module structure, so if a resource was blarb_whatever_v2.mything it will now be module.strap.blarb_whatever_v2.mything. Terraform has a command for dealing with this such that your existing state isn’t useless–you can issue terraform state mv <old-name> <new-name>. To move everything over, I did the following:

$ terraform state list > resources
$ while read thing; do terraform state mv "$thing" "module.strap.$thing"; done < resources
$ terraform plan  # found no changes!

Should I have been surprised that was relatively painless?

Back to the backend

As it is now with the app definition invoking the module and the output definition relaying the provisioned specifics, state will be stored locally. To use remote storage we define a backend as part of our Terraform configuration.

Minio provides an S3-compatible API, and there are a couple of things we’ll do to make it fit. For more details on the following configuration see the Terraform documentation on backends and the S3 backend in particular.

A STRAP backend definition for an app myapp looks like this:

terraform {
  backend "s3" {
    endpoint                    = "minio.example.org:9000"
    region                      = "east-of-nowhere"
    bucket                      = "strap-tf"
    key                         = "myapp.tfstate"

    # necessary because we're not using AWS S3
    skip_region_validation      = true
    force_path_style            = true
    skip_credentials_validation = true
  }
}

The endpoint is the Minio server and the port where it’s serving the API. The region is arbitrary in our case but I defined it in Minio for no real reason and now it has to match. In the past I haven’t defined it at all because there was no option in Minio.

The bucket and key are the location and name of the stored object, analogous but not identical to the directory and filename in a filesystem. These variables are required and cannot be defined by environment variables. Fortunately, Strapper will be generating this for us. The bucket will not change but the key will from app to app to keep their states isolated.

The following three lines are important because we’re not using “real” S3. First, our region name is made up and can’t be validated. Second, we want to use the URL s3://endpoint/bucket rather than s3://bucket.endpoint because the latter requires DNS shenanigans I don’t want to bother with. Finally, we skip credentials validation because we’re not making use of AWS IAM. (Minio may well be able to handle the first and third of these now and I know how to handle the second, but none of these matter particularly here.)

There are two variables I define in the environment for the backend to work:

  • AWS_ACCESS_KEY_ID is the access key ID, usually randomly generated, in the Minio server. This is like the username except for some reason it’s recommended to be a base64 nonsense string. (Maybe because this way its purpose as well as relationship to any person is obscured.)
  • AWS_SECRET_KEY is the associated secret key, like a password.

Define these to be the access credentials set up earlier.

A few random notes about remote state in Terraform

  1. If you do use workspaces with S3, note that as of this writing Terraform’s own documentation gives an incorrect access policy for using S3, possibly because it wasn’t written with workspaces in mind, because that’s where it fails. Where Terraform’s documentation mentions ListBucket access to /bucket/, you actually need access to /bucket/* because workspaces are stored as /bucket/workspace/terraform.tfstate. This part of the ACL:
    "Action": [
        "s3:ListBucket"
    ],
    "Resource": [
        "arn:aws:s3:::strap-tf"
    ]
    
    This works for our case, because we’re using different state files in the “root” of the bucket (the “bottom”?) but for workspaces, use arn:aws:s3:::strap-tf/*.
  2. In case you tried to use it, -state is ignored when using S3 as a backend, instead of giving an error or warning. This may be a bug in Terraform but I’m guessing is due to the command-line usage validation happening before code and variables are all parsed. It’s still too bad -state doesn’t let you override the key configuration that basically winds up meaning “filename inside the bucket”.

Moving Helm chart online

STRAP uses a Helm chart for its application deployments, fed variables by other Terraform resources and Strapper. Currently this Helm chart is also local. (These are things I meant to come back to when I originally developed them–guess I’m coming back to them now.)

This isn’t directly related to remote state but is a result of what led to us needing to use remote state: we don’t want to have the Helm chart locally, when we can break things up into reusable components. We could have baked the Helm chart into an image we use for Terraform deployment, as we could have done with the Terraform module, but that doesn’t seem right. So here we are.

The Helm chart is already a parameter to the STRAP Terraform, as well as its version, so apparently I was thinking ahead way back when. The tricky part is that unlike Terraform, Helm does not support git repositories as chart sources, at least not without plugins. (I did not look for such plugins. One thing about automating stuff is it’s great until you have to constantly tweak stuff to keep the automation working, so keeping dependencies down is important.)

GitLab, bless ’em, supports Helm charts in their package registry although at this time they caution it’s not ready for production use. Since STRAP isn’t either, I’m not too concerned.

I shamelessly cribbed a CI job from the internet for packaging and pushing the chart to the registry and it worked pretty much right off the bat, and using it is almost trivial once I’d taken the repository public.