LocalDev on a budget: Ephemeral Build Resources

In my last post, I described an interesting way to combine earthly with the golang tests module for an isolated and repeatable build environment. This setup works great for local development and even works well for running automated events from a scheduler such as cron. Using earthly requires a buildkit daemon which, for those of us on a budget, we may not always have access to. Or, we may be able to run buildkit locally but our laptop just doesn't have the performance we need for whatever cool build steps we've got. In this scenario, I wrote up a few small scripts to manage provisioning a temporary cloud virtual machine for use during the build process. I also wrote a script to detect idleness from the virtual machine and destroy it. My destroy script runs in cron every 15 minutes to ensure I'm only paying for the virtual machine during the time I'm actually using it. In this post, I'll walk through a more broad overview of the steps I took to make this all work.

For my purposes, I've got this all running as an extension of my account over at sdf.org. Sdf has been around long before THE CLOUD was even a thing. So gaining access to sdf means having a user account on a shared unix host. I don't get root access and I'm generally restricted to my home directory and whatever software the admins allow (in my case, I get access to a personal crontab and a personal systemd). Sdf isn't the only service like this. There is a small community of tilde servers who usually offer free unix shell access as well as various online services. Its not possible for me to install buildkit onto my sdf space mainly because I don't have root and cannot set the correct cgroups options to run buildkit in user-mode. I am also restricted somewhat by the resources my unix user is allocated. If I want greater control over what cpu, memory, and disk i/o I want, then I'm going to have to migrate my build to a different machine. The same idea goes if you want to do something like run a tiny virtual machine in a cloud for like $5/month. You will get root access but you will likely be restricted to limited cpu, memory, and disk i/o. As they say... you get what you pay for. I pay $36/year (USD) for my sdf account.

When it comes to deciding which cloud provider to use to host my virtual machine, I wanted to skip the large tech behemoths and use one of the mid-tier services such as Digital Ocean or Linode or Hetzner. In my case, I picked Vultr, mainly because they have data centers here in Australia and Australia is notorious for having shit internet speeds. For your case, it kind of doesn't matter which vendor you choose as long as you pick one that has a decent api and supports provisioning full operating system linux virtual machines (so a firecracker or gvisor vm probably won't work). And keep in mind that if you plan to manage the ephemeral virtual machine from a host where you do have root access, then you can simplify the below configuration greatly by running a wireguard network on both your root host and the ephemeral virtual machine. I think the easiest way to get that done is to use Tailscale and the last time I checked, Tailscale has a hobbyist free tier.

Secrets Management

In my opinion, the number one thing you just have to figure out before working on any project is how to store, retrieve, and keep secure your secrets. Here I mean things like passwords and api keys and such. In a perfect world, you are not just writing those values directly into your scripts. There are a number of options available for secrets management, with varying levels of complexity, to configure and maintain. But because my setup will only be used by me, I chose to keep things extremely simple and I chose to use pass. With pass, I basically need to configure a gpg key on my host so I can unencrypt the values stored in pass. And retrieving a pass value in a bash script is as easy as including the pass ls command. When a user runs the provision script manually, they are prompted to unlock their gpg key. But when a user wants to automate running the provision script (for example, as a cron job), pass will need to unlock the gpg key with another method. Luckily pass includes the environment variable PASSWORD_STORE_GPG_OPTS which allows us to customize the gpg command used by pass. Unfortunately, gpg doesn't have an incredibly cool way to retrieve the user's passphrase other than adding it directly to the PASSWORD_STORE_GPG_OPTS variable or by adding it to a file on the host. For my purposes, I chose to add the passphrase to a file like so:

export PASSWORD_STORE_GPG_OPTS="--batch --pinentry-mode=loopback --passphrase-file=< path to file containing gpg passphrase >"

I have the above line at the top of all my scripts. And to retrieve a variable, I include another line such as export VULTR_API_KEY=$(pass ls vultr/token). I admit its not ideal to copy my gpg passphrase to a file on my host. I spent quite a bit of time pondering other options. The best I could come up with was to use a gpg-agent and have a small application running on a different cloud vendor (such as Fly) that injects the passphrase into the gpg-agent on a regular cadence. The dilemma I am describing here is what I like to think of as the great secret zero conundrum (the more complicated secret stores handle his problem pretty well). For now, the setup I have is good enough for my purposes and I am very aware of the drawbacks to this design.

Service Discovery

Since we are now dealing with more than one host, its important to have some sort of plan to handle how they will know about each other. It doesn't have to be super elegant or fancy and it may be dictated by the quality of the resources you have available to you. In my case, when I was building this project, the DNS server on sdf suddenly stopped querying other DNS servers for new entries. And because I get what I pay for, there's no customer support SLA but rather a small group of sysadmins who appear from time to time. I simply had to wait for them to fix it. So my strategy is to go super basic: I grab the public IP address of the newly provisioned host and I store that, as well as the Vultr ID of the host, in two files on my sdf space. I still do update a record on mel.sh with my newly provisioned host's IP address, but my scripts don't actually use this record. I think it would be fairly easy to take this concept and expand it by running a small Consul or Redis server and store the host information in their key value stores.

Provisioning, Let's Use Docker

Recent versions of Docker already run buildkit. There are other options available such as Podman or just running vanilla buildkit with containerd. I chose to stick with Docker because it currently has the most options available to it and is very widely used. People are generally already familiar with it and since my setup will be used primarily as a development environment, I want to ensure I can run as many things as possible. Now, if I don't have root access to my sdf host, I cannot install the Docker client via a package manager like apt. I instead extract the client binary from the latest Docker tarball and place the cli somewhere in my PATH. The same idea applies if I also want to install plugins such as compose or scan.

To set up secure access to the remote Docker daemon, I take advantage of contexts. I create a new context where connectivity occurs over ssh like so:

docker context create --docker host=ssh://root@${IP} --description="remote docker daemon" remote
docker context use remote

I'm choosing root as the user here because a newly provisioned Vultr Debian host only includes the root user. That works fine for me at this time. At some point, I expect to switch all of this to run as a non privileged user. And ${IP} is the variable in my script for the Vultr's host IP address. With ssh, the first time I access this host, I will be prompted to accept this hosts' ssh key. I avoid this step by running ssh-keyscan ${IP} >> ${HOME}/.ssh/known_hosts. I also want to ensure I can run this provisioning script unattended so I have a special ssh key configured that does not contain a passphrase. For my script to use it, I inject a new entry into my ssh config:

cat << EOF > ${HOME}/.ssh/.remote
Host ${IP}
  User root
  IdentityFile "${HOME}/.ssh/< ssh key with no passphrase >"
EOF

And I have the line Include .remote in my ${HOME}/.ssh/config file. The above setup allows my script to successfully use ssh when triggered from a scheduler like cron. The Vultr Debian host has ssh enabled and open to the world by default.

At the time of this writing, my provisioning script is located here. Keep in mind that this script is a template that gets rendered on the host via chezmoi.

User Data

After the Vultr host boots, I run some customizations on that host to prepare it to become a Docker daemon. Mainly, I install Docker, tighten up ssh access via fail2ban, and add a Docker pull-through registry cache and Prometheus instance. I also copy over a ssl wildcard key for the mel.sh domain which, at the time of this writing, I only use to force https traffic when using curl to query Prometheus on this Vultr host.

Now why am I installing Prometheus? Because Docker has experimental support to send metrics. And I'm using those metrics to determine whether Docker has been idle for the past 15 minutes. This is the main method I use to decide if a host should be destroyed. And I am checking in 15 minute intervals because Vultr bills per hour and not per second. So if I provision a host, I've already automatically paid for it for that next hour. This means I've got some breathing room to let it sit idle and not be concerned that I'm breaking my budget.

At the time of this writing, my user data script is located here. Keep in mind that this script is a template that gets rendered on the host via chezmoi.

If you are wondering where I get the environment variable for ${CLIENTIP}, I create it in the provisioning script and send the variable to the Vultr host when the provisioning script executes the user data script:

ssh root@${IP} CLIENTIP=$(curl -fsSL -4 icanhazip.com) ./user-data.sh

So CLIENTIP is the public IP address of the sdf host.

And I like to take advantage of a dummy IP address, 169.254.32.1, which works great when a host in a docker network needs to access another resource on the Vultr host (or vice versa). In this case, the Docker daemon uses this IP to access the pull-through cache registry and Prometheus uses it to find the Docker daemon so it can scrape metrics.

Destroying the Host

Besides running the host destroy command from the Vultr api, my host destroy script basically backtracks the additions from the provisioning script. So I remove the docker context and reset myself back to default (the local unix socket):

docker context use default
docker context rm -f remote

I also remove the ssh specific files for the Vultr hosts's IP address:

truncate -s 0 ${HOME}/.ssh/.remote
sed -i "/${IP}/d" ${HOME}/.ssh/known_hosts

I also delete my host specific “service discovery” files for the Vultr host ID and IP address and remove the DNS entry for the host from the mel.sh domain.

But before I do all of the above, I need to feel confident that the Vultr host is in fact doing absolutely nothing. I curl the Prometheus instance to get metrics for cpu, memory, and network activity from the Docker daemon. I am using the Prometheus increase function which, from my understanding, means it will return a positive number if activity increases. For the time period queried (in our case, the past 15 minutes), activity always begins at zero and the query result is whether that base zero has moved upwards in value during the timespan. So to inspect cpu activity, the curl command is:

curl --insecure -fsSl https://${IP}:9090/api/v1/query?query="increase(process_cpu_seconds_total[15m])

The insecure flag is used because my ssl cert has a san for the mel.sh domain and not the host's IP. I'm not using this cert to verify authentication. I'm only using it to ensure my traffic to the Prometheus host is encrypted.

So if the result of that curl command is a value greater than 0, then the cpu did something in the last 15 minutes. This query should only be for cpu metrics of the Docker daemon and not the Vultr host. I've been running this destroy script in my cron for about two weeks now and things have been working as I expect. I do think I will probably need to tweak these values a bit as I add more functionality to this development environment. But for now, things appear to be working well.

At the time of this writing, my destroy script is located here. Keep in mind that this script is a template that gets rendered on the host via chezmoi.At the time of this writing, the configuration files for the pull-through cache registry are here and the configuration files for Prometheus are here.

And finally, to really ensure I'm staying on budget and not letting unused resources dangle out in THE CLOUD, I have a small cron job that calls my destroy script every 15 minutes. Since this cron job is unattended, I want to make sure I receive any errors in a timely manner and via a communication platform I actually check regularly. For me, that platform is Matrix. I have a private channel I subscribe to and the Element client on my laptop and phone (plus gomuks on my sdf host). The little go cli that could, trix, is very useful here to send these errors and I rely heavily on the exit code from my destroy script to determine if my logs should print to Matrix or if they should be deleted. Simply put, I call the destroy script CD=$(${HOME}/etc/dockerd/ddest > ${HOME}/tmp/dkill.log 2>&1; echo $?) and check the value of the ${CD} environment variable for the exit code. If the code is not zero, we produced an error and the trix cli will send the logs to my Matrix channel and Element will ping me with a cute sound.

At the time of this writing, my destroy cron script is located here. Keep in mind that this script is a template that gets rendered on the host via chezmoi.

What's Next

Right now, I'm using this setup to update a few custom docker images on a weekly basis via a cron job on the sdf host. Moving forward, I am currently working on an automated k3d setup that will eventually allow me to run the Temporal workflow manager as well as open up the possibility of installing other kubernetes native apps in my development environment. I eventually would like to get more proactive in the not-at-all lucrative field of data journalism so I am taking the time now to build a cheap-as-possible environment for my experiments. And cheap is correct. During the time it took me to build and test this setup, I spent about 60 cents (USD) on the Vultr platform.

#localDev #virtual #budget #pass #gpg #vultr #buildkit #docker #ssh #cron #metrics #prometheus #earthly #chezmoi #infrastructure #ephemeral #development #dev #webdev #tech #matrix #trix #cli


RSS feed: mel.sh/feed

This blog is on the fediverse: @mel@mel.sh

Subscribe to email updates:

I'm also on mastodon: @mel@social.sdf.org (but I admit I'm not very active)

My projects: codeberg.org/meh

keyoxide proofs: https://keyoxide.org/6EA5985B857ED15E1630424AC3E29D39C06F2B70

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.