Week 70 | Self-Hosted Runners Without Kubernetes

Last Week

Last week was about configuration management and release control. The App Store review fixes were mostly done, but the next iOS submission was still blocked by a practical release-management problem: unfinished work inside the mobile binary cannot be cleanly hidden without feature flags. Android is already live on Google Play. iOS is still waiting on the third production submission, and I still expect to get that submission out by the end of next week once the remaining rough edges are cleaned up.

That release-control work still matters, but this week the urgent problem moved one layer lower. The infrastructure that verifies every change started failing for a very ordinary reason: my self-hosted GitHub runner ran out of disk space. That turned into a deeper cleanup of how I run CI jobs locally, and the result is a self-hosted runner setup that is both ephemeral and capable of handling multiple jobs at once without pulling Kubernetes into a one-person project.

The Runner Became The Bottleneck

This week was about self-hosted, ephemeral GitHub runners.

Shokken’s CI/CD pipeline has become fairly serious. Pull requests run tests. Merges run another set of checks. The project also does build verification and runtime verification, and I want that set to grow over time into more emulator and simulator-based integration testing. That is the right direction, because the more the app depends on store releases, backend changes, paid access, messaging, and mobile platform behavior, the more I need the pipeline to catch bad changes before they escape.

The tradeoff is cost and capacity. GitHub-hosted runners are convenient, but hosted minutes disappear quickly when every meaningful change runs a large test and build matrix. The free tier gives a limited number of minutes, the lowest paid tier gives more, and macOS minutes are much more expensive than Linux minutes. Even without leaning heavily on macOS runners, my normal development rhythm can chew through hosted minutes quickly.

So a while ago I moved much of that work onto a self-hosted runner. I had an idle machine available, put Proxmox on it, created a virtual machine, assigned roughly 100 GB of storage, and registered that VM as a GitHub Actions runner. For a while, that worked well enough.

Then the runner filled its disk.

After digging into the failure, the cause was not mysterious. Docker images, containers, temporary layers, and runner state had been accumulating. The files looked temporary in spirit, but they were not temporary unless something explicitly removed them. Once enough of that material built up, the CI jobs started failing.

Manual cleanup got the pipeline moving again, but it exposed the real problem: a persistent self-hosted runner is not a clean foundation. It works only as long as every job, every container, and every workflow leaves the machine in a healthy state. That is a fragile contract.

What does it mean in English?

A GitHub runner is the worker machine that actually runs the checks after I push code.

When I use GitHub’s hosted runners, GitHub provides a fresh worker. That is convenient, but the minutes cost money once usage grows. When I use my own machine, I save those hosted minutes, but now I am responsible for keeping that worker healthy.

My old setup reused the same worker over and over. That meant junk could build up between jobs. Eventually enough Docker leftovers accumulated that the machine ran out of disk space and the pipeline failed.

The better model is disposable workers. A job starts, it gets a fresh environment, it runs the checks, and then that environment is thrown away. The next job starts from the same clean baseline. That makes failures easier to reason about and keeps yesterday’s leftovers from breaking today’s build.

Nerdy Details

Persistent runners carry hidden state

The first version of my self-hosted runner was persistent. That means the environment stayed alive between jobs.

That sounds harmless until you think through what a runner actually does. It checks out code. It pulls Docker images. It creates containers. It builds mobile artifacts. It downloads dependencies. It writes caches. It may mount volumes, produce reports, copy artifacts, and tear pieces down again afterward.

If every workflow is perfectly disciplined, a persistent runner can survive that. Each job does its setup, performs the work, removes anything it created, and leaves the machine ready for the next job.

In practice, that is a lot to trust. Some cleanup can happen from inside the workflow. Some cleanup belongs to Docker itself. Some cleanup touches the host environment. In my setup, the workflow runs inside a containerized boundary, so it should not have unrestricted access to the host just because a job wants to tidy up after itself. That is a good security property, but it means the job cannot always reach the layer where the mess is accumulating.

The result is slow drift. One run leaves behind an image layer. Another leaves behind a volume. Another downloads dependencies. None of those things is a crisis individually. Together, over enough runs, they become a full disk.

That is exactly what happened.

Ephemeral runners make cleanliness the default

An ephemeral runner changes the contract.

Instead of asking each job to perfectly clean a shared machine, each job gets an environment that is meant to disappear. Once the job is done, the runner is torn down. The next job does not inherit whatever the previous job happened to leave behind.

That matters for two reasons.

The first is reproducibility. If every job starts from the same known baseline, failures become easier to debug. A failure is less likely to be caused by an old package, a leftover image, or a stale file from a different branch. The environment can still be wrong, but at least it is wrong consistently.

The second is operational simplicity. I do not need to keep adding more and more workflow-level cleanup steps just to compensate for the runner being long-lived. The runner lifecycle becomes the cleanup mechanism.

For Shokken, that is a better fit. The project is already complex enough: Kotlin Multiplatform, Android, iOS, Supabase, store submissions, subscriptions, messaging, and guest-facing web surfaces. I do not want the CI machine to become another fragile subsystem that needs constant gardening.

Autoscaling solves a different problem

Ephemeral and autoscaling are related, but they are not the same thing.

Ephemeral answers the cleanliness question: does a job start from a fresh environment?

Autoscaling answers the capacity question: how many workers can the system provide when several jobs need to run?

My old setup had one runner. That means only one job could actively run at a time. That is fine when the pipeline is strictly linear. It becomes painful when multiple workflows or dependent jobs pile up. The practical failure mode is runner starvation: one job can occupy the only worker while waiting for work that needs another worker to complete. Eventually jobs time out, and the failure looks like a pipeline problem even though the underlying issue is capacity.

There are workflow-level ways to reduce that risk. A job can avoid waiting in place and let another event wake it back up later. GitHub concurrency groups can serialize related work so two jobs that should not overlap do not fight for the same limited worker. Those tools are useful, and I still think the pipeline should use them where they make sense.

But there is a simpler capacity answer too: have more than one runner available.

Why I did not add Kubernetes

The standard serious answer for ephemeral, autoscaling runners is an orchestrator. Kubernetes or k3s can watch for work, provision workers, route jobs, tear environments down, and scale capacity based on demand.

That is the right answer for some teams.

It is not the right answer for this setup.

Shokken is still a one-person project. The CI/CD pipeline is already heavier than most small indie projects need, because I am trying to protect the app as it moves toward real production use. Adding Kubernetes just to manage a handful of local runners would trade one infrastructure problem for a larger one.

Kubernetes is powerful, but it is not free in attention. It has its own installation model, networking model, resource model, update process, failure modes, and debugging surface. k3s reduces some of that weight, but it does not erase the operational category. I would still be adopting an orchestrator and then owning that decision.

For a large organization, that might be the correct move. For me, the practical question is narrower: can I get clean runner environments and enough parallel capacity without making the infrastructure stack much larger?

The answer was yes.

The setup I landed on

The new setup still starts with the same basic physical foundation: an idle machine running Proxmox.

Inside Proxmox, I provisioned a larger Ubuntu 24.04 LTS virtual machine. That VM has 24 vCPUs and 80 GB of memory. Docker runs inside the VM, and systemd brings the runner setup up when the VM boots.

Instead of one persistent runner, the machine now starts four Dockerized GitHub runners. Each runner is sized for the kind of work this project does: eight vCPUs and 16 GB of memory. The memory matters because Kotlin Multiplatform builds, Android tooling, dependency resolution, and test execution are not tiny workloads. The CPU count matters for the same reason. Compilation and test work can take advantage of parallelism, and starving those jobs would just move the bottleneck somewhere else.

The four runners register with GitHub and advertise themselves as available. When workflow jobs arrive, GitHub can assign work to more than one runner at a time. That lets the pipeline process concurrent work instead of forcing everything through a single slot.

This is not true cloud-style autoscaling. It does not scale down to zero, and it does not scale up indefinitely. The ceiling is four active runners because that is what I chose to run on this machine.

For my purposes, that is fine. I am not paying a cloud provider for idle resident memory. The machine is already mine, and the runners do not allocate their full workload footprint until they are actually doing work. When jobs arrive, I can process up to four at once. When no jobs are running, the overhead is acceptable.

That is enough capacity for the shape of my pipeline today. Most of the time, I do not expect more than four meaningful jobs to need local runner capacity at the same time.

Docker gives me the disposable layer

The ephemeral part comes from treating the runner container as disposable.

After a runner finishes a job, the container exits. The next run starts from a clean container state rather than continuing from whatever the previous job left behind. Docker still needs normal host-level maintenance over time, but the job environment itself is no longer a long-lived workspace that keeps collecting project residue.

That distinction is important. I am not claiming this is the same as a fully managed elastic runner fleet. It is a pragmatic middle ground:

The physical machine is mine.
The VM is long-lived.
Docker is long-lived.
The runner job environments are disposable.
The pool size is fixed at four.
The operational surface stays much smaller than Kubernetes.

That is the trade I wanted. I get fresh environments and parallel job processing without turning the runner host into a miniature platform team.

The immediate result

I finished the setup yesterday, and the difference was visible immediately.

Jobs that used to queue behind a single runner can now run in parallel. The pipeline feels less congested, and the failure mode that pushed me into this work, disk exhaustion from persistent runner buildup, is much less likely to return in the same form.

There will still be maintenance. Self-hosting always means I own the box. I still need to watch disk usage, keep the base VM updated, keep Docker healthy, and make sure the runner images stay compatible with GitHub’s expectations.

But this is a much better shape than the old setup. The important properties are now built into the architecture instead of depending on every workflow being perfectly polite.

For a one-person project, that matters. I need infrastructure that is strong enough to protect the product, but not so elaborate that maintaining the infrastructure becomes the product.

Next Week

Next week needs to return to the iOS submission path. The runner work should make the development loop faster and less fragile, but the product goal has not changed: clean up the remaining rough edges, prepare the third App Store submission, and keep moving Shokken toward full production availability on both platforms.

I will also keep an eye on the new runner setup as it handles normal development traffic. If the four-runner pool stays stable, it becomes the foundation for the next round of heavier checks, including the emulator and simulator work I still want to add.

Last Week#

The Runner Became The Bottleneck#

What does it mean in English?#

Nerdy Details#

Persistent runners carry hidden state#

Ephemeral runners make cleanliness the default#

Autoscaling solves a different problem#

Why I did not add Kubernetes#

The setup I landed on#

Docker gives me the disposable layer#

The immediate result#

Next Week#