Once the virtual machines are provisioned, I use Ansible to configure them. I moved away from Puppet toward Ansible because it’s more lightweight, agentless, and widely adopted — which makes it easier to automate and maintain for a smaller homelab setup.

All my Ansible playbooks, roles, and inventory are stored in a monorepo, structured like this:

.
├── ansible.cfg
├── inventory/
│   ├── group_vars/       # Group-wide variables (e.g., all monitoring nodes)
│   ├── host_vars/        # Per-host overrides
│   ├── hosts/            # Inventory files (static or dynamic)
│   └── recovery_vars/    # Used for restoring nodes
├── playbooks/
│   ├── general/
│   ├── recovery/
│   └── site.yaml         # Main entry point
├── roles/
│   ├── common/
│   ├── gitea/
│   ├── prometheus/
│   ├── grafana/
│   ├── vault/
│   └── ...
└── renovate.json         # Keeps versioning fresh

Each role is self-contained and reusable. The main playbook (site.yaml) orchestrates which roles run on which nodes by group.

GitOps Workflow with Gitea Actions

To make Ansible changes automatic and repeatable, I’ve set up Gitea Actions here too. Each role has its own workflow file that gets triggered when either:

  • The role code is updated
  • Its group variables are changed

This ensures that only the relevant role gets deployed, avoiding unnecessary reconfigurations across the fleet. Example: if I change something in the Prometheus role or its vars, only Prometheus hosts are reconfigured.

Here’s a simplified workflow example:

on:
  push:
    paths:
      - 'roles/prometheus/**'
      - 'inventory/group_vars/prometheus.yaml'

jobs:
  Deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python and Ansible
        run: |
          pip3 install ansible yamllint          

      - name: Lint
        run: yamllint .

      - name: Set up SSH
        run: |
          mkdir -p ~/.ssh
          echo "${{ secrets.SSH_PRIVATE_KEY }}" > ~/.ssh/id_rsa
          chmod 600 ~/.ssh/id_rsa          

      - name: Run playbook
        run: ansible-playbook playbooks/site.yaml --tags 'prometheus' -b -l 'prometheus'

Manual Role Triggers for Host-Specific Vars

One current limitation: if I change host-specific variables (in host_vars/), no workflow is triggered. For now, I run these manually when needed — which is rare, since most configuration is managed at the group level.

Eventually, I might improve this by creating a matrix-based workflow or adding logic to detect host var changes and trigger the correct role, but it’s not critical at the moment.

Recovery logic

Despite all the automation in place, it’s handy to have some playbooks to recover certain nodes that have backups — like gitea, vault, and jenkins. These services all have backup scripts that run every night and get shipped to my NAS. To handle this, I’ve created dedicated Ansible recovery playbooks that can bring up these key services independently when needed.

These playbooks live under:

playbooks/
├── general/
├── recovery/      # Recovery-specific flows
│   ├── gitea.yaml
│   ├── jenkins.yaml
│   └── vault.yaml

Each recovery playbook focuses on reapplying just the essential role(s) needed to restore a given service from scratch. These are designed to run even when the GitOps pipeline isn’t available — for example, if gitea itself is down and CI is blocked.

These playbooks are:

  • Standalone: They don’t rely on the CI pipeline
  • Minimal: Only what’s needed to bring the service back online
  • Idempotent: Can be safely rerun if something fails mid-recovery

They’re especially useful during:

  • Bare-metal reinstallations
  • Hardware migrations
  • Disaster recovery testing