With the main automation services setup, it was time to look at a way to fully automate the process of creating a new node. The main goal was to have a central repository that manages my infrastructure. This repository utilizes Terraform and Ansible for the staging, building & deployment.

Terraform (Opentofu)

I decided to use Opentofu instead of Terraform. This is an opensource fork which uses the exact same commands where the terraform command is substituded by tofu. The first step was to create the basic file structure for the project where I defined variables, the provider and my main file.

Variables

The main goal was to have a dynamic way to install, delete or modify nodes in my homelab. I achieved my basic needs (for now) using a map(object) where my nodes are defined with an IP, ID, cores, sockets and amount of allocated memory. I will expand this in the future if I want it even more modular.

variables.tf:

variable "node_info" {
  type = map(object({
    ip      = string
    vmid    = string
    cores   = string
    sockets = string
    memory  = string
  }))
  default = {
    "y"   = { ip = "x", vmid = "201", cores = "4", sockets = "2", memory = "8192"}
    "y"   = { ip = "x", vmid = "202", cores = "2", sockets = "1", memory = "4096"}
    "y"   = { ip = "x", vmid = "203", cores = "2", sockets = "2", memory = "2048"}
  }
}

variable "pm_api_token_id" {
  type    = string
}

variable "pm_api_token_secret" {
  type    = string
}

variable "target_node" {
  type    = string
  default = ""
}

variable "cloudinit_template" {
  type    = string
  default = "alma-9.1"
}

variable "ssh_key_main" {
  type    = string
  default = ""
}

variable "ssh_key_jenkins" {
  type    = string
  default = ""
}

Provider

As mentioned earlier, there is a community provider needed to use terraform on proxmox. The provider uses the proxmox API to perform all the actions needed.

provider.tf:

provider "proxmox" {
  pm_api_url          = "https://x:8006/api2/json"
  pm_api_token_id     = var.pm_api_token_id
  pm_api_token_secret = var.pm_api_token_secret
  pm_tls_insecure     = true
}

Main

Once the variables and provider were set, the main file could be build. I created a resource with all the necessary configurations and settings. The variables created in the previous step, could now be used inside the main file. I made sure that certain changes to the nodes (like network changes) won’t trigger a rebuild of the node. That way I can easily make small manual changes without terraform noticing a change and potentially deleting my data (I still have backups just in case).

main.tf:

terraform {
  required_providers {
    proxmox = {
        source        = "telmate/proxmox"
        version       = "2.9.14"
    }
  }
}

resource "proxmox_vm_qemu" "homelab_node" {
  # General VM settings
  for_each            = var.node_info

  name                = each.key
  target_node         = var.target_node
  vmid                = each.value.vmid

  clone               = var.cloudinit_template
  full_clone          = true

  agent               = 1
  os_type             = "cloud-init"
  cores               = each.value.cores
  sockets             = each.value.sockets
  cpu                 = "host"
  memory              = each.value.memory
  scsihw              = "virtio-scsi-pci"
  bootdisk            = "scsi0"

  # Cloud Init Settings
  ipconfig0           = "ip=${each.value.ip}/23,gw=x"
  nameserver          = "x"
  ciuser              = "rein"

  sshkeys = <<EOF
  ${var.ssh_key_main}
  ${var.ssh_key_jenkins}
  EOF

disk {
    slot              = 0
    ssd               = 1
    size              = "20G"
    type              = "scsi"
    storage           = "local-lvm"
    iothread          = 0
 }

network {
    model             = "virtio"
    bridge            = "vmbr0"
 }

lifecycle {
    ignore_changes    = [
      network,
      qemu_os,
      bootdisk,
      hostpci
    ]
 }
}

Building a pipeline

With the terraform code in place and ready to be used, I could start building a Jenkins pipeline that will build the infrastructure and run initial configuration using Ansible. This was done with a lot of trial and error because I’m still quit new to Jenkins. The pipeline is in no way perfect or final but it does the job for now.

Stage 1

The first stage just checks out the repository form my Gitea instance and loads all the info:

stage('Checkout') {
    steps {
        script {
            git branch: 'main', credentialsId: 'jenkins_ssh', url: "${env.REPO_URL}"
        }
    }
}

Stage 2

Next up is the initialization of tofu (terraform). This will check the current working directory and install the needed provider.

stage('Terraform init') {
    steps{
        script {
            sh "cd terraform; tofu init"
        }
    }
}

Stage 3

I wanted a way to manually decide if I wanted to Apply or Destroy the pushed code. This way I could use 1 pipeline for both setting up and destroying my setup. The pipeline will first check for the given user input:

  1. If Apply is chosen, the pipeline will just run tofu apply -auto-approve and update the current setup to the code-defined one.

  2. If Destroy is chose, the pipeline will run tofu destroy -auto-approve and remove the destroyed nodes from the puppetserver (this will be explained in the next stage).

stage('Apply / Destroy') {
    steps {
        script {
            def userInput = input(
                id: 'userInput',
                message: 'Select an option:',
                parameters: [
                    choice(name: 'Select an option:', choices: ['Apply', 'Destroy'], description: 'Choose one option')
                ]
            )
            
            echo "User selected: ${userInput}"

            withCredentials([
                string(credentialsId: 'pm_api_token_id', variable: 'TF_VAR_pm_api_token_id'),
                string(credentialsId: 'pm_api_token_secret', variable: 'TF_VAR_pm_api_token_secret')
            ]) {
                if (userInput == 'Apply') {
                    sh "cd terraform; tofu apply -auto-approve"
                } else if (userInput == 'Destroy') {
                    tf_command = "cd terraform; tofu destroy -auto-approve"
                    // Retrieve Terraform output variables
                    def node_info_json = sh(script: 'cd terraform; tofu output -json node_info', returnStdout: true).trim()
                    // Parse JSON and convert LazyMap to HashMap
                    def nodeInfoMap = new HashMap<String, String>()
                    new groovy.json.JsonSlurper().parseText(node_info_json).each { key, value ->
                        nodeInfoMap.put(key, value)
                    }
                    // Remove certs from puppetserver
                    for (Map.Entry<String, String> entry : nodeInfoMap.entrySet()) {
                        def nodeName = entry.key
                        def nodeIp = entry.value
                        sshagent(['jenkins_ssh']) {
                            sh "ssh -o StrictHostKeyChecking=no -p ${env.SERVER_PORT} ${env.SERVER_USER}@${env.SERVER_HOST} 'puppetserver ca clean --certname ${nodeName}.imrein.com 2>/dev/null || true'"
                        }
                    }
                    def exitCode = sh(script: tf_command, returnStatus: true)
                    env.tf_destroy_code = exitCode.toString()
                }
            }
        }
    }
}

Stage 4

After the nodes have been setup, I wanted a way to also do some initial configuration and install a puppetagent pointing to the right puppetserver by default + register the certificate. I used Ansible for this and included the playbook inside the same repository. If Destroy was chosen in the previous stage, this will not run ofcourse.

stage('Initial setup') {
    steps {
        script {
            if (env.tf_destroy_code != '0') {
                // Retrieve Terraform output variables
                def node_info_json = sh(script: 'cd terraform; tofu output -json node_info', returnStdout: true).trim()
                echo "${node_info_json}"
                // Parse JSON and convert LazyMap to HashMap
                def nodeInfoMap = new HashMap<String, String>()
                new groovy.json.JsonSlurper().parseText(node_info_json).each { key, value ->
                    nodeInfoMap.put(key, value)
                }
                // Build hosts file for ansible
                for (Map.Entry<String, String> entry : nodeInfoMap.entrySet()) {
                    def nodeName = entry.key
                    def nodeIp = entry.value
                    sh "cd ansible; echo '${nodeName} ansible_host=${nodeIp} ansible_user=rein' >> hosts.ini"
                }
                // Run ansible playbook
                sh "cd ansible; ansible-playbook main.yml -i hosts.ini --become"
            } else {
                echo "Destroy done."
            }
        }
    }
}

The ansible playbook:

### Playbook
- name: Install puppet-agent
  hosts: all
  vars_files:
    - defaults/main.yml
  
  vars:
    - ansible_ssh_private_key_file: /var/lib/jenkins/.ssh/id_rsa

  tasks:
    - name: Set fqdn
      hostname:
        name: "{{ inventory_hostname }}.imrein.com"

    - include_tasks: tasks/setup-RedHat.yml
      when: ansible_os_family == 'RedHat'

    - include_tasks: tasks/setup-Debian.yml
      when: ansible_os_family == 'Debian'

    - name: Install puppet-agent by default
      package:
        name: "{{ puppet_agent_package }}"
        state: present

    - name: Set puppet server
      lineinfile:
        path: /etc/puppetlabs/puppet/puppet.conf
        line: |
          [agent]
          environment=production
          server={{ fqdn_puppetserver }}
          ca_server={{ fqdn_puppetserver }}          
        state: present

    - name: Set puppet PATH
      lineinfile:
        path: /etc/sudoers
        regexp: '^Defaults'
        line: 'Defaults	secure_path="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/puppetlabs/bin"'
        state: present

    - name: Puppetrun
      puppet:
      ignore_errors: true

Puppet profiles

Now that the nodes are being setup automatically and setup with a puppet-agent by default, I could start making my puppet profiles for the services and docker-stacks I’m running. This was also pretty straight forward and still a big WIP but the main things are ready. I can define in my hieradata which node will run certain services or containers.

site.pp:

node 'a.imrein.com' {
  include profile_base
}

node 'b.imrein.com' {
  include profile_base
  include profile_docker
}

node 'c.imrein.com' {
  include profile_base
  include profile_gitea
  include profile_jenkins
}

node 'd.imrein.com' {
  include profile_base
  include profile_docker
  include profile_prometheus
  include profile_grafana
}

node 'e.imrein.com' {
  include profile_base
  include profile_docker
}

My homelab is now mostly automated and requires little to no manual intervention. The only things I still need to do is define my new nodes in the terraform variables code and include the right puppet profile to that node. Et voila. 🎉