K3s-cluster/README.md

14 KiB

K3s cluster

CRDs

Name Description Operator Prometheus integration
Traefik Kubernetes Ingress Controller No Configured
Prometheus Metrics scraping Yes Configured
ArgoCD Declarative GitOps CD No Configured
Longhorn Distributed block storage No Not configured
MetalLB Vare metal load-balancer No Not configured
CloudNativePG PostgreSQL operator Yes Not configured
SOPS Secret management Yes Not configured

Services

Name Usage Accessibility Host DB type Additional data Backup configuration Loki integration Prometheus integration Secret management Status Standalone migration
Traefik Reverse proxy and load balancer Public & Private [All] - - - Configured Configured - Completed5 Backbone
ArgoCD Declarative GitOPS CD Private [Workers] - - - Configured Configured - Completed Backbone
Vaultwarden Password manager Public [Workers] PostgreSQL - - Configured Not available Configured Completed Completed
Gitea Version control system Public [Workers] PostgreSQL User created content Configured9 Configured Configured Configured Completed4 Completed
Synapse Matrix server - Message centralizer Public [Workers] PostgreSQL User files Configured9 Configured Configured Configured Completed Completed
Grafana Graph visualizer Private [Workers] - - - Configured Configured Configured Completed Completed8
Prometheus Metrics aggregator Private [Workers] - - Configured9 Configured Configured - Completed Completed8
Loki Log aggregator Private [Workers] _ - Configured9 Configured Configured - Completed Completed8
Adguard DNS ad blocker and custom DNS server Private [Egress] - - - Configured Configured Configured Completed Completed
Home assistant Home automation and monitoring Private [Workers] PostgreSQL Additional data Configured9 Configured Configured Configured Completed Completed
Owncloud Infinity Scale File hosting webUI Public [Workers] ? Drive files Not configured Configured Not configured Configured Pending configuration Awaiting
therbron.com Personal website Public [Workers] - - - Not configured Not configured - Awaiting configuration Awaiting
Radarr Movie collection manager Private [Workers] PostgreSQL - - Configured Not configured Not configured Partial Awaiting
Flaresolverr Cloudflare proxy Private [Workers] - - - - - - Completed Awaiting
Sonarr TV shows collection manager Private [Workers] SQLite - Not configured Configured Not configured Not configured Partial Awaiting
Prowlarr Torrent indexer Private [Workers] PostgreSQL - Not configured Configured Not available Not configured Partial Awaiting
Jellyfin Media streaming Public Archimedes SQLite** - - Configured Not configured Configured6 Completed Awaiting
Jellyseerr Media requesting WebUI Public [Workers] - - - Not configured Not available Configured7 Awaiting configuration Awaiting
Minecraft Vanilla minecraft server for friends Public Archimedes - Game map Not configured Not configured Not configured - Awaiting configuration Awaiting
Satisfactory Satisfactory server for friends Public Archimedes - Game map Not configured Not configured Not configured - Not needed for v1 Awaiting
Space engineers Space engineers server for friends Public Archimedes - Game map Not configured Not configured Not configured - Not needed for v1 Awaiting
Raspsnir Bachelor memorial website Public [Workers] PostgreSQL - Not configured Not configured Not configured - Not needed for v1 Awaiting
Vikunja To-do and Kanban boards Public [Workers] - - - Not configured Not configured - Migrate to Gitea Awaiting
Wiki Documentation manager Public [Workers] - - - Not configured Not configured - Migrate to VuePress and Gitea Awaiting
PaperlessNG PDF viewer and organiser Public [Workers] PostgreSQL - - Not configured Not configured - Research migration into OCIS Awaiting

* Configuration panel only available internally
** Current implementation only support SQLite, making manual backups a necessity
4 Configuration completed, awaiting data migration from Gitlab
5 Missing dashboard configuration
6 Done through volume backup, because not possible otherwise
7 Done, but needs a reimplementation using kustomize for secret separation from configmap
8 Done but included in a grouped project Monitoring
9 Handled by Longhorn

Backup management

Databases

// To complete

Additional data

All additional data needing to be backed up is mounted to a longhorn volume, to also benefit from scheduled backups.

Example :

longhorn
└───backups
    └───vaultwarden
    │   └───<backup_date>.sql
    │   │   ...
    └───gitlab
        └───<backup_date>.sql
        │   ...

TODO

  • Add AntiAffinities to outsider nodes
  • Migrate Homeassistant to PostgreSQL instead of MariaDB
  • Move Prometheus connection management to ServiceMonitors instead of ConfigMap
  • Configure Alertmanager with basic webhook (discord)
  • Configure Prometheus alerts
  • Schedule longhorn S3 backups
  • Schedule CloudNativePG S3 backups
  • Restrict metrics endpoint on public services See Gitea repository for example
  • Move from NFS to S3 mounts for NAS volumes
  • Migrate Vaultwarden to PostgreSQL instead of MariaDB
  • Deploy PostgresQL cluster using operator for database HA and easy maintenance - To be tested properly
  • Change host/deployment specific variables to use environment variables (using Kustomize)
  • Write CI/CD pipeline to create environment loaded files Done with Kustomize migration
  • Write CI/CD pipeline to deploy cluster Done with ArgoCD
  • Setup internal traefik with nodeport as reverse proxy for internal only services Done through double ingress class and LB
  • Setup DB container sidecars for automated backups to Longhorn volume
  • Setup secrets configuration through CI/CD variable injection (using Kustomize) Environment modified by SOPS implementation
  • Figure out SOPS secret injection for absent namespaces
  • Explore permission issues when issuing OVH API keys (not working for wildcard and beta.halia.dev subdomain) Supposedly done
  • Setup default users for deployments
  • Setup log and metric monitoring
  • Define namespaces through yaml files
  • Look into CockroachDB for redundant database Judged too complicated, moving to a 1 to 1 relationship between services and databases
  • Configure IP range accessibility through Traefik (Internal vs external services) Impossible because of flannel ip-masq
  • Move secrets to separate, private Git repository ? Done with SOPS
  • Configure NFS connection for media library
  • Research IPv6 configuration for outsider node Impossible in Denmark while using YouSee as an ISP for now (no IPv6 support)
  • Write small script for auto installation of the cluster, to split API calls into 2 stages (solves MetalLB API not found error)
  • Migrate ingresses to traefik kind instead of k8s kind

Notes

Cluster base setup

Setup the cluster's backbone

make dev
# Include SOPS master secret generation
kubectl create secret generic age-key --from-file=~/.sops/key.txt -n sops

NOTE: It might be required to update the metallb IP range as well as traefik LoadBalancerIPs

Convert helm chart to k3s manifest

helm template chart stable/chart --output-dir ./chart

Gitlab backup process

Because gitlab does not offer the possibility to backup a container's data from an external container, a cronjob has been implemented in the custom image used for deployment. NOTE: This does not apply anymore, as a migration is planned to Gitea

VPN configuration for Deluge

Instead of adding an extra networking layer to the whole cluster, it seems like a better idea to just integrate a wireguard connection inside of the deluge image, and self-build everything within Gitlab registry. This image could utilize kubernetes secrets, including a "torrent-vpn" secret produces by the initial wireguard configuration done via Ansible. This ansible script could create one (or more) additional client(s) depending on the inventory configuration, and keep the "torrent-vpn" configuration file within a k3s formated file, inside of the auto-applied directory on CP.
Cf : https://docs.k3s.io/advanced#auto-deploying-manifests
After furhter reflection, it doesn't make sense to have Deluge being part of the cluster. It will be moved to the NAS, as it can run only when the NAS is running. This will also ease the whole VPN configuration.

Development domains

To access a service publicly when developing, the domain name should be _.beta.halia.dev To only expose a service internally, the domain name should be _.beta.entos

Ingresses

To split between external and internal services, two traefik ingresses are implemented through the ingressclass annotation. traefik-external will only allow external access to a given service, while traefik-internal restrict to an internal only access.

Secret management

All secrets are encrypted using SOPS and stored in a private secret repository. Secrets are decrypted on the fly when applied to the kluster using the SOPS Operator.

Inject the AGE key in the cluster to allow the operator to decrypt secrets :

kubectl create secret generic age-key --from-file=<path_to_file> -n sops