K3s-cluster/README.md

14 KiB

K3s cluster

Name Usage Accessibility Host DB type Additional data Backup configuration Loki integration Prometheus integration Secret management Status Standalone migration
Traefik Reverse proxy and load balancer Public & Private Socrates & Pythagoras-b - - - Configured Configured - Completed5 Backbone
ArgoCD Declarative GitOPS CD Private Pythagoras-b - - - Configured Configured - Completed Backbone
Vaultwarden Password manager Public Pythagoras-b PostgreSQL - 4AM K8s CronJob Configured Not available Configured Completed Completed
Gitea Version control system Public Pythagoras-b PostgreSQL User created content Not configured Configured Not configured Configured Partial4 Awaiting
Grafana Graph visualizer Public Pythagoras-b - - Not configured Configured Not configured Configured Partial Awaiting
Prometheus Metrics aggregator Private Pythagoras-b TBD - Not configured Configured Not configured Not configured Partial Awaiting
Loki Log aggregator Private Pythagoras-b TBD - Not configured Configured Not configured Not configured Partial Awaiting
Adguard DNS ad blocker and custom DNS server Private Socrates - - - Not configured Not configured Not configured Pending configuration1 Awaiting
Synapse Matrix server - Message centralizer Public Pythagoras-b PostgreSQL User medias 4AM K8s CronJob Configured Not configured Not configured Pending configuration3 Awaiting
Home assistant Home automation and monitoring Private Pythagoras-a MariaDB - Not configured Not configured Not configured Not configured Awaiting configuration Awaiting
therbron.com Personal website Public Socrates - - - Not configured Not configured - Awaiting configuration Awaiting
Owncloud Infinity Scale File hosting webUI Public Plato ? Drive files Not configured Configured Not available Not configured Pending configuration2 Awaiting
Radarr Movie collection manager Private Plato PostgreSQL - - Configured Not configured Not configured Partial Awaiting
Flaresolverr Cloudflare proxy Private Plato - - - - - - Completed Awaiting
Sonarr TV shows collection manager Private Plato SQLite - Not configured Configured Not configured Not configured Partial Awaiting
Prowlarr Torrent indexer Private Plato PostgreSQL - Not configured Configured Not available Not configured Partial Awaiting
Jellyfin Media streaming Public Archimedes SQLite** - - Configured Not configured Configured6 Completed Awaiting
Jellyseerr Media requesting WebUI Public Pythagoras-b - - - Not configured Not available Configured7 Awaiting configuration Awaiting
Deluge Torrent client Private Plato - ? - Not configured Not configured Not configured Awaiting configuration Awaiting
Minecraft Vanilla minecraft server for friends Public Archimedes - Game map Not configured Not configured Not configured - Awaiting configuration Awaiting
Satisfactory Satisfactory server for friends Public Archimedes - Game map Not configured Not configured Not configured - Not needed for v1 Awaiting
Space engineers Space engineers server for friends Public Archimedes - Game map Not configured Not configured Not configured - Not needed for v1 Awaiting
Raspsnir Bachelor memorial website Public Pythagoras-b PostgreSQL - Not configured Not configured Not configured - Not needed for v1 Awaiting
Vikunja To-do and Kanban boards Public Pythagoras-b - - - Not configured Not configured - Migrate to Gitea Awaiting
Wiki Documentation manager Public Pythagoras-b - - - Not configured Not configured - Migrate to VuePress and Gitea Awaiting
PaperlessNG PDF viewer and organiser Public Pythagoras-b PostgreSQL - - Not configured Not configured - Research migration into OCIS Awaiting

* Configuration panel only available internally
** Current implementation only support SQLite, making manual backups a necessity
1 Missing automated configuration pipeline for environment variable injection
2 Missing configuration for NAS volume mounting (over network)
3 Missing Longhorn scheduling for saving media_store and secret management
4 Currently migrating from Gitlab installation
5 Missing dashboard configuration
6 Done through volume backup, because not possible otherwise
7 Done, but needs a reimplementation using kustomize for secret separation from configmap

Backup management

Databases

All services needing a database to function come with a sidecar pod running a crontab to automate individual database backups. These backups are saved into a longhorn volume, to benefit from general snapshots later one. Each sidecar pod can only mount the backup folder it has been linked with, and cannot see other services' backups.

Additional data

All additional data needing to be backed up is mounted to a longhorn volume, to also benefit from scheduled backups.

Example :

longhorn
└───backups
    └───vaultwarden
    │   └───<backup_date>.sql
    │   │   ...
    └───gitlab
        └───<backup_date>.sql
        │   ...

TODO

  • Add AntiAffinities to outsider nodes
  • Migrate Homeassistant to PostgreSQL instead of MariaDB
  • Move Prometheus connection management to ServiceMonitors instead of ConfigMap
  • Schedule longhorn S3 backups
  • Migrate Vaultwarden to PostgreSQL instead of MariaDB
  • Deploy PostgresQL cluster using operator for database HA and easy maintenance - To be tested properly
  • Change host/deployment specific variables to use environment variables (using Kustomize)
  • Write CI/CD pipeline to create environment loaded files Done with Kustomize migration
  • Write CI/CD pipeline to deploy cluster Done with ArgoCD
  • Setup internal traefik with nodeport as reverse proxy for internal only services Done through double ingress class and LB
  • Setup DB container sidecars for automated backups to Longhorn volume
  • Setup secrets configuration through CI/CD variable injection (using Kustomize)
  • Explore permission issues when issuing OVH API keys (not working for wildcard and beta.halia.dev subdomain)
  • Setup default users for deployments
  • Setup log and metric monitoring
  • Define namespaces through yaml files
  • Look into CockroachDB for redundant database Judged too complicated, moving to a 1 to 1 relationship between services and databases
  • Configure IP range accessibility through Traefik (Internal vs external services) Impossible because of flannel ip-masq
  • Move secrets to separate, private Git repository ? Done with SOPS
  • Configure NFS connection for media library
  • Research IPv6 configuration for outsider node Impossible in Denmark while using YouSee as an ISP for now (no IPv6 support)
  • Write small script for auto installation of the cluster, to split API calls into 2 stages (solves MetalLB API not found error)
  • Migrate ingresses to traefik kind instead of k8s kind

Notes

Cluster base setup

Setup the cluster's backbone

kubectl apply -k environment/dev

Taint the outsider node to not be scheduled on unless actively setup

kubectl taint nodes outsider type=services:NoSchedule

DO NOT FORGET TO INSTALL THE SOPS PART

NOTE: It might be required to update the metallb IP range as well as traefik LoadBalancerIPs

Convert helm chart to k3s manifest

helm template chart stable/chart --output-dir ./chart

Gitlab backup process

Because gitlab does not offer the possibility to backup a container's data from an external container, a cronjob has been implemented in the custom image used for deployment. NOTE: This does not apply anymore, as a migration is planned to Gitea

VPN configuration for Deluge

Instead of adding an extra networking layer to the whole cluster, it seems like a better idea to just integrate a wireguard connection inside of the deluge image, and self-build everything within Gitlab registry. This image could utilize kubernetes secrets, including a "torrent-vpn" secret produces by the initial wireguard configuration done via Ansible. This ansible script could create one (or more) additional client(s) depending on the inventory configuration, and keep the "torrent-vpn" configuration file within a k3s formated file, inside of the auto-applied directory on CP.
Cf : https://docs.k3s.io/advanced#auto-deploying-manifests
After furhter reflection, it doesn't make sense to have Deluge being part of the cluster. It will be moved to the NAS, as it can run only when the NAS is running. This will also ease the whole VPN configuration.

Development domains

To access a service publicly when developing, the domain name should be *.beta.halia.dev To only expose a service internally, the domain name should be *.beta.entos

Ingresses

To split between external and internal services, two traefik ingresses are implemented through the ingressclass annotation. traefik-external will only allow external access to a given service, while traefik-internal restrict to an internal only access.

Secret management

All secrets are encrypted using SOPS and stored in a private secret repository. Secrets are decrypted on the fly when applied to the kluster using the SOPS Operator.

Inject the AGE key in the cluster to allow the operator to decrypt secrets :

kubectl create secret generic age-key --from-file=<path_to_file> -n sops