feat(monitoring): Move to Prometheus operator

Changes the implementation of Prometheus to use its Operator instead of the regular configmap implementation.
This allows the deployment of metrics through independent ServiceMonitors instead of a centralized ConfigMap.
This update is reflated in Grfana's datasources.
This commit is contained in:
Tanguy Herbron 2023-05-14 21:47:31 +02:00
parent 6068686d30
commit 8b755928a2
11 changed files with 38521 additions and 316 deletions

View File

@ -2,22 +2,22 @@
| Name | Usage | Accessibility | Host | DB type | Additional data | Backup configuration | Loki integration | Prometheus integration | Secret management | Status | Standalone migration |
|-------------------------|--------------------------------------|------------------|-------------------------|------------|----------------------|----------------------|------------------|------------------------|------------------------|-----------------------------------|----------------------|
| Traefik | Reverse proxy and load balancer | Public & Private | Socrates & Pythagoras-b | - | - | - | Configured | Configured | - | Completed<sup>5</sup> | Backbone |
| Traefik | Reverse proxy and load balancer | Public & Private | Socrates & Pythagoras-b | - | - | - | Configured | Not configured | - | Completed<sup>5</sup> | Backbone |
| ArgoCD | Declarative GitOPS CD | Private | Pythagoras-b | - | - | - | Configured | Configured | - | Completed | Backbone |
| Vaultwarden | Password manager | Public | Pythagoras-b | PostgreSQL | - | 4AM K8s CronJob | Configured | Not available | Configured | Completed | Completed |
| Gitlab | Version control system | Public | Pythagoras-b | PostgreSQL | User created content | 5AM internal CronJob | Configured | Configured | Not configured | Partial<sup>4</sup> | Awaiting |
| Radarr | Movie collection manager | Private | Plato | PostgreSQL | - | - | Configured | Configured | Not configured | Partial | Awaiting |
| Radarr | Movie collection manager | Private | Plato | PostgreSQL | - | - | Configured | Not configured | Not configured | Partial | Awaiting |
| Gitlab | Version control system | Public | Pythagoras-b | PostgreSQL | User created content | 5AM internal CronJob | Configured | Not configured | Not configured | Partial<sup>4</sup> | Awaiting |
| Flaresolverr | Cloudflare proxy | Private | Plato | - | - | - | - | - | - | Completed | Awaiting |
| Prometheus | Metrics aggregator | Private | Pythagoras-b | TBD | - | Not configured | Configured | Configured | Not configured | Partial | Awaiting |
| Loki | Log aggregator | Private | Pythagoras-b | TBD | - | Not configured | Configured | Configured | Not configured | Partial | Awaiting |
| Grafana | Graph visualizer | Public | Pythagoras-b | - | - | Not configured | Configured | Configured | Configured | Partial | Awaiting |
| Sonarr | TV shows collection manager | Private | Plato | SQLite | - | Not configured | Configured | Configured | Not configured | Partial | Awaiting |
| Prometheus | Metrics aggregator | Private | Pythagoras-b | TBD | - | Not configured | Configured | Not configured | Not configured | Partial | Awaiting |
| Loki | Log aggregator | Private | Pythagoras-b | TBD | - | Not configured | Configured | Not configured | Not configured | Partial | Awaiting |
| Grafana | Graph visualizer | Public | Pythagoras-b | - | - | Not configured | Configured | Not configured | Configured | Partial | Awaiting |
| Sonarr | TV shows collection manager | Private | Plato | SQLite | - | Not configured | Configured | Not configured | Not configured | Partial | Awaiting |
| Prowlarr | Torrent indexer | Private | Plato | PostgreSQL | - | Not configured | Configured | Not available | Not configured | Partial | Awaiting |
| Jellyfin | Media streaming | Public | Archimedes | SQLite** | - | - | Configured | Configured | Configured<sup>6</sup> | Completed | Awaiting |
| Jellyfin | Media streaming | Public | Archimedes | SQLite** | - | - | Configured | Not configured | Configured<sup>6</sup> | Completed | Awaiting |
| Jellyseerr | Media requesting WebUI | Public | Pythagoras-b | - | - | - | Not configured | Not available | Configured<sup>7</sup> | Awaiting configuration | Awaiting |
| Adguard | DNS ad blocker and custom DNS server | Private | Socrates | - | - | - | Not configured | Not configured | Not configured | Pending configuration<sup>1</sup> | Awaiting |
| Owncloud Infinity Scale | File hosting webUI | Public | Plato | ? | Drive files | Not configured | Configured | Not available | Not configured | Pending configuration<sup>2</sup> | Awaiting |
| Synapse | Matrix server - Message centralizer | Public | Pythagoras-b | PostgreSQL | User medias | 4AM K8s CronJob | Configured | Configured | Not configured | Pending configuration<sup>3</sup> | Awaiting |
| Synapse | Matrix server - Message centralizer | Public | Pythagoras-b | PostgreSQL | User medias | 4AM K8s CronJob | Configured | Not configured | Not configured | Pending configuration<sup>3</sup> | Awaiting |
| therbron.com | Personal website | Public | Socrates | - | - | - | Not configured | Not configured | - | Awaiting configuration | Awaiting |
| Home assistant | Home automation and monitoring | Private | Pythagoras-a | MariaDB | - | Not configured | Not configured | Not configured | Not configured | Awaiting configuration | Awaiting |
| Vikunja | To-do and Kanban boards | Public | Pythagoras-b | - | - | - | Not configured | Not configured | - | Migrate to Gitlab | Awaiting |
@ -65,7 +65,7 @@ longhorn
## TODO
- Add AntiAffinities to `outsider` nodes
- Migrate Homeassistant to PostgreSQL instead of MariaDB
- Move Prometheus connection management to ServiceMonitors instead of ConfigMap
- ~~Move Prometheus connection management to ServiceMonitors instead of ConfigMap~~
- Schedule longhorn S3 backups
- ~~Migrate Vaultwarden to PostgreSQL instead of MariaDB~~
- ~~Deploy PostgresQL cluster using operator for database HA and easy maintenance~~ - To be tested properly

View File

@ -24,6 +24,6 @@ data:
- name: Prometheus
type: prometheus
access: proxy
url: "http://prometheus-svc:9090"
url: "http://prometheus-operated:9090"
version: 1
isDefault: false

File diff suppressed because it is too large Load Diff

View File

@ -6,29 +6,19 @@ rules:
- apiGroups: [""]
resources:
- nodes
- nodes/proxy
- nodes/metrics
- services
- endpoints
- pods
verbs: ["get", "list","watch"]
- apiGroups: [""]
resources:
- configmaps
verbs: ["get"]
- apiGroups:
- extensions
- networking.k8s.io
resources:
- ingresses
verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: default
namespace: monitoring

View File

@ -0,0 +1,12 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: monitoring

View File

@ -1,227 +0,0 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-server-conf
labels:
name: prometheus-server-conf
namespace: monitoring
data:
prometheus.rules: |-
groups:
- name: devopscube demo alert
rules:
- alert: High Pod Memory
expr: sum(container_memory_usage_bytes) > 1
for: 1m
labels:
severity: slack
annotations:
summary: High Memory Usage
prometheus.yml: |-
global:
scrape_interval: 5s
evaluation_interval: 5s
rule_files:
- /etc/prometheus/prometheus.rules
alerting:
alertmanagers:
- scheme: http
static_configs:
- targets:
- "alertmanager.monitoring.svc:9093"
scrape_configs:
- job_name: 'gitlab-node_metrics'
static_configs:
- targets:
- gitlab-svc.gitlab.svc.cluster.local:9100
- job_name: 'gitlab-workhorse_metrics'
static_configs:
- targets:
- gitlab-svc.gitlab.svc.cluster.local:9229
- job_name: 'gitlab-exporter-database_metrics'
metrics_path: "/database"
static_configs:
- targets:
- gitlab-svc.gitlab.svc.cluster.local:9168
- job_name: 'gitab-exporter-sidekiq_metrics'
metrics_path: "/sidekiq"
static_configs:
- targets:
- gitlab-svc.gitlab.svc.cluster.local:9168
- job_name: 'gitlab-sidekiq_metrics'
static_configs:
- targets:
- gitlab-svc.gitlab.svc.cluster.local:8082
- job_name: 'gitlab-redis_metrics'
static_configs:
- targets:
- gitlab-svc.gitlab.svc.cluster.local:9121
- job_name: 'gitlab-postgres_metrics'
static_configs:
- targets:
- gitlab-svc.gitlab.svc.cluster.local:9187
- job_name: 'gitlab-gitaly_metrics'
static_configs:
- targets:
- gitlab-svc.gitlab.svc.cluster.local:9236
- job_name: 'gitlab-rails_metrics'
metrics_path: "/-/metrics"
scheme: https
static_configs:
- targets:
- gitlab-svc.gitlab.svc.cluster.local
- job_name: 'synapse'
scrape_interval: 15s
metrics_path: "/_synapse/metrics"
static_configs:
- targets:
- synapse-svc.synapse.svc.cluster.local:9000
- job_name: 'sonarr'
static_configs:
- targets:
- sonarr-svc.torrent.svc.cluster.local:9707
- job_name: 'radarr'
static_configs:
- targets:
- radarr-svc.torrent.svc.cluster.local:9707
- job_name: 'prowlarr'
static_configs:
- targets:
- prowlarr-svc.torrent.svc.cluster.local:9707
- job_name: 'jellyfin'
metrics_path: "/metrics"
static_configs:
- targets:
- jellyfin-svc.streaming.svc.cluster.local:80
- job_name: 'node-exporter'
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_endpoints_name]
regex: 'node-exporter'
action: keep
- job_name: 'argocd'
metrics_path: "/metrics"
static_configs:
- targets:
- argocd-metrics.argocd.svc.cluster.local:8082
- job_name: 'kubernetes-apiservers'
kubernetes_sd_configs:
- role: endpoints
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
- job_name: 'kubernetes-nodes'
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- job_name: 'kube-state-metrics'
static_configs:
- targets: ['kube-state-metrics.kube-system.svc.cluster.local:8080']
- job_name: 'kubernetes-cadvisor'
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
- job_name: 'kubernetes-service-endpoints'
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name

View File

@ -1,45 +0,0 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus
namespace: monitoring
labels:
app: prometheus
spec:
replicas: 1
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
spec:
containers:
- name: prometheus
image: prom/prometheus
args:
- "--storage.tsdb.retention.time=12h"
- "--config.file=/etc/prometheus/prometheus.yml"
- "--storage.tsdb.path=/prometheus/"
ports:
- containerPort: 9090
resources:
requests:
cpu: 500m
memory: 500M
limits:
cpu: 1
memory: 1Gi
volumeMounts:
- name: prometheus-config-volume
mountPath: /etc/prometheus
- name: prometheus-storage-volume
mountPath: /prometheus/
volumes:
- name: prometheus-config-volume
configMap:
defaultMode: 420
name: prometheus-server-conf
- name: prometheus-storage-volume
emptyDir: {}

View File

@ -1,8 +1,11 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: monitoring
resources:
- bundle.yaml
- serviceaccount.yaml
- clusterrole.yaml
- configmap.yaml
- deployment.yaml
- service.yaml
- clusterrolebinding.yaml
- prometheus.yaml

View File

@ -0,0 +1,14 @@
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: prometheus
spec:
serviceAccountName: prometheus
serviceMonitorNamespaceSelector: {}
serviceMonitorSelector:
matchLabels:
team: core
resources:
requests:
memory: 400Mi
enableAdminAPI: false

View File

@ -1,13 +0,0 @@
apiVersion: v1
kind: Service
metadata:
name: prometheus-svc
namespace: monitoring
spec:
ports:
- name: http
port: 9090
protocol: TCP
targetPort: 9090
selector:
app: prometheus

View File

@ -0,0 +1,4 @@
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus