Accessing rook-ceph in prometheus

a instruction to get rook ceph data in prometheus.

Accessing rook-ceph in prometheus

if you want to access rook ceph data like the actual size of date inside of persistent volumes, then you need first need to create a cluster with kubeopsctl, with clean values for rook ceph and prometheus:

this example uses the rook-ceph namespace for rook-ceph and the monitoring namespace for prometheus, it is possible for other namespaces, but then you need to change the namespaces.

...
rook-ceph: true
harbor: true
opensearch: true
opensearch-dashboards: true
logstash: true
filebeat: true
prometheus: true
opa: false
kubeops-dashboard: false
certman: false
ingress: false
keycloak: false
velero: false

# rookValues:

harborValues: 
  harborpass: "password" # change to your desired password
  databasePassword: "Postgres_Password" # change to your desired password
  redisPassword: "Redis_Password" 
  externalURL: http://10.2.10.110:30002 # change to ip adress of master1

prometheusValues:
  namespace: monitoring
  grafanaUsername: "admin"
  grafanaPassword: "password"
...

without prometheus, the monitors of rook ceph cannot function correctly.

if the cluster is created with these tools, then you can enable the monitors:


rook-ceph: true
harbor: false
opensearch: false
opensearch-dashboards: false
logstash: false
filebeat: false
prometheus: true
opa: false
kubeops-dashboard: false
certman: false
ingress: false
keycloak: false
velero: false

rookValues:
  operator:
    advancedParameters:
      serviceMonitor:
        enabled: true
      monitoring:
        enabled: true
      csi:
        enableMetadata: true
        csiRBDPluginVolumeMount:
          - name: plugin-dir
            mountPath: /csi
        serviceMonitor:
          enabled: true
        enableLiveness: true
  cluster:
    advancedParameters:
      monitoring:
        enabled: true
        createPrometheusRules: true
      cephBlockPoolsVolumeSnapshotClass:
        enabled: true
        name: ceph-block
        isDefault: true

prometheusValues:
  namespace: monitoring
  grafanaUsername: "admin"
  grafanaPassword: "password"
  advancedParameters:
    prometheus-node-exporter:
      extraArgs:
        - --collector.filesystem.mount-points-exclude=^/(dev|proc|sys|var/lib/docker/.+)($|/)
        - --collector.filesystem.fs-types-exclude=^(autofs|binfmt_misc|bpf|cgroup2?|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|iso9660|mqueue|nsfs|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|selinuxfs|squashfs|sysfs|tracefs)$
    kubelet:
      serviceMonitor:
        enabled: true
        scheme: https
        path: /metrics/cadvisor
        bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
        tlsConfig:
          insecureSkipVerify: true
    prometheus:
      additionalServiceMonitors:
        - name: rook-ceph-csi
          namespace: rook-ceph
          selector:
            matchLabels:
              app: csi-metrics
          endpoints:
            - interval: 30s
              path: /metrics
              port: 9080
              scheme: http
              honorLabels: true

prmetheus needs to communicate with the kubelet itself, that’s why prometheus needs to be configured.

after that, you need to create some files:

clusterrolebinding for accessing kubelet:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus-kubelet
rules:
  # erlaubt Zugriff über die API
  - apiGroups: [""]
    resources: ["nodes/proxy"]
    verbs: ["get"]
  # erlaubt direkten Zugriff auf die Kubelet-Metrics-Endpunkte
  - nonResourceURLs:
      - /metrics
      - /metrics/cadvisor
      - /metrics/resource
    verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prometheus-kubelet
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus-kubelet
subjects:
- kind: ServiceAccount
  name: prometheus-kube-prometheus-prometheus
  namespace: monitoring

cluster role for accessing metrics of the csi:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: node-metrics-access
rules:
- apiGroups: [""]
  resources:
  - nodes
  - nodes/proxy
  - nodes/metrics
  - nodes/stats
  verbs:
  - get
  - list
  - watch
- nonResourceURLs:
  - /metrics
  - /metrics/cadvisor
  - /metrics/resource
  verbs:
  - get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: node-metrics-access
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: node-metrics-access
subjects:
- kind: Group
  name: system:nodes
  apiGroup: rbac.authorization.k8s.io

endpoint for better accessability for prometheus:

apiVersion: v1
kind: Endpoints
metadata:
  name: prometheus-kube-prometheus-kubelet
  namespace: kube-system
subsets:
  - addresses:
      - ip: 10.2.10.110
        nodeName: node-01-rhel9
      - ip: 10.2.10.120
        nodeName: node-02-rhel9
      - ip: 10.2.10.130
        nodeName: node-03-rhel9
    ports:
      - name: https-metrics
        port: 10250
        protocol: TCP

service account for RBAC:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus-kubelet
  namespace: monitoring

servicemonitor for kubelet:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: kubelet-metrics
  namespace: monitoring
  labels:
    release: prometheus-kube-prometheus
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: kubelet
  namespaceSelector:
    matchNames:
      - kube-system
  endpoints:
    - port: https-metrics      # Port 10250 auf Kubelet-Service
      scheme: https
      interval: 30s
      bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
      tlsConfig:
        insecureSkipVerify: true

servicemonitor for rook ceph:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: rook-ceph-csi
  namespace: monitoring
  labels:
    release: prometheus
spec:
  selector:
    matchLabels:
      app: csi-metrics  # Label auf dem Service oder Pod
  namespaceSelector:
    matchNames:
      - rook-ceph      # Namespace der CSI Pods
  endpoints:
  - port: csi-http-metrics
    interval: 30s
    path: /metrics
    targetPort: 9080
    scheme: http
    honorLabels: true
  # Optional: scrape hostNetwork Pods direkt über Pod IPs
  podTargetLabels:
    - app
    - contains

you can create these files into a folder, like rook-prometheus, and then you can apply these files:

kubectl apply -f rook-prometheus/

maybe you need to restart the kubelet on the nodes:

systemctl stop kubelet
systemctl stop containerd
systemctl restart containerd
systemctl restart kubelet

on every node.

after that, you can p.e. access the kubelet_volume_stats_used_bytes parameter in the prometheus dashboard, and you can change alarms with the parameter kubelet_volume_stats_inodes to kubelet_volume_stats_used_bytes.