포스트

2026-02-07-Kubespary HA & Upgrade - 5 모니터링 설정

목표 모니터링을 위해 ETCD 메트릭을 수집하도록 설정

NFS subdir external provisioner 설치

  • NFS subdir external provisioner 설치 : admin-lb 에 NFS Server(/srv/nfs/share) 설정 되어 있음
1
2
3
4
5
6
kubectl create ns nfs-provisioner
helm repo add nfs-subdir-external-provisioner https://kubernetes-sigs.github.io/nfs-subdir-external-provisioner/
helm install nfs-provisioner nfs-subdir-external-provisioner/nfs-subdir-external-provisioner -n nfs-provisioner \
    --set nfs.server=192.168.10.10 \
    --set nfs.path=/srv/nfs/share \
    --set storageClass.defaultClass=true

스토리지 클래스 확인

1
2
3
4
kubectl get sc

# 파드 확인
kubectl get pod -n nfs-provisioner -owide

kube-prometheus-stack 설치, 대시보드 추가

  • helm 추가
1
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# 파라미터 파일 생성
cat <<EOT > monitor-values.yaml
prometheus:
  prometheusSpec:
    scrapeInterval: "20s"
    evaluationInterval: "20s"
    storageSpec:
      volumeClaimTemplate:
        spec:
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 10Gi
    additionalScrapeConfigs:
      - job_name: 'haproxy-metrics'
        static_configs:
          - targets:
              - '192.168.10.10:8405'
    externalLabels:
      cluster: "myk8s-cluster"
  service:
    type: NodePort
    nodePort: 30001

grafana:
  defaultDashboardsTimezone: Asia/Seoul
  adminPassword: prom-operator
  service:
    type: NodePort
    nodePort: 30002

alertmanager:
  enabled: false
defaultRules:
  create: false
kubeProxy:
  enabled: false
prometheus-windows-exporter:
  prometheus:
    monitor:
      enabled: false
EOT
  • 배포
1
2
helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack --version 80.13.3 \
-f monitor-values.yaml --create-namespace --namespace monitoring
  • 확인
1
2
3
4
helm list -n monitoring
kubectl get pod,svc,ingress,pvc -n monitoring
kubectl get prometheus,servicemonitors,alertmanagers -n monitoring
kubectl get crd | grep monitoring
  • 각각 웹 접속 실행 : NodePort 접속
1
2
3
4
5
6
7
8
9
open http://192.168.10.14:30001 # prometheus
open http://192.168.10.14:30002 # grafana : 접속 계정 admin / prom-operator

# 프로메테우스 버전 확인
kubectl exec -it sts/prometheus-kube-prometheus-stack-prometheus -n monitoring -c prometheus -- prometheus --version

# 그라파나 버전 확인
kubectl exec -it -n monitoring deploy/kube-prometheus-stack-grafana -- grafana --version

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
curl -o 12693_rev12.json https://grafana.com/api/dashboards/12693/revisions/12/download
curl -o 15661_rev2.json https://grafana.com/api/dashboards/15661/revisions/2/download
curl -o k8s-system-api-server.json https://raw.githubusercontent.com/dotdc/grafana-dashboards-kubernetes/refs/heads/master/dashboards/k8s-system-api-server.json

# sed 명령어로 uid 일괄 변경 : 기본 데이터소스의 uid 'prometheus' 사용
sed -i -e 's/${DS_PROMETHEUS}/prometheus/g' 12693_rev12.json
sed -i -e 's/${DS__VICTORIAMETRICS-PROD-ALL}/prometheus/g' 15661_rev2.json
sed -i -e 's/${DS_PROMETHEUS}/prometheus/g' k8s-system-api-server.json

# my-dashboard 컨피그맵 생성 : Grafana 포드 내의 사이드카 컨테이너가 grafana_dashboard="1" 라벨 탐지!
kubectl create configmap my-dashboard --from-file=12693_rev12.json --from-file=15661_rev2.json --from-file=k8s-system-api-server.json -n monitoring
kubectl label configmap my-dashboard grafana_dashboard="1" -n monitoring

# 대시보드 경로에 추가 확인
kubectl exec -it -n monitoring deploy/kube-prometheus-stack-grafana -- ls -l /tmp/dashboards

ETCD 메트릭 수집 될 수 있게 설정

1
2
3
ssh k8s-node1 ss -tnlp | grep etcd
ssh k8s-node1 ps -ef | grep etcd
cat roles/etcd/templates/etcd.env.j2 | grep -i metric
1
2
3
4
cat << EOF >> inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml
etcd_metrics: true
etcd_listen_metrics_urls: "http://0.0.0.0:2381"
EOF
1
tail -n 5 inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml
  • 모니터링
    • k8s-node1
1
watch -d "etcdctl.sh member list -w table"
  • admin-lb
1
while true; do echo ">> k8s-node1 <<"; ssh k8s-node1 etcdctl.sh endpoint status -w table; echo; echo ">> k8s-node2 <<"; ssh k8s-node2 etcdctl.sh endpoint status -w table; echo ">> k8s-node3 <<"; ssh k8s-node3 etcdctl.sh endpoint status -w table; sleep 1; done
  • etcd 재시작
1
2
ansible-playbook -i inventory/mycluster/inventory.ini -v cluster.yml --tags "etcd" --list-tasks
ansible-playbook -i inventory/mycluster/inventory.ini -v cluster.yml --tags "etcd" --limit etcd -e kube_version="1.32.9"
  • 확인
1
2
3
ssh k8s-node1 etcdctl.sh member list -w table
for i in {1..3}; do echo ">> k8s-node$i <<"; ssh k8s-node$i etcdctl.sh endpoint status -w table; echo; done

  • 백업 확인
1
for i in {1..3}; do echo ">> k8s-node$i <<"; ssh k8s-node$i tree /var/backups; echo; done
1
2
3
4
5
ssh k8s-node1 ss -tnlp | grep etcd

curl -s http://192.168.10.11:2381/metrics
curl -s http://192.168.10.12:2381/metrics
curl -s http://192.168.10.13:2381/metrics
  • 스크래핑 설정 추가
1
2
3
4
5
6
7
8
9
10
11
12
cat <<EOF > monitor-add-values.yaml
prometheus:
  prometheusSpec:
    additionalScrapeConfigs:
      - job_name: 'etcd'
        metrics_path: /metrics 
        static_configs:
          - targets:
              - '192.168.10.11:2381'
              - '192.168.10.12:2381'
              - '192.168.10.13:2381'
EOF
  • 헬름 업그레이드로 적용
1
2
3
helm get values -n monitoring kube-prometheus-stack
helm upgrade kube-prometheus-stack prometheus-community/kube-prometheus-stack --version 80.13.3 \
--reuse-values -f monitor-add-values.yaml --namespace monitoring
  • 확인
1
helm get values -n monitoring kube-prometheus-stack
  • (옵션) 불필요 servicemonitor etcd 제거 : 반영에 다소 시간 소요
1
2
kubectl get servicemonitors.monitoring.coreos.com -n monitoring kube-prometheus-stack-kube-etcd -o yaml
kubectl delete servicemonitors.monitoring.coreos.com -n monitoring kube-prometheus-stack-kube-etcd
이 기사는 저작권자의 CC BY 4.0 라이센스를 따릅니다.