配置 Alertmanager 发送告警到钉钉机器人

  • Post author:
  • Post category:其他


续:

kube-prometheus-stack 部署



一、安装钉钉 webhook 插件



1.1 创建钉钉机器人

打开电脑版钉钉,创建一个群,创建自定义机器人,按如下步骤创建:群设置–>智能群助手–>添加机器人–>自定义–>添加

参考:https://developers.dingtalk.com/document/app/custom-robot-access

创建完成后,点击机器人—>机器人设置,查看机器人的 ‘Webhook’ 和 ‘安全设置’:

在这里插入图片描述



1.2 安装钉钉 webhook 插件

github 仓库:https://github.com/timonwong/prometheus-webhook-dingtalk

  • 根据自己的需求修改 dingtalk-webhook.yaml 文件
cat > dingtalk-wehhook.yaml << EOF
apiVersion: v1
kind: ConfigMap
metadata:
  name: dingtalk-template
  namespace: monitoring
  labels:
    app: dingtalk
data:
  template.tmpl: |
    {{ define "__subject" }}[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .GroupLabels.SortedPairs.Values | join " " }} {{ if gt (len .CommonLabels) (len .GroupLabels) }}({{ with .CommonLabels.Remove .GroupLabels.Names }}{{ .Values | join " " }}{{ end }}){{ end }}{{ end }}

    {{ define "__alertmanagerURL" }}{{ .ExternalURL }}/#/alerts?receiver={{ .Receiver }}{{ end }}

    {{ define "__text_alert_list" }}{{ range . }}
    **Labels**
    {{ range .Labels.SortedPairs }}> - {{ .Name }}: {{ .Value | markdown | html }}
    {{ end }}
    **Annotations**
    {{ range .Annotations.SortedPairs }}> - {{ .Name }}: {{ .Value | markdown | html }}
    {{ end }}
    **Source:** [{{ .GeneratorURL }}]({{ .GeneratorURL }})
    {{ end }}{{ end }}

    {{/* Firing */}}

    {{ define "default.__text_alert_list" }}{{ range . }}

    **Trigger Time:** {{ dateInZone "2006.01.02 15:04:05" (.StartsAt) "Asia/Shanghai" }}

    **Summary:** {{ .Annotations.summary }}

    **Description:** {{ .Annotations.description }}

    **Graph:** [📈 ]({{ .GeneratorURL }})

    **Details:**
    {{ range .Labels.SortedPairs }}{{ if and (ne (.Name) "severity") (ne (.Name) "summary") }}> - {{ .Name }}: {{ .Value | markdown | html }}
    {{ end }}{{ end }}
    {{ end }}{{ end }}

    {{/* Resolved */}}

    {{ define "default.__text_resolved_list" }}{{ range . }}

    **Trigger Time:** {{ dateInZone "2006.01.02 15:04:05" (.StartsAt) "Asia/Shanghai" }}

    **Resolved Time:** {{ dateInZone "2006.01.02 15:04:05" (.EndsAt) "Asia/Shanghai" }}

    **Summary:** {{ .Annotations.summary }}

    **Graph:** [📈 ]({{ .GeneratorURL }})

    **Details:**
    {{ range .Labels.SortedPairs }}{{ if and (ne (.Name) "severity") (ne (.Name) "summary") }}> - {{ .Name }}: {{ .Value | markdown | html }}
    {{ end }}{{ end }}
    {{ end }}{{ end }}


    {{/* Default */}}
    {{ define "default.title" }}{{ template "__subject" . }}{{ end }}
    {{ define "default.content" }}#### \[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}\] **[{{ index .GroupLabels "alertname" }}]({{ template "__alertmanagerURL" . }})**
    {{ if gt (len .Alerts.Firing) 0 -}}


    **Alerts Firing**
    {{ template "default.__text_alert_list" .Alerts.Firing }}
    {{- end }}
    {{ if gt (len .Alerts.Resolved) 0 -}}


    **Alerts Resolved**
    {{ template "default.__text_resolved_list" .Alerts.Resolved }}
    {{- end }}
    {{- end }}


    {{/* Legacy */}}
    {{ define "legacy.title" }}{{ template "__subject" . }}{{ end }}
    {{ define "legacy.content" }}#### \[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}\] **[{{ index .GroupLabels "alertname" }}]({{ template "__alertmanagerURL" . }})**
    {{ template "__text_alert_list" .Alerts.Firing }}
    {{- end }}

    {{/* Following names for compatibility */}}
    {{ define "ding.link.title" }}{{ template "default.title" . }}{{ end }}
    {{ define "ding.link.content" }}{{ template "default.content" . }}{{ end }}
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: dingtalk-config
  namespace: monitoring
  labels:
    app: dingtalk
data:
  config.yaml: |
    templates:
    - /config/template.tmpl
    targets:
      webhook:
        url: "https://oapi.dingtalk.com/robot/send?access_token=6d234cf06a3a3b13d20cf84fd44654f"
        # secret for signature
        secret: "SECf84a548ebc0112e095d9aba3b5bc51a1948292191"
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: alertmanager-webhook-dingtalk
  namespace: monitoring
  labels:
    app: webhook-dingtalk
spec:
  replicas: 1
  selector:
    matchLabels:
      app: webhook-dingtalk
  template:
    metadata:
      labels:
        app: webhook-dingtalk
    spec:
      containers:
      - image: timonwong/prometheus-webhook-dingtalk:v2.1.0
        name: prometheus-webhook-dingtalk
        args:
        - --web.listen-address=:8060
        - --config.file=/etc/prometheus-webhook-dingtalk/config.yaml
        volumeMounts:
        - name: webdingtalk-configmap
          mountPath: /etc/prometheus-webhook-dingtalk/
        - name: webdingtalk-template
          mountPath: /config
        ports:
        - containerPort: 8060
          protocol: TCP
          name: http
        resources:
          limits:
            cpu: 100m
            memory: 100Mi
      volumes:
      - name: webdingtalk-configmap
        configMap:
          name: dingtalk-config
      - name: webdingtalk-template
        configMap:
          name: dingtalk-template
---
apiVersion: v1
kind: Service
metadata:
  name: webhook-dingtalk
  namespace: monitoring
  labels:
    app: webhook-dingtalk
spec:
  ports:
  - name: http
    port: 8060
    protocol: TCP
    targetPort: 8060
  selector:
    app: webhook-dingtalk
  type: ClusterIP
EOF
  • 将相关资源对象部署相关集群
kubectl apply -f dingtalk-webhook.yaml



二、Alertmanager 配置



2.1 修改 helm chart 的 values.yaml 文件

  • 修改 values.yaml
vim values.yaml

#### 可能会有多个集群,为了区分不同集群告警,为默认的告警规则添加集群名称相关的标签
defaultRules:
  additionalRuleLabels:
    clusterName: Cluster-test

#### 对 alertmanger 进行配置
alertmanager:
  config:
    global:
      resolve_timeout: 5m
    route:
      group_by: ['alertname', 'alertstate', 'namespace']
      group_wait: 30s
      group_interval: 5m
      repeat_interval: 2h
      receiver: 'webhook1'
    receivers:
    - name: 'webhook1'
      webhook_configs:
      - url: 'http://webhook-dingtalk:8060/dingtalk/webhook/send'  # 这里的 'webhook' 与 dingtalk-config configmap 下面的 targets 配置对应,我们需要根据配置的不同进行替换
        send_resolved: true
  • 更新服务
helm upgrade  prometheus . -n monitoring
#### 检查服务状态是否正常
kubectl get pod -n monitoring
#### 查看日志是否有报错
kubectl log -n monitoring promethues-kube-prometheus-alertmanager -c alertmanager -f
  • 查看告警

    在这里插入图片描述

    在这里插入图片描述



版权声明:本文为cl18707602767原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。