一、概述

在版本4.0+ grafana中增加了Alerting 告警模块,丰厚了grafana功用,曾经告警需要凭借AlertManager,但是有grafana告警模块之后就能够不运用AlertManager告警了,但是grafana也支撑对接AlertManager,所以仍是十分方面,又能够省区一个组件的保护和资源开销。

下图概述了 Grafana 告警的作业原理,并向您介绍了一些要害概念,这些概念协同作业并构成了咱们灵敏而强壮的警报引擎的中心。

【云原生】Grafana Alerting 告警模块介绍与实战操作

特征:

  • 一页包含所有警报:单个 Grafana 警报页面将 Grafana 办理的警报和驻留在与 Prometheus 兼容的数据源中的警报整合到一个方位。
  • 多维度告警:警报规矩能够为每个警报规矩创立多个单独的警报实例(称为多维警报),使你能够强壮而灵敏地经过单个警报来了解整个体系。
  • 路由警报:根据您界说的标签将每个警报实例路由到特定的联系点。告诉战略是一组规矩,用于将警报路由到联系点的方位、时刻和方法。
  • 按捺告警:按捺告警允许您中止接纳来自一个或多个警报规矩的耐久告诉。您还能够根据特定条件部分暂停警报。
  • 按捺告警时刻段:运用按捺告警时刻段设置,您能够指定不期望生成或发送新告诉的时刻间隔。您还能够将警报告诉冻结在重复时刻段内,例如在保护期间。

官方文档:grafana.com/docs/grafan…
关于Grafana其它模块的介绍能够参考我这篇文章:【云原生】Grafana 介绍与实战操作

告警装备全过程如下图:

【云原生】Grafana Alerting 告警模块介绍与实战操作

二、Grafana Alerting 模块介绍

【云原生】Grafana Alerting 告警模块介绍与实战操作

  • Alert rules(告警规矩)——设置确认是否触发警报实例的评估条件。告警规矩由一个或多个查询和表达式、条件、核算频率以及满足条件的继续时刻(可选)组成。
  • Contact points(联络点即告警通道)——界说在警报触发时如何告诉联系人。咱们支撑多种 告警通道,例如:邮件、webhook、alertmanager、钉钉等等。
  • Notification policies(告诉战略)——设置警报的路由方位、时刻和方法。每个告诉战略指定一组标签匹配器,以指示它们负责哪些警报。告诉战略分配有一个由一个或多个告诉程序组成的联系点。
  • Silences(告警按捺)——能够设置某个时刻段不告警,例如:体系升级或许阶段。

三、装备图表

图表装备能够参考我这篇文章:【云原生】Grafana 介绍与实战操作

四、告警告警规矩

进入修改界面,能够是下图Edit进入修改界面,也能够经过快捷方法“选中图表-》按e

【云原生】Grafana Alerting 告警模块介绍与实战操作
【云原生】Grafana Alerting 告警模块介绍与实战操作
【云原生】Grafana Alerting 告警模块介绍与实战操作
【云原生】Grafana Alerting 告警模块介绍与实战操作
装备相关信息
【云原生】Grafana Alerting 告警模块介绍与实战操作
装备link,能够在告警里显示,就能够跳转到相关监控项图表
【云原生】Grafana Alerting 告警模块介绍与实战操作

告警状况改变Normal-》Padding-》Firing

【云原生】Grafana Alerting 告警模块介绍与实战操作
【云原生】Grafana Alerting 告警模块介绍与实战操作
【云原生】Grafana Alerting 告警模块介绍与实战操作

五、装备告警通道(Contact points)

1)Email

1、装备smtp(grafana.ini)

[smtp]
enabled = true
host = "smtp.qq.com:465"
user = "xxxxxx@qq.com"
# If the password contains # or ; you have to wrap it with triple quotes. Ex """#password;"""
password = "xxxxxx"
;cert_file =
;key_file =
;skip_verify = false
from_address = xxxxxx@qq.com
from_name = Grafana
# EHLO identity in SMTP dialog (defaults to instance_name)
;ehlo_identity = dashboard.example.com
# SMTP startTLS policy (defaults to 'OpportunisticStartTLS')
;startTLS_policy = NoStartTLS

【温馨提示】上面装备记住换成自己的邮箱暗码。

重启grafana

systemctl restart grafana-server

2、装备音讯模板

{{ define "myalert" }}
  [{{.Status}}] {{ .Labels.alertname }}
  Labels:
  {{ range .Labels.SortedPairs }}
    {{ .Name }}: {{ .Value }}
  {{ end }}
  {{ if gt (len .Annotations) 0 }}
  Annotations:
  {{ range .Annotations.SortedPairs }}
    {{ .Name }}: {{ .Value }}
  {{ end }}
  {{ end }}
  {{ if gt (len .SilenceURL ) 0 }}
    Silence alert: {{ .SilenceURL }}
  {{ end }}
  {{ if gt (len .DashboardURL ) 0 }}
    Go to dashboard: {{ .DashboardURL }}
  {{ end }}
{{ end }}
{{ define "mymessage" }}
  {{ if gt (len .Alerts.Firing) 0 }}
    {{ len .Alerts.Firing }} firing:
    {{ range .Alerts.Firing }} {{ template "myalert" .}} {{ end }}
  {{ end }}
  {{ if gt (len .Alerts.Resolved) 0 }}
    {{ len .Alerts.Resolved }} resolved:
    {{ range .Alerts.Resolved }} {{ template "myalert" .}} {{ end }}
  {{ end }}
{{ end }}

3、装备告警通道

【云原生】Grafana Alerting 告警模块介绍与实战操作
上面装备好后就等待着告警就ok了。告警信息示例如下:
【云原生】Grafana Alerting 告警模块介绍与实战操作

2)WebHook

【云原生】Grafana Alerting 告警模块介绍与实战操作

告警示例 JSON

{
  "receiver": "My Super Webhook",
  "status": "firing",
  "orgId": 1,
  "alerts": [
    {
      "status": "firing",
      "labels": {
        "alertname": "High memory usage",
        "team": "blue",
        "zone": "us-1"
      },
      "annotations": {
        "description": "The system has high memory usage",
        "runbook_url": "https://myrunbook.com/runbook/1234",
        "summary": "This alert was triggered for zone us-1"
      },
      "startsAt": "2021-10-12T09:51:03.157076+02:00",
      "endsAt": "0001-01-01T00:00:00Z",
      "generatorURL": "https://play.grafana.org/alerting/1afz29v7z/edit",
      "fingerprint": "c6eadffa33fcdf37",
      "silenceURL": "https://play.grafana.org/alerting/silence/new?alertmanager=grafana&matchers=alertname%3DT2%2Cteam%3Dblue%2Czone%3Dus-1",
      "dashboardURL": "",
      "panelURL": "",
      "valueString": "[ metric='' labels={} value=14151.331895396988 ]"
    },
    {
      "status": "firing",
      "labels": {
        "alertname": "High CPU usage",
        "team": "blue",
        "zone": "eu-1"
      },
      "annotations": {
        "description": "The system has high CPU usage",
        "runbook_url": "https://myrunbook.com/runbook/1234",
        "summary": "This alert was triggered for zone eu-1"
      },
      "startsAt": "2021-10-12T09:56:03.157076+02:00",
      "endsAt": "0001-01-01T00:00:00Z",
      "generatorURL": "https://play.grafana.org/alerting/d1rdpdv7k/edit",
      "fingerprint": "bc97ff14869b13e3",
      "silenceURL": "https://play.grafana.org/alerting/silence/new?alertmanager=grafana&matchers=alertname%3DT1%2Cteam%3Dblue%2Czone%3Deu-1",
      "dashboardURL": "",
      "panelURL": "",
      "valueString": "[ metric='' labels={} value=47043.702386305304 ]"
    }
  ],
  "groupLabels": {},
  "commonLabels": {
    "team": "blue"
  },
  "commonAnnotations": {},
  "externalURL": "https://play.grafana.org/",
  "version": "1",
  "groupKey": "{}:{}",
  "truncatedAlerts": 0,
  "title": "[FIRING:2]  (blue)",
  "state": "alerting",
  "message": "**Firing**\n\nLabels:\n - alertname = T2\n - team = blue\n - zone = us-1\nAnnotations:\n - description = This is the alert rule checking the second system\n - runbook_url = https://myrunbook.com\n - summary = This is my summary\nSource: https://play.grafana.org/alerting/1afz29v7z/edit\nSilence: https://play.grafana.org/alerting/silence/new?alertmanager=grafana&matchers=alertname%3DT2%2Cteam%3Dblue%2Czone%3Dus-1\n\nLabels:\n - alertname = T1\n - team = blue\n - zone = eu-1\nAnnotations:\nSource: https://play.grafana.org/alerting/d1rdpdv7k/edit\nSilence: https://play.grafana.org/alerting/silence/new?alertmanager=grafana&matchers=alertname%3DT1%2Cteam%3Dblue%2Czone%3Deu-1\n"
}

这儿经过python的去写webhook,因为条件有限,仍是经过webhook转到邮箱发告警,一般企业会经过webhook转钉钉,微信,zabbix等等。

1、编写webhook api服务

#!/usr/bin/python3
# -*- coding: utf-8 -*-
# @Time     : 2022/12/24 11:03
# @Author   : liugp
# @Email    : liugp@163.com
# @File     : GrafanaWebHook.py
import json
import smtplib
from email.mime.text import MIMEText
from email.header import Header
from flask import Flask, request
# pip3 install flask
app = Flask(__name__)
class GrafanaWebHook:
    def __init__(self):
        # 第三方 SMTP 服务信息
        self.mail_host = "smtp.qq.com"
        self.mail_user = "xxxxxx@qq.com"
        self.mail_pass = "xxxxxx"
        self.sender = "xxxxxx@qq.com"
        self.receiver = "xxxxxx@163.com"  # 接纳邮件,可设置为你的QQ邮箱或许其他邮箱
    def send_mail(self, title, status, messages):
        print(messages)
        for message in messages:
            message['panelURL'] = str(message['panelURL']).replace("localhost:3000","192.168.182.110:3000")
            print(message)
            if not 'description' in message['annotations'].keys():
               message['annotations']['description'] = "test"
            message = MIMEText('grafana alert:' + title + '\n告警时刻:' + str(message['startsAt']) +
                               '\n告警状况:' + str(status) + '\n告警内容:' + str(
                message['annotations']['description']) + '\n告警面板:' + str(message['silenceURL']) + '', 'plain', 'utf-8')
            message['From'] = self.sender
            message['To'] = self.receiver
            subject = title
            message['Subject'] = Header(subject, 'utf-8')
            try:
                smtpObj = smtplib.SMTP_SSL(self.mail_host, 465)
                smtpObj.login(self.mail_user, self.mail_pass)
                smtpObj.sendmail(self.sender, self.receiver, message.as_string())
                print("邮件发送成功")
                return True
            except smtplib.SMTPException as e:
                print("Error: 无法发送邮件", e)
                return False
    def getAlertData(self):
        alertData = request.get_data()
        # 将str类型的数据转换为dict类型
        alertData = json.loads(alertData)
        #print(alertData)
        return alertData
@app.route('/webhook', methods=["POST"])
def webhook_server():
    gw = GrafanaWebHook()
    alertData = gw.getAlertData()
    title = alertData['title']
    status = alertData['status']
    messages = alertData['alerts']
    ret = gw.send_mail(title, status, messages)
    if ret:
      return {"status":"ok"}
    else:
      return {"status":"error"}
if __name__ == "__main__":
    app.run(debug=False, host='0.0.0.0', port=18088)

【温馨提示】运用时注意把上面的邮箱和暗码修改哦!!!

【云原生】Grafana Alerting 告警模块介绍与实战操作

2、在grafana页面上装备

【云原生】Grafana Alerting 告警模块介绍与实战操作

装备好后就能够等待告警,告警示例如下:

【云原生】Grafana Alerting 告警模块介绍与实战操作

3)Alertmanager

装备如下:

【云原生】Grafana Alerting 告警模块介绍与实战操作
这儿首要讲了三种告警通道,其它告警通道小伙伴能够自行测验验证,有疑问的小伙伴也欢迎给我留言,后续会继续更新【云原生+大数据】相关的文章,请小伙伴耐心等待~

【云原生】Grafana Alerting 告警模块介绍与实战操作