目录
前言
一、Prometheus + Grafana部署
二、监控数据源配置
二、监控服务配置
三、监控报警配置
总结
前言
Prometheus:抓取metrics(指标)数据,然后进行存储
Grafana:提供各种模板或自定义界面展示数据
一、Prometheus + Grafana部署
-
创建docker-compose.yml
version: '2'
networks:
monitor:
driver: bridge
#配置应用
services:
#监控服务
prometheus:
image: prom/prometheus
container_name: prometheus
hostname: prometheus
restart: always
volumes:
- /mydata/pan/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
- /mydata/pan/node_down/node_down.yml:/etc/prometheus/node_down.yml
ports:
- "9090:9090"
networks:
- monitor
#报警插件
alertmanager:
image: prom/alertmanager
container_name: alertmanager
hostname: alertmanager
restart: always
volumes:
- /mydata/pan/alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml
ports:
- "9093:9093"
networks:
- monitor
#界面展示
grafana:
image: grafana/grafana
container_name: grafana
hostname: grafana
restart: always
ports:
- "3000:3000"
networks:
- monitor
#获取服务器信息
node-exporter:
image: quay.io/prometheus/node-exporter
container_name: node-exporter
hostname: node-exporter
restart: always
ports:
- "9100:9100"
networks:
- monitor
#监控docker容器服务
cadvisor:
image: google/cadvisor:latest
container_name: cadvisor
hostname: cadvisor
restart: always
volumes:
- /:/rootfs:ro
- /var/run:/var/run:rw
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
ports:
- "8080:8080"
networks:
- monitor -
创建alertmanager.yml,配置收发邮件邮箱
global:
smtp_smarthost: 'smtp.qq.com:465' #qq服务器
smtp_from: 'xxxx@qq.com' #发邮件的邮箱
smtp_auth_username: 'xxxx@qq.com' #发邮件的邮箱用户名,也就是你的邮箱
smtp_auth_password: 'xxxxx' #发邮件的邮箱密码
smtp_require_tls: false #不进行tls验证
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 10m
receiver: live-monitoring
receivers:
- name: 'live-monitoring'
email_configs:
- to: 'xxxx@qq.com' #收邮件的邮箱 -
创建prometheus.yml
global:
scrape_interval: 15s # 设置间隔15s,默认1分钟.
evaluation_interval: 15s # 每15秒评估一次规则, 默认1分钟.
#设置报警插件
alerting:
alertmanagers:
- static_configs:
- targets: ['192.168.2.170:9093']
# - alertmanager:9093
# 加载规则一次,并根据全局规则定期对其进行评估'evaluation_interval'.
rule_files:
- "node_down.yml"
# - "first_rules.yml"
# - "second_rules.yml"
# 监控配置:
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['192.168.2.170:9090']
- job_name: 'cadvisor'
static_configs:
- targets: ['192.168.2.170:8080']
- job_name: 'node'
scrape_interval: 8s
static_configs:
- targets: ['192.168.2.170:9100'] -
添加报警规则 node_down.yml (node_down.yml为 prometheus targets 监控)
groups:
- name: node_down
rules:
- alert: InstanceDown
expr: up == 0
for: 1m
labels:
user: test
annotations:
summary: "Instance {{ $labels.instance }} down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minutes." -
运行docker-compose
#拉取镜像,创建容器运行
docker-compose up -d
#删除容器,停止运行
docker-compose down
-
查看运行结果及网址访问
Promethus地址:http://192.168.2.170:9090/targets
grafana地址:http://192.168.2.170:3000
(第一次访问:账户密码均为 admin ,登陆后第一次可进行设置密码)
创建docker-compose.yml
version: '2'
networks:
monitor:
driver: bridge
#配置应用
services:
#监控服务
prometheus:
image: prom/prometheus
container_name: prometheus
hostname: prometheus
restart: always
volumes:
- /mydata/pan/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
- /mydata/pan/node_down/node_down.yml:/etc/prometheus/node_down.yml
ports:
- "9090:9090"
networks:
- monitor
#报警插件
alertmanager:
image: prom/alertmanager
container_name: alertmanager
hostname: alertmanager
restart: always
volumes:
- /mydata/pan/alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml
ports:
- "9093:9093"
networks:
- monitor
#界面展示
grafana:
image: grafana/grafana
container_name: grafana
hostname: grafana
restart: always
ports:
- "3000:3000"
networks:
- monitor
#获取服务器信息
node-exporter:
image: quay.io/prometheus/node-exporter
container_name: node-exporter
hostname: node-exporter
restart: always
ports:
- "9100:9100"
networks:
- monitor
#监控docker容器服务
cadvisor:
image: google/cadvisor:latest
container_name: cadvisor
hostname: cadvisor
restart: always
volumes:
- /:/rootfs:ro
- /var/run:/var/run:rw
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
ports:
- "8080:8080"
networks:
- monitor 创建alertmanager.yml,配置收发邮件邮箱
global:
smtp_smarthost: 'smtp.qq.com:465' #qq服务器
smtp_from: 'xxxx@qq.com' #发邮件的邮箱
smtp_auth_username: 'xxxx@qq.com' #发邮件的邮箱用户名,也就是你的邮箱
smtp_auth_password: 'xxxxx' #发邮件的邮箱密码
smtp_require_tls: false #不进行tls验证
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 10m
receiver: live-monitoring
receivers:
- name: 'live-monitoring'
email_configs:
- to: 'xxxx@qq.com' #收邮件的邮箱 创建prometheus.yml
global:
scrape_interval: 15s # 设置间隔15s,默认1分钟.
evaluation_interval: 15s # 每15秒评估一次规则, 默认1分钟.
#设置报警插件
alerting:
alertmanagers:
- static_configs:
- targets: ['192.168.2.170:9093']
# - alertmanager:9093
# 加载规则一次,并根据全局规则定期对其进行评估'evaluation_interval'.
rule_files:
- "node_down.yml"
# - "first_rules.yml"
# - "second_rules.yml"
# 监控配置:
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['192.168.2.170:9090']
- job_name: 'cadvisor'
static_configs:
- targets: ['192.168.2.170:8080']
- job_name: 'node'
scrape_interval: 8s
static_configs:
- targets: ['192.168.2.170:9100'] 添加报警规则 node_down.yml (node_down.yml为 prometheus targets 监控)
groups:
- name: node_down
rules:
- alert: InstanceDown
expr: up == 0
for: 1m
labels:
user: test
annotations:
summary: "Instance {{ $labels.instance }} down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minutes." 运行docker-compose
#拉取镜像,创建容器运行 docker-compose up -d #删除容器,停止运行 docker-compose down
查看运行结果及网址访问
Promethus地址:http://192.168.2.170:9090/targets
grafana地址:http://192.168.2.170:3000
(第一次访问:账户密码均为 admin ,登陆后第一次可进行设置密码)
二、监控数据源配置
打开grafana,创建数据源 promethus (按需可配置其他数据源,如mysql,Elasticsearch等)
选择Prometheus 2.0 Stats
填入promethus地址
保存
二、监控服务配置
grafana模板查看及下载地址:Grafana: The open observability platform | Grafana Labs
- 服务器监控(模板代号:8919)
需监控的服务器上需安装node-exporterversion: '2' networks: monitor: driver: bridge services: node-exporter: image: quay.io/prometheus/node-exporter container_name: node-exporter hostname: node-exporter restart: always ports: - "9100:9100" networks: - monitor配置prometheus.yml文件,在targets中加入不同服务所在的ip:端口(端口为node-exporter的端口scrape_configs: - job_name: 'node' scrape_interval: 8s static_configs: - targets: ['192.168.2.170:9100','192.168.2.169:9100','192.168.1.69:9100']重启prometheus服务,访问http://192.168.2.170:9090/targets 查看是否正确运行
效果如下
- docker监控(模板代号:193)
需要在服务器上安装cadvisor服务
version: '2' networks: monitor: driver: bridge services: cadvisor: image: google/cadvisor:latest container_name: cadvisor hostname: cadvisor restart: always volumes: - /:/rootfs:ro - /var/run:/var/run:rw - /sys:/sys:ro - /var/lib/docker/:/var/lib/docker:ro ports: - "8080:8080" networks: - monitor配置prometheus.yml文件,在targets中加入不同docker服务所在的ip:端口(端口为cadvisor的端口)scrape_configs: - job_name: 'cadvisor' static_configs: - targets: ['192.168.2.170:8080','192.168.2.169:8080','192.168.1.69:8080']重启prometheus服务,访问http://192.168.2.170:9090/targets 查看是否正确运行
打开grafana,选择docker监控模板,进行设置,添加筛选条件
效果如下
- 监控springboot 项目(模板代号:4701)
spirng项目pom中引入 相关依赖
项目application文件中加入以下配置org.springframework.boot spring-boot-starter-actuatorio.micrometer micrometer-registry-prometheusspring: application: name: test server: port: 11111 management: endpoints: web: exposure: # 将 Actuator 的 /actuator/prometheus 端点暴露出来 include: 'prometheus' metrics: tags: application: ${spring.application.name}注意:项目中若有家权限框架,请将路径/actuator/prometheus 加入白名单
启动项目,控制台输出有以下内容即表示成功
o.s.b.a.e.web.EndpointlinksResolver : Exposing 1 endpoint(s) beneath base path '/actuator'
配置prometheus.yml文件scrape_configs: - job_name: '自定义名称' scrape_interval: 5s metrics_path: '/xxxx/actuator/prometheus' #若项目中有配置server.servlet.context-path,则加上对应xxxx路径 static_configs: - targets: ['192.168.1.69:18077','192.168.1.69:18078']重启prometheus服务,访问http://192.168.2.170:9090/targets 查看是否正确运行
查看效果
-
监控mysql(模板代号:7362)
在需监控mysql的服务器安装 mysqld-exporter,编辑docker-compose文件
services:
mysqld-exporter:
image: prom/mysqld-exporter
container_name: mysqld-exporter
hostname: mysqld-exporter
restart: always
ports:
- "9104:9104"
environment:
- DATA_SOURCE_NAME=root:root@(192.168.2.169:3306)/ #username:password@(ip:端口)
networks:
- monitor
配置prometheus.yml文件
scrape_configs:
- job_name: 'mysql-exporter'
scrape_interval: 5s
static_configs:
- targets: ['192.168.2.170:9104']
重启prometheus服务,访问http://192.168.2.170:9090/targets 查看是否正确运行
效果如下
三、监控报警配置
后续补充....
总结
监控流程总体上是,prometheus的配置文件中配置如何获取信息,进行保存,然后grafana配置对应prometheus的数据源,选择合适的模板,进行数据展示。
以上文章仅作为个人学习累积,以便回顾



