栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 系统运维 > 运维 > Linux

Prometheus搭建以及使用

Linux 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

Prometheus搭建以及使用

安装 prometheus:

1.创建用户以及组:useradd -m -s /bin/false prometheus

2.创建目录:

mkdir /etc/prometheus

mkdir /var/lib/prometheus

3.授权:chown prometheus /var/lib/prometheus/

4.下载安装包:

wget https://github.com/prometheus/prometheus/releases/download/v2.23.0/prometheus-2.23.0.linux-amd64.tar.gz

5.解压:tar zxvf prometheus-2.23.0.linux-amd64.tar.gz

6.进入解压目录:cd prometheus-2.23.0.linux-amd64

7.复制文件到路径:

cp prometheus /usr/local/bin

cp promtool /usr/local/bin

8.编辑配置:vim /etc/prometheus/prometheus.yml

global: scrape_interval:15s #设置间隔为每15秒。默认值为每1分钟

Evaluation_interval:15s #每15秒评估一次规则。默认值为每1分钟

scrape_timeout:15s #scrape_timeout设置为全局默认值(10s)

9.防火墙放行:iptables -I INPUT -p tcp -m state --state NEW -m tcp --dport 9090 -j ACCEPT

10.保存:service iptables save

11.重启:/bin/systemctl restart iptables.service

12.创建文件:vi /etc/systemd/system/prometheus.service

[Unit] Description=Prometheus Time Series Collection and Processing Server Wants=network-online.target After=network-online.target

[Service] User=prometheus Group=prometheus Type=simple ExecStart=/usr/local/bin/prometheus --config.file /etc/prometheus/prometheus.yml --storage.tsdb.path /var/lib/prometheus/ --web.console.templates=/etc/prometheus/consoles --web.console.libraries=/etc/prometheus/console_libraries

[Install] WantedBy=multi-user.target

13.重载 systemctl :systemctl daemon-reload

14.启动以及开机自启:systemctl start prometheus && systemctl enable prometheus

15.查看:systemctl status prometheus

16.访问:http://ip:9090

安装 node_exporter :(收集数据程序)

1.创建用户:useradd -m -s /bin/false node_exporter

2.下载安装包:

wget https://github.com/prometheus/node_exporter/releases/download/v1.0.1/node_exporter-1.0.1.linux-amd64.tar.gz

3.解压:tar zxvf node_exporter-1.0.1.linux-amd64.tar.gz

4.复制文件到路径:cp node_exporter-1.0.1.linux-amd64/node_exporter /usr/local/bin

5.授权:chown node_exporter:node_exporter /usr/local/bin/node_exporter

6.编辑启动服务文件:vi /etc/systemd/system/node_exporter.service

[Unit] Description=Prometheus Node Exporter Wants=network-online.target After=network-online.target

[Service] User=node_exporter Group=node_exporter Type=simple ExecStart=/usr/local/bin/node_exporter

[Install] WantedBy=multi-user.target

7.重载 systemctl :systemctl daemon-reload

8.启动以及开机自启:systemctl start node_exporter && systemctl enable node_exporter

9.启动并使节点导出器在系统引导时运行:systemctl enable --now node_exporter.service

10.查看:systemctl status node_exporter

11.防火墙放行:iptables -I INPUT -p tcp -m state --state NEW -m tcp --dport 9100 -j ACCEPT

12.保存:service iptables save

13.重启:/bin/systemctl restart iptables.service

14.prometheus 修改配置:vim /etc/prometheus/prometheus.yml

- job_name: 'node_exporter' #名字

static_configs:

- targets: ['localhost:9090']

- targets: ['192.168.6.160:9100'] #ip加端口

15.重启服务:systemctl restart prometheus

16.网页可打开测试:http://ip:9100/metrics

安装 mysql_exporter:(监控数据库)

文件安装:

1.下载安装包:

wget https://github.com/prometheus/mysqld_exporter/releases/download/v0.12.1/mysqld_exporter-0.12.1.linux-amd64.tar.gz

2.解压:tar zxvf mysqld_exporter-0.12.1.linux-amd64.tar.gz

3.移动加重命名:mv mysqld_exporter-0.12.1.linux-amd64 /usr/local/mysql_exporter

4.编辑文件:vim /usr/local/mysql_exporter/.my.cnf

[client] user=账号 password=密码

host=ip

port=端口

注意:数据库要有一个授权的账号

5.进入目录:cd /usr/local/mysql_exporter/

6.防火墙放行:iptables -I INPUT -p tcp -m state --state NEW -m tcp --dport 9104 -j ACCEPT

7.保存:service iptables save

8.重启:/bin/systemctl restart iptables.service

9.启动:./mysqld_exporter --config.my-cnf=.my.cnf &

10.prometheus 修改配置:vim /etc/prometheus/prometheus.yml

- job_name: '103.138.75.156'

static_configs:

- targets: ['localhost:9090']

- targets: ['103.138.75.156:9104']

11.重启服务:systemctl restart prometheus

12.网页测试:http://ip:9104/metrics

docker安装:

1.创建网络:docker network create my-mysql-network

2.创建 mysql_exporter:

docker run -d -p 9104:9104 --network my-mysql-network --restart="always" --name test -e DATA_SOURCE_NAME="test:123456@(103.138.75.103:33007)/" prom/mysqld-exporter

数据库账号:test

密码:123456

ip:103.138.75.103

端口:33007

3.防火墙放行:iptables -I INPUT -p tcp -m state --state NEW -m tcp --dport 9104 -j ACCEPT

4.保存:service iptables save

5.重启:/bin/systemctl restart iptables.service

6.重启 docker:systemctl restart docker

7.启动容器:docker start test

8.prometheus 修改配置:vim /etc/prometheus/prometheus.yml

- job_name: '103.138.75.156'

static_configs:

- targets: ['localhost:9090']

- targets: ['103.138.75.156:9104']

9.重启服务:systemctl restart prometheus

10.网页测试:http://ip:9104/metrics

安装 blackbox-exporter:(监控网络,这里监控 ping 以及用 docker 安装)

1.获取镜像:docker pull prom/blackbox-exporter

2.创建:docker run -d -p 9115:9115 --name blackbox-exporter prom/blackbox-exporter

3.防火墙放行:iptables -I INPUT -p tcp -m state --state NEW -m tcp --dport 9115 -j ACCEPT

4.保存:service iptables save

5.重启:/bin/systemctl restart iptables.service

6.重启 docker:systemctl restart docker

7.启动容器:docker start blackbox-exporter

8.prometheus 修改配置:vim /etc/prometheus/prometheus.yml

- job_name: 'blackbox_ping' #名字

scrape_interval: 1s #间隔

metrics_path: /probe

params:

module: [icmp]

static_configs:

- targets:

- 103.138.75.156 #目标 ping 的地址

relabel_configs:

- source_labels: [__address__]

target_label: __param_target

- source_labels: [__param_target]

target_label: instance

- target_label: __address__

replacement: 103.138.75.53:9115 #源的 ip 地址以及端口

9.重启服务:systemctl restart prometheus

10.网页测试:http://103.138.75.53:9115/probe?target=103.138.75.156&module=icmp

安装 Grafana:

1.下载安装包:wget https://dl.grafana.com/oss/release/grafana-7.2.1-1.x86_64.rpm

2.安装:yum install grafana-7.2.1-1.x86_64.rpm -y

3.启动以及开机自启:systemctl start grafana-server && systemctl enable grafana-server

4.防火墙放行:iptables -I INPUT -p tcp -m state --state NEW -m tcp --dport 3000 -j ACCEPT

5.保存:service iptables save

6.重启:/bin/systemctl restart iptables.service

7.访问:ip:3000

8.默认账号密码:admin

相对应的模板下载连接:https://grafana.com/grafana/dashboards

安装饼图图示化:

1.下载:git clone https://github.com/grafana/piechart-panel.git --branch release-1.3.8

2.编辑文件:vim /etc/grafana/grafana.ini

[plugin.piechart] path = /home/your/clone/dir/piechart-panel ##文件路径

3.重启:service grafana-server restart

添加监控模板:(前提是已经配置好,导入就可以使用)

主机下载链接:https://grafana.com/grafana/dashboards/8307

数据库下载链接:https://grafana.com/grafana/dashboards/7362

数据库主从下载链接:https://grafana.com/grafana/dashboards/7371

ping 的下载链接:https://grafana.com/grafana/dashboards/12275

1.添加:

 2.可创建文件夹:

3.导入:

 

告警:

1.下载钉钉插件:

wget https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v0.3.0/prometheus-webhook-dingtalk-0.3.0.linux-amd64.tar.gz

2.解压:tar -zxf prometheus-webhook-dingtalk-0.3.0.linux-amd64.tar.gz -C /opt/

3.重命名:

mv /opt/prometheus-webhook-dingtalk-0.3.0.linux-amd64 /opt/prometheus-webhook-dingtalk

4.创建钉钉机器人:

        

注意:如有其他的字眼需要触发,也要在这里加上关键字

5.方便启动配置:vim /etc/systemd/system/prometheus-webhook-dingtalk.service

[Unit]

Description=prometheus-webhook-dingtalk

After=network-online.target

[Service]

Restart=on-failure

ExecStart=/opt/prometheus-webhook-dingtalk/prometheus-webhook-dingtalk --ding.profile=ops_dingding=自己钉钉机器人的Webhook地址

[Install]

WantedBy=multi-user.target

6.重载 systemctl 命令:systemctl daemon-reload

7.启动以及开机自启:

systemctl start prometheus-webhook-dingtalk && systemctl enable prometheus-webhook-dingtalk

8.查看:ss -tnl | grep 8060

9.测试:

curl -H "Content-Type: application/json" -d '{"msgtype":"text","text":{"content":"告警"}}' 自己钉钉机器人的Webhook地址

10.安装 Alertmanager:

wget https://github.com/prometheus/alertmanager/releases/download/v0.19.0/alertmanager-0.19.0.linux-amd64.tar.gz

11.解压:tar xf alertmanager-0.19.0.linux-amd64.tar.gz

12.移动以及重命名:mv alertmanager-0.19.0.linux-amd64 alertmanager

13.编辑文件:vim /opt/alertmanager/alertmanager.yml

global:

resolve_timeout: 1m #每一分钟检查一次是否恢复

route:

group_by: ['alertname'] #采用哪个标签来作为分组依据

group_wait: 10s #告警产生后等待10s,如果有同组告警一起发出

group_interval: 10s #两组告警的间隔时间

repeat_interval: 1h #重复告警的间隔时间

receiver: 'warning' #接收人,一般用机器人名字

routes:

- receiver: 'warning' #告警接收人

group_wait: 10s

match_re:

alertname: 内存使用率|CPU使用率|磁盘使用率 #这里是告警规则中的名字定义 (alert)

receivers:

- name: 'warning'

webhook_configs:

- url: 'http://localhost:8060/dingtalk/ops_dingding/send' #这里注意刚刚插件服务里的名字

send_resolved: true #警报被解决之后是否通知

# 当与另一组匹配器的规则匹配时,仅其中一组失效,前提是两个告警组必须有相同的标签

inhibit_rules:

- source_match:

severity: 'critical'

target_match:

severity: 'warning'

equal: ['alertname', 'dev', 'instance']

14.进入目录:cd /opt/alertmanager/

15.放后台启动:

./alertmanager --config.file=alertmanager.yml --cluster.advertise-address=0.0.0.0:9093 &

16.查看:netstat -anput | grep 9093

17.创建规则目录:mkdir /opt/prometheus/rules

18.定义规则:vim /opt/prometheus/rules/cpu_usage.yml

groups:

- name: CPU告警规则 #定义名字

rules:

- alert: CPU使用率 #跟 alertmanager.yml 文件的名字定义要一样

expr: 100 - (avg by (instance)(irate(node_cpu_seconds_total{mode="idle"}[1m]) )) * 100 > 80

for: 1m #在1分钟内

labels:

user: prometheus #用户,有时 root 用户会报错

severity: warning

annotations:

description: "服务器CPU使用率超过80% (当前:{{ $value }}%)"

19.编辑 prometheus 文件:vim /opt/prometheus/prometheus.yml

alerting:

alertmanagers:

- static_configs:

- targets:

- 103.138.75.3:9093 #prometheus 的 ip 以及 alertmanager 端口

rule_files:

- "rules/*.yml" #规则路径,必须放在prometheus主配置下

20.重启即可:systemctl restart prometheus

告警模板官网:https://awesome-prometheus-alerts.grep.to/rules#prometheus-self-monitoring

CPU大于80%告警:

groups:

- name: CPU告警规则

rules:

- alert: CPU使用率

expr: 100 - (avg by (instance)(irate(node_cpu_seconds_total{mode="idle"}[1m]) )) * 100 > 80

for: 1m

labels:

user: prometheus

severity: warning

annotations:

description: "服务器CPU使用率超过80% (当前:{{ $value }}%)"

内存大于75%告警:

groups:

- name: 内存告警规则

rules:

- alert: 内存使用率

expr: (node_memory_MemTotal_bytes - (node_memory_MemFree_bytes+node_memory_Buffers_bytes+node_memory_Cached_bytes )) / node_memory_MemTotal_bytes * 100 > 75

for: 1m

labels:

user: prometheus

severity: warning

annotations:

description: "内存使用率超过75% (当前:{{ $value }}%)"

磁盘大于90%告警:

groups:

- name: 磁盘告警规则

rules:

- alert: 磁盘使用率

expr: (node_filesystem_avail_bytes * 100) / node_filesystem_size_bytes < 85 and ON (instance, device, mountpoint) node_filesystem_readonly == 0

for: 1m

labels:

user: prometheus

severity: warning

annotations:

description: "磁盘分区空间使用率超过85% "

ping 延迟大于300 ms 告警:

groups:

- name: ping告警

rules:

- alert: ping告警

expr: avg_over_time(probe_icmp_duration_seconds[1m]) > 0.3

for: 1m

labels:

user: prometheus

severity: warning

annotations:

summary: Blackbox probe slow ping (instance {{ $labels.instance }})

description: "Blackbox ping took more than 1sn VALUE = {{ $value }}n LABELS = {{ $labels }}"

主机挂掉告警:

groups:

- name: 主机告警规则

rules:

- alert: 主机失去联系

expr: up == 0

for: 1m

labels:

user: prometheus

severity: warning

annotations:

description: "主机{{ $labels.instance }}已经失去联系超过1分钟"

数据库主从告警:

第一个配置:

groups:

- name: 主从告警规则

rules:

- alert: 主从告警

expr: mysql_slave_status_master_server_id > 0 and ON (instance) mysql_slave_status_slave_io_running == 0

for: 0m

labels:

user: prometheus

severity: critical

annotations:

summary: MySQL Slave IO thread not running (instance {{ $labels.instance }})

description: "MySQL Slave IO thread not running on {{ $labels.instance }}n VALUE = {{ $value }}n LABELS = {{ $labels }}"

第二个配置:

groups:

- name: 主从告警规则

rules:

- alert: 主从告警

expr: mysql_slave_status_master_server_id > 0 and ON (instance) mysql_slave_status_slave_sql_running == 0

for: 0m

labels:

user: prometheus

severity: critical

annotations:

summary: MySQL Slave SQL thread not running (instance {{ $labels.instance }})

description: "MySQL Slave SQL thread not running on {{ $labels.instance }}n VALUE = {{ $value }}n LABELS = {{ $labels }}"

数据库关闭告警:

groups:

- name: 数据库告警规则

rules:

- alert: 数据库失去联系

expr: mysql_up == 0

for: 0m

labels:

user: prometheus

severity: critical

annotations:

summary: MySQL down (instance {{ $labels.instance }})

description: "MySQL instance is down on {{ $labels.instance }}n VALUE = {{ $value }}n LABELS = {{ $labels }}"

数据库连接大于70%告警:

groups:

- name: 连接告警规则

rules:

- alert: 连接告警

expr: avg by (instance) (rate(mysql_global_status_threads_connected[1m])) / avg by (instance) (mysql_global_variables_max_connections) * 100 > 70

for: 1m

labels:

user: prometheus

severity: warning

annotations:

summary: MySQL too many connections (> 70%) (instance {{ $labels.instance }})

description: "More than 80% of MySQL connections are in use on {{ $labels.instance }}n VALUE = {{ $value }}n LABELS = {{ $labels }}"

数据库大于60%高线程告警:

groups:

- name: 高线程运行告警规则

rules:

- alert: 高线程运行告警

expr: avg by (instance) (rate(mysql_global_status_threads_running[1m])) / avg by (instance) (mysql_global_variables_max_connections) * 100 > 60

for: 1m

labels:

user: prometheus

severity: warning

annotations:

summary: MySQL high threads running (instance {{ $labels.instance }})

description: "More than 60% of MySQL connections are in running state on {{ $labels.instance }}n VALUE = {{ $value }}n LABELS = {{ $labels }}"

数据库复制延迟大于30s告警:

groups:

- name: 数据库复制延迟告警规则

rules:

- alert: 数据库复制延迟告警

expr: mysql_slave_status_master_server_id > 0 and ON (instance) (mysql_slave_status_seconds_behind_master - mysql_slave_status_sql_delay) > 30

for: 1m

labels:

user: prometheus

severity: critical

annotations:

summary: MySQL Slave replication lag (instance {{ $labels.instance }})

description: "MySQL replication lag on {{ $labels.instance }}n VALUE = {{ $value }}n LABELS = {{ $labels }}"

数据库慢查询告警:

groups:

- name: 数据库慢查询告警规则

rules:

- alert: 慢查询告警

expr: increase(mysql_global_status_slow_queries[1m]) > 0

for: 1m

labels:

user: prometheus

severity: warning

annotations:

summary: MySQL slow queries (instance {{ $labels.instance }})

description: "MySQL server mysql has some new slow query.n VALUE = {{ $value }}n LABELS = {{ $labels }}"

数据库日志写入等待大于10s告警:

groups:

- name: 数据库日志写入等待告警规则

rules:

- alert: 数据库日志写入等待告警

expr: rate(mysql_global_status_innodb_log_waits[15m]) > 10

for: 0m

labels:

user: prometheus

severity: warning

annotations:

summary: MySQL InnoDB log waits (instance {{ $labels.instance }})

description: "MySQL innodb log writes stallingn VALUE = {{ $value }}n LABELS = {{ $labels }}"

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/837143.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号