本站的监控系统从 Uptime Robot 换到 Uptime Kuma 再到 Nezha，总觉得它们太过于儿戏，很多功能不够严谨，甚至有些功能隐含安全风险。为了一次到位，我决定将服务器的监控系统换成 Prometheus + Grafana.

Prometheus 是一个开源的、工业级的监控框架，通过通用或者定制的数据导出器提供数据源。Grafana 是一个开源的数据分析和可视化前端页面，可以为 Prometheus 提供可视化的图表。

本篇文章就以最快的速度，基于 Docker 带大家部署好 Prometheus + Grafana，对于个人场景来说我认为本篇文章的内容已经完全够用。

0 目录

硬件要求
部署 Prometheus
部署 Grafana
- 3.1 启动 Grafana 容器
- 3.2 添加 Prometheus 数据源
添加 Exporter
- 4.1 node_exporter
- 4.2 blackbox_exporter
- 4.3 prometheus-pve-exporter
配置 Prometheus 接入数据
- 5.1 接入 node_exporter
- 5.2 接入 blackbox_exporter
- 5.3 接入 prometheus-pve-exporter
配置 Grafana 可视化图表
- 6.1 node_exporter
- 6.2 blackbox_exporter
- 6.3 prometheus-pve-exporter
Exporter 连接的安全性
- 7.1 生成 TLS 证书
- 7.2 生成 HTTP Basic Auth 口令
- 7.3 配置 Exporter
- 7.4 配置 Prometheus
推送 Push Gateway
- 8.1 部署 Push Gateway
- 8.2 Export 的推送
- 8.3 Prometheus 的拉取

1 硬件要求

Google 上查询 Prometheus 的硬件要求，居然给出了 2c4g 的底线。不过深入查询后发现网上说的都是工业级的严肃场景，添加的数据源成千上万，数据源越多内存需求越大。

经过个人尝试后，我推荐的硬件要求是 1c2g 以上。我使用的是阿里云 99/年的 2c2g，全套跑起来后内存占用约 827M/1.67G，绰绰有余。

另外，本篇教程基于 Docker 进行部署，安装 Docker 应该难不倒读者各位吧？（还是写个命令吧）

curl -fsSL https://get.docker.com -o install-docker.sh
sudo sh install-docker.sh --mirror Aliyun

2 部署 Prometheus

Prometheus 只有两个需要持久化的目录：

/etc/prometheus/prometheus.yml: 配置文件
/prometheus: 数据库

本篇文章用 volume 储存数据库，用 bind 绑定配置文件：

# 创建数据卷
docker volume create prometheus-data
# 创建配置文件（一定要做，否则 Docker 默认当目录绑定了）
mkdir /etc/prometheus
touch /etc/prometheus/prometheus.yml

Prometheus 只有一个需要暴露的端口 9090，该端口兼顾前端和 API 访问。

综上，容器启动命令：

docker run -d \
    --name prometheus \
    -p 9090:9090 \
    -v /etc/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml \
    -v prometheus-data:/prometheus \
    --restart=always \
    prom/prometheus

启动好后可以 Web 访问一下 9090 端口测试一下：

https://assets.zouht.com/img/blog/3888-01.webp

这样 Prometheus 就跑起来了，具体对 /etc/prometheus/prometheus.yml 的配置将会放到下文进行，接下来先开始部署 Grafana.

3 部署 Grafana

3.1 启动 Grafana 容器

Grafana 只有一个需要持久化的目录 /var/lib/grafana，用 volume 来存：

docker volume create grafana-data

Grafana 只有一个需要暴露的端口 3000，是前端的 Web 页面。容器启动命令：

docker run -d \
    --name=grafana \
    -p 3000:3000 \
    -v grafana-data:/var/lib/grafana \
    --restart=always \
    grafana/grafana-enterprise

启动好后可以 Web 访问一下 3000 端口测试一下，初始账号和密码均为 admin，登录后就能到达主页。在右上角头像->Profile 可以更改语言为中文。

https://assets.zouht.com/img/blog/3888-02.webp

3.2 添加 Prometheus 数据源

然后在左侧 “连接” -> “数据源” 页面中点击 “添加数据源” 按钮，然后选择 “Prometheus” 数据源，然后在数据源页面的 “Connection” 条目中填写 Prometheus 的访问地址。其他内容不需要调整，直接滑到底点击“保存并测试”。

需要注意的是，这个访问地址是服务器侧的，意思是 Grafana 容器发出请求来连接 Prometheus，因此对于 Docker 环境，应当填写 Docker 网桥的 IP，即 http://172.17.0.1:9090.

https://assets.zouht.com/img/blog/3888-03.webp

添加好后，数据源页面应该会显示出 Prometheus.

https://assets.zouht.com/img/blog/3888-04.webp

4 添加 Exporter

Prometheus 只是一个监控框架，负责收集并储存数据，而数据从节点中的导出是由 Exporter 来负责的。Prometheus 官方提供了一些 Exporter，同时也有很多第三方的 Exporter 可供使用，并且如果你需要高度定制化，可以自己实现 Exporter.

本节将会介绍的 Exporter 有：

node_exporter: 用于监控 Linux 节点的数据。
blackbox_exporter: 用于向 HTTP, HTTPS, DNS, TCP, ICMP 或 gRPC 发出探针请求。
prometheus-pve-exporter: 用于监控 Proxmox VE 虚拟环境的数据。

本节只介绍 Exporter 的部署，Prometheus 的配置将会在下一节介绍，因此部署好你需要的 Exporter 后可以直接进入下一节。

同时，需要注意的是，所有 Exporter 的网络环境必须能被 Prometheus 直接访问到。如果你的 Exporter 环境不能被直接访问（NAT/防火墙等），则需要看下文会提到的 Push Gateway 方法解决。

4.1 node_exporter

node_exporter 负责监控 Linux 节点的数据，简单来说便是 CPU、内存、网络、磁盘四大类。

首先去 GitHub Release 获取最新的符合节点架构的下载链接，在需要被监控的节点下载并解压：

mkdir /opt
cd /opt
wget "https://github.com/prometheus/node_exporter/releases/download/v1.9.0/node_exporter-1.9.0.linux-amd64.tar.gz"
tar xvzf node_exporter-1.9.0.linux-amd64.tar.gz
rm -f node_exporter-1.9.0.linux-amd64.tar.gz
mv node_exporter-1.9.0.linux-amd64 node_exporter

node_exporter 虽说可以添加自定义配置文件，但是默认配置已经完全够用，因此我们可以直接准备运行。我们使用 systemctl 进行服务管理，让 node_exporter 自动运行。

创建新服务文件 /etc/systemd/system/node-exporter.service，编写以下内容：

[Unit]
Description=node_exporter
After=network.target

[Service]
Type=simple
User=root
ExecStart=/opt/node_exporter/node_exporter
Restart=on-failure

[Install]
WantedBy=multi-user.target

保存后启动：

systemctl enable node-exporter.service # 开机自启
systemctl start node-exporter.service  # 启动服务
systemctl status node-exporter.service # 查看状态

如果服务显示 active (running) 则 node_exporter 正常运行起来了。

https://assets.zouht.com/img/blog/3888-05.webp

注意，node_exporter 运行在 9100 端口，务必放通防火墙，保证 Prometheus 能够直接访问到对应节点的 9100 端口。

4.2 blackbox_exporter

blackbox_exporter 用于向 HTTP, HTTPS, DNS, TCP, ICMP 或 gRPC 发出探针请求，从而判断对应的 Web 应用运行情况。最常用的就是 HTTP(S) 请求判定网页是否存活，因此本文只讲解 HTTP(S) 这种请求方式。

首先去 GitHub Release 获取最新的符合节点架构的下载链接，在需要发出探针请求的节点下载并解压：

mkdir /opt
cd /opt
wget "https://github.com/prometheus/blackbox_exporter/releases/download/v0.25.0/blackbox_exporter-0.25.0.linux-amd64.tar.gz"
tar xvzf blackbox_exporter-0.25.0.linux-amd64.tar.gz
rm -f blackbox_exporter-0.25.0.linux-amd64.tar.gz
mv blackbox_exporter-0.25.0.linux-amd64 blackbox_exporter

同时 blackbox_exporter 的默认配置也已经完全够用，因此我们可以直接准备运行。我们使用 systemctl 进行服务管理，让 blackbox_exporter 自动运行。

创建新服务文件 /etc/systemd/system/blackbox-exporter.service，编写以下内容：

[Unit]
Description=blackbox_exporter
After=network.target

[Service]
Type=simple
User=root
ExecStart=/opt/blackbox_exporter/blackbox_exporter --config.file=/opt/blackbox_exporter/blackbox.yml
Restart=on-failure

[Install]
WantedBy=multi-user.target

保存后启动：

systemctl enable blackbox-exporter.service # 开机自启
systemctl start blackbox-exporter.service  # 启动服务
systemctl status blackbox-exporter.service # 查看状态

如果服务显示 active (running) 则 blackbox_exporter 正常运行起来了。

https://assets.zouht.com/img/blog/3888-06.webp

注意，blackbox_exporter 运行在 9115 端口，务必放通防火墙，保证 Prometheus 能够直接访问到对应节点的 9115 端口。

4.3 prometheus-pve-exporter

prometheus-pve-exporter 用于监控 Proxmox VE 虚拟环境的数据。如果你觉得 PVE 内置监控不够用的话，可以选择它来提供一个额外的监控。

与上面两个监控不同的是，prometheus-pve-exporter 是一个 Python 软件包，因此不能直接下载二进制部署。但同时 prometheus-pve-exporter 也和上面不同，它并不直接获取系统的信息完成监控，它是通过 API 访问 PVE 平台来收集数据，再将数据暴露给 Prometheus 的。

因此它可以部署在和 PVE 不同的机器上，这里为了方便直接部署在 PVE 上的 Docker 容器里。

首先我们需要在 PVE 上生成 token，prometheus-pve-exporter 通过这个 token 收集到 PVE 平台上的数据：

https://assets.zouht.com/img/blog/3888-07.webp

然后创建配置文件 /etc/prometheus-pve-exporter/pve.yaml，填写以下内容，将 Token 的内容换成自己新建的：

default:
    user: root@pam
    token_name: "********"
    token_value: "********-****-****-****-************"
    verify_ssl: false

然后就可以创建 Docker 容器了：

docker run -d \
    --name=prometheus-pve-exporter \
    -p 9221:9221 \
    -v /etc/prometheus-pve-exporter/pve.yaml:/etc/prometheus/pve.yml \
    --restart=always \
    prompve/prometheus-pve-exporter:latest

注意，prometheus-pve-exporter 运行在 9221 端口，务必放通防火墙，保证 Prometheus 能够直接访问到对应节点的 9221 端口。

5 配置 Prometheus 接入数据

Prometheus 的配置文件在 /etc/prometheus/prometheus.yml，下文的配置均写入该文件。

5.1 接入 node_exporter

例如我们有两台服务器部署了 node_exporter：

Chocola: chocola.example.com
Vanilla: vanilla.example.com

那么要接入这两台服务器，即可编写配置：

scrape_configs:
  - job_name: server
    scrape_interval: 10s # 数据收集间隔
    static_configs:
      - targets: ["chocola.example.com:9100"]
        labels:
          instance: 'Chocola' # 这里如果不手动指定 instance 标签，那么 instance 标签默认是 targets 中的地址。
      - targets: ["vanilla.example.com:9100"]
        labels:
          instance: 'Vanilla'

如果想要对不同的服务器有不同的收集配置（例如不同的收集间隔），那么也可以把不同服务器写入不同的 Job 中：

scrape_configs:
  - job_name: Chocola
    scrape_interval: 10s
    static_configs:
      - targets: ["chocola.example.com:9100"]
        labels:
          instance: 'Chocola'
  - job_name: Vanilla
    scrape_interval: 30s
    static_configs:
      - targets: ["vanilla.example.com:9100"]
        labels:
          instance: 'Vanilla'

编辑好后，保存文件并重启 Prometheus，即可使配置生效：docker restart prometheus

然后进入 Prometheus 的面板，点击 Navbar 的 Status -> Target health 查看节点是否正常上线：

https://assets.zouht.com/img/blog/3888-08.webp

如果不能上线，需要自查防火墙是否放通，Prometheus 的网络是否正常。

5.2 接入 blackbox_exporter

例如我们有我们的 blackbox_exporter 探针部署在 hoshino.example.com，需要监控的 Web 站点为：

https://www.baidu.com
https://www.bilibili.com
https://www.zouht.com

那么即可编写配置：

scrape_configs:
  - job_name: blackbox
    scrape_interval: 30s
    metrics_path: /probe
    params:
      module: [http_2xx]
    static_configs:
      - targets:
        - https://www.baidu.com
        - https://www.bilibili.com
        - https://www.zouht.com
    relabel_configs:
      - source_labels: [__address__] 
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: hoshino.example.com:9115

编辑好后，保存文件并重启 Prometheus，即可使配置生效：docker restart prometheus

然后进入 Prometheus 的面板，点击 Navbar 的 Status -> Target health 查看节点是否正常上线：

https://assets.zouht.com/img/blog/3888-09.webp

5.3 接入 prometheus-pve-exporter

例如我们有我们的 prometheus-pve-exporter 探针部署在 murasame.example.com，那么即可编写配置：

scrape_configs:
  - job_name: 'pve'
    static_configs:
      - targets:
        - murasame.example.com:9221
    metrics_path: /pve
    params:
      module: [default]
      cluster: ['1']
      node: ['1']

编辑好后，保存文件并重启 Prometheus，即可使配置生效：docker restart prometheus

6 配置 Grafana 可视化图表

要配置 Grafana 可视化图表，一种方式是从零开始学 PromQL 然后手搓图表。但其实 Grafana 官方社区分享了很多官方或者个人设计好的图表，导入就能用，不需要任何学习。图表分享页面在：https://grafana.com/grafana/dashboards/

选好图表后，下载会得到一个 json，然后进入“仪表板”页面，在右上角“新建”按钮选择“导入”。

https://assets.zouht.com/img/blog/3888-10.webp

然后将下载好的 json 文件导入，保存即可查看图表。

另外，下文提到的图表和我的修改版图表，均上传到我的网盘方便下载：https://run.sh.cn/grafana

6.1 node_exporter

https://assets.zouht.com/img/blog/3888-11.webp

注：如果你觉得 Network Traffic Basic 里显示一大堆 Docker 的 veth，可以修改 Network 图表里的 PromQL（也可以直接用我的修改版）

# A
irate(node_network_receive_bytes_total{instance="$node",job="$job",device=~"enp.*|eth.*|wlo.*"}[$__rate_interval])*8
# B
irate(node_network_transmit_bytes_total{instance="$node",job="$job",device=~"enp.*|eth.*|wlo.*"}[$__rate_interval])*8

6.2 blackbox_exporter

https://assets.zouht.com/img/blog/3888-12.webp

https://assets.zouht.com/img/blog/3888-13.webp — 图片的版本我稍作修改了

6.3 prometheus-pve-exporter

https://assets.zouht.com/img/blog/3888-14.webp

注：这个图表有点小 Bug：一是 CPU 的数据换算有问题，二是如果你有多个 PVE 节点得话有些信息只会显示其中一个，不跟随左上角的选项卡。可以导入我自己修复的版本（不保证完全修复）

7 Exporter 连接的安全性

如果你不希望 Exporter 的数据被公网访问，那么就可以进行这一节的配置。

本节介绍的方式比较稳妥，使用 TLS 证书+HTTP Basic Auth，如果可以的话还可以在防火墙中配置只允许 Prometheus 的 IP 访问，这样就最安全了。

7.1 生成 TLS 证书

当然，你可以直接选择 CA 签发的 TLS 证书，但是对于监控系统我是不想每隔 3 个月轮转一遍 TLS 证书，太麻烦了。因此我选择自己签发。

使用 OpenSSL 进行证书生成，subj 内的内容可以改也可以不改，影响不大：

openssl req -new -newkey rsa:2048 -days 365 -nodes -x509 -keyout tls.key -out tls.crt -subj "/C=CN/ST=Hubei/L=Wuhan/O=zouht.com/CN=localhost"

https://assets.zouht.com/img/blog/3888-15.webp

7.2 生成 HTTP Basic Auth 口令

首先安装工具：apt install apache2-utils

然后使用命令：htpasswd -nBC 10 "" | tr -d ':\n'; echo

https://assets.zouht.com/img/blog/3888-16.webp

输入并确认密码后会获得一串 Hash 散列后的密码，将这个密码保存下来。

本例密码明文为 test，$2y$10$HjoGeDDP6jM4WkLEpQcxVeIXWBbIhp6vuWAHK2cSyCHhUCFDufxBW .

7.3 配置 Exporter

注意，这个方式基本上是兼容大多数 Exporter 的，因此可以直接套用。本节以 node_exporter 为例：

首先创建配置文件 /opt/node_exporter/config.yml 并编辑：

tls_server_config:
  cert_file: tls.crt
  key_file: tls.key
basic_auth_users:
  prometheus: $2y$10$HjoGeDDP6jM4WkLEpQcxVeIXWBbIhp6vuWAHK2cSyCHhUCFDufxBW

其中，cert_file 指定你的证书文件，key_file 指定证书的私钥。在 basic_auth_users 中，键为用户名，值为Hash 后的密码。

然后需要修改 systemctl 配置，让 node_exporter 使用该配置文件：

[Unit]
Description=node_exporter
After=network.target

[Service]
Type=simple
User=root
ExecStart=/opt/node_exporter/node_exporter --web.config.file="/opt/node_exporter/config.yml"
Restart=on-failure

[Install]
WantedBy=multi-user.target

主要修改的是 ExecStart 行，添加了 –web.config.file 参数。

7.4 配置 Prometheus

配置了 TLS + Basic Auth 后，Prometheus 必须进行对应调整，否则就访问不到了。首先要把生成的 TLS 证书的证书文件 tls.crt 拷贝到 Prometheus 的服务器上 /etc/prometheus/tls.cert .

编辑配置文件 /etc/prometheus/prometheus.yaml：

scrape_configs:
  - job_name: server
    scrape_interval: 10s
    scheme: https  # 使用 HTTPS
    tls_config:
      ca_file: tls.crt            # 指定我们的证书
      insecure_skip_verify: true  # 允许不安全证书（我们证书是自签的）
    basic_auth:
      username: prometheus  # 填写 Basic Auth 账号
      password: test        # 填写 Basic Auth 密码明文
    static_configs:
      - targets: ["chocola.example.com:9100"]
        labels:
          instance: 'Chocola'
      - targets: ["vanilla.example.com:9100"]
        labels:
          instance: 'Vanilla'

编辑好后，保存文件并重启 Prometheus，即可使配置生效：docker restart prometheus

8 推送 Push Gateway

如果 Exporter 的网络环境不能被 Prometheus 直接访问到，例如在 NAT 里或者有无权修改的防火墙，那么就可以用 Push Gateway 方法。

标准情况下，是 Prometheus 向 Exporter 发出请求收集数据。如果使用 Push Gateway 后，便是 Exporter 将数据推送到 Push Gateway，然后 Prometheus 定时来 Push Gateway 取数据。

8.1 部署 Push Gateway

我建议将 Push Gateway 部署到 Prometheus 同一台服务器上，本文使用 Docker 部署。

docker run -d \
    --name=pushgateway \
    -p 9091:9091\
    --restart=always \
    prom/pushgateway:latest

注意，pushgateway 运行在 9091 端口，务必放通防火墙，保证 Prometheus 和 Exporter 都能够直接访问到对应节点的 9091 端口。

8.2 Export 的推送

Exporter 并不具备推送能力，推送操作由我们自己写脚本完成。流程就是先请求 Exporter 的 API 获取数据，再请求 pushgateway 的 API 推送数据，因此实现方法很多。本文使用 bash + curl 完成，用 crontab 来定时。

如果 pushgateway 在 http://meguru.example.com:9091/，那么编辑推送脚本 push.sh：

#!/bin/bash
curl -s http://localhost:9100/metrics | curl --data-binary @- http://meguru.example.com:9091/metrics/job/<job-name>/instance/<instance-name>

curl -s 的参数就是本地 exporter 的 API 接口，curl 获取到数据后通过管道运算符传给下半条指令。下半条指令 curl 将数据 POST 到 pushgateway，需要注意两个链接参数：

job-name: 任务名称，同 scrape_configs 中的 job_name 名称。
instance-name: 实例名称，同 scrape_configs 中打的 instance 标签。

这两个参数可以自定义，例如你有一部分服务器是 Pull 直接收集的数据，Job 名称为 server，那对于不能 Pull 收集数据的服务器，你 Push 推送数据时也填写 Job 名称为 server，那么 Grafana 显示数据时它们就会在一起，不会因为数据获取方法不同而显示到两个 Job 里。

然后就需要定时运行了，可以直接用 Crontab，下面的例子是 10s 推送一次：

* * * * * ( sleep 00 ; /opt/push.sh )
* * * * * ( sleep 10 ; /opt/push.sh )
* * * * * ( sleep 20 ; /opt/push.sh )
* * * * * ( sleep 30 ; /opt/push.sh )
* * * * * ( sleep 40 ; /opt/push.sh )
* * * * * ( sleep 50 ; /opt/push.sh )

8.3 Prometheus 的拉取

如果 pushgateway 和 prometheus 在同一台服务器，那么编辑文件：

scrape_configs:
  - job_name: pushgateway
    scrape_interval: 12s
    honor_labels: true # 重要，必须给这个参数，要不然 Prometheus 就不按你填写的 Label 保存数据了
    static_configs:
      - targets:
        - "172.17.0.1:9091"

编辑好后，保存文件并重启 Prometheus，即可使配置生效：docker restart prometheus

LiuShen

好专业的检测面板，我对这玩意只要求好看哈哈，能用就行一个给别人看，一个给自己用
https://status.liushen.fun/status/main
https://listen.liushen.fun/

1 月前回复

ChrisKim博主

@LiuShen：主要是市面上的面板都或多或少功能不完全，总有点儿戏，只能换个工业级的。

比如 UptimeKuma 到现在都没实现公开页面的自定义显示长度，然后 SQLite 数据库性能也特别低下，半年的数据就能把它卡得不行，数据读写可以把 CPU 卡满 100%，属实有点搞笑了。

然后换到哪吒面板，居然往探针里塞了 WebShell, 内网穿透等等功能，Web 请求居然是由 Host 发送到 Client 端，由 Client 端探测的，这全都是安全风险。即使我 Pr 了个功能开关进去还是觉得太儿戏了，不应当作为专业的监控系统。

换成 Prometheus 之后确实解决了以上的所有问题，目前非常满意。感觉还是得专业工具做专业事情。

https://www.chriskim.top/

1 月前回复

ryker

有考虑加一下前端的监控么

4 天前回复

ChrisKim博主

@ryker：前端的监控是啥？一般不都是监控服务器和后端服务吗。

3 天前回复

Prometheus + Grafana 监控快速上手

0 目录

1 硬件要求

2 部署 Prometheus

3 部署 Grafana

3.1 启动 Grafana 容器

3.2 添加 Prometheus 数据源

4 添加 Exporter

4.1 node_exporter

4.2 blackbox_exporter

4.3 prometheus-pve-exporter

5 配置 Prometheus 接入数据

5.1 接入 node_exporter

5.2 接入 blackbox_exporter

5.3 接入 prometheus-pve-exporter

6 配置 Grafana 可视化图表

6.1 node_exporter

6.2 blackbox_exporter

6.3 prometheus-pve-exporter

7 Exporter 连接的安全性

7.1 生成 TLS 证书

7.2 生成 HTTP Basic Auth 口令

7.3 配置 Exporter

7.4 配置 Prometheus

8 推送 Push Gateway

8.1 部署 Push Gateway

8.2 Export 的推送

8.3 Prometheus 的拉取

发表回复取消回复

Prometheus + Grafana 监控快速上手

0 目录

1 硬件要求

2 部署 Prometheus

3 部署 Grafana

3.1 启动 Grafana 容器

3.2 添加 Prometheus 数据源

4 添加 Exporter

4.1 node_exporter

4.2 blackbox_exporter

4.3 prometheus-pve-exporter

5 配置 Prometheus 接入数据

5.1 接入 node_exporter

5.2 接入 blackbox_exporter

5.3 接入 prometheus-pve-exporter

6 配置 Grafana 可视化图表

6.1 node_exporter

6.2 blackbox_exporter

6.3 prometheus-pve-exporter

7 Exporter 连接的安全性

7.1 生成 TLS 证书

7.2 生成 HTTP Basic Auth 口令

7.3 配置 Exporter

7.4 配置 Prometheus

8 推送 Push Gateway

8.1 部署 Push Gateway

8.2 Export 的推送

8.3 Prometheus 的拉取

发表回复 取消回复

发表回复取消回复