您的位置 首页 golang

通过Consul+Prometheus自动注册node-exporter实现自动监控OpenStack的VM


1. 提出问题

在工作中OpenStack集群的vm需要解决基础性能指标的监控,如果每台的启动再去手动添加监控node_exporter,再写prometheus.yml的话对于吾等懒程序员简直就是噩梦,由此开始设计基于Prometheus+Consul的监控方案。

2. 解决方案

1. 通过将node_exporter打包进Image实现强制自动部署2. 通过开发一个小程序自动注册node_exporter到consul,同时小程序也与node_exporter一样打包进Image3. 配置Prometheus发现node_exporter

3. 部署Consul集群

3.1 集群规划

系统主机名IP
Centos-7.7compute-7-1172.16.100.71
Centos-7.7compute-7-2172.16.100.72
Centos-7.7compute-7-3172.16.100.73

3.1 自行下载Consul并安装

Consul v1.7.2

3.1.1 配置master token

$ curl \    --request PUT \    http://172.16.100.71:8500/v1/acl/bootstrap

3.1.2 配置获取到的master token

compute-7-1:

{    "bootstrap_expect": 1,    "datacenter": "sibat_consul",    "primary_datacenter":"sibat_consul",    "data_dir": "/data/consul",    "start_join":[        "172.16.100.72",        "172.16.100.73"    ],    "retry_join":[        "172.16.100.72",        "172.16.100.73"    ],    "connect":{        "enabled": true    },    "server": true,    "client_addr": "0.0.0.0",    "ui": true,    "node_name": "compute-7-1",    "bind_addr": "172.16.100.71",    "advertise_addr": "172.16.100.71",    "enable_script_checks": false,    "enable_local_script_checks": true,    "log_file": "/var/log",    "log_rotate_bytes": 300000000,    "log_rotate_duration": "360h",    "log_level": "info",    "encrypt": "gEjZMbDxnA5UDS5DJRI3Nn5KvOwdVa46jneHK0gFDa8=",    "acl": {        "enabled": true,        "default_policy": "deny",        "enable_token_persistence": true,        "tokens": {            "master": "8dc1eb67-1f5f-4e10-ad9d-5e58b047647c"        }    }}

compute-7-2

{    "datacenter": "sibat_consul",    "primary_datacenter":"sibat_consul",    "data_dir": "/data/consul",    "connect":{        "enabled": true    },    "server": true,    "client_addr": "0.0.0.0",    "ui": true,    "node_name": "compute-7-2",    "bind_addr": "172.16.100.72",    "advertise_addr": "172.16.100.72",    "enable_script_checks": false,    "enable_local_script_checks": true,    "log_file": "/var/log",    "log_rotate_bytes": 300000000,    "log_rotate_duration": "360h",    "log_level": "info",    "acl_datacenter": "sibat_consul",    "encrypt": "gEjZMbDxnA5UDS5DJRI3Nn5KvOwdVa46jneHK0gFDa8=",    "acl": {        "enabled": true,        "default_policy": "deny",        "enable_token_persistence": true,        "tokens": {            "master": "8dc1eb67-1f5f-4e10-ad9d-5e58b047647c"        }    }}

compute-7-3

{    "datacenter": "sibat_consul",    "primary_datacenter":"sibat_consul",    "data_dir": "/data/consul",    "connect":{        "enabled": true    },    "server": true,    "client_addr": "0.0.0.0",    "ui": true,    "node_name": "compute-7-3",    "bind_addr": "172.16.100.73",    "advertise_addr": "172.16.100.73",    "enable_script_checks": false,    "enable_local_script_checks": true,    "log_file": "/var/log",    "log_rotate_bytes": 300000000,    "log_rotate_duration": "360h",    "log_level": "info",    "acl_datacenter": "sibat_consul",    "encrypt": "gEjZMbDxnA5UDS5DJRI3Nn5KvOwdVa46jneHK0gFDa8=",    "acl": {        "enabled": true,        "default_policy": "deny",        "enable_token_persistence": true,        "tokens": {            "master": "8dc1eb67-1f5f-4e10-ad9d-5e58b047647c"        }    }}

在三个节点中启动

3.1.3 三个节点都执行

$ sudo useradd consul
$ sudo vim /usr/lib/systemd/system/consul.serviceDescription=consul: the monitoring systemDocumentation=http://prometheus.io/docs/[Service]User=consulGroup=consulExecStart=/usr/bin/consul agent -config-file /etc/consul.d/consul_config.jsonKillMode=processRestart=on-failureLimitNOFILE=65536[Install]WantedBy=multi-user.target
$ sudo systemctl daemon-reload

3.1.4 在compute-7-2和compute-7-3执行

$ sudo systemctl restart consul && sudo systemctl enable consul

3.1.5 在compute-7-3执行

$ sudo systemctl restart consul && sudo systemctl enable consul

启动后我们会查看到服务器日志中出现与权限有关的错误,根据官方文档的说法是因为未配置agent的token导致的,因此还需要创建agent的token:

$ curl \    --request PUT \    --header "X-Consul-Token: 8dc1eb67-1f5f-4e10-ad9d-5e58b047647c" \    --data \    '{     "Name": "Agent Token",     "Type": "client",    "Rules": "node \"\" { policy = \"write\" } service \"\" { policy = \"read\" }" }'http://172.16.100.71:8500/v1/acl/create

3.1.6 配置获取到的agent token

compute-7-1:

{    "bootstrap_expect": 1,    "datacenter": "sibat_consul",    "primary_datacenter":"sibat_consul",    "data_dir": "/data/consul",    "start_join":[        "172.16.100.72",        "172.16.100.73"    ],    "retry_join":[        "172.16.100.72",        "172.16.100.73"    ],    "connect":{        "enabled": true    },    "server": true,    "client_addr": "0.0.0.0",    "ui": true,    "node_name": "compute-7-1",    "bind_addr": "172.16.100.71",    "advertise_addr": "172.16.100.71",    "enable_script_checks": false,    "enable_local_script_checks": true,    "log_file": "/var/log",    "log_rotate_bytes": 300000000,    "log_rotate_duration": "360h",    "log_level": "info",    "encrypt": "gEjZMbDxnA5UDS5DJRI3Nn5KvOwdVa46jneHK0gFDa8=",    "acl": {        "enabled": true,        "default_policy": "deny",        "enable_token_persistence": true,        "tokens": {            "master": "8dc1eb67-1f5f-4e10-ad9d-5e58b047647c",            "agent": "883efc94-0c59-c46f-67cf-4644ac4adad2"        }    }}

compute-7-2

{    "datacenter": "sibat_consul",    "primary_datacenter":"sibat_consul",    "data_dir": "/data/consul",    "connect":{        "enabled": true    },    "server": true,    "client_addr": "0.0.0.0",    "ui": true,    "node_name": "compute-7-2",    "bind_addr": "172.16.100.72",    "advertise_addr": "172.16.100.72",    "enable_script_checks": false,    "enable_local_script_checks": true,    "log_file": "/var/log",    "log_rotate_bytes": 300000000,    "log_rotate_duration": "360h",    "log_level": "info",    "acl_datacenter": "sibat_consul",    "encrypt": "gEjZMbDxnA5UDS5DJRI3Nn5KvOwdVa46jneHK0gFDa8=",    "acl": {        "enabled": true,        "default_policy": "deny",        "enable_token_persistence": true,        "tokens": {            "master": "8dc1eb67-1f5f-4e10-ad9d-5e58b047647c",            "agent": "883efc94-0c59-c46f-67cf-4644ac4adad2"        }    }}

compute-7-3

{    "datacenter": "sibat_consul",    "primary_datacenter":"sibat_consul",    "data_dir": "/data/consul",    "connect":{        "enabled": true    },    "server": true,    "client_addr": "0.0.0.0",    "ui": true,    "node_name": "compute-7-3",    "bind_addr": "172.16.100.73",    "advertise_addr": "172.16.100.73",    "enable_script_checks": false,    "enable_local_script_checks": true,    "log_file": "/var/log",    "log_rotate_bytes": 300000000,    "log_rotate_duration": "360h",    "log_level": "info",    "acl_datacenter": "sibat_consul",    "encrypt": "gEjZMbDxnA5UDS5DJRI3Nn5KvOwdVa46jneHK0gFDa8=",    "acl": {        "enabled": true,        "default_policy": "deny",        "enable_token_persistence": true,        "tokens": {            "master": "8dc1eb67-1f5f-4e10-ad9d-5e58b047647c",            "agent": "883efc94-0c59-c46f-67cf-4644ac4adad2"        }    }}

3.1.7 在compute-7-2和compute-7-3执行

$ sudo systemctl restart consul && sudo systemctl enable consul

3.1.8 在compute-7-3执行

$ sudo systemctl restart consul && sudo systemctl enable consul

待集群稳定后即可访问UI,http://172.16.100.71:8500

4. 集成Prometheus

$ sudo vim /etc/prometheus/prometheus.yml...  - job_name: 'OpenStack-vms'    consul_sd_configs:      - server: "172.16.100.71:8500"        token: '8dc1eb67-1f5f-4e10-ad9d-5e58b047647c'        services: []      - server: "172.16.100.72:8500"        token: '8dc1eb67-1f5f-4e10-ad9d-5e58b047647c'        services: []      - server: "172.16.100.73:8500"        token: '8dc1eb67-1f5f-4e10-ad9d-5e58b047647c'        services: []    relabel_configs:      - source_labels: [__meta_consul_tags]        regex: ".*OpenStack-vms.*"        replacement: OpenStack-vms        action: keep        target_label: env      - regex: __meta_consul_service_metadata_(.+)        action: labelmap...
$ sudo systemctl restart prometheus

启动后,在prometheus UI就可以找到刚才配置的job_name了:
TIM图片20200611134431.png

5. VMS自动注册

问题:关于自动注册,原生的组件中都没有较美好的方案。我刚开始使用curl的方式通过shell写入rc.local的方式自动注册,但是发现有时还是会出现没有注册的情况。同时发现consul并不是强一致性的注册中心,有时会出现相同的serviceid同时被注册到不同的节点的情况:
TIM图片20200611135436.png
所以使用go语言开发了一个小程序自动注册node_exporter,并使用systemd设置开机自启动来达到自动注册的效果,并通过一套算法来避免重复注册以及实现均衡注册。

$ wget https://github.com/FrankenFuncc/consul-registy-service/releases/download/202006161758/consulR.zip$ unzip consulR.zip$ wget https://github.com/prometheus/node_exporter/releases/download/v1.0.0/node_exporter-1.0.0.linux-amd64.tar.gz$ tar -zxvf node_exporter-1.0.0.linux-amd64.tar.gz -C /usr/local/$ mv /usr/local/node_exporter-1.0.0.linux-amd64.tar.gz /usr/local/node_exporter

Node_Exporter安装与开机自启动

$ vim [Unit]Description=node_exporter: the monitoring systemDocumentation=http://prometheus.io/docs/[Service]ExecStart=/usr/local/node_exporter/node_exporterRestart=alwaysStartLimitInterval=0RestartSec=10[Install]WantedBy=multi-user.target$ systemctl daemon-reload && systemctl start node_exporter && systemctl enable node_exporter

Consul安装与开机自启动

$ vim /etc/consul/consul.yamlSystem:  ServiceName: consul-registy-service  ListenAddress: 0.0.0.0  Port: 9984  #通过此IP与端口来检索出口网卡IP地址  FindAddress: 8.8.8.8:80Logs:  LogFilePath: /data/consul/consul.log  LogLevel: infoConsul:  Address: 172.16.100.71:8500,172.16.100.72:8500,172.16.100.73:8500  Token: 8dc1eb67-1f5f-4e10-ad9d-5e58b047647c  CheckTimeout: 5s  CheckInterval: 5s  CheckDeregisterCriticalServiceAfter: true  CheckDeregisterCriticalServiceAfterTime: 5sService:  Tag: node-exporter  #Address空则默认通过FindAddress配置来检索出口网卡IP地址  Address:  Port: 9100
$ vim /usr/lib/systemd/system/consul.service [Unit]Description=ConsulAfter=network-online.target[Service]User=nobodyExecStart=/usr/local/consul --confpath=/etc/consul/consul.yamlRestart=on-failureRestartSec=1[Install]WantedBy=multi-user.target$ systemctl daemon-reload && systemctl start consul && systemctl enable consul

创建镜像后,用这个镜像就能被prometheus自动发现了。


文章来源:智云一二三科技

文章标题:通过Consul+Prometheus自动注册node-exporter实现自动监控OpenStack的VM

文章地址:https://www.zhihuclub.com/6790.shtml

关于作者: 智云科技

热门文章

网站地图