简述
1、InfluxDB
InfluxDB是用Go语言编写的一个开源分布式时序、事件和指标数据库,无需外部依赖。
2、Telegraf
Telegraf是一个插件驱动的服务器代理,用于收集和报告指标,并且是TICK Stack的第一部分。
Telegraf插件可以直接从它运行的系统中获取各种指标,从第三方API中提取指标,甚至通过statsd和Kafka消费者服务监听指标。它还具有输出插件,可将指标发送到各种其他数据存储、服务和消息队列,包括InfluxDB、Graphite、OpenTSDB、Datadog、Librato、Kafka、MQTT、NSQ等。
3、Grafana
Grafana是一个跨平台的开源的度量分析和可视化工具,可以通过将采集的数据查询然后可视化的展示,并及时通知。
简单架构
以下部署可以跨主机 。比如telegraf 部署在客户机器, influxdb部署在自己公司外网(开放白名单只允许tekegraf的服务器访问)。 grafana去收集influxdb。
一、环境准备
1. 准备docker、docker-compose 此处网上一大堆 跳过
2.创建环境需要目录,(以下每次部署去相关目录操作)
mkdir influxdb telegraf grafana
二、 部署influxdb
1.准备compose文件
version: "3.3" services:influxdb:image: influxdb:1.6.3container_name: influxdbhostname: influxdbrestart: alwaysports:- "20000:8086" #外部端口自定义volumes:- ./data:/var/lib/influxdbenvironment:- TZ=Asia/Shanghai- INFLUXDB_HTTP_AUTH_ENABLED=true #开启账号密码登录数据库- INFLUXDB_DB=telegraf #定义数据库名- INFLUXDB_ADMIN_USER=admin #定义数据库账号- INFLUXDB_ADMIN_PASSWORD=aaaa1111 #定义数据库密码deploy:resources:limits:memory: 4g
三、部署telegraf
1.准备配置文件telegraf.conf
[global_tags]instance="10.10.10.10" #本机ip [agent]interval = "60s"round_interval = truemetric_batch_size = 1000metric_buffer_limit = 10000collection_jitter = "0s"flush_interval = "10s"flush_jitter = "0s"precision = ""hostname = "10.10.10.10" #本机ip 。会显示在grafana中omit_hostname = false#[[outputs.http]] #此参数推送到prometheus数据库中,但下载是用的influxdb 所以注释 # url = "http://10.10.10.10:9090/api/v1/write" # data_format = "prometheusremotewrite" # [outputs.http.headers] # Content-Type = "application/x-protobuf" # Content-Encoding = "snappy" # X-Prometheus-Remote-Write-Version = "0.1.0"[[outputs.influxdb]] #推送到数据库 urls = ["http://111.111.111.111:20000"] #数据库的ip加端口。 跨网络需要指定influxdb公网ip端口database = "telegraf" #数据库名## Retention policy to write to. Empty string writes to the default rp.retention_policy = ""## Write consistency (clusters only), can be: "any", "one", "quorum", "all"write_consistency = "any"## Write timeout (for the InfluxDB client), formatted as a string.## If not provided, will default to 5s. 0s means no timeout (not recommended).timeout = "5s"username = "admin" #influxdb的账号password = "aaaa1111" #密码## Set the user agent for HTTP POSTs (can be useful for log differentiation)# user_agent = "telegraf" ## Set UDP payload size, defaults to InfluxDB UDP Client default (512 bytes)# udp_payload = 512[[inputs.docker]] #收集docker数据endpoint = "unix:///var/run/docker.sock"gather_services = falsecontainer_name_include = []container_name_exclude = []timeout = "5s"docker_label_include = []docker_label_exclude = []perdevice = truetotal = false[inputs.docker.tags]env = "kehu-admin" #定义收集上来的环境信息。 方便后面grafana查看,基本每收集一个都要写, 可增加变量。我这里定义(客户名-服务名)#以下收集的硬件数据,如果没有收集你想要的, 可以百度执行搜下 或官方找下 [[inputs.cpu]] #收集cpu数据## Whether to report per-cpu stats or notpercpu = true## Whether to report total system cpu stats or nottotalcpu = true## Comment this line if you want the raw CPU time metricsfielddrop = ["time_*"][inputs.cpu.tags]env = "kehu-admin" # Read metrics about disk usage by mount point [[inputs.disk]] ## By default, telegraf gather stats for all mountpoints.## Setting mountpoints will restrict the stats to the specified mountpoints.# mount_points = ["/"]## Ignore some mountpoints by filesystem type. For example (dev)tmpfs (usually## present on /run, /var/run, /dev/shm or /dev).ignore_fs = ["tmpfs", "devtmpfs"][inputs.disk.tags]env = "kehu-admin"# Read metrics about disk IO by device [[inputs.diskio]][inputs.diskio.tags]env = "kehu-admin"## By default, telegraf will gather stats for all devices including## disk partitions.## Setting devices will restrict the stats to the specified devices.# devices = ["sda", "sdb"]## Uncomment the following line if you need disk serial numbers.# skip_serial_number = false# Get kernel statistics from /proc/stat [[inputs.kernel]][inputs.kernel.tags]env = "kehu-admin"# Read metrics about memory usage [[inputs.mem]][inputs.mem.tags]env = "kehu-admin"# Get the number of processes and group them by status [[inputs.processes]][inputs.processes.tags]env = "kehu-admin"# Read metrics about swap memory usage [[inputs.swap]][inputs.swap.tags]env = "kehu-admin"# Read metrics about system load & uptime [[inputs.system]][inputs.system.tags]env = "kehu-admin"# Read metrics about network interface usage [[inputs.nstat]][inputs.nstat.tags]env = "kehu-admin"# collect data only about specific interfaces# interfaces = ["eth0"][[inputs.netstat]][inputs.netstat.tags]env = "kehu-admin"[[inputs.interrupts]][inputs.interrupts.tags]env = "kehu-admin"[[inputs.linux_sysctl_fs]][inputs.linux_sysctl_fs.tags]env = "kehu-admin"
2.准备compose文件
version: "3.3" services:telegraf:image: telegrafcontainer_name: telegrafrestart: alwaysenvironment:HOST_PROC: /rootfs/procHOST_SYS: /rootfs/sysHOST_ETC: /rootfs/etcuser: telegraf:994 #/etc/group 看下docker组的id ,需要修改volumes:- ./telegraf.conf:/etc/telegraf/telegraf.conf:ro #指定上面的配置文件- /var/run/docker.sock:/var/run/docker.sock #收集docker 下面收集系统- /sys:/rootfs/sys:ro- /proc:/rootfs/proc:ro- /etc:/rootfs/etc:rodeploy:resources:limits:cpus: '0.5'memory: 512M
四、部署grafana
1.准备compose文件
#web界grafana:image: grafana/grafana:9.5.18restart: "always"ports:- 10000:3000container_name: "grafana"volumes:- "./grafana/grafana.ini:/etc/grafana/grafana.ini" #配置文件自行拷贝出来。通过docker run启动个grafana 然后docker cp拷贝到外部,杀掉run启动的容器- "./grafana/grafana-storage:/var/lib/grafana"- "/etc/localtime:/etc/localtime:ro"
五、启动服务
docker-compose -f *****.yml up -d #指定各个yml文件
六、设置grafana
1. 浏览器打开grafana 10.10.10.10:10000 上面写的什么端口就什么端口
2.设置中文
3.设置数据源。指定influxdb 的数据库。 grafana和influxdb服务器网络要通
4.导入仪表盘
#去官方下载监控模板即可 。 我用的是 10578
插件地址:
导入完后不会完全展示。 设置刚才telegraf设置的变量