open-falcon

时间:March 22, 2019 分类:

目录:

安装openfalcon

openfalcon由小米开源,因为是国人开发,文档都是中文的,不过多介绍,参考项目介绍

环境准备

参考环境准备

$ yum install -y redis mariadb-server git
$ systemctl start mariadb
$ systemctl start redis
$ cd /tmp/ && git clone https://github.com/open-falcon/falcon-plus.git 
$ cd /tmp/falcon-plus/scripts/mysql/db_schema/
$ mysql -h 127.0.0.1 -u root -p < 1_uic-db-schema.sql
$ mysql -h 127.0.0.1 -u root -p < 2_portal-db-schema.sql
$ mysql -h 127.0.0.1 -u root -p < 3_dashboard-db-schema.sql
$ mysql -h 127.0.0.1 -u root -p < 4_graph-db-schema.sql
$ mysql -h 127.0.0.1 -u root -p < 5_alarms-db-schema.sql
$ rm -rf /tmp/falcon-plus/

设置mysql密码可以

MariaDB [(none)]>  set password=password('123456')

下载二进制包

下载页面

$ wget https://github.com/open-falcon/falcon-plus/releases/download/v0.2.1/open-falcon-v0.2.1.tar.gz

安装

Agent

Agent用于采集数据指标,每隔60s以push的方式发送给transfer(建立了长链),agent也提供了http接口用于接收用户手工push的一些数据

配置参考官方文档,配置文件目录

启动agent服务

$ ./open-falcon start agent
[falcon-agent] 4367

查看agent日志

$ ./open-falcon monitor agent
2019/03/22 12:08:21 cfg.go:128: read config file: /root/agent/config/cfg.json successfully
2019/03/22 12:08:21 var.go:31: get local addr failed !
2019/03/22 12:08:21 http.go:74: listening :1988
2019/03/22 12:08:21 rpc.go:41: dial 0.0.0.0:6030 fail: dial tcp 0.0.0.0:6030: getsockopt: connection refused
2019/03/22 12:08:23 rpc.go:41: dial 0.0.0.0:6030 fail: dial tcp 0.0.0.0:6030: getsockopt: connection refused
2019/03/22 12:08:27 rpc.go:41: dial 0.0.0.0:6030 fail: dial tcp 0.0.0.0:6030: getsockopt: connection refused
2019/03/22 12:08:35 rpc.go:41: dial 0.0.0.0:6030 fail: dial tcp 0.0.0.0:6030: getsockopt: connection refused

验证

$ ./agent/bin/falcon-agent --check
ps aux   ... ok
kernel   ... ok
net.if   ... ok
cpustat  ... ok
ss -s    ... ok
netstat  ... ok
ss -tln  ... ok
du -bs   ... ok
df.bytes ... ok
loadavg  ... ok
disk.io  ... ok
memory   ... ok

本地会启动1988端口

打开对应的1988端口web服务可以在web上看到业务情况

直接往agent写入数据

$ ts=`date +%s`; curl -X POST -d "[{\"metric\": \"metric.demo\", \"endpoint\": \"qd-open-falcon-judge01.hd\", \"timestamp\": $ts,\"step\": 60,\"value\": 9,\"counterType\": \"GAUGE\",\"tags\": \"project=falcon,module=judge\"}]" http://127.0.0.1:1988/v1/push

transfer

transfer用于接收agent上报的数据,然后按照哈希规则进行数据分配,然后push到graph和judge等组件

配置参考官方文档

启动服务

$ ./open-falcon start transfer
[falcon-transfer] 6036

查看一下启动的服务

$ ss -nlpt | grep transfer
LISTEN     0      128         :::4444                    :::*                   users:(("falcon-transfer",pid=6036,fd=6))
LISTEN     0      128         :::6060                    :::*                   users:(("falcon-transfer",pid=6036,fd=3))
LISTEN     0      128         :::8433                    :::*                   users:(("falcon-transfer",pid=6036,fd=5))

graph

graph用于存储绘图数据,数据来源于transfer,同时处理api组件的查询请求,返回绘图数据

$ ./open-falcon start graph
[falcon-graph] 16267

配置参考官方文档

注意数据库的配置需要进行配置

    "db": {
        "dsn": "root:123456@tcp(127.0.0.1:3306)/graph?loc=Local&parseTime=true",
        "maxIdle": 4
    },

查看一下启动的服务

$ ss -nlpt | grep graph
LISTEN     0      128         :::6070                    :::*                   users:(("falcon-graph",pid=16267,fd=5))
LISTEN     0      128         :::6071                    :::*                   users:(("falcon-graph",pid=16267,fd=6))

6070端口用于接收数据,6071端口为控制端口

API

API用于提供restfulAPI操作接口

$ ./open-falcon start api

dashboard

mkdir /home/work
export HOME=/home/work
export WORKSPACE=$HOME/open-falcon
mkdir -p $WORKSPACE
cd $WORKSPACE
git clone https://github.com/open-falcon/dashboard.git
yum install -y python-virtualenv
yum install -y python-devel
yum install -y openldap-devel
yum install -y mysql-devel
yum groupinstall "Development tools"
cd $WORKSPACE/dashboard/
virtualenv ./env
# -i制定源为豆瓣的源,不过貌似这个源有问题
./env/bin/pip install -r pip_requirements.txt -i https://pypi.douban.com/simple  

修改配置文件rrd/config.py

根据实际情况修改组件

# portal database
# TODO: read from api instead of db
PORTAL_DB_HOST = os.environ.get("PORTAL_DB_HOST","127.0.0.1")
PORTAL_DB_PORT = int(os.environ.get("PORTAL_DB_PORT",3306))
PORTAL_DB_USER = os.environ.get("PORTAL_DB_USER","falcon")
PORTAL_DB_PASS = os.environ.get("PORTAL_DB_PASS","falcon")
PORTAL_DB_NAME = os.environ.get("PORTAL_DB_NAME","falcon_portal")

# alarm database
# TODO: read from api instead of db
ALARM_DB_HOST = os.environ.get("ALARM_DB_HOST","127.0.0.1")
ALARM_DB_PORT = int(os.environ.get("ALARM_DB_PORT",3306))
ALARM_DB_USER = os.environ.get("ALARM_DB_USER","root")
ALARM_DB_PASS = os.environ.get("ALARM_DB_PASS","")
ALARM_DB_NAME = os.environ.get("ALARM_DB_NAME","alarms")

启动服务

$ bash control start
falcon-dashboard started..., pid=28399

查看日志

$ bash control start
falcon-dashboard started..., pid=28399
[root@VM_16_11_centos dashboard]# bash control tail
[2019-03-22 16:29:05 +0000] [28399] [INFO] Starting gunicorn 19.9.0
[2019-03-22 16:29:05 +0000] [28399] [INFO] Listening at: http://0.0.0.0:8081 (28399)
[2019-03-22 16:29:05 +0000] [28399] [INFO] Using worker: sync
[2019-03-22 16:29:05 +0000] [28404] [INFO] Booting worker with pid: 28404
[2019-03-22 16:29:05 +0000] [28405] [INFO] Booting worker with pid: 28405
[2019-03-22 16:29:05 +0000] [28406] [INFO] Booting worker with pid: 28406
[2019-03-22 16:29:05 +0000] [28408] [INFO] Booting worker with pid: 28408

用户需要手动注册,第一个手动注册的root用户被设置为超级管理员

如果不需要注册可以将api组件的配置文件cfg.json,将signup_disable配置项修改为true,然后重启api服务

Judge

Judge用于告警判断,agent同时将数据push给transfer和judge

judge提供了一个http接口/count可以获取juage实例处理了多少的数据量

配置参考官方文档

$ ./open-falcon start judge
[falcon-judge] 1839

Alarm

alarm模块用于处理报警event,judge产生的报警event写入redis,alarm从redis读取并处理

报警逻辑在alarm中,可以对event做报警合并等等,已发出的告警alarm会写入到MySQL中保存,可以在dashboard中查询

$ ./open-falcon start alarm
[falcon-alarm] 2856

报警配置

邮件、短信、微信、电话发送接口,falcon同一定义了http发送的数据

短信发送http接口:

method: post
params:
  - content: 短信内容
  - tos: 使用逗号分隔的多个手机号

邮件发送http接口:

method: post
params:
  - content: 邮件内容
  - subject: 邮件标题
  - tos: 使用逗号分隔的多个邮件地址

im发送http接口:

method: post
params:
  - content: im内容
  - tos: 使用逗号分隔的多个im号码

HBS(Heartbeat Server)

用于检测agent是否存活

$ ./open-falcon start hbs

Nodata

nodata用于检测监控数据的上报异常,和jadge模块协同工作,过程为配置了Nodata的采集项在没有上报数据的时候会发送nodedata信息出发jadge报警

Aggregator

集群聚合模块,汇聚集群下的所有主机的某个指标,提供集群视角的监控

还有等等等等辅助服务

自定义监控

自定义push数据到open-falcon