今天学习了一个新的轻量级框架,就是一个的实时日志收集系统,由cloudera公司开发的框架
下面贴上代码
##对log的完成数据,进行数据抽取,并对文件夹下所有文件进行实时监视###define agent
a3.sources = r3
a3.channels = c3
a3.sinks = k3#define sources
a3.sources.r3.type = spooldir
a3.sources.r3.spoolDir = /opt/module/cdh/flume-1.5.0-cdh5.3.6/spool_logs
a3.sources.r3.fileSuffix = .completed
a3.sources.r3.ignorePattern = ^(.)*\\.tmp$#define channels
a3.channels.c3.type = file
a3.channels.c3.checkpointDir = /opt/module/cdh/flume-1.5.0-cdh5.3.6/flume_file/checkpoint
a3.channels.c3.dataDirs = /opt/module/cdh/flume-1.5.0-cdh5.3.6/flume_file/data#define sinks
a3.sinks.k3.type = hdfs
a3.sinks.k3.hdfs.useLocalTimeStamp = true
a3.sinks.k3.hdfs.path = /user/make/flume/hive_spool_log/%Y-%m-%d
a3.sinks.k3.hdfs.fileType = DataStream
a3.sinks.k3.hdfs.writeFormat = Text
a3.sinks.k3.hdfs.batchSize= 10#bind
a3.sources.r3.channels = c3
a3.sinks.k3.channel = c3
上面的具体参数,以及三大核心的类型,上官网看,写得十分详细:http://flume.apache.org/FlumeUserGuide.html#hdfs-sink
a3.channels.c3.checkpointDir = /opt/module/cdh/flume-1.5.0-cdh5.3.6/flume_file/checkpoint
a3.channels.c3.dataDirs = /opt/module/cdh/flume-1.5.0-cdh5.3.6/flume_file/data
这两个参数自己建立的目录
a3.sources.r3.spoolDir = /opt/module/cdh/flume-1.5.0-cdh5.3.6/spool_logs
这是我们监视的目录
a3.sinks.k3.hdfs.useLocalTimeStamp = true
a3.sinks.k3.hdfs.path = /user/make/flume/hive_spool_log/%Y-%m-%d
如果要按照抽取的时间,自动建立文件夹,则需要把hdfs.uselocaltimestamp参数设置为true,自动调用系统当前时间
这是我么运行测试日志之后,得到的文件夹,2018-6-19是我们手动修改时间,形成的文件夹,所以有两个,理解得更明显
在来查看,抽取到hdfs上的数据,内容跟我们的日志数据一毛一样,所以就可以对其进行分析,或者进行日志异常监控,监控某个值是否异常等等。