flume安装并整合kafka

作者： | 来源：互联网 | 2023-08-11 13:56

官方文档：http:flume.apache.orgFlumeUserGuide.html参考图书：Flume构建高可用、可拓展的海量日志采集系统参考文

官方文档&＃xff1a;http://flume.apache.org/FlumeUserGuide.html
参考图书&＃xff1a;Flume 构建高可用、可拓展的海量日志采集系统
参考文档&＃xff1a;http://www.aboutyun.com/forum.php?mod&＃61;viewthread&tid&＃61;20699

kafka集群部署&＃xff1a;https://blog.51cto.com/13323775/2063420

flume

Flume agent之间的通信&＃xff08;参考图书&＃xff09;

flume内置了专门的RPC sink-source对来处理agent之间的数据传输。source是负责接收数据到Flume Agent的组件。包括Avro Source、Thrift source 、HTTP Source、Spooling Directory Source、Syslog Source、Exec Source、JMS Source等。channel是位于source和sink之间的缓冲区&＃xff0c;是保证数据不丢失的关键。sink从Channel中读取事件&＃xff0c;每一个sink只能从一个Channel钟读取事件&＃xff0c;必须给每一个sink配置Channel&＃xff0c;否则会从agent中移除。安装flume

下载安装

cd /data/
wget http://mirrors.hust.edu.cn/apache/flume/1.8.0/apache-flume-1.8.0-bin.tar.gz
tar axf apache-flume-1.8.0-bin.tar.gz
cd apache-flume-1.8.0-bin

修改环境变量

vim /etc/profile

#FLUSM export FLUME_HOME&＃61;/data/apache-flume-1.8.0-bin export PATH&＃61;$PATH:${FLUME_HOME}/bin export HADOOP_HOME&＃61;/data/hadoop

source /etc/profile

修改配置文件

cd ${FLUME_HOME}/conf/
cp flume-env.sh.template flume-env.sh
修改 flume-env.sh

export JAVA_HOME&＃61;/usr/local/jdk export JAVA_OPTS&＃61;"-Xms100m -Xmx2000m -Dcom.sun.management.jmxremote" export HADOOP_HOME&＃61;/data/hadoop

验证安装
flume-ng version
flume安装并整合kafka

使用flume

单节点agent传输信息

cd ${FLUME_HOME}/conf/
添加配置文件
vim avro.conf

#Name the components on this agent agent.sources &＃61; avroSrc agent.channels &＃61; avroChannel#Describe/configure the source agent.sources.avroSrc.type &＃61; netcat agent.sources.avroSrc.bind &＃61; localhost agent.sources.avroSrc.port &＃61; 62000#Describe the sink agent.sinks.avroSink.type &＃61; logger#Use a channel that buffers events in memory agent.channels.avroChannel.type &＃61; memory agent.channels.avroChannel.capacity &＃61; 1000 agent.channels.avroChannel.transactionCapacity &＃61; 100#Bind the source and sink to the channel agent.sinks &＃61; avroSink agent.sources.avroSrc.channels &＃61; avroChannel agent.sinks.avroSink.channel &＃61; avroChannel

“#测试agent.sources.avroSrc.type用avro&＃xff0c;然后报错
#org.apache.avro.AvroRuntimeException: Excessively large list #allocation request detected: 1863125517 items! Connection #closed”

运行flume agent
flume-ng agent -f /data/apache-flume-1.8.0-bin/conf/avro.conf -n agent -Dflume.root.logger&＃61;INFO,console

使用Telnet连接测试
telnet localhost 6200
flume安装并整合kafka
查看

exec监控本地文件

cd ${FLUME_HOME}/conf/
添加配置文件
vim exec.conf

#example.conf: A single-node Flume configuration#Name the components on this agent agentexec.sources &＃61; avroexec agentexec.sinks &＃61; sinkexec agentexec.channels &＃61; channelexec#Describe/configure the sources#Describe/configure the source agentexec.sources.avroexec.bind &＃61; localhost agentexec.sources.avroexec.port &＃61; 630000 agentexec.sources.avroexec.type &＃61; exec agentexec.sources.avroexec.command &＃61; tail -F /tmp/testexec.log #Describe the sink agentexec.sinks.sinkexec.type &＃61; logger#Use a channel which buffers events in memory agentexec.channels.channelexec.type &＃61; memory agentexec.channels.channelexec.capacity &＃61; 100000 agentexec.channels.channelexec.transactionCapacity &＃61; 10000#Bind the source and sink to the channel agentexec.sources.avroexec.channels &＃61; channelexec agentexec.sinks.sinkexec.channel &＃61; channelexec

运行flume agent
flume-ng agent -f /data/apache-flume-1.8.0-bin/conf/exec.conf --name agentexec -Dflume.root.logger&＃61;INFO,console

测试
flume安装并整合kafka

尴尬&＃xff0c;只获取到了一部分&＃xff08;暂时没有占到解决方法&＃xff09;

spooldir整合kafka监控日志

前提&＃xff1a;安装kafka集群
cd ${FLUME_HOME}/conf/
添加配置文件
vim single_agent.conf

#agent name a1 a1.sources &＃61; source1 a1.channels &＃61; channel1 a1.sinks &＃61; sink1#set source #“测试使用将数据放在了/tmp目录下&＃xff0c;注意设置” a1.sources.source1.type &＃61; spooldir a1.sources.source1.spoolDir&＃61;/tmp/spooldir a11.sources.source1.fileHeader &＃61; false#set sink a1.sinks.sink1.type &＃61; org.apache.flume.sink.kafka.KafkaSink a1.sinks.sink1.kafka.bootstrap.servers &＃61; master:9092,slave1:9092,slave2:9092 a1.sinks.sink1.topic&＃61; spooldir#set channel #“测试使用将数据放在了/tmp目录下&＃xff0c;注意设置” a1.channels.channel1.type &＃61; file a1.channels.channel1.checkpointDir &＃61; /tmp/flume_data/checkpoint a1.channels.channel1.dataDirs&＃61; /tmp/flume_data/data#bind a1.sources.source1.channels &＃61; channel1 a1.sinks.sink1.channel &＃61; channel1

创建文件存放目录

mkdir -pv /tmp/spooldir mkdir -pv /tmp/flume_data/checkpoint mkdir -pv /tmp/flume_data/data

&＃xff08;所有节点&＃xff09;启动kafka集群

kafka-server-start.sh /data/kafka_2.11-1.0.0/config/server.properties

创建kafka的topic

kafka-topics.sh --zookeeper master:2181,slave1:2181,slave2:2181 --create --topic spooldir --replication-factor 1 --partitions 3

查看topic

kafka-topics.sh --list --zookeeper master:2181,slave1:2181,slave2:2181

创建kafka的consumer

kafka-console-consumer.sh --zookeeper master:2181,slave1:2181,slave2:2181 --topic spooldir --from-beginning

&＃xff08;新窗口&＃xff09;启动flume的agent

flume-ng agent -f /data/apache-flume-1.8.0-bin/conf/single_agent.conf --name a1 -Dflume.root.logger&＃61;INFO,console

写入测试
[root&＃64;master conf]# echo "hello ,test flume spooldir source" >> /tmp/spooldir/spool.txt
flume-ng信息
flume安装并整合kafka
kafka信息

将日志信息写入hbase

前提&＃xff1a;安装hbase集群
cd ${FLUME_HOME}/conf/
mkdir hbase && cd hbase
添加配置文件&＃xff0c;这里需要两个agent端
hbase-back.conf用于收集本地数据&＃xff0c;hbase-front.conf用于将数据写入hbase
vim hbase-back.conf

agent.sources &＃61;backsrc agent.channels&＃61;memoryChannel agent.sinks &＃61;remotesink #Describe the sources agent.sources.backsrc.type &＃61; exec agent.sources.backsrc.command &＃61; tail -F /tmp/test/data/data.txt agent.sources.backsrc.checkperiodic &＃61; 1000 agent.sources.backsrc.channels&＃61;memoryChannel #Describe the channels agent.channels.memoryChannel.type &＃61; memory agent.channels.memoryChannel.keep-alive &＃61; 30 agent.channels.memoryChannel.capacity &＃61; 1000 agent.channels.memoryChannel.transactionCapacity &＃61; 1000 #Describe the sinks agent.sinks.remotesink.type &＃61; avro agent.sinks.remotesink.hostname &＃61; master agent.sinks.remotesink.port &＃61; 9999 agent.sinks.remotesink.channel&＃61; memoryChannel

vim hbase-front.conf

agent.sources &＃61; frontsrc agent.channels &＃61; memoryChannel agent.sinks &＃61; fileSink #Describe the sources agent.sources.frontsrc.type &＃61; avro agent.sources.frontsrc.bind &＃61; master agent.sources.frontsrc.port &＃61; 9999 agent.sources.frontsrc.channels &＃61; memoryChannel #Describe the channels agent.channels.memoryChannel.type &＃61; memory agent.channels.memoryChannel.keep-alive &＃61; 30 agent.channels.memoryChannel.capacity &＃61; 1000 agent.channels.memoryChannel.transactionCapacity &＃61;1000 #Describe the sinks agent.sinks.fileSink.type &＃61; hbase agent.sinks.fileSink.channel&＃61;memoryChannel agent.sinks.fileSink.table &＃61; access_log agent.sinks.fileSink.columnFamily &＃61; t agent.sinks.fileSink.batchSize&＃61; 50 agent.sinks.fileSink.serializer &＃61; org.apache.flume.sink.hbase.RegexHbaseEventSerializer agent.sinks.fileSink.zookeeperQuorum &＃61; master:2181,slave1:2181,slave2:2181 agent.sinks.fileSink.znodeParent &＃61; /hbase agent.sinks.fileSink.timeout &＃61; 90000

创建本地文件和目录
mkdir -pv /tmp/test/data && touch /tmp/test/data/data.txt
创建hbase中的表
hbase shell
创建表
create &＃39;access_log&＃39;,&＃39;t&＃39;
查看
list
flume安装并整合kafka
启动back agent

flume-ng agent -f /data/apache-flume-1.8.0-bin/conf/hbase/hbase-back.conf --name agent -Dflume.root.logger&＃61;INFO,console

启动后会报错

18/01/22 22:29:28 WARN sink.AbstractRpcSink: Unable to create Rpc client using hostname: 192.168.3.58, port: 9999
org.apache.flume.FlumeException: NettyAvroRpcClient { host: master, port: 9999 }: RPC connection error

这是因为avro连接没有完成&＃xff0c;现在只启动了sink端&＃xff0c;没有source端&＃xff0c;等启动了front后就会显示连接上了
flume安装并整合kafka
启动front agent

flume-ng agent -f /data/apache-flume-1.8.0-bin/conf/hbase/hbase-front.conf --name agent -Dflume.root.logger&＃61;INFO,console

向本地文件中追加内容&＃xff0c;然后在hbase中查看

echo "hello ,test flush to hbase">>/tmp/test/data/data.txt

写入的过程中两个agent不会打印日志
查看hbase中的数据

hbase shell scan "access_log"

flume向hbase中写入日志会有一定时间的延迟
flume安装并整合kafka

将日志写入hadoop

原理和写入hbase一样&＃xff0c;理解了hbase写入流程就很好理解写入其它服务了&＃xff0c;详细配置参考官方文档。
前提&＃xff1a;安装hadoop集群
cd ${FLUME_HOME}/conf/
mkdir hdfs && cd hdfs
添加配置文件&＃xff0c;这里需要两个agent端
hadoop-back.conf用于收集本地数据&＃xff0c;hadoop-front.conf用于将数据写入hadoop
vim hadoop-back.conf

#Namethe components hadoop.sources&＃61; backsrc hadoop.sinks&＃61; fileSink hadoop.channels&＃61; memoryChannel #Source hadoop.sources.backsrc.type&＃61; spooldir hadoop.sources.backsrc.spoolDir&＃61; /tmp/data/hadoop hadoop.sources.backsrc.channels&＃61; memoryChannel hadoop.sources.backsrc.fileHeader &＃61; true #Channel hadoop.channels.memoryChannel.type&＃61; memory hadoop.channels.memoryChannel.keep-alive &＃61; 30 hadoop.channels.memoryChannel.capacity &＃61; 1000 hadoop.channels.memoryChannel.transactionCapacity &＃61; 1000 #Sink hadoop.sinks.fileSink.type&＃61; avro hadoop.sinks.fileSink.hostname&＃61; master hadoop.sinks.fileSink.port&＃61; 10000 hadoop.sinks.fileSink.channel&＃61; memoryChannel

vim hadoop-front.conf

#Namethe components hadoop.sources&＃61; frontsrc hadoop.channels&＃61; memoryChannel hadoop.sinks&＃61; remotesink #Source hadoop.sources.frontsrc.type&＃61; avro hadoop.sources.frontsrc.bind&＃61; master hadoop.sources.frontsrc.port&＃61; 10000 hadoop.sources.frontsrc.channels&＃61; memoryChannel #Channel hadoop.channels.memoryChannel.type&＃61; memory hadoop.channels.memoryChannel.keep-alive &＃61; 30 hadoop.channels.memoryChannel.capacity &＃61; 1000 hadoop.channels.memoryChannel.transactionCapacity &＃61;1000 #Sink hadoop.sinks.remotesink.type&＃61; hdfs hadoop.sinks.remotesink.hdfs.path&＃61;hdfs://master/flume hadoop.sinks.remotesink.hdfs.rollInterval &＃61; 0 hadoop.sinks.remotesink.hdfs.idleTimeout &＃61; 10000 hadoop.sinks.remotesink.hdfs.fileType&＃61; DataStream hadoop.sinks.remotesink.hdfs.writeFormat&＃61; Text hadoop.sinks.remotesink.hdfs.threadsPoolSize &＃61; 20 hadoop.sinks.remotesink.channel&＃61; memoryChannel

创建本地目录并修改权限

mkdir -pv /tmp/data/hadoop && chmod -R 777 /tmp/data/

创建hdfs中的目录并修改权限

hadoop fs -mkdir /flume hadoop fs -chmod 777 /flume hadoop fs -ls /

flume安装并整合kafka
向本地目录中写入文件

echo "hello, test hadoop" >> /tmp/data/hadoop/hadoop.log echo "hello, test flume" >> /tmp/data/hadoop/flume.log echo "hello, test helloworld" >> /tmp/data/hadoop/helloworld.log

查看hdfs中的文件和文件信息

hadoop fs -ls /flume hadoop fs -cat /flume/FlumeData.1516634328510.tmp

flume安装并整合kafka

参考文档&＃xff1a;

官方文档&＃xff1a;http://flume.apache.org/FlumeUserGuide.html
图书&＃xff1a;Flume 构建高可用、可拓展的海量日志采集系统
flume常见配置&＃xff1a;http://blog.csdn.net/sang1203/article/details/51474628
flume安装与使用&＃xff1a;http://www.aboutyun.com/forum.php?mod&＃61;viewthread&tid&＃61;20699

转:https://blog.51cto.com/13323775/2063751

推荐阅读

string
利用Struts1构建简易计算器：采用DispatchAction处理请求，动态Form优化开发流程，提供用户友好的错误提示

本文介绍了如何利用Struts1框架构建一个简易的四则运算计算器。通过采用DispatchAction来处理不同类型的计算请求，并使用动态Form来优化开发流程，确保代码的简洁性和可维护性。同时，系统提供了用户友好的错误提示，以增强用户体验。 ... [详细]

蜡笔小新 2024-11-09 19:48:22
string
如何使用 `org.apache.tomcat.websocket.server.WsServerContainer.findMapping()` 方法及其代码示例解析

如何使用 `org.apache.tomcat.websocket.server.WsServerContainer.findMapping()` 方法及其代码示例解析 ... [详细]

蜡笔小新 2024-11-11 10:08:55
string
com.sun.javadoc.PackageDoc.exceptions()方法的使用及代码示例

com.sun.javadoc.PackageDoc.exceptions()方法的使用及代码示例 ... [详细]

蜡笔小新 2024-11-13 10:47:33
string
Spring详解（六）AOP

原文网址：https:www.cnblogs.comysoceanp7476379.html目录1、AOP什么？2、需求3、解决办法1:使用静态代理4 ... [详细]

蜡笔小新 2024-11-12 14:40:40
plugins
优化Hadoop 2.7.2源代码以支持Snappy压缩和解压功能的Native编译

为了在Hadoop 2.7.2中实现对Snappy压缩和解压功能的原生支持，本文详细介绍了如何重新编译Hadoop源代码，并优化其Native编译过程。通过这一优化，可以显著提升数据处理的效率和性能。此外，还探讨了编译过程中可能遇到的问题及其解决方案，为用户提供了一套完整的操作指南。 ... [详细]

蜡笔小新 2024-11-09 19:45:36
plugins
Maven进阶指南：高效管理项目外部依赖库

本文深入探讨了如何利用Maven高效管理项目中的外部依赖库。通过介绍Maven的官方依赖搜索地址（），详细讲解了依赖库的添加、版本管理和冲突解决等关键操作。此外，还提供了实用的配置示例和最佳实践，帮助开发者优化项目构建流程，提高开发效率。 ... [详细]

蜡笔小新 2024-11-09 11:17:43
default
解决Only fullscreen opaque activities can request orientation错误的方法

本文介绍了在使用PictureSelectorLight第三方框架时遇到的Only fullscreen opaque activities can request orientation错误，并提供了一种有效的解决方案。 ... [详细]

蜡笔小新 2024-11-13 09:46:25
buffer
面试中如何回答“零拷贝”技术问题？

零拷贝技术是提高I/O性能的重要手段，常用于Java NIO、Netty、Kafka等框架中。本文将详细解析零拷贝技术的原理及其应用。 ... [详细]

蜡笔小新 2024-11-13 02:03:52
string
oracle c3p0 dword 60,web_day10 dbcp c3p0 dbutils

createdatabasemydbcharactersetutf8;alertdatabasemydbcharactersetutf8;1.自定义连接池为了不去经常创建连接和释放 ... [详细]

蜡笔小新 2024-11-12 19:26:15
string
com.hazelcast.config.MapConfig.isStatisticsEnabled()方法的使用及代码示例

com.hazelcast.config.MapConfig.isStatisticsEnabled()方法的使用及代码示例 ... [详细]

蜡笔小新 2024-11-12 14:33:17
foreach
深入解析 Lifecycle 的实现原理

本文将详细介绍 Android Jetpack 中 Lifecycle 组件的实现原理，帮助开发者更好地理解和使用 Lifecycle，避免常见的内存泄漏问题。 ... [详细]

蜡笔小新 2024-11-12 14:05:19
js
秒建一个后台管理系统？用这5个开源免费的Java项目就够了

秒建一个后台管理系统？用这5个开源免费的Java项目就够了 ... [详细]

蜡笔小新 2024-11-12 03:21:33
js
XAMPP 遇到 404 错误：无法找到请求的对象

在使用 XAMPP 时遇到 404 错误，表示请求的对象未找到。通过详细分析发现，该问题可能由以下原因引起：1. `httpd-vhosts.conf` 文件中的配置路径错误；2. `public` 目录下缺少 `.htaccess` 文件。建议检查并修正这些配置，以确保服务器能够正确识别和访问所需的文件路径。 ... [详细]

蜡笔小新 2024-11-11 18:20:00
js
在PHP中如何正确调用JavaScript变量及定义PHP变量的方法详解

在PHP中如何正确调用JavaScript变量及定义PHP变量的方法详解 ... [详细]

蜡笔小新 2024-11-11 17:28:29
string
如何使用 `org.eclipse.rdf4j.query.impl.MapBindingSet.getValue()` 方法及其代码示例详解

如何使用 `org.eclipse.rdf4j.query.impl.MapBindingSet.getValue()` 方法及其代码示例详解 ... [详细]

蜡笔小新 2024-11-11 02:42:52

Tags | 热门标签

RankList | 热门文章