为什么80%的码农都做不了架构师?>>>
关于Flume NG
关于Flume NG的官方说明:Flume NG是Flume 的一个分支,其目的是要明显简单,体积更小,更容易部署。这样做,我们不承诺保持向后兼容性。目前,我们正在从那些有兴趣测试的正确性,易用性,以及与其他系统整合潜力这个分支征求反馈意见
有什么变化?
Flume NG是一个从Flume继承保留来的。因此大部分概念是相同的,如果你已经熟悉flume ,这里就是你需要知道的。
1.You still have sources and sinks and they still do the same thing. They are now connected by channels.
2.Channels are pluggable and dictate durability. Flume NG ships with an in-memory channel for fast, but non-durable event delivery and a JDBC-based channel for durable event delivery. We have recently added a file-based durable channel too.
3.There's no more logical or physical nodes. We call all physical nodes agents and agents can run zero or more sources and sinks.
4.There's no master and no ZooKeeper dependency anymore. At this time, Flume runs with a simple file-based configuration system.
5.Just about everything is a plugin, some end user facing, some for tool and system developers. (Specifically, sources, sinks, channels, configuration providers, lifecycle management policies, input and output formats, compression, source and sink channel adapters, and the kitchen sink.)
6.Tons of things are not yet implemented. Please file JIRAs and / or vote for features you deem important.
Flum NG 主要构成介绍
Event
事件是flume NG中一种广义的数据单位。事件是类似于JMS和类似邮件系统的邮件,一般都比较小(几个字节到几KB的顺序)。事件是在一个更大的数据集常用单记录。事件被做成头和身体的,前者是一个键/值映射和后者,一个任意字节数组
- Source
A source of data from which Flume NG receives data. Sources can be pollable or event driven. Pollable sources, like they sound, are repeatedly polled by Flume NG source runners where as event driven sources are expected to be driven by some other force. An example of a pollable source is the sequence generator which simple generates events whose body is a monotonically increasing integer. Event driven sources include the Avro source which accepts Avro RPC calls and converts the RPC payload into a Flume event and the netcat source which mimics the nc command line tool running in server mode. Sources are a user accessible API extension point.
- Sink
A sink is the counterpart to the source in that it is a destination for data in Flume NG. Some of the builtin sinks that are included with Flume NG are the Hadoop Distributed File System sink which writes events to HDFS in various ways, the logger sink which simply logs all events received, and the null sink which is Flume NG's version of /dev/null. Sinks are a user accessible API extension point.
- Channel
通道是一个源和一个接收器之间的管道事件。渠道也决定了一个源和一个接收器之间的事件持久性。例如,一个通道可能会在内存中,在内存虽然快,但不作任何保证防止数据丢失,它也可以全面持久的(从而可靠),其中每一个事件,保证交付连接的接收器,即使在失败的案例,如断电。渠道是一个用户访问API的扩展点。
Agent
Flume NG 归纳代理的概念。代理人是任何物理的JVM中运行的Flume NG。Flume OG用户应抛弃以前代理的“物理节点连接这个词。物理/逻辑节点术语。一个单一的NG代理可以运行任意数量的源,汇和渠道,它们之间
Client
客户端并不一定是一个Flume NG组件尽可能连接到Flume 和发送数据到源。一个流行和良好的客户端的一个例子将是一个像的Log4j Appender直接发送事件到flume avro源的日志记录。另一个例子可能是syslog守护进程。