当前位置: 开发笔记 > 编程语言 > 正文

开源syslog日志系统scribe

作者：Jack捷L | 来源：互联网 | 2023-06-21 12:11

开源syslog日志系统scribescribe官网https:github.comfacebookarchivescribe简介Scribe是facebook开源的日志收集系

开源 syslog 日志系统 scribe

scribe 官网

https://github.com/facebookarchive/scribe

简介

Scribe 是 facebook 开源的日志收集系统，在 facebook 内部已经得到大量的应用，目前在各大互联网公司内部已经得到大量的应用。

它能够从各种日志源上收集日志，存储到一个中央存储系统（可以是 NFS，分布式文件系统等）上，以便于进行集中统计分析处理。它为日志的“分布式收集，统一处理”提供了一个可扩展的，高容错的方案。

它最重要的特点是容错性好。当中央存储系统的网络或者机器出现故障时，scribe 会将日志转存到本地或者另一个位置，当中央存储系统恢复后，scribe 会将转存的日志重新传输给中央存储系统。

其通常与 Hadoop 结合使用，scribe 用于向 HDFS 中 push 日志，而 Hadoop 通过 MapReduce 作业进行定期处理。

架构：

scribe 的架构比较简单，主要包括三部分，分别为 scribe agent， scribe 和存储系统。

(1) scribe agent

scribe agent 实际上是一个 thrift client。向 scribe 发送数据的唯一方法是使用 thrift client， scribe 内部定义了一个 thrift 接口，用户使用该接口将数据发送给 server。

(2) scribe

scribe 接收到 thrift client 发送过来的数据，根据配置文件，将不同 topic 的数据发送给不同的对象。scribe 提供了各种各样的 store，如 file， HDFS 等，scribe 可将数据加载到这些 store 中。

(3) 存储系统

存储系统实际上就是 scribe 中的 store，当前 scribe 支持非常多的 store，包括 file（文件），buffer（双层存储，一个主储存，一个副存储），network（另一个scribe服务器），bucket（包含多个 store，通过 hash 的将数据存到不同 store 中），null (忽略数据)，thriftfile（写到一个 Thrift TFileTransport 文件中）和 multi（把数据同时存放到不同 store 中）。

【CentOS-7】

安装环境软件

sudo yum install git make bison libtool automake openssl-devel gcc-c++ python-devel# libevent，是一个用 C 语言编写的、轻量级的开源高性能事件通知库 # 安装 libevent libevent-devel yum install libevent libevent-devel# flex,是一个生成词法分析器的工具,它可以利用正则表达式来生成匹配相应字符串的 C 语言代码。 # 安装 flex yum install flex # 安装 byacc yum install byacc # 安装 openjdk yum install java-1.7.0-openjdk # 一个构建工具，它通过自动完成所有的编译代码，运行测试以及打包重新部署的结果等繁琐费力的任务来帮助软件团队开发大程序 # 安装 ant yum install ant# Autoconf 是一个用于包，以适应多种 Unix 类系统的 shell 脚本的工具。 # 安装 autoconf yum install autoconf# Boost是为C++语言标准库提供扩展的一些C++程序库的总称。 # 安装 boost yum install boost boost-devel# libevent，是一个用 C 语言编写的、轻量级的开源高性能事件通知库 # 安装 libevent yum install libevent# 安装 libicu-devel yum install libicu-devel# 安装 thrift wget http://rpmfind.net/linux/epel/7/x86_64/Packages/t/thrift-0.9.1-15.el7.x86_64.rpm yum install thrift-0.9.1-15.el7.x86_64.rpm# 安装 fb303 wget http://rpmfind.net/linux/epel/7/x86_64/Packages/f/fb303-0.9.1-15.el7.x86_64.rpm yum install fb303-0.9.1-15.el7.x86_64.rpm

刷新动态链接库

/sbin/ldconfig

下载 scribe

git clone https://github.com/facebookarchive/scribe.git

Readme

Archived Repo =============This is an archived project and is no longer supported or updated by Facebook. Please do not file issues or pull-requests against this repo. If you wish to continue to develop this code yourself, we recommend you fork it.-------------Introduction ============Scribe is a server for aggregating log data that&＃39;s streamed in real time from clients. It is designed to be scalable and reliable.See the Scribe Wiki for documentation: http://wiki.github.com/facebook/scribeKeep up to date on Scribe development by joining the Scribe Discussion Group: http://groups.google.com/group/scribe-server/License (See LICENSE file for full license) =========================================== Copyright 2007-2008 FacebookLicensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.Hierarchy =========scribe/aclocal/Contains scripts for building/linking with Boostexamples/Contains simple examples of using Scribeif/Contains Thrift interface for Scribelib/Contains Python package for Scribesrc/Contains Scribe sourcetest/Contain php scripts for testing scribeRequirements ============[libevent] Event Notification library [boost] Boost C++ library (version 1.36 or later) [thrift] Thrift framework (version 0.5.0 or later) [fb303] Facebook Bassline (included in thrift/contrib/fb303/)fb303 r697294 or later is required. [hadoop] optional. version 0.19.1 or higher (http://hadoop.apache.org)These libraries are open source and may be freely obtained, but they are not provided as a part of this distribution.Helpful tips: -Thrift, fb303, and scribe installation expects python to be installedunder /usr. See PY_PREFIX option in &＃39;configure --help&＃39; to change this path. -Some python installs do not include python site-packages in the defaultpython include path. If python cannot find the installed packages forscribe or fb303, try setting the environment variable PYTHONPATH to thelocation of the installed packages. This path gets output during&＃39;make install&＃39;. (Eg: PYTHOnPATH=&＃39;/usr/lib/python2.5/site-packages&＃39;).To build ========./bootstrap.sh make(If you have multiple versions of Boost installed, see Boost configure options below.)Subsequent builds =================./bootstrap makeOR./configure makeNOTE: After the first run with bootstrap.sh you can use "[ ./bootstrap | ./configure ] " followed by "make" to create builds with different configurations. "bootstrap" can be passed the same arguments as "configure".Make sure that if you change configure.ac and|or add macros run "bootstrap.sh". to regenerate configure. In short whenever in doubt run "bootstrap.sh".Configure options =================To find all available configure options run ./configure --helpUse *only* the listed options.Examples: # To disable optimized builds and turn on debug. [ default has been set to optimized] ./configure --disable-opt# To disable static libraries and enable shared libraries. [ default has been set to static] ./configure --disable-static# To build scribe with Hadoop support ./configure --enable-hdfs# If the build process cannot find your Hadoop/Jvm installs, you may need to specify them manually: ./configure --with-hadooppath=/usr/local/hadoop --enable-hdfs CPPFLAGS="-I/usr/local/java/include -I/usr/local/java/include/linux" LDFLAGS="-ljvm -lhdfs"# To set thrift home to a non-default location ./configure --with-thriftpath=/myhome/local/thrift# If Boost is installed in a non-default location or there are multiple Boost versions # installed, you will need to specify the Boost path and library names ./configure --with-boost=/usr/local --with-boost-system=boost_system-gcc40-mt-1_36 --with-boost-filesystem=boost_filesystem-gcc40-mt-1_36Install =======as root: make installRun ===See the examples directory to learn how to use Scribe.Acknowledgements ================ The build process for Scribe uses autoconf macros to compile/link with Boost. These macros were written by Thomas Porschberg, Michael Tindal, and Daniel Casimiro. See the m4 files in the aclocal subdirectory for more information.

运行

# 查看脚本 cat bootstrap.sh# 执行脚本 ./bootstrap.sh --prefix=/usr/local/scribe --with-thriftpath=/usr/local/thrift/ --with-fb303path=/usr/local/fb303/ --with-boost=/usr/local/boost/

Scribe 的配置文件分为全局配置和存储配置两部分：

全局配置

port：指示scribe服务器在哪一个端口上监听，默认是0，通过命令行参数选项-P可以指定端口，也能够通过配置文件指定。在源代码中就赋值给变量port。max_msg_per_second：默认值是0，如果这个参数值是0将被忽略。随着最近的改变这个参数很少被关联使用到，max_queue_size参数将被应用到限制每秒最大的消息数。在scribeHandler::throttleDeny被使用。max_queue_size（按字节）：接收消息的队列的最大字节，默认是5,000,000字节。在scribeHandler::Log使用。check_interval（秒）：用于控制多长时间检查一次存储，默认值是5.new_thread_per_category（是/否）：如果为是，将为每一个分类场景创建一个新的线程，否则将创一个单线程为每一个在配置文件中定义的存储。对于前缀存储或默认存储，如果这个参数设置成“否”将导致所有匹配这个分类的消息都由一个单独的存储来处理。否则将为每一个唯一的分类名创建一个新的存储。默认为“是”。num_thrift_server_threads：为接收消息的监听线程数量，默认是3.max_conn：最大的链接数。

其他开源的日志系统

scribe主页：https://github.com/facebook/scribe

chukwa主页：http://incubator.apache.org/chukwa/

kafka主页：http://sna-projects.com/kafka/

Flume主页：https://github.com/cloudera/flume/

参考：

https://www.cnblogs.com/likehua/p/3796826.html

https://blog.csdn.net/weixin_34200628/article/details/89997699

推荐阅读

hadoop
ZooKeeper 学习

前言相信大家对ZooKeeper应该不算陌生。但是你真的了解ZooKeeper是个什么东西吗？如果别人面试官让你给他讲讲ZooKeeper是个什么东西， ... [详细]

蜡笔小新 2023-10-17 17:07:40
hadoop
一句话解决高并发的核心原则

本文介绍了解决高并发的核心原则，即将用户访问请求尽量往前推，避免访问CDN、静态服务器、动态服务器、数据库和存储，从而实现高性能、高并发、高可扩展的网站架构。同时提到了Google的成功案例，以及适用于千万级别PV站和亿级PV网站的架构层次。 ... [详细]

蜡笔小新 2023-12-12 10:56:24
sum
Oracle优化新常态的五大禁止及其性能隐患

本文介绍了Oracle优化新常态中的五大禁止措施，包括禁止外键、禁止视图、禁止触发器、禁止存储过程和禁止JOB，并分析了这些禁止措施可能带来的性能隐患。文章还讨论了这些禁止措施在C/S架构和B/S架构中的不同应用情况，并提出了解决方案。 ... [详细]

蜡笔小新 2023-12-12 12:55:55
io
云原生应用最佳开发实践之十二原则（12factor）

目录简介一、基准代码二、依赖三、配置四、后端配置五、构建、发布、运行六、进程七、端口绑定八、并发九、易处理十、开发与线上环境等价十一、日志十二、进程管理当 ... [详细]

蜡笔小新 2023-12-09 09:35:02
int
什么是大数据lambda架构

一、什么是Lambda架构Lambda架构由Storm的作者[NathanMarz]提出，根据维基百科的定义，Lambda架构的设计是为了在处理大规模数 ... [详细]

蜡笔小新 2023-10-17 16:06:09
int
每天收获一点点Hadoop概述

一、Hadoop来历Hadoop的思想来源于Google在做搜索引擎的时候出现一个很大的问题就是这么多网页我如何才能以最快的速度来搜索到，由于这个问题Google发明 ... [详细]

蜡笔小新 2023-12-14 18:58:01
io
t-io 2.0.0发布-法网天眼第一版的回顾和更新说明

本文回顾了t-io 1.x版本的工程结构和性能数据，并介绍了t-io在码云上的成绩和用户反馈。同时，还提到了@openSeLi同学发布的t-io 30W长连接并发压力测试报告。最后，详细介绍了t-io 2.0.0版本的更新内容，包括更简洁的使用方式和内置的httpsession功能。 ... [详细]

蜡笔小新 2023-12-14 10:17:48
io
Linux如何安装Mongodb的详细步骤和注意事项

本文介绍了Linux如何安装Mongodb的详细步骤和注意事项，同时介绍了Mongodb的特点和优势。Mongodb是一个开源的数据库，适用于各种规模的企业和各类应用程序。它具有灵活的数据模式和高性能的数据读写操作，能够提高企业的敏捷性和可扩展性。文章还提供了Mongodb的下载安装包地址。 ... [详细]

蜡笔小新 2023-12-12 21:54:15
io
java命令运行

Java在运行已编译完成的类时，是通过java虚拟机来装载和执行的，java虚拟机通过操作系统命令JAVA_HOMEbinjava–option来启 ... [详细]

蜡笔小新 2023-12-12 19:26:55
io
单点登录原理及实现方案详解

本文详细介绍了单点登录的原理及实现方案，其中包括共享Session的方式，以及基于Redis的Session共享方案。同时，还分享了作者在应用环境中所遇到的问题和经验，希望对读者有所帮助。 ... [详细]

蜡笔小新 2023-12-12 19:23:28
int
OpenStack及其构成简介

本文介绍了OpenStack的逻辑概念以及其构成简介，包括了软件开源项目、基础设施资源管理平台、三大核心组件等内容。同时还介绍了Horizon(UI模块)等相关信息。 ... [详细]

蜡笔小新 2023-12-12 06:47:38
int
初学者遇到的dubbo设计架构问题及解决方法总结

本文总结了初学者在使用dubbo设计架构过程中遇到的问题，并提供了相应的解决方法。问题包括传输字节流限制、分布式事务、序列化、多点部署、zk端口冲突、服务失败请求3次机制以及启动时检查。通过解决这些问题，初学者能够更好地理解和应用dubbo设计架构。 ... [详细]

蜡笔小新 2023-12-09 10:07:18
io
ejava,刘聪dejava

本文目录一览：1、什么是Java？2、java ... [详细]

蜡笔小新 2023-12-09 09:28:18
search
Hadoop源码解析1Hadoop工程包架构解析

1 Hadoop中各工程包依赖简述 Google的核心竞争技术是它的计算平台。Google的大牛们用了下面5篇文章，介绍了它们的计算设施。 GoogleCluster：ht ... [详细]

蜡笔小新 2023-10-17 13:28:20
io
LVS-DR直接路由实现负载均衡示例

nsitionalENhttp:www.w3.orgTRxhtml1DTDxhtml1-transitional.dtd ... [详细]

蜡笔小新 2023-10-17 10:27:04

Jack捷L

这个家伙很懒，什么也没留下！

Tags | 热门标签

RankList | 热门文章