ELK最新版6.2.4学习笔记-Logstash和Filebeat解析

作者：阳吉登 | 来源：互联网 | 2023-05-19 10:19

接前一篇CentOS7下最新版(6.2.4)ELK+Filebeat+Log4j日志集成环境搭建完整指南，继续对ELK。logstash官方最新文档https:www.elastic.co

接前一篇CentOS 7下最新版(6.2.4)ELK+Filebeat+Log4j日志集成环境搭建完整指南，继续对ELK。

logstash官方最新文档https://www.elastic.co/guide/en/logstash/current/index.html。
假设有几十台服务器，每台服务器要监控系统日志syslog、tomcat日志、nginx日志、mysql日志等等，监控OOM、内存低下进程被kill、nginx错误、mysql异常等等，可想而知，这是多么的耗时耗力。
logstash采用的是插件化体系架构，几乎所有具体功能的实现都是采用插件，已安装的插件列表可以通过bin/logstash-plugin list --verbose列出。或者访问https://www.elastic.co/guide/en/logstash/current/input-plugins.html、https://www.elastic.co/guide/en/logstash/current/output-plugins.html。

logstash配置文件格式

分为输入、过滤器、输出三部分。除了POC目的外，基本上所有实际应用中都需要filter对日志进行预处理，无论是nginx日志还是log4j日志。output中的stdout同理。

input {
    log4j {
        port => "5400"
    }
    beats {
        port => "5044"
    }
}
filter {  # 多个过滤器会按声明的先后顺序执行
    grok {
        match => { "message" => "%{COMBINEDAPACHELOG}"}
    }
    geoip {
        source => "clientip"
    }
}
output {
    elasticsearch {
		action => "index"
		hosts => "127.0.0.1:9200" # 或者 ["IP Address 1:port1", "IP Address 2:port2", "IP Address 3"] ,支持均衡的写入ES的多个节点，一般为非master节点
		index  => "logstash-%{+YYYY-MM}"
    }
	stdout { 
		codec=> rubydebug 
	}
	file {
        path => "/path/to/target/file"
    }
}

logstash支持的常用输入包括syslog（参考RFC3164）、控制台、文件、redis、beats。
logstash支持的常用输出包括es、控制台、文件。
logstash支持的常用过滤器包括grok、mutate、drop、clone、geoip。

查看logstash各种命令行选项

[root@elk1 bin]# ./logstash --help
OpenJDK 64-Bit Server VM warning: If the number of processors is expected to increase from one, then you should configure the number of parallel GC threads appropriately using -XX:ParallelGCThreads=N
Usage:
bin/logstash [OPTIONS]

Options:
-n, --node.name NAME Specify the name of this logstash instance, if no value is given
it will default to the current hostname.
(default: "elk1")
-f, --path.config CONFIG_PATH Load the logstash config from a specific file
or directory. If a directory is given, all
files in that directory will be concatenated
in lexicographical order and then parsed as a
single config file. You can also specify
wildcards (globs) and any matched files will
be loaded in the order described above.
-e, --config.string CONFIG_STRING Use the given string as the configuration
data. Same syntax as the config file. If no
input is specified, then the following is
used as the default input:
"input { stdin { type => stdin } }"
and if no output is specified, then the
following is used as the default output:
"output { stdout { codec => rubydebug } }"
If you wish to use both defaults, please use
the empty string for the '-e' flag.
(default: nil)
--modules MODULES Load Logstash modules.
Modules can be defined using multiple instances
'--modules module1 --modules module2',
or comma-separated syntax
'--modules=module1,module2'
Cannot be used in conjunction with '-e' or '-f'
Use of '--modules' will override modules declared
in the 'logstash.yml' file.
-M, --modules.variable MODULES_VARIABLE Load variables for module template.
Multiple instances of '-M' or
'--modules.variable' are supported.
Ignored if '--modules' flag is not used.
Should be in the format of
'-M "MODULE_NAME.var.PLUGIN_TYPE.PLUGIN_NAME.VARIABLE_NAME=VALUE"'
as in
'-M "example.var.filter.mutate.fieldname=fieldvalue"'
--setup Load index template into Elasticsearch, and saved searches,
index-pattern, visualizations, and dashboards into Kibana when
running modules.
(default: false)
--cloud.id CLOUD_ID Sets the elasticsearch and kibana host settings for
module connections in Elastic Cloud.
Your Elastic Cloud User interface or the Cloud support
team should provide this.
Add an optional label prefix ':' to help you
identify multiple cloud.ids.
e.g. 'staging:dXMtZWFzdC0xLmF3cy5mb3VuZC5pbyRub3RhcmVhbCRpZGVudGlmaWVy'
--cloud.auth CLOUD_AUTH Sets the elasticsearch and kibana username and password
for module connections in Elastic Cloud
e.g. 'username:'
--pipeline.id ID Sets the ID of the pipeline.
(default: "main")
-w, --pipeline.workers COUNT Sets the number of pipeline workers to run.
(default: 1)
--experimental-java-execution (Experimental) Use new Java execution engine.
(default: false)
-b, --pipeline.batch.size SIZE Size of batches the pipeline is to work in.
(default: 125)
-u, --pipeline.batch.delay DELAY_IN_MS When creating pipeline batches, how long to wait while polling
for the next event.
(default: 50)
--pipeline.unsafe_shutdown Force logstash to exit during shutdown even
if there are still inflight events in memory.
By default, logstash will refuse to quit until all
received events have been pushed to the outputs.
(default: false)
--path.data PATH This should point to a writable directory. Logstash
will use this directory whenever it needs to store
data. Plugins will also have access to this path.
(default: "/usr/local/app/logstash-6.2.4/data")
-p, --path.plugins PATH A path of where to find plugins. This flag
can be given multiple times to include
multiple paths. Plugins are expected to be
in a specific directory hierarchy:
'PATH/logstash/TYPE/NAME.rb' where TYPE is
'inputs' 'filters', 'outputs' or 'codecs'
and NAME is the name of the plugin.
(default: [])
-l, --path.logs PATH Write logstash internal logs to the given
file. Without this flag, logstash will emit
logs to standard output.
(default: "/usr/local/app/logstash-6.2.4/logs")
--log.level LEVEL Set the log level for logstash. Possible values are:
- fatal
- error
- warn
- info
- debug
- trace
(default: "info")
--config.debug Print the compiled config ruby code out as a debug log (you must also have --log.level=debug enabled).
WARNING: This will include any 'password' options passed to plugin configs as plaintext, and may result
in plaintext passwords appearing in your logs!
(default: false)
-i, --interactive SHELL Drop to shell instead of running as normal.
Valid shells are "irb" and "pry"
-V, --version Emit the version of logstash and its friends,
then exit.
-t, --config.test_and_exit Check configuration for valid syntax and then exit.
(default: false)
-r, --config.reload.automatic Monitor configuration changes and reload
whenever it is changed.
NOTE: use SIGHUP to manually reload the config
(default: false)
--config.reload.interval RELOAD_INTERVAL How frequently to poll the configuration location
for changes, in seconds.
(default: 3000000000)
--http.host HTTP_HOST Web API binding host (default: "127.0.0.1")
--http.port HTTP_PORT Web API http port (default: 9600..9700)
--log.format FORMAT Specify if Logstash should write its own logs in JSON form (one
event per line) or in plain text (using Ruby's Object#inspect)
(default: "plain")
--path.settings SETTINGS_DIR Directory containing logstash.yml file. This can also be
set through the LS_SETTINGS_DIR environment variable.
(default: "/usr/local/app/logstash-6.2.4/config")
--verbose Set the log level to info.
DEPRECATED: use --log.level=info instead.
--debug Set the log level to debug.
DEPRECATED: use --log.level=debug instead.
--quiet Set the log level to info.
DEPRECATED: use --log.level=quiet instead.
-h, --help print help

各配置的含义也可以参考https://www.elastic.co/guide/en/logstash/current/logstash-settings-file.html
比较实用的是：
-f filename.conf 指定配置文件
--config.test_and_exit 解析配置文件正确性
--config.reload.automatic 自动监听配置修改而无需重启，跟nginx -s reload一样，挺实用的

ELK均采用YAML语言（https://baike.baidu.com/item/YAML/1067697?fr=aladdin）编写配置文件。

JVM参数在config/jvm.options中设置。

配置文件中output和filter部分均支持主要常见的逻辑表达式比如if/else if，以及各种比较、正则匹配。
配置文件中还可以访问环境变量，通过${HOME}即可，具体可以参考https://www.elastic.co/guide/en/logstash/current/environment-variables.html。

Beats Input插件

在开始看具体Input插件之前，我们看下哪些选项是所有插件都支持的。
其中主要的是id，如果一个logstash实例里面开了多个相同类型的插件，可以用来区分。

通过Beats插件加载数据源已经是ELK 6.x的主要推荐方式，所以我们来详细看下Beats插件的配置（https://www.elastic.co/guide/en/logstash/current/plugins-inputs-beats.html）。

input {
  beats {
    port => 5044
  }
}

其中port是参数是必填的，没有默认值。除了ssl配置外，其他几乎都是可选的。
host默认是"0.0.0.0"，代表监听所有网卡，除非有特殊安全要求，也是推荐的做法。

核心解析插件Grok Filter

通常来说，各种日志的格式都比较灵活复杂比如nginx访问日志或者并不纯粹是一行一事件比如java异常堆栈，而且还不一定对大部分开发或者运维那么友好，所以如果可以在最终展现前对日志进行解析并归类到各个字段中，可用性会提升很多。grok过滤器插件就是用来完成这个功能的。grok和beat插件一样，默认可用。
从非源头上来说，日志体系好不好，很大程度上依赖于这一步的过滤规则做的好不好，所以虽然繁琐，但却必须掌握，跟nginx的重写差不多。
Logstash自带了约120个模式，具体可见https://github.com/logstash-plugins/logstash-patterns-core/tree/master/patterns。
grok的语法为：%{SYNTAX:SEMANTIC}
类似于java:

String pattern = ".*runoob.*";
boolean isMatch = Pattern.matches(pattern, content);

其中的pattern就相当于SYNTAX，SEMANTIC为content，只不过因为解析的时候没有字段名，所以content是赋给匹配正则模式的文本的字段名，这些字段名会被追加到event中。
例如对于下列http请求日志：
55.3.244.1 GET /index.html 15824 0.043
使用 %{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration} 匹配的话，除了原message外，事件中会新增下列额外字段：
client: 55.3.244.1
method: GET
request: /index.html
bytes: 15824
duration: 0.043
完整的grok例子如下：

input {
  file {
    path => "/var/log/http.log"
  }
}
filter {
  grok {
    match => { "message" => "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}" }
  }
}

注：如果重启，logstash怎么知道读取到http.log的什么位置了，在filebeat部分，我们会讲到。
grok的主要选项是match和overwrite，前者用来解析message到相应字段，后者用来重写message，这样原始message就可以被覆盖，对于很多的日志来说，原始的message重复存储一份没有意义。 https://www.elastic.co/guide/en/logstash/6.2/plugins-filters-grok.html#plugins-filters-grok-overwrite

虽然Grok过滤器可以用来进行格式化，但是对于多行事件来说，并不适合在filter或者input（multiline codec，如果希望在logstash中处理多行事件，可以参考https://www.elastic.co/guide/en/logstash/current/multiline.html）中处理，因为使用ELK的平台通常日志使用beats input插件，此时在logstash中进行多行事件的处理会导致数据流混乱，所以需要在事件发送到logstash之前就处理好，也就是应该在filebeat中预处理。

对于来自于filebeat模块的数据，logstash自带了针对他们的解析模式，参考https://www.elastic.co/guide/en/logstash/current/logstash-config-for-filebeat-modules.html，具体到filebeat的时候详解。

ES Output插件

主要的选项包括：
action，默认是index，索引文档(logstash的事件)（ES架构与核心概念参考）。
host，声明ES服务器地址端口
index，事件写入的ES index，默认是logstash-%{+YYYY.MM.dd}，按天分片index，一般来说我们会按照时间分片，时间格式参考http://www.joda.org/joda-time/apidocs/org/joda/time/format/DateTimeFormat.html。

filebeat

从ELK 6.x开始，log4j输入插件已经不再建议使用，推荐的替代是filebat。

filebeat工作原理

参考https://www.elastic.co/guide/en/beats/filebeat/6.2/how-filebeat-works.html
Filebeat由两个主要组件组成， prospectors和harvesters，他们一起协作tail文件并将事件发送给声明的输出。harvester的职责是以行为单位读取文件，发送给输出，每个文件由不同的harvester读取。prospector的职责是管理harvester并找到要读取的文件。
Filebeat当前支持log和stdin这两种prospector，每种prospector可以定义多次。
Filebeat在注册表(通过参数filebeat.registry_file声明，默认是${path.data}/registry)中记录了每个文件的状态，状态记录了上一次harvester的读取偏移量。prospector则记录了每个找到的文件的状态。Filebeat确保所有的事件都被发送至少一次。

filebeat的配置文件同样采用YAML格式。

filebeat.prospectors:
- type: log
  paths:
    - /var/log/*.log  # 声明日志文件的绝对路径
  fields:
    type: syslog  # 声明增加一个值为syslog的type字段到事件中
output.logstash:
  hosts: ["localhost:5044"]

filebeat支持输出到Elasticsearch或者Logstash，一般来说通行的做法都是到Logstash，所以到ES的相关配置略过。
filebeat的命令行选项可以参考https://www.elastic.co/guide/en/beats/filebeat/6.2/command-line-options.html，配置文件所有配置项参考https://www.elastic.co/guide/en/beats/filebeat/6.2/filebeat-reference-yml.html。

默认情况下，filebeat运行在后台，要以前台方式启动，运行./filebeat -e。

要使用Filebeat，我们需要在filebeat.yml配置文件的filebeat.prospectors下声明prospector，prospector不限定只有一个。例如：

filebeat.prospectors:
- type: log
  paths:
    - /var/log/apache/httpd-*.log

- type: log
  paths:
    - /var/log/messages
    - /var/log/*.log

其他有用的选项还包括include_lines（仅读取匹配的行）、exclude_lines（不读取匹配的行）、exclude_files（排除某些文件）、tags、fields、fields_under_root、close_inactive（日志文件多久没有变化后自动关闭harvester，默认5分钟）、scan_frequency（prospector为harvester扫描新文件的频率，注意，因close_inactive自动关闭的也算新文件，默认为10s，不要低于1s）等
具体可见https://www.elastic.co/guide/en/beats/filebeat/6.2/configuration-filebeat-options.html。

解析多行消息

对于采用ELK作为应用日志来说，多行消息的友好展示是必不可少的，否则ELK的价值就大大打折了。要正确的处理多行消息，需要在filebeat.yml中设置multiline规则以声明哪些行属于一个事件。主要是由multiline.pattern、multiline.negate、multiline.match这三个参数决定。
比如，对于java日志而言，可以使用：

multiline.pattern: '^\['
multiline.negate: true
multiline.match: after

或者：

multiline.pattern: '^[[:space:]]+(at|\.{3})\b|^Caused by:'
multiline.negate: false
multiline.match: after

这样，下面的日志就算一个事件了。

[beat-logstash-some-name-832-2015.11.28] IndexNotFoundException[no such index]
    at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver$WildcardExpressionResolver.resolve(IndexNameExpressionResolver.java:566)
    at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndices(IndexNameExpressionResolver.java:133)
    at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndices(IndexNameExpressionResolver.java:77)
    at org.elasticsearch.action.admin.indices.delete.TransportDeleteIndexAction.checkBlock(TransportDeleteIndexAction.java:75)

详细的配置可以参考https://www.elastic.co/guide/en/beats/filebeat/6.2/multiline-examples.html。

Filebeat支持的输出包括Elasticsearch、Logstash、Kafka、Redis、File、Console，都挺简单，可以参考https://www.elastic.co/guide/en/beats/filebeat/6.2/kafka-output.html。

Filebeat模块提供了一种更便捷的方式处理常见的日志格式，比如apache2、mysql等。从性质上来说，他就像spring boot，约定优于配置。具体可以参考https://www.elastic.co/guide/en/beats/filebeat/6.2/filebeat-modules-overview.html。Filebeat模块要求Elasticsearch 5.2以及之后的版本。

推荐阅读

config
Nginx使用AWStats日志分析的步骤及注意事项

本文介绍了在Centos7操作系统上使用Nginx和AWStats进行日志分析的步骤和注意事项。通过AWStats可以统计网站的访问量、IP地址、操作系统、浏览器等信息，并提供精确到每月、每日、每小时的数据。在部署AWStats之前需要确认服务器上已经安装了Perl环境，并进行DNS解析。 ... [详细]

蜡笔小新 2023-12-14 19:42:01
post
【译】发送表单数据

这是原文链接：sendingformdata许多情况下，我们使用表单发送数据到服务器。服务器处理数据并返回响应给用户。这看起来很简单，但是 ... [详细]

蜡笔小新 2023-12-14 16:19:10
post
Linux下部署Symfoy2对app/cache和app/logs目录的权限设置，symfoy2logs

php教程|php手册xml文件php教程-php手册Linux下部署Symfoy2对appcache和applogs目录的权限设置，symfoy2logs黑色记事本源码,vsco ... [详细]

蜡笔小新 2023-10-17 20:32:59
python
如何实现织梦DedeCms全站伪静态

本文介绍了如何通过修改织梦DedeCms源代码来实现全站伪静态，以提高管理和SEO效果。全站伪静态可以避免重复URL的问题，同时通过使用mod_rewrite伪静态模块和.htaccess正则表达式，可以更好地适应搜索引擎的需求。文章还提到了一些相关的技术和工具，如Ubuntu、qt编程、tomcat端口、爬虫、php request根目录等。 ... [详细]

蜡笔小新 2023-12-14 19:45:47
post
LVS 实现负载均衡的原理

LVS实现负载均衡的原理LVS负载均衡负载均衡集群是LoadBalance集群。是一种将网络上的访问流量分布于各个节点，以降低服务器压力，更好的向客户端 ... [详细]

蜡笔小新 2023-12-10 12:10:22
ascii
目录浏览漏洞与目录遍历漏洞的危害及修复方法

本文讨论了目录浏览漏洞与目录遍历漏洞的危害，包括网站结构暴露、隐秘文件访问等。同时介绍了检测方法，如使用漏洞扫描器和搜索关键词。最后提供了针对常见中间件的修复方式，包括关闭目录浏览功能。对于保护网站安全具有一定的参考价值。 ... [详细]

蜡笔小新 2023-12-09 23:30:30
install
CentOS离线安装zip和unzip的方法

本文介绍了在无法联网的情况下，通过下载rpm包离线安装zip和unzip的方法。详细介绍了如何搜索并下载合适的rpm包，以及如何使用rpm命令进行安装。 ... [详细]

蜡笔小新 2023-12-09 09:08:01
install
Tomcat安装与配置教程及常见问题解决方法

本文介绍了Tomcat的安装与配置教程，包括jdk版本的选择、域名解析、war文件的部署和访问、常见问题的解决方法等。其中涉及到的问题包括403问题、数据库连接问题、1130错误、2003错误、Java Runtime版本不兼容问题以及502错误等。最后还提到了项目的前后端连接代码的配置。通过本文的指导，读者可以顺利完成Tomcat的安装与配置，并解决常见的问题。 ... [详细]

蜡笔小新 2023-12-09 07:28:32
usb
CentOS7.0 U盘刻录工具使用方法详解

本文介绍了使用CentOS7.0 U盘刻录工具进行安装的详细步骤，包括使用USBWriter工具刻录ISO文件到USB驱动器、格式化USB磁盘、设置启动顺序等。通过本文的指导，用户可以轻松地使用U盘安装CentOS7.0操作系统。 ... [详细]

蜡笔小新 2023-12-14 18:55:14
copy
Nginx使用（server参数配置）

本文介绍了Nginx的使用，重点讲解了server参数配置，包括端口号、主机名、根目录等内容。同时，还介绍了Nginx的反向代理功能。 ... [详细]

蜡笔小新 2023-12-14 17:08:34
split
PHP实现断点续传乱序合并文件的方法和源码

本文介绍了使用PHP实现断点续传乱序合并文件的方法和源码。由于网络原因，文件需要分割成多个部分发送，因此无法按顺序接收。文章中提供了merge2.php的源码，通过使用shuffle函数打乱文件读取顺序，实现了乱序合并文件的功能。同时，还介绍了filesize、glob、unlink、fopen等相关函数的使用。阅读本文可以了解如何使用PHP实现断点续传乱序合并文件的具体步骤。 ... [详细]

蜡笔小新 2023-12-14 04:33:19
split
高校天文共享平台开发过程中的思考与规划

本文介绍了高校天文共享平台的开发过程中的思考和规划。该平台旨在为高校学生提供天象预报、科普知识、观测活动、图片分享等功能。文章分析了项目的技术栈选择、网站前端布局、业务流程、数据库结构等方面，并总结了项目存在的问题，如前后端未分离、代码混乱等。作者表示希望通过记录和规划，能够理清思路，进一步完善该平台。 ... [详细]

蜡笔小新 2023-12-13 18:08:58
config
Web学习历程记录（七）——Tomcat基本概念和配置

本文介绍了Web学习历程记录中关于Tomcat的基本概念和配置。首先解释了Web静态Web资源和动态Web资源的概念，以及C/S架构和B/S架构的区别。然后介绍了常见的Web服务器，包括Weblogic、WebSphere和Tomcat。接着详细讲解了Tomcat的虚拟主机、web应用和虚拟路径映射的概念和配置过程。最后简要介绍了http协议的作用。本文内容详实，适合初学者了解Tomcat的基础知识。 ... [详细]

蜡笔小新 2023-12-13 17:08:24
config
分享css中提升优先级属性!important的用法总结

web前端|css教程css!importantweb前端-css教程本文分享css中提升优先级属性!important的用法总结微信门店展示源码,vscode如何管理站点,ubu ... [详细]

蜡笔小新 2023-12-11 11:25:16
config
sqoop自定义分隔符的实现方法及步骤详解

本文介绍了在sqoop1.4.*版本中，如何实现自定义分隔符的方法及步骤。通过修改sqoop生成的java文件，并重新编译，可以满足实际开发中对分隔符的需求。具体步骤包括修改java文件中的一行代码，重新编译所需的hadoop包等。详细步骤和编译方法在本文中都有详细说明。 ... [详细]

蜡笔小新 2023-12-10 11:29:22

阳吉登

这个家伙很懒，什么也没留下！

Tags | 热门标签

RankList | 热门文章