首先这里要收集的日志是容器的日志,而不是集群状态的日志
要完成的三个点,收集,监控,报警,收集是基础,监控和报警可以基于收集的日志来作,这篇主要实现收集
不想看字的可以直接看代码,一共没几行,尽量用调用本地方法实现,有工夫的可以改写成shell脚本
https://github.com/cclient/kubernetes-filebeat-collector
官方的收集方案
https://kubernetes.io/docs/tasks/debug-application-cluster/logging-elasticsearch-kibana/
一些基础的总结方案,文章时间比较早
https://jimmysong.io/kubernetes-handbook/practice/app-log-collection.html
官方的方案限制较多,因各种原因放弃
文内的原因是
```
Kubernetes官方提供了EFK的日志收集解决方案,但是这种方案并不适合所有的业务场景,它本身就有一些局限性,例如:
基于以上几个原因,我们决定使用自己的ELK集群。
```
个人不采用官方收集的原因基本类似,既要收集out前台输出,也要收集映射目录的多个输出文件,个人也不喜欢Fluentd,太重,整体较封闭。
基本实现方式
编号 | 方案 | 优点 | 缺点 |
---|---|---|---|
1 | 每个app的镜像中都集成日志收集组件 | 部署方便,kubernetes的yaml文件无须特别配置,可以为每个app自定义日志收集配置 | 强耦合,不方便应用和日志收集组件升级和维护且会导致镜像过大 |
2 | 单独创建一个日志收集组件跟app的容器一起运行在同一个pod中 | 低耦合,扩展性强,方便维护和升级 | 需要对kubernetes的yaml文件进行单独配置,略显繁琐 |
3 | 将所有的Pod的日志都挂载到宿主机上,每台主机上单独起一个日志收集Pod | 完全解耦,性能最高,管理起来最方便 | 需要统一日志收集规则,目录和输出方式 |
文内的需求选择的是方案2
文内和我的需有区别,但不管需求如何
方案1,2都引入了更多的复杂度,需要较多的额外工作,区别只是引入时机,1是build image时和2是deploy时
1,2从设计上就可以排除,太不优雅,以开发语言来说,侵入性太高,3是类似aop性质的能减少侵入,只有在3难以实现的情况下才考虑,实际上3并不难实现,早前纯docker集群就部署过类似的方案,稍改改就能适用于kubernetes
而且这些方案的设计,都只考虑到了,收集的是自定义日志文件,映射外部目录下的文件,不论在容器内,容器外读取,都只监听相应目录下的文件,都不收集 docker 默认的json file,即 docker logs --tail 100 -f contain_name 这类日志的数据
docker logs 操作上比较方便,早先设计docker 日志收集时,虽官方支持不少日志插件
因为`The docker logs command is not available for drivers other than json-file and journald.`
https://docs.docker.com/config/containers/logging/configure/#supported-logging-drivers
过去一直采用默认的json-file,收集时,直接监听 /var/lib/docker/containers/*/*-json.log
早期文件日志收集采用过logstash,修改数据比Fluentd方便,日志收集,没有太多的修改过滤需求,可以切到更轻量的filebeat。
但缺点也很明显,这些日志只是容器的日志,日志本身并不含有容器的相关信息,一种实现应用时就把应用信息写到日志里,虽然没有容器信息,应用信息也足够识别。这些实际也引入项目开发的侵入性。
实际上,docker早期就有基本的容器信息,我们只要把这个容器信息,在收集时,附加到原始日志中即可。
很多人想到附加信息,会考虑从kube-api,或是从etcd读取,这样复杂度过高,基本都放弃了。
实际上,docker 包括contain 在内都可以附加 lable 信息,而这些都可直接从本地文件读取
https://docs.docker.com/config/labels-custom-metadata/#key-format-recommendations
kuberetes 不支持为contain打label,但官方自已会加上一部分,这些lable基本够用了。
需求和简介描述完毕,以下是实现
看一眼config文件即知道实现方式,docker版本不同,文件名可能不一样,印象里早期版本是config.json,内容也没现在的全,早期版本不保证适用本文内容
{ "StreamConfig": {}, "State": { "Running": true, "Paused": false, "Restarting": false, "OOMKilled": false, "RemovalInProgress": false, "Dead": false, "Pid": 15201, "ExitCode": 0, "Error": "", "StartedAt": "2018-07-23T00:51:31.821704515Z", "FinishedAt": "2018-07-23T00:51:31.573484432Z", "Health": null }, "ID": "caed8938198152778e3715f4bd3c00795f40c812d2cdb87dadfe8b0bb058390f", "Created": "2018-07-20T09:05:21.881001304Z", "Managed": false, "Path": "/app/app", "Args": [], "Config": { "Hostname": "consume-577cd986c7-8mk4h", "Domainname": "", "User": "0", "AttachStdin": false, "AttachStdout": false, "AttachStderr": false, "Tty": false, "OpenStdin": false, "StdinOnce": false, "Env": [ "KUBERNETES_PORT_443_TCP=tcp://10.96.0.1:443", "TUICE_PERSONAL_WECHAT_PORT_8883_TCP_PORT=8883", "KUBERNETES_SERVICE_HOST=10.96.0.1", "KUBERNETES_PORT=tcp://10.96.0.1:443", "TUICE_PERSONAL_WECHAT_PORT=tcp://10.109.89.240:8883", "TUICE_PERSONAL_WECHAT_PORT_8883_TCP=tcp://10.109.89.240:8883", "KUBERNETES_SERVICE_PORT=443", "KUBERNETES_SERVICE_PORT_HTTPS=443", "KUBERNETES_PORT_443_TCP_PORT=443", "KUBERNETES_PORT_443_TCP_ADDR=10.96.0.1", "TUICE_PERSONAL_WECHAT_PORT_8883_TCP_ADDR=10.109.89.240", "KUBERNETES_PORT_443_TCP_PROTO=tcp", "TUICE_PERSONAL_WECHAT_SERVICE_HOST=10.109.89.240", "TUICE_PERSONAL_WECHAT_SERVICE_PORT=8883", "TUICE_PERSONAL_WECHAT_SERVICE_PORT_SERVER=8883", "TUICE_PERSONAL_WECHAT_PORT_8883_TCP_PROTO=tcp", "PATH=/go/bin:/usr/local/go/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", "GOLANG_VERSION=1.8", "GOLANG_SRC_URL=https://golang.org/dl/go1.8.src.tar.gz", "GOLANG_SRC_SHA256=406865f587b44be7092f206d73fc1de252600b79b3cacc587b74b5ef5c623596", "GOPATH=/app", "RUN=pro" ], "Cmd": null, "Healthcheck": { "Test": [ "NONE" ] }, "Image": "hub.docker.admaster.co/social_base/consume@sha256:483cd48b4b1e2ca53846f11fe953695bcceea59542b77f09044d30644cf3235f", "Volumes": null, "WorkingDir": "/app", "Entrypoint": [ "/app/app" ], "OnBuild": null, "Labels": { "annotation.io.kubernetes.container.hash": "e1a52ef6", "annotation.io.kubernetes.container.restartCount": "0", "annotation.io.kubernetes.container.terminationMessagePath": "/dev/termination-log", "annotation.io.kubernetes.container.terminationMessagePolicy": "File", "annotation.io.kubernetes.pod.terminationGracePeriod": "30", "io.kubernetes.container.logpath": "/var/log/pods/00bea8a1-8bfc-11e8-b709-f01fafd51338/consume/0.log", "io.kubernetes.container.name": "consume", "io.kubernetes.docker.type": "container", "io.kubernetes.pod.name": "consume-577cd986c7-8mk4h", "io.kubernetes.pod.namespace": "default", "io.kubernetes.pod.uid": "00bea8a1-8bfc-11e8-b709-f01fafd51338", "io.kubernetes.sandbox.id": "fda1d939f5b41e31ca5c5214b4d78b38691d438eeb52d22d4460d49a2a820c1f" } }, "Image": "sha256:85c9ca987b6fd310ce1019c28031b670da4d6705e70f1807f17727d48aa4aef4", "NetworkSettings": { "Bridge": "", "SandboxID": "", "HairpinMode": false, "LinkLocalIPv6Address": "", "LinkLocalIPv6PrefixLen": 0, "Networks": null, "Service": null, "Ports": null, "SandboxKey": "", "SecondaryIPAddresses": null, "SecondaryIPv6Addresses": null, "IsAnonymousEndpoint": false, "HasSwarmEndpoint": false }, "LogPath": "/var/lib/docker/containers/caed8938198152778e3715f4bd3c00795f40c812d2cdb87dadfe8b0bb058390f/caed8938198152778e3715f4bd3c00795f40c812d2cdb87dadfe8b0bb058390f-json.log", "Name": "/k8s_consume_consume-577cd986c7-8mk4h_default_00bea8a1-8bfc-11e8-b709-f01fafd51338_0", "Driver": "overlay", "MountLabel": "", "ProcessLabel": "", "RestartCount": 0, "HasBeenStartedBefore": true, "HasBeenManuallyStopped": false, "MountPoints": { "/app/conf": { "Source": "/var/lib/kubelet/pods/00bea8a1-8bfc-11e8-b709-f01fafd51338/volumes/kubernetes.io~configmap/conf", "Destination": "/app/conf", "RW": false, "Name": "", "Driver": "", "Type": "bind", "Relabel": "ro", "Propagation": "rprivate", "Spec": { "Type": "bind", "Source": "/var/lib/kubelet/pods/00bea8a1-8bfc-11e8-b709-f01fafd51338/volumes/kubernetes.io~configmap/conf", "Target": "/app/conf", "ReadOnly": true } }, "/dev/termination-log": { "Source": "/var/lib/kubelet/pods/00bea8a1-8bfc-11e8-b709-f01fafd51338/containers/consume/fc3f9ebf", "Destination": "/dev/termination-log", "RW": true, "Name": "", "Driver": "", "Type": "bind", "Propagation": "rprivate", "Spec": { "Type": "bind", "Source": "/var/lib/kubelet/pods/00bea8a1-8bfc-11e8-b709-f01fafd51338/containers/consume/fc3f9ebf", "Target": "/dev/termination-log" } }, "/etc/hosts": { "Source": "/var/lib/kubelet/pods/00bea8a1-8bfc-11e8-b709-f01fafd51338/etc-hosts", "Destination": "/etc/hosts", "RW": true, "Name": "", "Driver": "", "Type": "bind", "Propagation": "rprivate", "Spec": { "Type": "bind", "Source": "/var/lib/kubelet/pods/00bea8a1-8bfc-11e8-b709-f01fafd51338/etc-hosts", "Target": "/etc/hosts" } }, "/var/log/contain": { "Source": "/var/lib/kubelet/pods/00bea8a1-8bfc-11e8-b709-f01fafd51338/volumes/kubernetes.io~local-volume/pv-3", "Destination": "/var/log/contain", "RW": true, "Name": "", "Driver": "", "Type": "bind", "Propagation": "rprivate", "Spec": { "Type": "bind", "Source": "/var/lib/kubelet/pods/00bea8a1-8bfc-11e8-b709-f01fafd51338/volumes/kubernetes.io~local-volume/pv-3", "Target": "/var/log/contain" } }, "/tmp/nbbsdownload": { "Source": "/var/lib/kubelet/pods/00bea8a1-8bfc-11e8-b709-f01fafd51338/volumes/kubernetes.io~local-volume/pv-3-6", "Destination": "/tmp/nbbsdownload", "RW": true, "Name": "", "Driver": "", "Type": "bind", "Propagation": "rprivate", "Spec": { "Type": "bind", "Source": "/var/lib/kubelet/pods/00bea8a1-8bfc-11e8-b709-f01fafd51338/volumes/kubernetes.io~local-volume/pv-3-6", "Target": "/tmp/nbbsdownload" } }, "/var/run/secrets/kubernetes.io/serviceaccount": { "Source": "/var/lib/kubelet/pods/00bea8a1-8bfc-11e8-b709-f01fafd51338/volumes/kubernetes.io~secret/default-token-g6gr9", "Destination": "/var/run/secrets/kubernetes.io/serviceaccount", "RW": false, "Name": "", "Driver": "", "Type": "bind", "Relabel": "ro", "Propagation": "rprivate", "Spec": { "Type": "bind", "Source": "/var/lib/kubelet/pods/00bea8a1-8bfc-11e8-b709-f01fafd51338/volumes/kubernetes.io~secret/default-token-g6gr9", "Target": "/var/run/secrets/kubernetes.io/serviceaccount", "ReadOnly": true } } }, "SecretReferences": null, "AppArmorProfile": "", "HostnamePath": "/var/lib/docker/containers/fda1d939f5b41e31ca5c5214b4d78b38691d438eeb52d22d4460d49a2a820c1f/hostname", "HostsPath": "/var/lib/kubelet/pods/00bea8a1-8bfc-11e8-b709-f01fafd51338/etc-hosts", "ShmPath": "/var/lib/docker/containers/fda1d939f5b41e31ca5c5214b4d78b38691d438eeb52d22d4460d49a2a820c1f/shm", "ResolvConfPath": "/var/lib/docker/containers/fda1d939f5b41e31ca5c5214b4d78b38691d438eeb52d22d4460d49a2a820c1f/resolv.conf", "SeccompProfile": "unconfined", "NoNewPrivileges": false }
如果是k8s启动的contain,会有k8s相应的lable,应用这部分数据
{ "annotation.io.kubernetes.container.hash": "e1a52ef6", "annotation.io.kubernetes.container.restartCount": "0", "annotation.io.kubernetes.container.terminationMessagePath": "/dev/termination-log", "annotation.io.kubernetes.container.terminationMessagePolicy": "File", "annotation.io.kubernetes.pod.terminationGracePeriod": "30", "io.kubernetes.container.logpath": "/var/log/pods/00bea8a1-8bfc-11e8-b709-f01fafd51338/consume/0.log", "io.kubernetes.container.name": "consume", "io.kubernetes.docker.type": "container", "io.kubernetes.pod.name": "consume-577cd986c7-8mk4h", "io.kubernetes.pod.namespace": "default", "io.kubernetes.pod.uid": "00bea8a1-8bfc-11e8-b709-f01fafd51338", "io.kubernetes.sandbox.id": "fda1d939f5b41e31ca5c5214b4d78babcd1d438eeb52d22d4460d49a2a820c1f" }
docker的原始日志json file 文件 是
收集这个文件,再把label里的相应数据加上,就是k8s的容器日志了,说穿了很简单,只是很多人不清楚这几个文件的存在罢了。
这里收集的是docker out的日志,另一种普遍的作法是,映射一个本地路径,把文件写在这个目录下。
在容器内执行,收集容器内路径下的日志,或在宿主机执行,收集容器外路径下的日志。
这里最好能和docker out的收集方式统一,用两套完全不可接受
实际实现也很简单,以上的信息里还有这一部分
{ "/var/log/contain": { "Source": "/var/lib/kubelet/pods/00bea8a1-8bfc-11e8-b709-f01fafd51338/volumes/kubernetes.io~local-volume/pv-3", "Destination": "/var/log/contain", "RW": true, "Name": "", "Driver": "", "Type": "bind", "Propagation": "rprivate", "Spec": { "Type": "bind", "Source": "/var/lib/kubelet/pods/00bea8a1-8bfc-11e8-b709-f01fafd51338/volumes/kubernetes.io~local-volume/pv-3", "Target": "/var/log/contain" } } }
这里mount的映射信息,
/var/log/contain是容器内路径,
/var/lib/kubelet/pods/00bea8a1-8bfc-11e8-b709-f01fafd51338/volumes/kubernetes.io~local-volume/pv-3是宿主机路径
把这个
/var/lib/kubelet/pods/00bea8a1-8bfc-11e8-b709-f01fafd51338/volumes/kubernetes.io~local-volume/pv-3目录也加到收集监控里便ok了
思路已经通了
最后我们要的是类似这样的一个配置文件(不论是不是filebeat,需要的数据就是这些)
补充一点,看下方输出就明白
filebeat.inputs: - type: log paths: - /var/lib/docker/containers/caed8938198152778e3715f4bd3c00795f40c812d2cdb87dadfe8b0bb058390f/caed8938198152778e3715f4bd3c00795f40c812d2cdb87dadfe8b0bb058390f-json.log - /var/lib/kubelet/pods/00bea8a1-8bfc-11e8-b709-f01fafd51338/volumes/kubernetes.io~local-volume/pv-3/*.log fields: namespace: "default" name: "consume" pod_name: "consume-577cd986c7-8mk4h" output.elasticsearch: hosts: ["localhost:9200"] setup.kibana: host: "localhost:5601"
定时扫描/var/lib/docker/containers/,读取每个contain的config.v2.json,构造配置文件,启动收集进程
因为contain id包含hash值,同名冲突的概率几乎0零,因此也不用考虑任何冲突的问题,每个k8s node独立部署一个脚本即可。
目前实现的方式是用脚本加定时器,实时性稍微差了些,够用了,有工夫的可以优化成文件监听。