总体设计架构图Kubernetes monitoring architecture
设计介绍
监控分成两个部分
kubelet运行在所有node节点上,通过内置的cAdvisor收集节点上所有的容器资源使用信息,然后通过kubelet合并成pod级别的指标信息,之后以API的形式对外暴露出来,供其他组件调用。详细的kubeletAPI可参考kubelet-api,在kubelet安装完成后可以通过如下命令尝试调用
root@master:/etc/kubernetes/cert# curl -s --cacert ./ca.pem --cert ./admin.pem --key ./admin-key.pem https://192.168.0.107:10250/metrics | head
# HELP apiserver_audit_event_total [ALPHA] Counter of audit events generated and sent to the audit backend.
# TYPE apiserver_audit_event_total counter
apiserver_audit_event_total 0
# HELP apiserver_audit_requests_rejected_total [ALPHA] Counter of apiserver requests rejected due to an error in audit logging backend.
# TYPE apiserver_audit_requests_rejected_total counter
apiserver_audit_requests_rejected_total 0
# HELP apiserver_client_certificate_expiration_seconds [ALPHA] Distribution of the remaining lifetime on the certificate used to authenticate a request.
# TYPE apiserver_client_certificate_expiration_seconds histogram
apiserver_client_certificate_expiration_seconds_bucket{le="0"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="1800"} 0
metrics-server中只会保留最近一次的指标汇集数据,不会存储获取过的信息,所以k8s早期计划再开发一个基础存储组件,把metrics-server收集到的信息传递给存储组件,供dashboard、vertical autoscaling 等组件使用
对于用户自定义的HPA指标,通过adapter转换成HPA识别的形式参与HPA
这些都是刚开始设计时的想法,现在prometheus采用的是在节点上安装一个nodereporter来收集节点信息,汇报给Prometheus
kubernetes 引入Aggregation Layer机制,可以让用户方便的在核心API之外对k8s集群进行扩展。安装配置好集群后aggregation会运行在kube-apiserver进程里面,用户通过在集群中创建一个APIService对象,在其中设定对应的URL(路径 /apis/{group}/{version}/...),之后访问这个路径,API server会将请求转发到具体的后端服务,一般情况下,通过extension-apiserver服务实现,这个服务会作为一个pod运行在k8s集群中。
用户访问extension-apiserver的过程如下
API server通过以下配置项向extension-apiserver提供访问的证书以及用户信息
当API server有以上配置项时,API server会在kube-system命名空间下生成一个configmap:extension-apiserver-authentication,里面包含--requestheader* 相关的配置信息。Extension-apiserver要想对API server的请求认证,需要先拿到这个configmap中的信息,可以通过给Extension-apiserver使用的serviceAccount赋予kube-system:extension-apiserver-authentication-reader这个角色来实现。
目前API server没有以上配置项时也会产生这个configmap,对应的内容如下
root@master:/opt/k8s/work# kubectl get configmaps -n kube-system extension-apiserver-authentication -o yaml
apiVersion: v1
data:
client-ca-file: |
-----BEGIN CERTIFICATE-----
MIIDmjCCAoKgAwIBAgIUMgmbH118p4mkwRHqgFl3bltHX1MwDQYJKoZIhvcNAQEL
BQAwZTELMAkGA1UEBhMCQ04xEDAOBgNVBAgTB05hbkppbmcxEDAOBgNVBAcTB05h
bkppbmcxDDAKBgNVBAoTA2s4czEPMA0GA1UECxMGc3lzdGVtMRMwEQYDVQQDEwpr
NQIDAQABo0IwQDAOBgNVHQ8BAf8EBAMCAQYwDwYDVR0TAQH/BAUwAwEB/zAdBgNV
HQ4EFgQUcyfeeyf0LulhElMz7x4YXC7FBXIwDQYJKoZIhvcNAQELBQADggEBAHvN
18jceQ9BthnxFNoCZ5yjiQGQViVcaw76gEm/OrmxKGFUXJyDmZghP+gjJ8ZOADZ9
Brw+F66ULWMBfFQrESUf3nnnaScFdrZ9TcoKDPPhzibOfEqGMf6RNFTjlWk11ZUl
qPTPmkJlGqMGvRgPMPm2xwucE5+o762C94iLFBfmqaS/FHGsoR7hfGSEAn0q9by5
SotQpHpAt5tzE8N7KEXFIDOr8LlbXOd/lLn1+G84NY8lWWcARFgvAuOFgKQqfenm
ezrX/nv45OvuKBYVf7o+8CXfoTK7vc7RTtqWHA+zNbjly7IaYeaPyDxQqWSY6cBZ
Fzh51DLVlbmTyeagMXo=
-----END CERTIFICATE-----
kind: ConfigMap
metadata:
creationTimestamp: "2020-02-09T12:04:05Z"
name: extension-apiserver-authentication
namespace: kube-system
resourceVersion: "21"
selfLink: /api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication
uid: 6cb1a94e-78e5-49c9-8f5a-ae8183f8de96
Extension-apiserver 要对请求的user进行鉴权,需要先发送一个SubjectAccessReview请求到API server,为了能够发送这个请求还需要给xtension-apiserver使用的serviceAccount赋予system:auth-delegator这个角色
启用aggregation功能需要在kube-apiserver配置项中追加如下信息
--requestheader-client-ca-file=
--requestheader-allowed-names=front-proxy-client
--requestheader-extra-headers-prefix=X-Remote-Extra-
--requestheader-group-headers=X-Remote-Group
--requestheader-username-headers=X-Remote-User
--proxy-client-cert-file=
--proxy-client-key-file=
如果运行API server的机器上没有运行kube-proxy进程,还需要追加如下一个配置项
--enable-aggregator-routing=true
CA 冲突问题
启用了aggregation功能后,API server会有两个配置
如果配置不好,会造成CA冲突
当两者都配置后,API server会先检查证书是否被requestheader-client-ca-file签名,不是时才会用client-ca-file来判断。所以一般情况下这两个ca要不一致,如果两个配置的一样,可能会造成原来能正常访问API server的证书在API server启用aggregator不能再访问了,因为有可能原来正常访问的证书中的CN不在requestheader-allowed-names这个列表中,(官方文档说法,实际使用时测试了下,用同一个ca,没有报错,不知道是什么原因)
生成 client ca,要安装cfssl工具集cfssl
签名配置文件
cd /opt/k8s/work
cat > client-ca-config.json <
"signing": {
"default": {
"expiry": "87600h"
},
"profiles": {
"kubernetes": {
"usages": [
"signing",
"key encipherment",
"server auth",
"client auth"
],
"expiry": "87600h"
}
}
}
}
EOF
证书请求文件
cd /opt/k8s/work
cat > client-ca-csr.json <
"CN": "kubernetes",
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "NanJing",
"L": "NanJing",
"O": "k8s",
"OU": "system"
}
],
"ca": {
"expiry": "87600h"
}
}
EOF
生成客户端根证书
cd /opt/k8s/work
cfssl gencert -initca client-ca-csr.json | cfssljson -bare client-ca
ls client-ca*.pem
将证书放到k8s证书目录(多个API server节点时,其他节点也要分发)
cd /opt/k8s/work
cp client-ca*.pem client-ca-config.json /etc/kubernetes/cert/
生成proxy用证书
证书请求文件
cd /opt/k8s/work
cat > proxy-client-csr.json <
"CN": "front-proxy-client",
"hosts": [],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "NanJing",
"L": "NanJing",
"O": "k8s",
"OU": "system"
}
]
}
EOF
生成证书
cfssl gencert -ca=/etc/kubernetes/cert/client-ca.pem \
-ca-key=/etc/kubernetes/cert/client-ca-key.pem \
-cOnfig=/etc/kubernetes/cert/client-ca-config.json \
-profile=kubernetes proxy-client-csr.json | cfssljson -bare proxy-client
ls proxy-client*.pem
将证书放到k8s证书目录(多个API server节点时,其他节点也要分发)
cd /opt/k8s/work
cp proxy-client*.pem /etc/kubernetes/cert/
配置API server追加如下配置项
--requestheader-client-ca-file=/etc/kubernetes/cert/client-ca.pem
--requestheader-allowed-names=front-proxy-client
--requestheader-extra-headers-prefix=X-Remote-Extra-
--requestheader-group-headers=X-Remote-Group
--requestheader-username-headers=X-Remote-User
--proxy-client-cert-file=/etc/kubernetes/cert/proxy-client.pem
--proxy-client-key-file=/etc/kubernetes/cert/proxy-client-key.pem
重启API server
systemctl daemon-reload
systemctl restart kube-apiserver
对应的extension-apiserver-authentication中的内容
root@master:/opt/k8s/work# kubectl get configmaps -n kube-system extension-apiserver-authentication -o yaml
apiVersion: v1
data:
client-ca-file: |
-----BEGIN CERTIFICATE-----
MIIDmjCCAoKgAwIBAgIUMgmbH118p4mkwRHqgFl3bltHX1MwDQYJKoZIhvcNAQEL
BQAwZTELMAkGA1UEBhMCQ04xEDAOBgNVBAgTB05hbkppbmcxEDAOBgNVBAcTB05h
...
Fzh51DLVlbmTyeagMXo=
-----END CERTIFICATE-----
requestheader-allowed-names: '["front-proxy-client"]'
requestheader-client-ca-file: |
-----BEGIN CERTIFICATE-----
MIIDmjCCAoKgAwIBAgIULBdSC4QJy1MBYwGDb0b9g7YMDH0wDQYJKoZIhvcNAQEL
...
QKHtdMypc3mPUO6sBcY=
-----END CERTIFICATE-----
requestheader-extra-headers-prefix: '["X-Remote-Extra-"]'
requestheader-group-headers: '["X-Remote-Group"]'
requestheader-username-headers: '["X-Remote-User"]'
kind: ConfigMap
metadata:
creationTimestamp: "2020-02-09T12:04:05Z"
name: extension-apiserver-authentication
namespace: kube-system
resourceVersion: "2930987"
selfLink: /api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication
uid: 6cb1a94e-78e5-49c9-8f5a-ae8183f8de96
在安装metrics-server之前,虽然kubelet收集了系统信息,但是这些信息只能通过kubelet的接口进行访问,调用kubectl top nodes会报如下错误
$ kubectl top nodes
Error from server (NotFound): the server could not find the requested resource (get services http:heapster:)
下载metrics-server对应的镜像,上传到自己的私有镜像库中
docker pull gcr.azk8s.cn/google_containers/metrics-server-amd64:v0.3.6
docker tag gcr.azk8s.cn/google_containers/metrics-server-amd64:v0.3.6 192.168.0.107/k8s/metrics-server-amd64:v0.3.6
docker push 192.168.0.107/k8s/metrics-server-amd64:v0.3.6
下载metrics-server启动文件
$ cd /opt/k8s/work/
$ wget https://github.com/kubernetes-sigs/metrics-server/archive/master.zip
$ unzip master.zip
$ cd metrics-server-master/deploy/kubernetes
修改 metrics-server-deployment.yaml 文件,为 metrics-server 添加两个命令行参数,并修改镜像名称,指向自己的私有仓库
$ diff metrics-server-deployment.yaml metrics-server-deployment.yaml.bak
32c32
> image: k8s.gcr.io/metrics-server-amd64:v0.3.6
36,37d35
<- --metric-resolution=30s
<- --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP
kubelet-preferred-address:优先选择用IP访问kubelet,否则会用主机的hostname来访问,默认安装的coreDns不支持hostname的解析,也可通过修改coreDNS的配置文件,追加 hosts配置
...
Corefile: |
.:53 {
errors
health
hosts {
192.168.0.107 master
192.168.0.114 slave
fallthrough
}
...
启动metrics-server
$ cd /opt/k8s/work/metrics-server-master/deploy/kubernetes
$ kubectl create -f .
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
serviceaccount/metrics-server created
deployment.apps/metrics-server created
service/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created
查看运行情况
$ kubectl -n kube-system get all -l k8s-app=metrics-server
NAME READY STATUS RESTARTS AGE
pod/metrics-server-857d7c4878-swpvk 1/1 Running 0 72s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/metrics-server 1/1 1 1 72s
NAME DESIRED CURRENT READY AGE
replicaset.apps/metrics-server-857d7c4878 1 1 1 72s
查看 metrics-server 输出的 metrics
$ kubectl get --raw https://192.168.0.107:6443/apis/metrics.k8s.io/v1beta1/nodes | jq .
{
"kind": "NodeMetricsList",
"apiVersion": "metrics.k8s.io/v1beta1",
"metadata": {
"selfLink": "/apis/metrics.k8s.io/v1beta1/nodes"
},
"items": [
{
"metadata": {
"name": "master",
"selfLink": "/apis/metrics.k8s.io/v1beta1/nodes/master",
"creationTimestamp": "2020-02-27T09:30:12Z"
},
"timestamp": "2020-02-27T09:29:35Z",
"window": "30s",
"usage": {
"cpu": "414650216n",
"memory": "6069004Ki"
}
},
{
"metadata": {
"name": "slave",
"selfLink": "/apis/metrics.k8s.io/v1beta1/nodes/slave",
"creationTimestamp": "2020-02-27T09:30:12Z"
},
"timestamp": "2020-02-27T09:29:35Z",
"window": "30s",
"usage": {
"cpu": "80942639n",
"memory": "2393408Ki"
}
}
]
}
$ kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
master 427m 10% 5928Mi 76%
slave 92m 2% 2335Mi 62%
启动完后,服务都正常,获取不到指标信息
$ kubectl top nodes
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)
kube-apiserver 服务中一直出现下面的错
E0227 16:52:50.472445 19192 available_controller.go:419] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.0.0.96:443/apis/metrics.k8s.io/v1beta1: bad status from https://10.0.0.96:443/apis/metrics.k8s.io/v1beta1: 403
按照这个错,应该是kube-apiserver访问metrics-server时权限不足,可我们明明提供了proxy*相关的参数,最后再一次检查kube-apiserver的启动文件,发现--proxy-client-cert-file参数后面少了个分行符
错误
--proxy-client-cert-file=/etc/kubernetes/cert/proxy-client.pem
--proxy-client-key-file=/etc/kubernetes/cert/proxy-client-key.pem \
正确
--proxy-client-cert-file=/etc/kubernetes/cert/proxy-client.pem \
--proxy-client-key-file=/etc/kubernetes/cert/proxy-client-key.pem \