前言
本篇来自秦牧羊的一篇分享,讲述的是从hammer升级到jewel的过程,以及其中的一些故障的处理,是一篇非常详细的实战分享
初始状态
pool状态
root@demo:/home/demouser# rados lspools
rbd
.cn.rgw.root
.cn-zone1.rgw.root
.cn-zone1.rgw.domain
.cn-zone1.rgw.control
.cn-zone1.rgw.gc
.cn-zone1.rgw.buckets.index
.cn-zone1.rgw.buckets.extra
.cn-zone1.rgw.buckets
.cn-zone1.log
.cn-zone1.intent-log
.cn-zone1.usage
.cn-zone1.users
.cn-zone1.users.email
.cn-zone1.users.swift
.cn-zone1.users.uid
ceph.conf配置
[client.radosgw.us-zone1]
rgw dns name = s3.ceph.work
rgw frOntends= fastcgi
host = ceph.work
rgw region = cn
rgw region root pool = .cn.rgw.root
rgw zOne= us-zone1
rgw zone root pool = .cn-zone1.rgw.root
keyring = etc/ceph/ceph.client.radosgw.keyring
rgw socket path = home/ceph/var/run/ceph-client.radosgw.us-zone1.sock
log file = home/ceph/log/radosgw.us-zone1.log
rgw print cOntinue= false
rgw content length compat = true
元数据信息检查
root@demo:/home/demouser# radosgw-admin metadata list user --name client.radosgw.us-zone1
[
"en-user1",
···
]
root@demo:/home/demouser# radosgw-admin metadata list bucket --name client.radosgw.us-zone1
[
"cn-test1",
···
]
软件版本及集群状态
root@demo:/home/demouser# ceph -v
ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
root@demo:/home/demouser# ceph -s
cluster 23d6f3f9-0b86-432c-bb18-1722f73e93e0
health HEALTH_OK
ceph升级到最新jewel
这里要提醒一点就是如果ceph版本低于0.94.7,直接升级到10.xx会出一些问题,因为低版本的osdmap的数据结构与高版本不兼容,所以先升级到最新的hammer
root@demo:/home/demouser# vi /etc/apt/sources.list.d/ceph.list
deb http://mirrors.163.com/ceph/debian-hammer/ jessie main #使用163源更新到最新的hammer
root@demo:/home/demouser# apt-get update
root@demo:/home/demouser# apt-cache policy ceph
正式升级到最新的hammer
root@demo:/home/demouser# ceph -v
ceph version 0.94.10 (b1e0532418e4631af01acbc0cedd426f1905f4af) #当前软件包版本已经更新
root@demo:/home/demouser# ceph -s
cluster 23d6f3f9-0b86-432c-bb18-1722f73e93e0
health HEALTH_OK
monmap e1: 1 mons at {ceph.work=10.63.48.19:6789/0}
election epoch 1, quorum 0 ceph.work
osdmap e43: 3 osds: 3 up, 3 in
pgmap v907873: 544 pgs, 16 pools, 2217 kB data, 242 objects
3120 MB used, 88994 MB / 92114 MB avail
544 active+clean
root@demo:/home/demouser# /etc/init.d/ceph status
=== mon.ceph.work ===
mon.ceph.work: running {"version":"0.94.5"} #mon和osd进程还是跑的旧版本
···
root@demo:/home/demouser# /etc/init.d/ceph restart #手工重启所有服务,线上环境依次先重启mon再是osd,避免批量重启造成影响
=== mon.ceph.work ===
···
Stopping Ceph osd.0 on ceph.work...kill 1082...kill 1082...done
=== osd.0 ===
Mounting xfs on ceph.work:/home/ceph/var/lib/osd/ceph-0
···
root@demo:/home/demouser# /etc/init.d/ceph status
=== mon.ceph.work ===
mon.ceph.work: running {"version":"0.94.10"} #mon和osd都全部更新到最新
=== osd.2 ===
osd.2: running {"version":"0.94.10"}
root@demo:/home/demouser# ceph -s
升级到最新jewel版本
root@demo:/home/demouser# vi /etc/apt/sources.list.d/ceph.list
deb http://mirrors.163.com/ceph/debian-jewel/ jessie main #使用163源更新到最新的jewel
root@demo:/home/demouser# apt-get update
...
Fetched 18.7 kB in 11s (1,587 B/s)
Reading package lists... Done
root@demo:/home/demouser# apt-cache policy ceph
ceph:
Installed: 0.94.10-1~bpo80+1 #当前安装的版本
Candidate: 10.2.6-1~bpo80+1 #准备安装的最新jewel版本
Version table:
10.2.6-1~bpo80+1 0
500 http://mirrors.163.com/ceph/debian-jewel/ jessie/main amd64 Packages
*** 0.94.10-1~bpo80+1 0
100 /var/lib/dpkg/status
Setting system user ceph properties..usermod: user ceph is currently used by process 5312
dpkg: error processing package ceph-common (--configure): #需要重启进程才能更新配置,忽略这里及以下错误
subprocess installed post-installation script returned error exit status 8
···
root@demo:/home/demouser# /etc/init.d/ceph status
=== mon.ceph.work ===
mon.ceph.work: running {"version":"0.94.10"} #当前mon和osd版本还是旧版本
···
osd.2: running {"version":"0.94.10"}
root@demo:/home/demouser# ceph -s
cluster 23d6f3f9-0b86-432c-bb18-1722f73e93e0
health HEALTH_OK
root@demo:/home/demouser# /etc/init.d/ceph restart #手工重启所有服务,线上环境依次先重启mon再是osd,避免批量重启造成影响
=== mon.ceph.work ===
=== mon.ceph.work ===
···
root@demo:/home/demouser# ceph -s #出现crushmap 兼容性告警
cluster 23d6f3f9-0b86-432c-bb18-1722f73e93e0
health HEALTH_WARN
crush map has legacy tunables (require bobtail, min is firefly)
all OSDs are running jewel or later but the 'require_jewel_osds' osdmap flag is not set
monmap e1: 1 mons at {ceph.work=10.63.48.19:6789/0}
election epoch 2, quorum 0 ceph.work
osdmap e61: 3 osds: 3 up, 3 in
pgmap v907906: 544 pgs, 16 pools, 2217 kB data, 242 objects
3122 MB used, 88991 MB / 92114 MB avail
544 active+clean
root@demo:/home/demouser# /etc/init.d/ceph status #检查所有服务进程版本是否到最新
=== mon.ceph.work ===
mon.ceph.work: running {"version":"10.2.6"}
=== osd.0 ===
osd.0: running {"version":"10.2.6"}
···
root@demo:/home/demouser# ceph osd set require_jewel_osds
set require_jewel_osds
root@demo:/home/demouser# ceph osd crush tunables optimal
adjusted tunables profile to optimal
root@demo:/home/demouser# ceph -s #调整crushmap兼容性参数以后恢复正常
cluster 23d6f3f9-0b86-432c-bb18-1722f73e93e0
health HEALTH_OK
monmap e1: 1 mons at {ceph.work=10.63.48.19:6789/0}
election epoch 2, quorum 0 ceph.work
osdmap e63: 3 osds: 3 up, 3 in
flags require_jewel_osds
pgmap v907917: 544 pgs, 16 pools, 2217 kB data, 242 objects
3122 MB used, 88991 MB / 92114 MB avail
544 active+clean
总结
旧版本hammer的rgw管理模型是 region->zone两级结构,而新版本变成了realm->zonegroup->zone,同时部分pool的命名规则也发生了变更,如果总结升级ceph版本,会出现RGW服务启动失败,导致RGW启动失败的因素有两类,一类是pool名称的变更,另外一类是ceph.conf中rgw的配置变更。本文通过真实用例,实现了新旧版本的切换,各位实际环境还是要谨慎操作,毕竟跨版本的升级还是有很大风险。 —-by 秦牧羊
附
官方升级操作指南:http://docs.ceph.com/docs/master/radosgw/upgrade_to_jewel/
注意
由于微信公众号字数的限制,只能20000字,做了一些删减,本篇的RGW的服务的修复部分请查看原文链接,原文链接是全部的内容