热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文




Hadoop主守护进程通过调用配置文件指定的外部脚本或java类来获取集群工作者的机架ID。使用java类或外部脚本进行拓扑,输出必须遵循java org.apache.hadoop.net.DNSToSwitchMapping接口。接口需要保持一对一的对应关系,并且拓扑信息的格式为'/ myrack/myhost',其中'/'是拓扑定界符,'myrack'是机架标识符,'myhost'是个人主持人。假设每个机架有一个/ 24个子网,可以使用'/'格式作为唯一的机架 - 主机拓扑映射。



如果未设置net.topology.script.file.namenet.topology.node.switch.mapping.impl,则会为任何传递的IP地址返回机架ID'/ default-rack'。虽然这种行为似乎是可取的,但它可能会导致HDFS块复制出现问题,因为默认行为是将一个复制块写入机架而无法执行此操作,因为只有一个名为“/ default-rack”的机架。


python Example

# this script makes assumptions about the physical environment.
# 1) each rack is its own layer 3 network with a /24 subnet, which
# could be typical where each rack has its own
# switch with uplinks to a central core router.
# +-----------+
# |core router|
# +-----------+
# / \
# +-----------+ +-----------+
# |rack switch| |rack switch|
# +-----------+ +-----------+
# | data node | | data node |
# +-----------+ +-----------+
# | data node | | data node |
# +-----------+ +-----------+
# 2) topology script gets list of IP's as input, calculates network address, and prints '/network_address/ip'.import netaddr
import sys
sys.argv.pop(0) # discard name of topology script from argv list as we just want IP addressesnetmask = '' # set netmask to what's being used in your environment. The example uses a /24for ip in sys.argv: # loop over list of datanode IP'saddress = '{0}/{1}'.format(ip, netmask) # format address string so it looks like 'ip/netmask' to make netaddr worktry:network_address = netaddr.IPNetwork(address).network # calculate and print network addressprint "/{0}".format(network_address)except:print "/rack-unknown" # print catch-all value if unable to calculate network address

bash Example

#!/usr/bin/env bash
# Here's a bash example to show just how simple these scripts can be
# Assuming we have flat network with everything on a single switch, we can fake a rack topology.
# This could occur in a lab environment where we have limited nodes,like 2-8 physical machines on a unmanaged switch.
# This may also apply to multiple virtual machines running on the same physical hardware.
# The number of machines isn't important, but that we are trying to fake a network topology when there isn't one.
# +----------+ +--------+
# |jobtracker| |datanode|
# +----------+ +--------+
# \ /
# +--------+ +--------+ +--------+
# |datanode|--| switch |--|datanode|
# +--------+ +--------+ +--------+
# / \
# +--------+ +--------+
# |datanode| |namenode|
# +--------+ +--------+
# With this network topology, we are treating each host as a rack. This is being done by taking the last octet
# in the datanode's IP and prepending it with the word '/rack-'. The advantage for doing this is so HDFS
# can create its 'off-rack' block copy.
# 1) 'echo $@' will echo all ARGV values to xargs.
# 2) 'xargs' will enforce that we print a single argv value per line
# 3) 'awk' will split fields on dots and append the last field to the string '/rack-'. If awk
# fails to split on four dots, it will still print '/rack-' last field valueecho $@ | xargs -n 1 | awk -F '.' '{print "/rack-"$NF}'


原文链接: https://hadoop.apache.org/docs/r3.2.0/














PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有