基于hadoop2.7.3搭建多机环境（YARN+HA）

作者：书友59082326 | 来源：互联网 | 2023-08-04 08:24

第一：环境说明parallelsdesktopCentOS-6.5-x86_64-bin-DVD1.isojdk-7u79-linux-x64.tar.gzHado

第一：环境说明

parallels desktop
CentOS-6.5-x86_64-bin-DVD1.iso
jdk-7u79-linux-x64.tar.gz
Hadoop-2.7.3.tar.gz
搭建四个节点的集群。他们的hostname分布为hadoopA，hadoopB，hadoopC，hadoopD。其中hadoopA的角色为Activity namnode。hadoopB的角色为standby namenode，datanode，journalnode。hadoopC的角色为datanode，journalnode。hadoopD的角色为datanode，journalnode。

第二：操作系统配置

赋予hadoop用户sudo权限

[root@hadoopa hadoop]# visudo

## Allow root to run any commands anywhere
root    ALL=(ALL)       ALL
hadoop  ALL=(ALL)       ALL

修改hostname

[hadoop@hadoopa hadoop-2.7.3]$ cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.1.201 hadoopA
192.168.1.202 hadoopB
192.168.1.203 hadoopC
192.168.1.204 hadoopD

第三：安装和配置jdk

分别在hadoopA，hadoopB，hadoopC，hadoopD四个节点安装jdk。

[hadoop@hadoopb ~]$ tar -zxvf jdk-7u79-linux-x64.tar.gz

修改jdk的名称

[hadoop@hadoopb ~]$ mv jdk1.7.0_79/  jdk1.7

第四：安装和配置hadoop

在hadoopA,hadoopB,hadoopC,hadoopD四个节点上解压hadoop

[hadoop@hadoopb ~]$ tar -zxvf hadoop-2.7.3.tar.gz

在hadoopA上配置hadoop-env.sh

# The java implementation to use.
export JAVA_HOME=/home/hadoop/jdk1.7

在hadoopA上配置core-site.xml

<configuration>
        <property>
                <name>fs.defaultFSname>
                <value>hdfs://hadoopA:8020value>
        property>
configuration>

在hadoopA配置hdfs-site.xml




  dfs.nameservices
  <value>hadoop-testvalue>
  
    Comma-separated list of nameservices.
  



  dfs.ha.namenodes.hadoop-test
  <value>nn1,nn2value>
  
    The prefix for a given nameservice, contains a comma-separated
    list of namenodes for a given nameservice (eg EXAMPLENAMESERVICE).
  



  dfs.namenode.rpc-address.hadoop-test.nn1
  <value>hadoopA:8020value>
  
    RPC address for nomenode1 of hadoop-test
  



  dfs.namenode.rpc-address.hadoop-test.nn2
  <value>hadoopB:8020value>
  
    RPC address for nomenode2 of hadoop-test
  



  dfs.namenode.http-address.hadoop-test.nn1
  <value>hadoopA:50070value>
  
    The address and the base port where the dfs namenode1 web ui will listen on.
  



  dfs.namenode.http-address.hadoop-test.nn2
  <value>hadoopB:50070value>
  
    The address and the base port where the dfs namenode2 web ui will listen on.
  



  dfs.namenode.name.dir
  <value>file:///home/hadoop/hdfs/name
  Determines where on the local filesystem the DFS name node
      should store the name table(fsimage).  If this is a comma-delimited list
      of directories then the name table is replicated in all of the
      directories, for redundancy. 



  dfs.namenode.shared.edits.dir
  <value>qjournal://hadoopB:8485;hadoopC:8485;hadoopD:8485/hadoop-testvalue>
  A directory on shared storage between the multiple namenodes
  in an HA cluster. This directory will be written by the active and read
  by the standby in order to keep the namespaces synchronized. This directory
  does not need to be listed in dfs.namenode.edits.dir above. It should be
  left empty in a non-HA cluster.
  



  dfs.datanode.data.dir
  <value>file:///home/hadoop/hdfs/data
  Determines where on the local filesystem an DFS data node
  should store its blocks.  If this is a comma-delimited
  list of directories, then data will be stored in all named
  directories, typically on different devices.
  Directories that do not exist are ignored.
  



  dfs.ha.automatic-failover.enabled
  <value>falsevalue>
  
    Whether automatic failover is enabled. See the HDFS High
    Availability documentation for details on automatic HA
    configuration.
  



  dfs.journalnode.edits.dir
  <value>/home/hadoop/hdfs/journal/value>

在hadoopA配置mapred-site.xml

<configuration>

<property>
<name>mapreduce.framework.namename>
<value>yarnvalue>
property>
<property>
<name>mapreduce.jobhistory.addressname>
<value>hadoopB:10020value>
property>
<property>
<name>mapreduce.jobhistory.webapp.addressname>
<value>hadoopB:19888value>
property>
configuration>

在hadoopA配置yarn-site.xml

<configuration>

  
  <property>
    <description>The hostname of the RM.description>
    <name>yarn.resourcemanager.hostnamename>
    <value>hadoopAvalue>
  property>

  <property>
    <description>The address of the applications manager interface in the RM.description>
    <name>yarn.resourcemanager.addressname>
    <value>${yarn.resourcemanager.hostname}:8032value>
  property>

  <property>
    <description>The address of the scheduler interface.description>
    <name>yarn.resourcemanager.scheduler.addressname>
    <value>${yarn.resourcemanager.hostname}:8030value>
  property>

  <property>
    <description>The http address of the RM web application.description>
    <name>yarn.resourcemanager.webapp.addressname>
    <value>${yarn.resourcemanager.hostname}:8088value>
  property>

  <property>
    <description>The https adddress of the RM web application.description>
    <name>yarn.resourcemanager.webapp.https.addressname>
    <value>${yarn.resourcemanager.hostname}:8090value>
  property>

  <property>
    <name>yarn.resourcemanager.resource-tracker.addressname>
    <value>${yarn.resourcemanager.hostname}:8031value>
  property>

  <property>
    <description>The address of the RM admin interface.description>
    <name>yarn.resourcemanager.admin.addressname>
    <value>${yarn.resourcemanager.hostname}:8033value>
  property>

  <property>
    <description>The class to use as the resource scheduler.description>
    <name>yarn.resourcemanager.scheduler.classname>
    <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSchedulervalue>
  property>

  <property>
    <description>fair-scheduler conf locationdescription>
    <name>yarn.scheduler.fair.allocation.filename>
    <value>/home/hadoop/hadoop-2.7.3/etc/hadoop/fairscheduler.xmlvalue>
  property>

  <property>
    <description>List of directories to store localized files in. An
      application's localized file directory will be found in:
      ${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}.
      Individual containers' work directories, called container_${contid}, will
      be subdirectories of this.
   description>
    <name>yarn.nodemanager.local-dirsname>
    <value>/home/hadoop/yarn/localvalue>
  property>

  <property>
    <description>Whether to enable log aggregationdescription>
    <name>yarn.log-aggregation-enablename>
    <value>truevalue>
  property>

  <property>
    <description>Where to aggregate logs to.description>
    <name>yarn.nodemanager.remote-app-log-dirname>
    <value>/tmp/logsvalue>
  property>

  <property>
    <description>Amount of physical memory, in MB, that can be allocated
    for containers.description>
    <name>yarn.nodemanager.resource.memory-mbname>
    <value>8720value>
  property>

  <property>
    <description>Number of CPU cores that can be allocated
    for containers.description>
    <name>yarn.nodemanager.resource.cpu-vcoresname>
    <value>2value>
  property>

  <property>
    <description>the valid service name should only contain a-zA-Z0-9_ and can not start with numbersdescription>
    <name>yarn.nodemanager.aux-servicesname>
    <value>mapreduce_shufflevalue>
  property>

configuration>

在hadoopA配置fairscheduler.xml

<allocations>

  <queue name="infrastructure">
    <minResources>102400 mb, 50 vcores minResources>
    <maxResources>153600 mb, 100 vcores maxResources>
    <maxRunningApps>200maxRunningApps>
    <minSharePreemptionTimeout>300minSharePreemptionTimeout>
    <weight>1.0weight>
    <aclSubmitApps>root,yarn,search,hdfsaclSubmitApps>
  queue>

   <queue name="tool">
      <minResources>102400 mb, 30 vcoresminResources>
      <maxResources>153600 mb, 50 vcoresmaxResources>
   queue>

   <queue name="sentiment">
      <minResources>102400 mb, 30 vcoresminResources>
      <maxResources>153600 mb, 50 vcoresmaxResources>
   queue>

allocations>

在hadoopA配置slaves文件


[root@hadoopa hadoop]# cat slaves
hadoopB
hadoopC
hadoopD

将hadoopA上hadoop的安装目录复制到其它


[hadoop@hadoopa hadoop-2.7.3]$ scp etc/hadoop/* hadoopB://home/hadoop/hadoop-2.7.3/etc/hadoop/


[hadoop@hadoopa hadoop-2.7.3]$ scp etc/hadoop/* hadoopC://home/hadoop/hadoop-2.7.3/etc/hadoop/


[hadoop@hadoopa hadoop-2.7.3]$ scp etc/hadoop/* hadoopD://home/hadoop/hadoop-2.7.3/etc/hadoop/

第五：启动hadoop

在各个JournalNode节点上，输入以下命令启动journalnode服务

[hadoop@hadoopb hadoop-2.7.3]$ sbin/hadoop-daemon.sh start journalnode
[hadoop@hadoopc hadoop-2.7.3]$ sbin/hadoop-daemon.sh start journalnode
[hadoop@hadoopd hadoop-2.7.3]$ sbin/hadoop-daemon.sh start journalnode

在[nn1]上，对其进行格式化，并启动：

[root@hadoopa hadoop-2.7.3]# bin/hdfs namenode -format
[root@hadoopa hadoop-2.7.3]# sbin/hadoop-daemon.sh start namenode

在[nn2]上，同步nn1的元数据信息

[hadoop@hadoopb hadoop-2.7.3]$ bin/hdfs namenode -bootstrapStandby

在[nn2]上，启动NameNode：

[hadoop@hadoopb hadoop-2.7.3]$ sbin/hadoop-daemon.sh start namenode
(经过以上四步操作，nn1和nn2均处理standby状态)

在[nn1]上，将NameNode切换为Active


[root@hadoopa hadoop-2.7.3]# bin/hdfs haadmin -transitionToActive nn1

在[nn1]上，启动所有datanode


[root@hadoopa hadoop-2.7.3]# sbin/hadoop-daemons.sh start datanode

启动yarn：在[nn1]上，输入以下命令

[root@hadoopa hadoop-2.7.3]# sbin/start-yarn.sh

关闭Hadoop集群：在[nn1]上，输入以下命令

[root@hadoopa hadoop-2.7.3]# sbin/stop-dfs.sh
[root@hadoopa hadoop-2.7.3]# sbin/stop-yarn.sh

第六：验证hadoop

hadoopA输入命令


[root@hadoopa jdk1.7]# /home/hadoop/jdk1.7/bin/jps
10747 -- process information unavailable
15583 Jps
16576 -- process information unavailable

hadoopB输入命令

[hadoop@hadoopb hadoop-2.7.3]$ /home/hadoop/jdk1.7/bin/jps
15709 NodeManager
2405 JournalNode
11551 NameNode
12862 DataNode
15398 Jps

hadoopC输入命令

[hadoop@hadoopc ~]$ /home/hadoop/jdk1.7/bin/jps
2388 JournalNode
13091 Jps
13553 DataNode
15214 NodeManager

hadoopD输入命令

[hadoop@hadoopd hadoop-2.7.3]$ /home/hadoop/jdk1.7/bin/jps
13506 DataNode
12675 Jps
15334 NodeManager
2570 JournalNode

打开浏览器输入以下地址：

http://192.168.1.201:50070/dfshealth.html#tab-overview
http://192.168.1.202:50070/dfshealth.html#tab-overview
http://192.168.1.201:8088/cluster/scheduler

第七：关闭hadoop

关闭Hadoop集群：在[nn1]上，输入以下命令

[root@hadoopa hadoop-2.7.3]# sbin/stop-dfs.sh
[root@hadoopa hadoop-2.7.3]# sbin/stop-yarn.sh

第八：特别说明

说明：
步骤2：在[nn1]上，对其进行格式化，并启动：
bin/hdfs namenode -fromal
步骤3：在[nn2]上，同步nn1的元数据信息
bin/hdfs namenode -bootstrapStandby

这两步操作，只是在第一次建立集群的时候才使用
下次重启节点，是不需要操作这两步

基于hadoop2.7.3搭建多机环境（YARN+HA）

第一：环境说明

第二：操作系统配置

第三：安装和配置jdk

第四：安装和配置hadoop

第五：启动hadoop

第六：验证hadoop

第七：关闭hadoop

第八：特别说明

Hadoop2.6.0 + 云centos +伪分布式只谈部署

HDFS2.x新特性

centos安装Mysql的方法及步骤详解

Centos7搭建ELK（Elasticsearch、Logstash、Kibana）教程及注意事项

x265探索与研究（一）：x265下载安装与配置

windows部署hadoop2.7.0

Nginx使用AWStats日志分析的步骤及注意事项

安装mysqlclient失败解决办法

Linux服务器密码过期策略、登录次数限制、私钥登录等配置方法

VScode格式化文档换行或不换行的设置方法

rhel5.5搭建网关+LAMP+postfix+dhcp的步骤和配置方法

Centos7.6安装Gitlab教程及注意事项

CentOS 7部署KVM虚拟化环境之一架构介绍

CentOS 6.5安装VMware Tools及共享文件夹显示问题解决方法

CentOS安装Python2.7.2的步骤和注意事项