作者:郭洁蓉4071_878 | 来源:互联网 | 2023-09-07 10:26
先参考:《Hadoop-2.3.0-cdh5.1.0伪分布安装(基于CentOS)》 2014-09/106372.htm
注:本例使用root用户搭建
一、环境
操作系统:CentOS 6.5 64位操作系统
注:Hadoop2.0以上采用的是jdk环境是1.7,Linux自带的jdk卸载掉,重新安装
下载地址:http://www.oracle.com/technetwork/java/javase/downloads/index.html
软件版本:hadoop-2.3.0-cdh5.1.0.tar.gz, zookeeper-3.4.5-cdh5.1.0.tar.gz
下载地址:http://archive.cloudera.com/cdh5/cdh/5/
c1:192.168.58.11
c2:192.168.58.12
c3:192.168.58.13
二、安装JDK(略)见上面的参考文章
三、配置环境变量 (配置jdk和hadoop的环境变量)
四、系统配置
1关闭防火墙
chkconfig iptables off(永久性关闭)
配置主机名和hosts文件
2、SSH无密码验证配置
因为Hadoop运行过程需要远程管理Hadoop的守护进程,NameNode节点需要通过SSH(Secure Shell)链接各个DataNode节点,停止或启动他们的进程,所以SSH必须是没有密码的,所以我们要把NameNode节点和DataNode节点配制成无秘密通信,同理DataNode也需要配置无密码链接NameNode节点。
在每一台机器上配置:
vi /etc/ssh/sshd_config打开
RSAAuthentication yes # 启用 RSA 认证,PubkeyAuthentication yes # 启用公钥私钥配对认证方式
Master01:运行:ssh-keygen –t rsa –P '' 不输入密码直接enter
默认存放在 /root/.ssh目录下,
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[root@master01 .ssh]# ls
authorized_keys id_rsa id_rsa.pub known_hosts
远程copy:
scp authorized_keys c2:~/.ssh/
scp authorized_keys c3:~/.ssh/
五、配置几个文件(各个节点一样)
5.1. hadoop/etc/hadoop/hadoop-env.sh 添加:
# set to the root ofyour Java installation
export JAVA_HOME=/usr/java/latest
# Assuming your installation directory is/usr/local/hadoop
export HADOOP_PREFIX=/usr/local/hadoop
5.2. etc/hadoop/core-site.xml
fs.defaultFS
hdfs://c1:9000
hadoop.tmp.dir
/usr/local/cdh/hadoop/data/tmp
5.3. etc/hadoop/hdfs-site.xml
dfs.webhdfs.enabled
true
dfs.replication
2
dfs.namenode.name.dir
/usr/local/cdh/hadoop/data/dfs/name
namenode 存放name table(fsimage)本地目录(需要修改)
dfs.namenode.edits.dir
${dfs.namenode.name.dir}
namenode粗放 transactionfile(edits)本地目录(需要修改)
dfs.datanode.data.dir
/usr/local/cdh/hadoop/data/dfs/data
datanode存放block本地目录(需要修改)
dfs.permissions
false
dfs.permissions.enabled
false
5.4 etc/hadoop/mapred-site.xml
mapreduce.framework.name
yarn
5.5 etc/hadoop/yarn-env.sh
# some Java parameters
export JAVA_HOME=/usr/local/java/jdk1.7.0_67
5.6 etc/hadoop/yarn-site.xml
yarn.resourcemanager.address
c1:8032
yarn.resourcemanager.scheduler.address
c1:8030
yarn.resourcemanager.resource-tracker.address
c1:8031
yarn.resourcemanager.admin.address
c1:8033
yarn.resourcemanager.webapp.address
c1:8088
yarn.nodemanager.aux-services
mapreduce_shuffle
yarn.nodemanager.aux-services.mapreduce.shuffle.class
org.apache.hadoop.mapred.ShuffleHandler
5.7. etc/hadoop/slaves
c2
c3
六:启动及验证安装是否成功
格式化:要先格式化HDFS:
bin/hdfs namenode -format
启动:
sbin/start-dfs.sh
sbin/start-yarn.sh
[root@c1 hadoop]# jps
3250 Jps
2491 ResourceManager
2343 SecondaryNameNode
2170 NameNode
datanode节点:
[root@c2 ~]# jps
4196 Jps
2061 DataNode
2153 NodeManager
--------------------------------------------------------------------------------
Ubuntu 13.04上搭建Hadoop环境 2013-06/86106.htm
Ubuntu 12.10 +Hadoop 1.2.1版本集群配置 2013-09/90600.htm
Ubuntu上搭建Hadoop环境(单机模式+伪分布模式) 2013-01/77681.htm
Ubuntu下Hadoop环境的配置 2012-11/74539.htm
单机版搭建Hadoop环境图文教程详解 2012-02/53927.htm
搭建Hadoop环境(在Winodws环境下用虚拟机虚拟两个Ubuntu系统进行搭建) 2011-12/48894.htm
Hadoop2.4.1尝鲜部署+完整版配置文件 2014-09/106291.htm
--------------------------------------------------------------------------------
1. 打开浏览器
NameNode - http://localhost:50070/
2. 创建文件夹
3. $bin/hdfs dfs -mkdir /user
$ bin/hdfs dfs -mkdir /user/
4. Copy 文件
$ bin/hdfs dfs -put etc/hadoop input
5. 运行作业
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0-cdh5.1.0.jar grep input output 'dfs[a-z.]+'
6. 查看输出
$ bin/hdfs dfs -get output output
$ cat output/*
##
Hadoop 专题页面 =13
:2014-09/106373.htm