1.下载组件
首先去CDH网站上下载hadoop组件
地址:http://archive.cloudera.com/cdh5/cdh/5/
注意版本号要与其他的组件CDH版本一致
2.环境配置
设置主机名和用户名
配置静态IP
配置SSH免密登录
配置JDK
3.配置HADOOP
1.新建用户hadoop,从root用户获取/opt文件夹的权限,所有节点都要执行
useradd -m hadoop -s /bin/bash
passwd hadoop
chown -R hadoop /opt/module/hadoop
chown -R hadoop /usr/sunny
为hadoop用户添加管理权限
visudo
## Next comes the main part: which users can run what software on
## which machines (the sudoers file can be shared between multiple
## systems).
## Syntax:
##
## user MACHINE=COMMANDS
##
## The COMMANDS section may have other options added to it.
##
## Allow root to run any commands anywhere
root ALL=(ALL) ALL
hadoop ALL=(ALL) ALL
2.hadoop的安装路径不推荐安装在/home/hadoop目录下,推荐安装在/opt目录下,然后切换到hadoop用户,解压文件后将hadoop转移到/opt/module下,并修改文件夹名称为hadoop
tar -zxvf hadoop-2.6.0-cdh5.12.0.tar.gz
mv hadoop-2.6.0-cdh5.12.0 /opt/module/hadoop
修改hadoop文件夹的权限
sudo chown -R hadoop:hadoop hadoop
3.配置环境变量
vim ~/.bash_profile
export HADOOP_HOME=/opt/module/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
source ~/.bash_profile
4.修改配置文件
配置文件的位置为hadoop-2.6.0-cdh5.12.0/etc/hadoop目录下,主要文件:
配置名称 | 类型 | 说明 |
hadoop-env.sh | Bash脚本 | Hadoop运行环境变量设置 |
core-site.xml | xml | Hadoop的配置项,例如HDFS和MapReduce常用的I/O设置等 |
hdfs-site.xml | xml | HDFS守护进程的配置项,包括NameNode、SecondaryNameNode、DataNode、JN等 |
yarn-env.sh | Bash脚本 | Yarn运行环境变量设置 |
yarn-site.xml | xml | YARN守护进程的配置项,包括ResourceManager和NodeManager等 |
mapred-site.xml | xml | MapReduce计算框架的配置项 |
capacity-scheduler.xml | xml | Yarn调度属性设置 |
container-executor.cfg | Cfg | Yarn Container配置 |
mapred-queues.xml | xml | MR队列设置 |
hadoop-metrics.properties | Java属性 | 控制metrics在Hadoop上如何发布的属性 |
hadoop-metrics2.properties | Java属性 | 控制metrics在Hadoop上如何发布的属性 |
slaves | Plain Text | 运行DataNode和NodeManager的机器列表,每行一个 |
exclude | Plain Text | 移除DN节点配置文件 |
log4j.properties | 系统日志文件、NameNode审计日志DataNode子进程的任务日志的属性 | |
configuration.xsl |
(1)修改hadoop-env.sh文件,在文件末尾增加环境变量
#--------------------Java Env------------------------------
export JAVA_HOME=/opt/module/jdk1.8.0_144
#--------------------Hadoop Env----------------------------
export HADOOP_HOME=/opt/module/hadoop-2.6.0-cdh5.12.0
#--------------------Hadoop Daemon Options-----------------
# export HADOOP_NAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS"
# export HADOOP_DATANODE_OPTS="-Dhadoop.security.logger=ERROR,RFAS $HADOOP_DATANODE_OPTS"#--------------------Hadoop Logs---------------------------
#export HADOOP_LOG_DIR=${HADOOP_LOG_DIR}/$USER
#--------------------SSH PORT-------------------------------
#export HADOOP_SSH_OPTS="-p 6000" #如果你修改了SSH登录端口,一定要修改此配置。
(2)修改core-site.xml
<configuration><property><name>fs.defaultFSname><value>hdfs://node1.sunny.cn:9000value>property><property><name>hadoop.tmp.dirname><value>file:/usr/sunny/hadoop/tmpvalue><description>Abase for other temporary directories.description>property>
configuration>
这一步是设置提供HDFS服务的主机名和端口号&#xff0c;也就是说HDFS通过master的9000端口提供服务&#xff0c;这项配置也指明了NameNode所运行的节点&#xff0c;即主节点
&#xff08;3&#xff09;修改hdfs-site.xml
<configuration><property><name>dfs.namenode.secondary.http-addressname><value>node1.sunny.cn:50090value>property><property><name>dfs.replicationname><value>2value>property><property><name>dfs.namenode.name.dirname><value>/usr/sunny/hadoop/hdfs/namevalue>property><property><name>dfs.datanode.data.dirname><value>/usr/sunny/hadoop/hdfs/datavalue>property>
configuration>
以下为网络方案&#xff0c;仅供参考
xml version&#61;"1.0" encoding&#61;"UTF-8"?>
xml-stylesheet type&#61;"text/xsl" href&#61;"configuration.xsl"?>
<configuration>
<property>
<name>dfs.webhdfs.enabledname>
<value>truevalue>
property>
<property>
<name>dfs.namenode.name.dirname>
<value>/usr/sunny/hadoop/hdfs/namevalue>
property>
<property>
<name>dfs.namenode.edits.dirname>
<value>${dfs.namenode.name.dir}value>
property>
<property>
<name>dfs.datanode.data.dirname>
<value>/home/hadoopuser/hadoop-2.6.0-cdh5.6.0/data/dfs/datavalue>
property>
<property>
<name>dfs.replicationname>
<value>1value>
property>
<property>
<name>dfs.blocksizename>
<value>268435456value>
property>
<property>
<name>dfs.nameservicesname>
<value>hadoop-clustervalue>
property>
<property>
<name>dfs.ha.namenodes.hadoop-clustername>
<value>namenode1,namenode2value>
property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1name>
<value>namenode1:8020value>
property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2name>
<value>namenode2:8020value>
property>
<property>
<name>dfs.namenode.http-address.hadoop-cluster.namenode1name>
<value>namenode1:50070value>
property>
<property>
<name>dfs.namenode.http-address.hadoop-cluster.namenode2name>
<value>namenode2:50070value>
property>
<property>
<name>dfs.journalnode.http-addressname>
<value>0.0.0.0:8480value>
property>
<property>
<name>dfs.journalnode.rpc-addressname>
<value>0.0.0.0:8485value>
property>
<property>
<name>dfs.namenode.shared.edits.dirname>
<value>qjournal://namenode1:8485;namenode2:8485;namenode3:8485/hadoop-clustervalue>
property>
<property>
<name>dfs.journalnode.edits.dirname>
<value>/usr/sunny/hadoop/hdfs/journalvalue>
property>
<property>
<name>dfs.client.failover.proxy.provider.myclustername>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvidervalue>
property>
<property>
<name>dfs.ha.fencing.methodsname>
<value>sshfencevalue>
property>
<property>
<name>dfs.ha.fencing.ssh.private-key-filesname>
<value>/home/hadoopuser/.ssh/id_rsavalue>
property>
<property>
<name>dfs.ha.fencing.ssh.connect-timeoutname>
<value>30000value>
property>
<property>
<name>dfs.ha.automatic-failover.enabledname>
<value>truevalue>
property>
<property>
<name>ha.zookeeper.quorumname>
<value>Hadoop-DN-01:2181,Hadoop-DN-02:2181,Hadoop-DN-03:2181value>
property>
<property>
<name>ha.zookeeper.session-timeout.msname>
<value>2000value>
property>
configuration>
dfs.replication配置hdfs中文件的副本数为3&#xff0c;HDFS会自动对文件做冗余处理&#xff0c;这项配置就是配置文件的冗余数&#xff0c;3表示有2份冗余。
dfs.name.dir设置NameNode的元数据存放的本地文件系统路径
dfs.data.dir设置DataNode存放数据的本地文件系统路径
&#xff08;4&#xff09;修改mapred-site.xml
目录中只有一个mapred-site.xml.template文件&#xff0c;cp一份出来
<configuration><property><name>mapreduce.framework.namename><value>yarnvalue>property><property><name>mapreduce.jobhistory.addressname><value>node1.sunny.cn:10020value>property><property><name>mapreduce.jobhistory.webapp.addressname><value>node1.sunny.cn:19888value>property>
configuration>
以下为网络方案&#xff0c;仅供参考
<configuration><property><name>mapred.child.java.optsname><value>-Xmx1000mvalue><final>truefinal><description>final&#61;true表示禁止用户修改JVM大小description>property><property><name>mapreduce.framework.namename><value>yarnvalue>property><property><name>mapreduce.jobhistory.addressname><value>0.0.0.0:10020value>property><property><name>mapreduce.jobhistory.webapp.addressname><value>0.0.0.0:19888value>property>
configuration>
&#xff08;5&#xff09;修改yarn-site.xml
<configuration><property><name>yarn.resourcemanager.hostnamename><value>node1.sunny.cnvalue>property><property><name>yarn.nodemanager.aux-servicesname><value>mapreduce_shufflevalue>property>
configuration>
以下为网络方案&#xff0c;仅供参考
<configuration><property><name>yarn.nodemanager.aux-servicesname><value>mapreduce_shufflevalue>property><property><name>yarn.nodemanager.aux-services.mapreduce.shuffle.classname><value>org.apache.hadoop.mapred.ShuffleHandlervalue>property><property><description>Address where the localizer IPC is.description><name>yarn.nodemanager.localizer.addressname><value>0.0.0.0:23344value>property><property><description>NM Webapp address.description><name>yarn.nodemanager.webapp.addressname><value>0.0.0.0:23999value>property><property><name>yarn.resourcemanager.connect.retry-interval.msname><value>2000value>property><property><name>yarn.resourcemanager.ha.enabledname><value>truevalue>property><property><name>yarn.resourcemanager.ha.automatic-failover.enabledname><value>truevalue>property><property><name>yarn.resourcemanager.ha.automatic-failover.embeddedname><value>truevalue>property><property><name>yarn.resourcemanager.cluster-idname><value>yarn-clustervalue>property><property><name>yarn.resourcemanager.ha.rm-idsname><value>rm1,rm2value>property><property><name>yarn.resourcemanager.scheduler.classname><value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSchedulervalue>property><property><name>yarn.resourcemanager.recovery.enabledname><value>truevalue>property><property><name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-msname><value>5000value>property><property><name>yarn.resourcemanager.store.classname><value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStorevalue>property><property><name>yarn.resourcemanager.zk-addressname><value>Hadoop-DN-01:2181,Hadoop-DN-02:2181,Hadoop-DN-03:2181value>property><property><name>yarn.resourcemanager.zk.state-store.addressname><value>Hadoop-DN-01:2181,Hadoop-DN-02:2181,Hadoop-DN-03:2181value>property><property><name>yarn.resourcemanager.address.rm1name><value>Hadoop-NN-01:23140value>property><property><name>yarn.resourcemanager.address.rm2name><value>Hadoop-NN-02:23140value>property><property><name>yarn.resourcemanager.scheduler.address.rm1name><value>Hadoop-NN-01:23130value>property><property><name>yarn.resourcemanager.scheduler.address.rm2name><value>Hadoop-NN-02:23130value>property><property><name>yarn.resourcemanager.admin.address.rm1name><value>Hadoop-NN-01:23141value>property><property><name>yarn.resourcemanager.admin.address.rm2name><value>Hadoop-NN-02:23141value>property><property><name>yarn.resourcemanager.resource-tracker.address.rm1name><value>Hadoop-NN-01:23125value>property><property><name>yarn.resourcemanager.resource-tracker.address.rm2name><value>Hadoop-NN-02:23125value>property><property><name>yarn.resourcemanager.webapp.address.rm1name><value>Hadoop-NN-01:23188value>property><property><name>yarn.resourcemanager.webapp.address.rm2name><value>Hadoop-NN-02:23188value>property><property><name>yarn.resourcemanager.webapp.https.address.rm1name><value>Hadoop-NN-01:23189value>property><property><name>yarn.resourcemanager.webapp.https.address.rm2name><value>Hadoop-NN-02:23189value>property>
configuration>
&#xff08;6&#xff09;修改slaves文件
配置的都是datanode
node2.sunny.cn
node3.sunny.cn
5.初始化namenode
hdfs namenode -format
6.启动Hadoop
start-dfs.sh
start-yarn.sh &#xff08;可以start-all.sh&#xff09;
mr-jobhistory-daemon.sh start historyserver
启动后的进程
如果datanode启动不成功&#xff0c;需要把所有hadoop下tmp文件夹删掉再重新格式化namenode
访问地址&#xff1a;http://192.168.2.11:50070
7.执行分布式实例
&#xff08;1&#xff09; 在HDFS 上创建用户目录
hdfs dfs -mkdir -p /user/hadoop
&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;前方高能&#xff0c;建议先行扫描&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;
在此过程中可能会报出警告&#xff1a;
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
意思是无法加载本地native库&#xff0c;位于hadoop/lib/native目录&#xff0c;这时候就需要去下载源码编译hadoop-2.6.0-cdh5.12.0-src.tar.gz
下载后上传到服务器并解压&#xff0c;进入到解压后的源码目录下执行命令
mvn package -Dmaven.javadoc.skip&#61;true -Pdist,native -DskipTests -Dtar
打包过程中出现错误
Detected JDK Version: 1.8.0-144 is not in the allowed range [1.7.0,1.7.1000}
表示源码编译需要1.7的JDK&#xff0c;而实际是1.8&#xff0c;所以把jdk降级下
source ~/.bash_profile后jdk依然是1.8&#xff0c;关掉所有进程后依然没变&#xff0c;所以reboot&#xff0c;然后解决
继续编译源码&#xff0c;大概20、30、40、50、60、70多分钟&#xff0c;注意修改mvn的镜像地址&#xff0c;使用国内地址的会快一些
...
好吧&#xff0c;编译到common时继续报错
[ERROR] Failed to execute goal org.apache.hadoop:hadoop-maven-plugins:2.6.0-cdh5.12.0:protoc (compile-protoc) on project hadoop-common: org.apache.maven.plugin.MojoExecutionException: &#39;protoc --version&#39; did not return a version -> [Help 1]
查阅资料得知&#xff1a;
protobuf是google提供的一个可以编码格式化结构数据方法&#xff0c;Google大部分的RPC端通信协议都是基于protocol buffers的。同时现Hadoop中master和slave中的RPC通信协议也都是基于它实现的。所以下载吧。
需要安装protoc&#xff0c;版本protobuf-2.5.0&#xff0c;但目前google官方链接下载不了。好人的下载链接&#xff1a;http://pan.baidu.com/s/1pJlZubT
上传到服务器之后解压&#xff0c;然后准备编译
编译protoc之前还要先安装gcc, gcc-c&#43;&#43;, make&#xff0c;否则又是一堆错误
yum install gcc
yum intall gcc-c&#43;&#43;
yum install make tar -xvf protobuf-2.5.0.tar.bz2
cd protobuf-2.5.0
./configure --prefix&#61;/opt/module/protoc/
make && make install
编译好之后记得配置下环境变量
PROTOC_HOME&#61;"/opt/module/protoc"
export PATH&#61;$PATH:$PROTOC_HOME/bin
然后source下
然后可以继续编译hadoop源码了
OK
终于编译成功&#xff0c;大约持续了N久
编译好的包在hadoop-src/hadoop-dist/target/hadoop-2.4.1.tar.gz
把tar包复制到本地hadoop/lib/native目录下并将tar包的native里的文件复制到本地hadoop/lib/native下
使用命令查看native库的版本
ldd libhadoop.so.1.0.0
然后将这些文件分发到其他的节点上
scp * root&#64;node2.sunny.cn:/opt/module/hadoop/lib/native/
scp * root&#64;node3.sunny.cn:/opt/module/hadoop/lib/native/
然后需要修改下~/.bash_profile文件
export HADOOP_OPTS&#61;"-Djava.library.path&#61;$HADOOP_HOME/lib:$HADOOP_COMMON_LIB_NATIVE_DIR"
然后source下
验证OK
&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;高能区域结束&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;
继续执行
&#xff08;2&#xff09;
将 /opt/moudle/hadoop/etc/hadoop 中的配置文件作为输入文件复制到分布式文件系统中
hdfs dfs -mkdir input
hdfs dfs -put /usr/local/hadoop/etc/hadoop/*.xml input
hadoop jar /opt/moudle/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar grep input output &#39;dfs[a-z.]&#43;&#39;
查看进度&#xff1a;http://192.168.2.11:8088/cluster
查看输出结果
hdfs dfs -cat output/*
若要再次运行需要把output删除掉
hdfs dfs -rm -r output
附注&#xff1a;
关闭hadoop集群
在node1上执行&#xff1a;
stop-yarn.shstop-dfs.sh (stop-all.sh)mr-jobhistory-daemon.sh stop historyserver
具体操作:
hadoop上基础操作
hadoop fs -ls *** (hdfs dfs –ls ***) 查看列表hadoop fs -mkdir *** (hdfs dfs –mkdir ***) 创建文件夹hadoop fs -rm -r *** (hdfs dfs –rm -r ***) 删除文件夹hadoop fs -put *** (hdfs dfs –put ***) 上传文件到hdfshadoop fs -get *** (hdfs dfs –get ***) 下载文件hadoop fs –cp *** (hdfs dfs –cp ***) 拷贝文件hadoop fs -cat *** (hdfs dfs –cat ***) 查看文件hadoop fs -touchz *** (hdfs dfs –touchz ***) 创建空文件
列出所有Hadoop Shell支持的命令:
hadoop fs –help
Java结合例子:
java程序操作
首先导入hadoop相关jar包
新建一个User library
点击Window-->Preferences 搜索框中输入User Libraries