专业术语叫:运行一个mapreduce(分布式计算)
Hadoop提供的jar包demo: hadoop-mapreduce-examples-2.4.1.jar提供的demo有:
pi计算圆周率、wordcount统计相同单词数量
${hadoop安装目录}/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar
计算圆周率(自带的jar包)
启动一个job计算任务,将这个job分五成个map运行。
[root@weekend110 mapreduce]# hadoop jar hadoop-mapreduce-examples-2.4.1.jar pi 5 5
06:30:39 INFO input.FileInputFormat: Total input paths to process : 5
06:30:39 INFO mapreduce.JobSubmitter: number of splits:5 ### 将任务分5个map执行
06:30:40 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local944404022_0001 ### job标识id
....................
Job Finished in 14.069 seconds ### 耗时
Estimated value of Pi is 3.68000000000000000000 ### 结果
统计相同单词数量DEMO
1. 创建一个text.txt文件
[root@weekend110 mapreduce]# cat text.txt
world
hello tom
hello job
hello name
wang ming
wang liang
wang world
2. 将文件推入HDFS文件服务器
hadoop -fs -mkdir /workcount ### 在HDFS创建workcount目录
hadoop -fs -mkdir /workcount/input ### 在HDFS创建workcount/input输入目录
hadoop -fs -put test.txt /workcount/input ### 将文件推到HDFS
说明:hdfs服务地址可简写,
简写前:hadoop -fs -mkdir hdfs://127.0.0.1:9000/workcount
简写后:hadoop -fs -mkdir /workcount
3. 运行-统计
hadoop jar hadoop-mapreduce-examples-2.4.1.jar wordcount /wordcount/input /wordcount/output
说明:计算HDFS服务/wordcount/input目录下的所有文件,将结果文件放到HDFS服务的/wordcount/output目录下
4. 结果查看
下载结果文件part-r-00000打开后看到:
hello 3
job 1
liang 1
ming 1
name 1
tom 1
wang 3
world 2
~ 1