作者:过去丶真的過卜去 | 来源:互联网 | 2023-07-24 17:28
Intellij idea 本地开发调试hadoop的方法
使用软件的版本信息
Intellij idea版本:
2016.3.2
hadoop版本:
2.6.5
数据集:
**Hadoop: The Definitive Guide
**(使用《hadoop权威指南》的1901和1902的两年的天气统计的源码实例)
附上源码
hadooptestMapper
package cn.zcs.zzuli;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Mapper;import java.io.IOException;/*** Created by 张超帅 on 2018/6/23.*/public class hadooptestMapper extends Mapper<LongWritable, Text, Text, IntWritable> {private static final int MISSING = 9999;@Overrideprotected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {String line = value.toString();String year = line.substring(15, 19);int airTemperature;if (line.charAt(87) == &#39;+&#39;) { airTemperature = Integer.parseInt(line.substring(88, 92));} else {airTemperature = Integer.parseInt(line.substring(87, 92));}String quality = line.substring(92, 93);if (airTemperature != MISSING && quality.matches("[01459]")) {context.write(new Text(year), new IntWritable(airTemperature));}}}
hadooptestReducer
package cn.zcs.zzuli;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Reducer;import java.io.IOException;/*** Created by 张超帅 on 2018/6/23.*/public class hadooptestReducer extends Reducer<Text, IntWritable, Text, IntWritable>{@Overrideprotected void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException {int maxValue = Integer.MIN_VALUE;for(IntWritable value : values) {maxValue = Math.max(maxValue, value.get());}context.write(key, new IntWritable(maxValue));}}
hadooptest
package cn.zcs.zzuliimport org.apache.hadoop.conf.Configurationimport org.apache.hadoop.fs.Pathimport org.apache.hadoop.io.IntWritableimport org.apache.hadoop.io.Textimport org.apache.hadoop.mapreduce.Jobimport org.apache.hadoop.mapreduce.lib.input.FileInputFormatimport org.apache.hadoop.mapreduce.lib.output.FileOutputFormatimport java.io.Filepublic class hadooptest {public static void main(String[] args) throws Exception {if (args.length != 2) {System.err.println("Usage: MaxTemperature )System.exit(-1)}Job job = Job.getInstance(new Configuration())job.setJarByClass(hadooptest.class)job.setJobName("Max temperature")File inputdir = new File(args[0])File[] files = inputdir.listFiles()for (File file : files) {FileInputFormat.addInputPath(job, new Path(file.toString()))}FileOutputFormat.setOutputPath(job, new Path(args[1]))job.setMapperClass(hadooptestMapper.class)job.setReducerClass(hadooptestReducer.class)job.setOutputKeyClass(Text.class)job.setOutputValueClass(IntWritable.class)System.exit(job.waitForCompletion(true) ? 0 : 1)}}
建立项目 (File->new->project)
创建包->在包里建立如上三个类
项目结构如下图所示:
hadoop开发配置
- File->Project Structure->Modules
- 点击加号->JARs or dectories
- 在Project Structure 中找到Artifacts点击。
- 点击加号添加JARs
- (1) main class需要输入org.apache.hadoop.util.RunJar
- (2) program arguments,填写参数如上图所示:- 第一个参数之前在project structure中填写的jar文件路径,第二个参数是输入文件的目录,第二个参数是输出文件的路径- (3) 在项目中新建一个输入路径并将输入文件放进去(输出文件不用建立,系统自己建立的)
FAQ
找不到类
- 发现刚才填写参数的地方着了一个参数,需要将main函数所在类的路径添加进去:
eclipse 中运行 Hadoop2.xx map reduce程序 出现错误(null) entry in command string: null chmod 0700
解决办法:
在https://github.com/SweetInk/hadoop-common-2.7.1-bin
中下载winutils.exe,libwinutils.lib 拷贝到%HADOOP_HOME%\bin目录 。
再次执行程序,报错:
Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/Stringat org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method)at org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:609)at org.apache.hadoop.fs.FileUtil.canRead(FileUtil.java:977)at org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:187)at org.apache.hadoop.util.DiskChecker.checkDirAccess(DiskChecker.java:174)at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:108)at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.confChanged(LocalDirAllocator.java:285)at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:344)at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131)at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:115)at org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:125)at org.apache.hadoop.mapred.LocalJobRunner$Job.(LocalJobRunner.java:163)at org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:731)at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:240)at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:422)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)at com.jack.hadoop.temperature.MinTemperature.main(MinTemperature.java:37)
- 在https://github.com/SweetInk/hadoop-common-2.7.1-bin中下载hadoop.dll,并拷贝到c:\windows\system32目录中。
注意
:程序中的路径都要写windows文件系统中的路径了,不再是hdfs中的路径。
输入文件的目录问题(出现权限问题)
Hadoop报错:Failed to locate the winutils binary in the hadoop binary path