当前位置: 开发笔记 > 编程语言 > 正文

FileSplit简单使用

作者：mobiledu2502853623 | 来源：互联网 | 2023-06-28 11:35

hadoop的FileSplit简单使用FileSplit类继承关系：FileSplit类中的属性和方法：作业输入：[java]viewp

hadoop的FileSplit简单使用

FileSplit类继承关系：

FileSplit类中的属性和方法：

作业输入：

[java] view plaincopy

hadoop@hadoop:/home/hadoop/blb$ hdfs dfs -text /user/hadoop/libin/input/inputpath1.txt
hadoop a
spark a
hive a
hbase a
tachyon a
storm a
redis a
hadoop@hadoop:/home/hadoop/blb$ hdfs dfs -text /user/hadoop/libin/input/inputpath2.txt
hadoop b
spark b
kafka b
tachyon b
oozie b
flume b
sqoop b
solr b
hadoop@hadoop:/home/hadoop/blb$

[java] view plaincopy

hadoop@hadoop:/home/hadoop/blb$ hdfs dfs -text /user/hadoop/libin/input/inputpath1.txt
hadoop a
spark a
hive a
hbase a
tachyon a
storm a
redis a
hadoop@hadoop:/home/hadoop/blb$ hdfs dfs -text /user/hadoop/libin/input/inputpath2.txt
hadoop b
spark b
kafka b
tachyon b
oozie b
flume b
sqoop b
solr b
hadoop@hadoop:/home/hadoop/blb$

代码：

[java] view plaincopy

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.SplitLocationInfo;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.FileSplit;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class GetSplitMapReduce {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if(otherArgs.length!=2){
System.err.println("Usage databaseV1 ");
}
Job job = Job.getInstance(conf, GetSplitMapReduce.class.getSimpleName() + "1");
job.setJarByClass(GetSplitMapReduce.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(NullWritable.class);
job.setMapperClass(MyMapper1.class);
job.setNumReduceTasks(0);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
job.waitForCompletion(true);
}
public static class MyMapper1 extends Mapper{
@Override
protected void map(LongWritable key, Text value, Mapper.Context context)
throws IOException, InterruptedException {
FileSplit fileSplit=(FileSplit) context.getInputSplit();
String pathname=fileSplit.getPath().getName(); //获取目录名字
int depth = fileSplit.getPath().depth(); //获取目录深度
Classextends FileSplit> class1 = fileSplit.getClass(); //获取当前类
long length = fileSplit.getLength(); //获取文件长度
SplitLocationInfo[] locationInfo = fileSplit.getLocationInfo(); //获取位置信息
String[] locations = fileSplit.getLocations(); //获取位置
long start = fileSplit.getStart(); //The position of the first byte in the file to process.
String string = fileSplit.toString();
//fileSplit.
context.write(new Text("===================================================================================="), NullWritable.get());
context.write(new Text("pathname--"+pathname), NullWritable.get());
context.write(new Text("depth--"+depth), NullWritable.get());
context.write(new Text("class1--"+class1), NullWritable.get());
context.write(new Text("length--"+length), NullWritable.get());
context.write(new Text("locationInfo--"+locationInfo), NullWritable.get());
context.write(new Text("locations--"+locations), NullWritable.get());
context.write(new Text("start--"+start), NullWritable.get());
context.write(new Text("string--"+string), NullWritable.get());
}
}
}

[java] view plaincopy

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.SplitLocationInfo;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.FileSplit;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class GetSplitMapReduce {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if(otherArgs.length!=2){
System.err.println("Usage databaseV1 ");
}
Job job = Job.getInstance(conf, GetSplitMapReduce.class.getSimpleName() + "1");
job.setJarByClass(GetSplitMapReduce.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(NullWritable.class);
job.setMapperClass(MyMapper1.class);
job.setNumReduceTasks(0);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
job.waitForCompletion(true);
}
public static class MyMapper1 extends Mapper{
@Override
protected void map(LongWritable key, Text value, Mapper.Context context)
throws IOException, InterruptedException {
FileSplit fileSplit=(FileSplit) context.getInputSplit();
String pathname=fileSplit.getPath().getName(); //获取目录名字
int depth = fileSplit.getPath().depth(); //获取目录深度
Classextends FileSplit> class1 = fileSplit.getClass(); //获取当前类
long length = fileSplit.getLength(); //获取文件长度
SplitLocationInfo[] locationInfo = fileSplit.getLocationInfo(); //获取位置信息
String[] locations = fileSplit.getLocations(); //获取位置
long start = fileSplit.getStart(); //The position of the first byte in the file to process.
String string = fileSplit.toString();
//fileSplit.
context.write(new Text("===================================================================================="), NullWritable.get());
context.write(new Text("pathname--"+pathname), NullWritable.get());
context.write(new Text("depth--"+depth), NullWritable.get());
context.write(new Text("class1--"+class1), NullWritable.get());
context.write(new Text("length--"+length), NullWritable.get());
context.write(new Text("locationInfo--"+locationInfo), NullWritable.get());
context.write(new Text("locations--"+locations), NullWritable.get());
context.write(new Text("start--"+start), NullWritable.get());
context.write(new Text("string--"+string), NullWritable.get());
}
}
}

对应inputpath2.txt文件的输出：

[java] view plaincopy

hadoop@hadoop:/home/hadoop/blb$ hdfs dfs -text /user/hadoop/libin/out2/part-m-00000
====================================================================================
pathname--inputpath2.txt
depth--5
class1--class org.apache.hadoop.mapreduce.lib.input.FileSplit
length--66
locationInfo--null
locations--[Ljava.lang.String;@4ff41ba0
start--0
string--hdfs://hadoop:9000/user/hadoop/libin/input/inputpath2.txt:0+66
====================================================================================
pathname--inputpath2.txt
depth--5
class1--class org.apache.hadoop.mapreduce.lib.input.FileSplit
length--66
locationInfo--null
locations--[Ljava.lang.String;@2341ce62
start--0
string--hdfs://hadoop:9000/user/hadoop/libin/input/inputpath2.txt:0+66
====================================================================================
pathname--inputpath2.txt
depth--5
class1--class org.apache.hadoop.mapreduce.lib.input.FileSplit
length--66
locationInfo--null
locations--[Ljava.lang.String;@35549603
start--0
string--hdfs://hadoop:9000/user/hadoop/libin/input/inputpath2.txt:0+66
====================================================================================
pathname--inputpath2.txt
depth--5
class1--class org.apache.hadoop.mapreduce.lib.input.FileSplit
length--66
locationInfo--null
locations--[Ljava.lang.String;@4444ba4f
start--0
string--hdfs://hadoop:9000/user/hadoop/libin/input/inputpath2.txt:0+66
====================================================================================
pathname--inputpath2.txt
depth--5
class1--class org.apache.hadoop.mapreduce.lib.input.FileSplit
length--66
locationInfo--null
locations--[Ljava.lang.String;@7c23bb8c
start--0
string--hdfs://hadoop:9000/user/hadoop/libin/input/inputpath2.txt:0+66
====================================================================================
pathname--inputpath2.txt
depth--5
class1--class org.apache.hadoop.mapreduce.lib.input.FileSplit
length--66
locationInfo--null
locations--[Ljava.lang.String;@dee2400
start--0
string--hdfs://hadoop:9000/user/hadoop/libin/input/inputpath2.txt:0+66
====================================================================================
pathname--inputpath2.txt
depth--5
class1--class org.apache.hadoop.mapreduce.lib.input.FileSplit
length--66
locationInfo--null
locations--[Ljava.lang.String;@d7d8325
start--0
string--hdfs://hadoop:9000/user/hadoop/libin/input/inputpath2.txt:0+66
====================================================================================
pathname--inputpath2.txt
depth--5
class1--class org.apache.hadoop.mapreduce.lib.input.FileSplit
length--66
locationInfo--null
locations--[Ljava.lang.String;@2b2cf90e
start--0
string--hdfs://hadoop:9000/user/hadoop/libin/input/inputpath2.txt:0+66

[java] view plaincopy

hadoop@hadoop:/home/hadoop/blb$ hdfs dfs -text /user/hadoop/libin/out2/part-m-00000
====================================================================================
pathname--inputpath2.txt
depth--5
class1--class org.apache.hadoop.mapreduce.lib.input.FileSplit
length--66
locationInfo--null
locations--[Ljava.lang.String;@4ff41ba0
start--0
string--hdfs://hadoop:9000/user/hadoop/libin/input/inputpath2.txt:0+66
====================================================================================
pathname--inputpath2.txt
depth--5
class1--class org.apache.hadoop.mapreduce.lib.input.FileSplit
length--66
locationInfo--null
locations--[Ljava.lang.String;@2341ce62
start--0
string--hdfs://hadoop:9000/user/hadoop/libin/input/inputpath2.txt:0+66
====================================================================================
pathname--inputpath2.txt
depth--5
class1--class org.apache.hadoop.mapreduce.lib.input.FileSplit
length--66
locationInfo--null
locations--[Ljava.lang.String;@35549603
start--0
string--hdfs://hadoop:9000/user/hadoop/libin/input/inputpath2.txt:0+66
====================================================================================
pathname--inputpath2.txt
depth--5
class1--class org.apache.hadoop.mapreduce.lib.input.FileSplit
length--66
locationInfo--null
locations--[Ljava.lang.String;@4444ba4f
start--0
string--hdfs://hadoop:9000/user/hadoop/libin/input/inputpath2.txt:0+66
====================================================================================
pathname--inputpath2.txt
depth--5
class1--class org.apache.hadoop.mapreduce.lib.input.FileSplit
length--66
locationInfo--null
locations--[Ljava.lang.String;@7c23bb8c
start--0
string--hdfs://hadoop:9000/user/hadoop/libin/input/inputpath2.txt:0+66
====================================================================================
pathname--inputpath2.txt
depth--5
class1--class org.apache.hadoop.mapreduce.lib.input.FileSplit
length--66
locationInfo--null
locations--[Ljava.lang.String;@dee2400
start--0
string--hdfs://hadoop:9000/user/hadoop/libin/input/inputpath2.txt:0+66
====================================================================================
pathname--inputpath2.txt
depth--5
class1--class org.apache.hadoop.mapreduce.lib.input.FileSplit
length--66
locationInfo--null
locations--[Ljava.lang.String;@d7d8325
start--0
string--hdfs://hadoop:9000/user/hadoop/libin/input/inputpath2.txt:0+66
====================================================================================
pathname--inputpath2.txt
depth--5
class1--class org.apache.hadoop.mapreduce.lib.input.FileSplit
length--66
locationInfo--null
locations--[Ljava.lang.String;@2b2cf90e
start--0
string--hdfs://hadoop:9000/user/hadoop/libin/input/inputpath2.txt:0+66

对应inputpath1.txt文件的输出：

[java] view plaincopy

hadoop@hadoop:/home/hadoop/blb$ hdfs dfs -text /user/hadoop/libin/out2/part-m-00001
====================================================================================
pathname--inputpath1.txt
depth--5
class1--class org.apache.hadoop.mapreduce.lib.input.FileSplit
length--58
locationInfo--null
locations--[Ljava.lang.String;@4ff41ba0
start--0
string--hdfs://hadoop:9000/user/hadoop/libin/input/inputpath1.txt:0+58
====================================================================================
pathname--inputpath1.txt
depth--5
class1--class org.apache.hadoop.mapreduce.lib.input.FileSplit
length--58
locationInfo--null
locations--[Ljava.lang.String;@2341ce62
start--0
string--hdfs://hadoop:9000/user/hadoop/libin/input/inputpath1.txt:0+58
====================================================================================
pathname--inputpath1.txt
depth--5
class1--class org.apache.hadoop.mapreduce.lib.input.FileSplit
length--58
locationInfo--null
locations--[Ljava.lang.String;@35549603
start--0
string--hdfs://hadoop:9000/user/hadoop/libin/input/inputpath1.txt:0+58
====================================================================================
pathname--inputpath1.txt
depth--5
class1--class org.apache.hadoop.mapreduce.lib.input.FileSplit
length--58
locationInfo--null
locations--[Ljava.lang.String;@4444ba4f
start--0
string--hdfs://hadoop:9000/user/hadoop/libin/input/inputpath1.txt:0+58
====================================================================================
pathname--inputpath1.txt
depth--5
class1--class org.apache.hadoop.mapreduce.lib.input.FileSplit
length--58
locationInfo--null
locations--[Ljava.lang.String;@7c23bb8c
start--0
string--hdfs://hadoop:9000/user/hadoop/libin/input/inputpath1.txt:0+58
====================================================================================
pathname--inputpath1.txt
depth--5
class1--class org.apache.hadoop.mapreduce.lib.input.FileSplit
length--58
locationInfo--null
locations--[Ljava.lang.String;@dee2400
start--0
string--hdfs://hadoop:9000/user/hadoop/libin/input/inputpath1.txt:0+58
====================================================================================
pathname--inputpath1.txt
depth--5
class1--class org.apache.hadoop.mapreduce.lib.input.FileSplit
length--58
locationInfo--null
locations--[Ljava.lang.String;@d7d8325
start--0
string--hdfs://hadoop:9000/user/hadoop/libin/input/inputpath1.txt:0+58
hadoop@hadoop:/home/hadoop/blb$

推荐阅读

go
Python 数据可视化实战指南

本文详细介绍如何使用 Python 进行数据可视化，涵盖从环境搭建到具体实例的全过程。 ... [详细]

蜡笔小新 2024-11-13 06:03:30
go
从0到1搭建大数据平台

从0到1搭建大数据平台 ... [详细]

蜡笔小新 2024-11-12 15:26:03
email
美团优选推荐系统架构师 L7/L8：算法与工程深度融合

美团优选推荐系统架构师 L7/L8：算法与工程深度融合 ... [详细]

蜡笔小新 2024-11-05 19:10:28
header
FastDFS Nginx 扩展模块的源代码解析与技术剖析

FastDFS Nginx 扩展模块的源代码解析与技术剖析 ... [详细]

蜡笔小新 2024-11-04 20:15:18
copy
HTML a 标签中 href 属性的多种用法

本文详细介绍了 HTML 中 a 标签的 href 属性的多种用法，包括实现超链接、锚点以及调用 JavaScript 方法。通过具体的示例和解释，帮助开发者更好地理解和应用这些技术。 ... [详细]

蜡笔小新 2024-11-14 09:07:08
copy
HDFS API

Hadoop的文件操作位于包org.apache.hadoop.fs里面，能够进行新建、删除、修改等操作。比较重要的几个类：(1)Configurati ... [详细]

蜡笔小新 2024-11-13 17:31:50
copy
解决Bootstrap DataTable Ajax请求重复问题

在最近的一个项目中，我们使用了JQuery DataTable进行数据展示，虽然使用起来非常方便，但在测试过程中发现了一个问题：当查询条件改变时，有时查询结果的数据不正确。通过FireBug调试发现，点击搜索按钮时，会发送两次Ajax请求，一次是原条件的请求，一次是新条件的请求。 ... [详细]

蜡笔小新 2024-11-12 13:59:27
copy
深入解析Java多线程同步机制与应用

本文深入探讨了Java多线程环境下的同步机制及其应用，重点介绍了`synchronized`关键字的使用方法和原理。`synchronized`关键字主要用于确保多个线程在访问共享资源时的互斥性和原子性。通过具体示例，如在一个类中使用`synchronized`修饰方法，展示了如何实现线程安全的代码块。此外，文章还讨论了`ReentrantLock`等其他同步工具的优缺点，并提供了实际应用场景中的最佳实践。 ... [详细]

蜡笔小新 2024-11-08 16:11:26
go
使用 Spark SQL 基于起始与终止时间生成时序数据表

本文介绍了如何使用 Spark SQL 生成基于起始与终止时间的时序数据表。通过 `SELECT DISTINCT goods_id, get_dt_date(start_time, i) as new_dt` 语句，根据不同的时间间隔 `i` 动态填充日期，从而构建出完整的时序数据记录。该方法能够高效地处理大规模数据集，并确保生成的数据表准确反映商品在不同时间段的状态变化。 ... [详细]

蜡笔小新 2024-11-08 15:57:47
copy
Linux 环境下多线程编程实战案例分析

在 Linux 环境下，多线程编程是实现高效并发处理的重要技术。本文通过具体的实战案例，详细分析了多线程编程的关键技术和常见问题。文章首先介绍了多线程的基本概念和创建方法，然后通过实例代码展示了如何使用 pthreads 库进行线程同步和通信。此外，还探讨了多线程程序中的性能优化技巧和调试方法，为开发者提供了宝贵的实践经验。 ... [详细]

蜡笔小新 2024-11-08 13:02:21
schema
Presto：高效即席查询引擎的深度解析与应用

本文深入解析了Presto这一高效的即席查询引擎，详细探讨了其架构设计及其优缺点。Presto通过内存到内存的数据处理方式，显著提升了查询性能，相比传统的MapReduce查询，不仅减少了数据传输的延迟，还提高了查询的准确性和效率。然而，Presto在大规模数据处理和容错机制方面仍存在一定的局限性。本文还介绍了Presto在实际应用中的多种场景，展示了其在大数据分析领域的强大潜力。 ... [详细]

蜡笔小新 2024-11-07 19:17:47
int
详解Android连接MySQL数据库的操作流程及技术要点

在Android应用开发中，实现与MySQL数据库的连接是一项重要的技术任务。本文详细介绍了Android连接MySQL数据库的操作流程和技术要点。首先，Android平台提供了SQLiteOpenHelper类作为数据库辅助工具，用于创建或打开数据库。开发者可以通过继承并扩展该类，实现对数据库的初始化和版本管理。此外，文章还探讨了使用第三方库如Retrofit或Volley进行网络请求，以及如何通过JSON格式交换数据，确保与MySQL服务器的高效通信。 ... [详细]

蜡笔小新 2024-11-07 19:11:13
byte
Scala学习指南：从零开始掌握基础

本指南从零开始介绍Scala编程语言的基础知识，重点讲解了Scala解释器REPL（读取-求值-打印-循环）的使用方法。REPL是Scala开发中的重要工具，能够帮助初学者快速理解和实践Scala的基本语法和特性。通过详细的示例和练习，读者将能够熟练掌握Scala的基础概念和编程技巧。 ... [详细]

蜡笔小新 2024-11-07 18:07:59
byte
Jeecg开源社区启动第12届架构技术培训班，现正式开放报名通道

Jeecg开源社区正式启动第12届架构技术培训班，现已开放报名。本次培训采用师徒制模式，深入探讨Java架构技术。类似于大学导师指导研究生的方式，特别适合在职人员。导师将为学员布置课题，提供丰富的视频资料，并进行一对一指导，帮助学员高效学习和完成任务。我们的教学方法注重实践与理论结合，旨在培养学员的综合技术能力。 ... [详细]

蜡笔小新 2024-11-06 10:35:24
copy
使用JavaScript生成Java兼容的UUID代码实现与优化技巧

本文介绍了UUID（通用唯一标识符）的概念及其在JavaScript中生成Java兼容UUID的代码实现与优化技巧。UUID是一个128位的唯一标识符，广泛应用于分布式系统中以确保唯一性。文章详细探讨了如何利用JavaScript生成符合Java标准的UUID，并提供了多种优化方法，以提高生成效率和兼容性。 ... [详细]

蜡笔小新 2024-11-05 18:19:54

mobiledu2502853623

这个家伙很懒，什么也没留下！

Tags | 热门标签

RankList | 热门文章