ALLuxio

作者：李太有才_905 | 来源：互联网 | 2023-09-01 18:49

一、什么是AlluxioAlluxio（之前名为Tachyon）是世界上第一个以内存为中心的虚拟的分布式存储系统。它统一了数据访问的方式，

一、什么是Alluxio
Alluxio&＃xff08;之前名为Tachyon&＃xff09;是世界上第一个以内存为中心的虚拟的分布式存储系统。它统一了数据访问的方式&＃xff0c;为上层计算框架和底层存储系统构建了桥梁。应用只需要连接Alluxio即可访问存储在底层任意存储系统中的数据。此外&＃xff0c;Alluxio的以内存为中心的架构使得数据的访问速度能比现有常规方案快几个数量级。

在大数据生态系统中&＃xff0c;Alluxio介于计算框架(如Apache Spark&＃xff0c;Apache MapReduce&＃xff0c;Apache HBase&＃xff0c;Apache Hive&＃xff0c;Apache Flink)和现有的存储系统&＃xff08;如Amazon S3&＃xff0c;OpenStack Swift&＃xff0c;GlusterFS&＃xff0c;HDFS&＃xff0c;MaprFS&＃xff0c;Ceph&＃xff0c;NFS&＃xff0c;OSS&＃xff09;之间。Alluxio为大数据软件栈带来了显著的性能提升。Alluxio与Hadoop是兼容的。现有的数据分析应用&＃xff0c;如Spark和MapReduce程序&＃xff0c;可以不修改代码直接在Alluxio上运行。

二、Alluxio应用
比如&＃xff1a;分布式内存文件系统Alluxio&＃xff0c; Alluxio是一个分布式内存文件系统&＃xff0c;可以在集群里以访问内存的速度来访问存在Alluxio里的文件。把Alluxio是架构在最底层的分布式文件存储和上层的各种计算框架之间的一种中间件&＃xff0c;其前身为Tachyon。

二、安装Alluxio
docker安装

# Launch the Alluxio master $ docker run -d \-p 19999:19999 \--net&＃61;alluxio_nw \--name&＃61;alluxio_master \-v ufs:/opt/alluxio/underFSStorage \alluxio/alluxio master # Launch the Alluxio worker $ docker run -d \--net&＃61;alluxio_nw \--name&＃61;alluxio_worker \--shm-size&＃61;1G -e ALLUXIO_WORKER_MEMORY_SIZE&＃61;1G \-v ufs:/opt/alluxio/underFSStorage \-e ALLUXIO_MASTER_HOSTNAME&＃61;alluxio_master \alluxio/alluxio worker

三、Alluxio与springboot应用

依赖 pom.xml

4.0.0org.springframework.bootspring-boot-starter-parent2.1.5.RELEASE com.citydoalluxio0.0.1-SNAPSHOTalluxioDemo project for Spring Boot1.8org.springframework.bootspring-boot-starter-weborg.alluxioalluxio-core-client-fs1.8.1org.springframework.bootspring-boot-starter-testtestcom.alibaba.blinkflink-shaded-hadoop21.5.1org.springframework.bootspring-boot-maven-plugin

控制层 controller

package com.citydo.alluxio.controller;import alluxio.AlluxioURI; import alluxio.client.file.FileInStream; import alluxio.client.file.FileOutStream; import alluxio.client.file.FileSystem; import alluxio.exception.AlluxioException; import org.springframework.web.bind.annotation.RequestMapping; import org.springframework.web.bind.annotation.RestController;import java.io.IOException;&＃64;RestController &＃64;RequestMapping("/v1/alluxio") public class AlluxioController {&＃64;RequestMapping("/in")public FileInStream in() throws IOException, AlluxioException {FileSystem fs &＃61; FileSystem.Factory.get();AlluxioURI path &＃61; new AlluxioURI("/myFile");// Open the file for readingFileInStream in &＃61; fs.openFile(path);//CreateFileOptions options &＃61; CreateFileOptions.defaults().setBlockSize(128 * Constants.MB);//FileOutStream out &＃61; fs.createFile(path, options);// Read datain.read(null);// Close file relinquishing the lockin.close();return in;}&＃64;RequestMapping("/out")public FileOutStream out() throws IOException, AlluxioException {FileSystem fs &＃61; FileSystem.Factory.get();AlluxioURI path &＃61; new AlluxioURI("/myFile");// Create a file and get its output streamFileOutStream out &＃61; fs.createFile(path);// Write dataout.write(null);// Close and complete fileout.close();return out;}//参考https://docs.alluxio.io/os/javadoc/stable/index.html }

工具类 AlluxioFsUitls

package com.citydo.alluxio.utils;import alluxio.AlluxioURI; import alluxio.client.ReadType; import alluxio.client.WriteType; import alluxio.client.file.FileSystem; import alluxio.client.file.URIStatus; import alluxio.client.file.options.CreateFileOptions; import alluxio.client.file.options.OpenFileOptions; import alluxio.exception.AlluxioException;import java.io.*; import java.util.ArrayList; import java.util.List;public class AlluxioFsUitls {// 获取文件系统FileSystemprivate static final FileSystem fs &＃61; FileSystem.Factory.get();/*** 此方法用于添加挂载点** &＃64;param alluxioFilePath* 文件路径*/public static void mount(String alluxioFilePath, String underFileSystemPath) {// 1.创建文件路径 AlluxioURIAlluxioURI apath &＃61; new AlluxioURI(alluxioFilePath);AlluxioURI upath &＃61; new AlluxioURI(underFileSystemPath);try {// 2.添加挂载点if (!fs.exists(apath)) {fs.mount(apath, upath);}} catch (IOException e) {e.printStackTrace();} catch (AlluxioException e) {e.printStackTrace();}}/*** 此方法用于删除挂载点** &＃64;param filePath* 文件路径*/public static void unmount(String filePath) {// 1.创建文件路径 AlluxioURIAlluxioURI path &＃61; new AlluxioURI(filePath);try {// 2.删除挂载点if (fs.exists(path)) {fs.unmount(path);}} catch (IOException e) {e.printStackTrace();} catch (AlluxioException e) {e.printStackTrace();}}/*** 此方法用于创建文件&＃xff0c;并向文件中输出内容WriteType.ASYNC_THROUGH* 数据被同步地写入到Alluxio的Worker&＃xff0c;并异步地写入到底层存储系统。处于实验阶段。** &＃64;param filePath* 文件路径* &＃64;param contents* 向文件中输出的内容* &＃64;return 文件创建&＃xff0c;是否成功*/public static boolean createFileMustAsysncThroughWriteTpye(String filePath, List contents) {return createFile(filePath, contents, CreateFileOptions.defaults().setWriteType(WriteType.ASYNC_THROUGH));}/*** 此方法用于创建文件&＃xff0c;并向文件中输出内容WriteType.CACHE_THROUGH* 数据被同步地写入到Alluxio的Worker和底层存储系统。** &＃64;param filePath* 文件路径* &＃64;param contents* 向文件中输出的内容* &＃64;return 文件创建&＃xff0c;是否成功*/public static boolean createFileMustCacheThroughWriteTpye(String filePath, List contents) {return createFile(filePath, contents, CreateFileOptions.defaults().setWriteType(WriteType.CACHE_THROUGH));}/*** 此方法用于创建文件&＃xff0c;并向文件中输出内容WriteType.THROUGH* 数据被同步地写入到底层存储系统。但不会被写入到Alluxio的Worker。** &＃64;param filePath* 文件路径* &＃64;param contents* 向文件中输出的内容* &＃64;return 文件创建&＃xff0c;是否成功*/public static boolean createFileMustThroughWriteTpye(String filePath, List contents) {return createFile(filePath, contents, CreateFileOptions.defaults().setWriteType(WriteType.THROUGH));}/*** 此方法用于创建文件&＃xff0c;并向文件中输出内容WriteType.MUST_CACHE* 数据被同步地写入到Alluxio的Worker。但不会被写入到底层存储系统。这是默认写类型。** &＃64;param filePath* 文件路径* &＃64;param contents* 向文件中输出的内容* &＃64;return 文件创建&＃xff0c;是否成功*/public static boolean createFileMustCacheWriteTpye(String filePath, List contents) {return createFile(filePath, contents, CreateFileOptions.defaults().setWriteType(WriteType.MUST_CACHE));}/*** 方法用于创建文件&＃xff0c;并向文件中输出内容** &＃64;param filePath* 文件路径* &＃64;param contents* 向文件中输出的内容* &＃64;param options* CreateFileOptions* &＃64;return 文件创建&＃xff0c;是否成功*/public static boolean createFile(String filePath, List contents, CreateFileOptions options) {// 1.创建文件路径 AlluxioURIAlluxioURI path &＃61; new AlluxioURI(filePath);BufferedWriter writer &＃61; null;try {// 2.打开文件输出流,使用BufferedWriter输出if (!fs.exists(path)) {writer &＃61; new BufferedWriter(new OutputStreamWriter(fs.createFile(path, options)));// 3.输出文件内容for (String line : contents) {writer.write(line);writer.newLine();}}// 3.如果文件存在&＃xff0c;则表示执行成功return fs.exists(path);} catch (IOException e) {e.printStackTrace();} catch (AlluxioException e) {e.printStackTrace();} finally {try {// 4.关闭输入流&＃xff0c;释放资源if (writer !&＃61; null) {writer.close();}} catch (IOException e) {e.printStackTrace();}}return false;}/*** 此方法用于读取alluxio文件ReadType.CACHE_PROMOTE* 如果读取的数据在Worker上时&＃xff0c;该数据被移动到Worker的最高层。如果该数据不在本地Worker的Alluxio存储中&＃xff0c;* 那么就将一个副本添加到本地Alluxio Worker中&＃xff0c;用于每次完整地读取数据块。这是默认的读类型。** &＃64;param filePath* 文件路径* &＃64;return 文件的内容*/public static List openFilePromoteCacheReadType(String filePath) {return openFile(filePath, OpenFileOptions.defaults().setReadType(ReadType.CACHE_PROMOTE));}/*** 此方法用于读取alluxio文件ReadType.NO_CACHE 不会创建副本** &＃64;param filePath* 文件路径* &＃64;return 文件的内容*/public static List openFileNoCacheReadType(String filePath) {return openFile(filePath, OpenFileOptions.defaults().setReadType(ReadType.NO_CACHE));}/*** 此方法用于读取alluxio文件ReadType.CACHE* 如果该数据不在本地Worker的Alluxio存储中&＃xff0c;那么就将一个副本添加到本地Alluxio Worker中&＃xff0c; 用于每次完整地读取数据块。** &＃64;param filePath* 文件路径* &＃64;return 文件的内容*/public static List openFileCacheReadType(String filePath) {return openFile(filePath, OpenFileOptions.defaults().setReadType(ReadType.CACHE));}/*** 此方法用于读取alluxio文件DefalutReadType** &＃64;param filePath* 文件路径* &＃64;return 文件的内容*/public static List openFileDefalutReadType(String filePath) {return openFile(filePath, OpenFileOptions.defaults());}/*** 此方法用于读取alluxio文件** &＃64;param filePath* 文件路径* &＃64;param options* 文件读取选项* &＃64;return 文件的内容*/public static List openFile(String filePath, OpenFileOptions options) {List list &＃61; new ArrayList();// 1.创建文件路径 AlluxioURIAlluxioURI path &＃61; new AlluxioURI(filePath);BufferedReader reader &＃61; null;try {// 2.打开文件输入流&＃xff0c;使用 BufferedReader按行读取if (fs.exists(path)) {reader &＃61; new BufferedReader(new InputStreamReader(fs.openFile(path, options)));for (String line &＃61; null; (line &＃61; reader.readLine()) !&＃61; null;) {list.add(line);}return list;}} catch (IOException e) {e.printStackTrace();} catch (AlluxioException e) {e.printStackTrace();} finally {try {// 3.关闭输入流&＃xff0c;释放资源if (reader !&＃61; null) {reader.close();}} catch (IOException e) {e.printStackTrace();}}return list;}/*** 此方法用于释放alluxio中的文件或路径** &＃64;param filePath* 文件路径* &＃64;return 释放文件, 是否成功*/public static boolean free(String filePath) {// 1.创建文件路径 AlluxioURIAlluxioURI path &＃61; new AlluxioURI(filePath);try {// 2.释放文件if (fs.exists(path)) {fs.free(path);}// 3.判定文件是否不存在&＃xff0c;如果不存在则删除成功&＃xff01;return !fs.exists(path);} catch (IOException e) {e.printStackTrace();} catch (AlluxioException e) {e.printStackTrace();}return false;}/*** 此方法用于删除文件或路径** &＃64;param filePath* 文件路径* &＃64;return 删除文件, 是否成功*/public static boolean delete(String filePath) {// 1.创建文件路径 AlluxioURIAlluxioURI path &＃61; new AlluxioURI(filePath);try {// 2.删除文件if (fs.exists(path)) {fs.delete(path);}// 3.判定文件是否不存在&＃xff0c;如果不存在则删除成功&＃xff01;return !fs.exists(path);} catch (IOException e) {e.printStackTrace();} catch (AlluxioException e) {e.printStackTrace();}return false;}/*** 此方法用于创建文件夹** &＃64;param dirPath* 文件夹路径* &＃64;return 创建文件夹, 是否成功*/public static boolean createDirectory(String dirPath) {// 1.创建文件路径 AlluxioURIAlluxioURI path &＃61; new AlluxioURI(dirPath);try {// 2.创建文件夹if (!fs.exists(path)) {fs.createDirectory(path);}// 3.再次判定文件夹是否存在&＃xff0c;来确定文件夹是否创建成功return fs.exists(path);} catch (IOException e) {e.printStackTrace();} catch (AlluxioException e) {e.printStackTrace();}return false;}/*** 此方法用于获取文件状态信息** &＃64;param filePath* 文件路径* &＃64;return List*/public static List listStatus(String filePath) {// 1.创建文件路径 AlluxioURIAlluxioURI path &＃61; new AlluxioURI(filePath);try {// 2.获取文件状态信息if (fs.exists(path)) {return fs.listStatus(path);}} catch (IOException e) {e.printStackTrace();} catch (AlluxioException e) {e.printStackTrace();}return null;}/*** 此方法用于获取文件状态信息** &＃64;param filePath* 文件路径* &＃64;return URIStatus*/public static URIStatus getStatus(String filePath) {// 1.创建文件路径 AlluxioURIAlluxioURI path &＃61; new AlluxioURI(filePath);try {// 2.获取文件状态信息if (fs.exists(path)) {return fs.getStatus(path);}} catch (IOException e) {e.printStackTrace();} catch (AlluxioException e) {e.printStackTrace();}return null;}/*** 此方法用于判定文件是否存在** &＃64;param filePath* 文件路径* &＃64;return 文件是否存在*/public static boolean exists(String filePath) {// 1.创建文件路径 AlluxioURIAlluxioURI path &＃61; new AlluxioURI(filePath);try {// 2.获取文件状态信息return fs.exists(path);} catch (IOException e) {e.printStackTrace();} catch (AlluxioException e) {e.printStackTrace();}return false;}/*** 此方法用于重命名文件** &＃64;param sourcePath* 原文件路径* &＃64;param distPath* 目的文件路径* &＃64;return 重命名是否成功*/public static boolean rename(String sourcePath, String distPath) {// 1.创建文件路径 AlluxioURIAlluxioURI sourcepath &＃61; new AlluxioURI(sourcePath);AlluxioURI distpath &＃61; new AlluxioURI(distPath);try {// 2.重命名操作if (fs.exists(sourcepath)) {fs.rename(sourcepath, distpath);}// 3.判定目标文件是否存在&＃xff0c;来判定重命名是否成功return ((fs.exists(distpath)) && (!fs.exists(sourcepath)));} catch (IOException e) {e.printStackTrace();} catch (AlluxioException e) {e.printStackTrace();}return false;} }

四、性能对比
万级表
1.2G (2千万)

presto sql	HDFS(s)	Alluxio(s)
count	5	2
distinct &＃43; where	3~10	1
group by	2	1

亿级表
tableB 900G (45亿)

presto sql	HDFS(s)	Alluxio(s)
count	23~69	11~14
distinct &＃43; where	50~95	20~21
group by	76~157	77~90

参考&＃xff1a;https://www.alluxio.io/download/
参考&＃xff1a;https://blog.csdn.net/lsshlsw/article/details/85690841
参考&＃xff1a;https://docs.alluxio.io/os/user/stable/cn/Getting-Started.html

推荐阅读

io
从0到1搭建大数据平台

从0到1搭建大数据平台 ... [详细]

蜡笔小新 2024-11-12 15:26:03
io
HBase在金融大数据迁移中的应用与挑战

随着最后一台设备的下线，标志着超过10PB的HBase数据迁移项目顺利完成。目前，新的集群已在新机房稳定运行超过两个月，监控数据显示，新集群的查询响应时间显著降低，系统稳定性大幅提升。此外，数据消费的波动也变得更加平滑，整体性能得到了显著优化。 ... [详细]

蜡笔小新 2024-10-31 14:06:06
io
构建高可用性Spark分布式集群：大数据环境下的最佳实践

在构建高可用性的Spark分布式集群过程中，确保所有节点之间的无密码登录是至关重要的一步。通过在每个节点上生成SSH密钥对（使用 `ssh-keygen -t rsa` 命令并保持默认设置），可以实现这一目标。此外，还需将生成的公钥分发到所有节点的 `~/.ssh/authorized_keys` 文件中，以确保节点间的无缝通信。为了进一步提升集群的稳定性和性能，建议采用负载均衡和故障恢复机制，并定期进行系统监控和维护。 ... [详细]

蜡笔小新 2024-11-02 14:18:50
merge
数据湖风暴来袭，EMR重磅发布Apache Hudi

Hudi是一种数据湖的存储格式，在Hadoop文件系统之上提供了更新数据和删除数据的能力以及流式消费变化数据的能力。应用场景近实时数据摄取Hudi支持插入、更新和删除数据的能力。您 ... [详细]

蜡笔小新 2024-10-14 13:15:48
io
工作原理_一文理解 Spark 基础概念及工作原理

篇首语：本文由编程笔记#小编为大家整理，主要介绍了一文理解Spark基础概念及工作原理相关的知识，希望对你有一定的参考价值。 ... [详细]

蜡笔小新 2024-10-14 04:29:30
io
Hadoop的分布式架构改进与应用

nsitionalENhttp:www.w3.orgTRxhtml1DTDxhtml1-transitional.dtd ... [详细]

蜡笔小新 2024-10-11 14:10:35
merge
如何在Hive中合理配置Map和Reduce任务数量以优化不同场景下的性能表现

在Hive中合理配置Map和Reduce任务的数量对于优化不同场景下的性能至关重要。本文探讨了如何控制Hive任务中的Map数量，分析了当输入数据超过128MB时是否会自动拆分，以及Map数量是否越多越好的问题。通过实际案例和实验数据，本文提供了具体的配置建议，帮助用户在不同场景下实现最佳性能。 ... [详细]

蜡笔小新 2024-10-31 14:33:41
main
技术日志：深入探讨Spark Streaming与Spark SQL的融合应用

技术日志：深入探讨Spark Streaming与Spark SQL的融合应用 ... [详细]

蜡笔小新 2024-10-30 14:20:53
install
在Linux系统中配置环境变量以切换不同版本Python的方法与实践

在Linux系统中，原本已安装了多个版本的Python 2，并且还安装了Anaconda，其中包含了Python 3。本文详细介绍了如何通过配置环境变量，使系统默认使用指定版本的Python，以便在不同版本之间轻松切换。此外，文章还提供了具体的实践步骤和注意事项，帮助用户高效地管理和使用不同版本的Python环境。 ... [详细]

蜡笔小新 2024-10-30 09:39:09
heap
深入理解Spark框架：RDD核心概念与操作详解

RDD是Spark框架的核心计算模型，全称为弹性分布式数据集（Resilient Distributed Dataset）。本文详细解析了RDD的基本概念、特性及其在Spark中的关键操作，包括创建、转换和行动操作等，帮助读者深入理解Spark的工作原理和优化策略。通过具体示例和代码片段，进一步阐述了如何高效利用RDD进行大数据处理。 ... [详细]

蜡笔小新 2024-10-29 20:10:01
heap
大数据深度解读系列官网资源分享

大数据深度解读系列官网资源分享 ... [详细]

蜡笔小新 2024-10-27 17:27:52
heap
上海市地理位置解析：纬度详细分析

9月10日，ShanghaiApacheSparkMeetup聚会在上海通茂大酒店成功举办。本次活动邀请到运营商和高校讲师来分享经验，主题覆盖了从Spark研发到应用的各种不同视角 ... [详细]

蜡笔小新 2024-10-22 20:30:31
input
pyspark RDD数据的读取与保存

数据读取hadoopFileParameters:path–pathtoHadoopfileinputFormatClass–fullyqualifiedclassnameo ... [详细]

蜡笔小新 2024-10-17 18:31:13
timestamp
FileBeat + Flume + Kafka + HDFS + Neo4j + SparkStreaming + MySQL：【案例】三度关系推荐V1.0版本11：每周一计算最近一月主播视频评级

一、数据计算步骤汇总下面我们通过文字梳理一下具体的数据计算步骤。第一步：历史粉丝关注数据初始化第二步：实时维护粉丝关注数据第三步：每天定 ... [详细]

蜡笔小新 2024-10-17 15:50:44
input
hadoop3.1.2 first programdefault wordcount (Mac)

hadoop3.1.2安装完成后的第一个实操示例程 ... [详细]

蜡笔小新 2024-10-15 11:11:55

李太有才_905

这个家伙很懒，什么也没留下！

Tags | 热门标签

RankList | 热门文章