一文搞懂Flink如何移动计算

作者：php.com | 来源：互联网 | 2023-07-22 10:29

对于分布式框架来说，我们经常听到的一句话就是：移动计算，不移动数据。那么对于Flink来说是如何移动计算的呢？我们一起重点看

对于分布式框架来说&＃xff0c;我们经常听到的一句话就是&＃xff1a;移动计算&＃xff0c;不移动数据。那么对于 Flink 来说是如何移动计算的呢&＃xff1f;我们一起重点看一下 ExecuteGraph

基本概念

ExecutionJobVertex&＃xff1a;表示 JobGraph 的一个计算顶点&＃xff0c;每个 ExecutionJobVertex 可能会有很多个并行的 ExecutionVertex
ExecutionVertex&＃xff1a;表示一个并行的 subtask
Execution: 表示 ExecutionVertex 的一次尝试执行

Graph 变化

在这里插入图片描述

源代码

由一文搞定 Flink Job 提交全流程我们可以知道在创建 jobMaster 的同时还 create executionGraph &＃xff0c;一路追踪至 ExecutionGraphBuilder.buildGraph 方法

...... // topologically sort the job vertices and attach the graph to the existing one// 排好序的 topology source->flatMap Filter->sink// 一个 operator chain 形成一个 JobVertex 。single operator as a special operator chainList<JobVertex> sortedTopology &＃61; jobGraph.getVerticesSortedTopologicallyFromSources();if (log.isDebugEnabled()) {log.debug("Adding {} vertices from job graph {} ({}).", sortedTopology.size(), jobName, jobId);}executionGraph.attachJobGraph(sortedTopology);......

进入 attachJobGraph

public void attachJobGraph(List<JobVertex> topologiallySorted) throws JobException {assertRunningInJobMasterMainThread();LOG.debug("Attaching {} topologically sorted vertices to existing job graph with {} " &＃43;"vertices and {} intermediate results.",topologiallySorted.size(),tasks.size(),intermediateResults.size());final ArrayList<ExecutionJobVertex> newExecJobVertices &＃61; new ArrayList<>(topologiallySorted.size());final long createTimestamp &＃61; System.currentTimeMillis();//从 source operator chain 开始for (JobVertex jobVertex : topologiallySorted) {if (jobVertex.isInputVertex() && !jobVertex.isStoppable()) {this.isStoppable &＃61; false;}/*//在这里生成 ExecutionGraph 的每个节点//首先是进行了一堆赋值&＃xff0c;将任务信息交给要生成的图节点&＃xff0c;以及设定并行度等等//然后是创建本节点的 IntermediateResult&＃xff0c;根据本节点的下游节点的个数确定创建几份//最后是根据设定好的并行度创建用于执行 task 的 ExecutionVertex//如果 job 有设定 inputsplit 的话&＃xff0c;这里还要指定 inputsplits*/// create the execution job vertex and attach it to the graph// 已经开始并行化了ExecutionJobVertex ejv &＃61; new ExecutionJobVertex(this,jobVertex,1,rpcTimeout,globalModVersion,createTimestamp);/*//这里要处理所有的JobEdge//对每个edge&＃xff0c;获取对应的intermediateResult&＃xff0c;并记录到本节点的输入上//最后&＃xff0c;把每个ExecutorVertex和对应的IntermediateResult关联起来*/ejv.connectToPredecessors(this.intermediateResults);ExecutionJobVertex previousTask &＃61; this.tasks.putIfAbsent(jobVertex.getID(), ejv);if (previousTask !&＃61; null) {throw new JobException(String.format("Encountered two job vertices with ID %s : previous&＃61;[%s] / new&＃61;[%s]",jobVertex.getID(), ejv, previousTask));}for (IntermediateResult res : ejv.getProducedDataSets()) {IntermediateResult previousDataSet &＃61; this.intermediateResults.putIfAbsent(res.getId(), res);if (previousDataSet !&＃61; null) {throw new JobException(String.format("Encountered two intermediate data set with ID %s : previous&＃61;[%s] / new&＃61;[%s]",res.getId(), res, previousDataSet));}}this.verticesInCreationOrder.add(ejv);this.numVerticesTotal &＃43;&＃61; ejv.getParallelism();newExecJobVertices.add(ejv);}terminationFuture &＃61; new CompletableFuture<>();failoverStrategy.notifyNewVertices(newExecJobVertices);}

关键性方法 new ExecutionJobVertex&＃xff0c;除了进行了一些基本的赋值操作外&＃xff0c;还并行化了 intermediateResult&＃xff0c;并行化了 ExecutionVertex。
说白点&＃xff0c;就是创建了几个特别相似的 intermediateResult 对象以及 ExecutionVertex 对象&＃xff0c;具体如下

// 已经开始并行化了public ExecutionJobVertex(ExecutionGraph graph,JobVertex jobVertex,int defaultParallelism,Time timeout,long initialGlobalModVersion,long createTimestamp) throws JobException {if (graph &＃61;&＃61; null || jobVertex &＃61;&＃61; null) {throw new NullPointerException();}this.graph &＃61; graph;this.jobVertex &＃61; jobVertex;int vertexParallelism &＃61; jobVertex.getParallelism();// 最终的并行度int numTaskVertices &＃61; vertexParallelism > 0 ? vertexParallelism : defaultParallelism;final int configuredMaxParallelism &＃61; jobVertex.getMaxParallelism();this.maxParallelismConfigured &＃61; (VALUE_NOT_SET !&＃61; configuredMaxParallelism);// if no max parallelism was configured by the user, we calculate and set a defaultsetMaxParallelismInternal(maxParallelismConfigured ?configuredMaxParallelism : KeyGroupRangeAssignment.computeDefaultMaxParallelism(numTaskVertices));// verify that our parallelism is not higher than the maximum parallelismif (numTaskVertices > maxParallelism) {throw new JobException(String.format("Vertex %s&＃39;s parallelism (%s) is higher than the max parallelism (%s). Please lower the parallelism or increase the max parallelism.",jobVertex.getName(),numTaskVertices,maxParallelism));}this.parallelism &＃61; numTaskVertices;this.taskVertices &＃61; new ExecutionVertex[numTaskVertices];this.operatorIDs &＃61; Collections.unmodifiableList(jobVertex.getOperatorIDs());this.userDefinedOperatorIds &＃61; Collections.unmodifiableList(jobVertex.getUserDefinedOperatorIDs());this.inputs &＃61; new ArrayList<>(jobVertex.getInputs().size());// take the sharing groupthis.slotSharingGroup &＃61; jobVertex.getSlotSharingGroup();this.coLocationGroup &＃61; jobVertex.getCoLocationGroup();// setup the coLocation groupif (coLocationGroup !&＃61; null && slotSharingGroup &＃61;&＃61; null) {throw new JobException("Vertex uses a co-location constraint without using slot sharing");}// create the intermediate resultsthis.producedDataSets &＃61; new IntermediateResult[jobVertex.getNumberOfProducedIntermediateDataSets()];// intermediateResult 开始并行化for (int i &＃61; 0; i < jobVertex.getProducedDataSets().size(); i&＃43;&＃43;) {final IntermediateDataSet result &＃61; jobVertex.getProducedDataSets().get(i);this.producedDataSets[i] &＃61; new IntermediateResult(result.getId(),this,numTaskVertices,result.getResultType());}Configuration jobConfiguration &＃61; graph.getJobConfiguration();int maxPriorAttemptsHistoryLength &＃61; jobConfiguration !&＃61; null ?jobConfiguration.getInteger(JobManagerOptions.MAX_ATTEMPTS_HISTORY_SIZE) :JobManagerOptions.MAX_ATTEMPTS_HISTORY_SIZE.defaultValue();// create all task vertices// 移动计算// ExecutionVertex 开始并行化for (int i &＃61; 0; i < numTaskVertices; i&＃43;&＃43;) {ExecutionVertex vertex &＃61; new ExecutionVertex(this,i,producedDataSets,timeout,initialGlobalModVersion,createTimestamp,maxPriorAttemptsHistoryLength);this.taskVertices[i] &＃61; vertex;}// sanity check for the double referencing between intermediate result partitions and execution verticesfor (IntermediateResult ir : this.producedDataSets) {if (ir.getNumberOfAssignedPartitions() !&＃61; parallelism) {throw new RuntimeException("The intermediate result&＃39;s partitions were not correctly assigned.");}}// set up the input splits, if the vertex has anytry {&＃64;SuppressWarnings("unchecked")InputSplitSource<InputSplit> splitSource &＃61; (InputSplitSource<InputSplit>) jobVertex.getInputSplitSource();if (splitSource !&＃61; null) {Thread currentThread &＃61; Thread.currentThread();ClassLoader oldContextClassLoader &＃61; currentThread.getContextClassLoader();currentThread.setContextClassLoader(graph.getUserClassLoader());try {inputSplits &＃61; splitSource.createInputSplits(numTaskVertices);if (inputSplits !&＃61; null) {splitAssigner &＃61; splitSource.getInputSplitAssigner(inputSplits);}} finally {currentThread.setContextClassLoader(oldContextClassLoader);}}else {inputSplits &＃61; null;}}catch (Throwable t) {throw new JobException("Creating the input splits caused an error: " &＃43; t.getMessage(), t);}}

至此移动计算&＃xff0c;就算清楚了

推荐阅读

config
PHP 5.5.31 和 PHP 5.6.17 安全更新发布

PHP 5.5.31 和 PHP 5.6.17 已正式发布，主要包含多个安全修复。强烈建议所有用户尽快升级至最新版本以确保系统安全。 ... [详细]

蜡笔小新 2024-11-14 17:40:40
config
解决Unreal Engine中UMG按钮长时间按住自动释放的问题

本文探讨了在Unreal Engine中使用UMG按钮时，长时间按住按钮会导致自动释放的问题，并提供了详细的解决方案。 ... [详细]

蜡笔小新 2024-11-14 20:40:39
byte
C#实现文件的压缩与解压

2019独角兽企业重金招聘Python工程师标准一、准备工作1、下载ICSharpCode.SharpZipLib.dll文件2、项目中引用这个dll二、文件压缩与解压共用类 ... [详细]

蜡笔小新 2024-11-14 10:37:34
post
在范围[0..n-1]中产生m个不同的随机数 - Generating m distinct random numbers in the range [0..n-1]

Ihavetwomethodsofgeneratingmdistinctrandomnumbersintherange[0..n-1]我有两种方法在范围[0.n-1]中生 ... [详细]

蜡笔小新 2024-11-13 09:49:14
include
c/c++常用代码doc,ppt,xls文件格式转PDF格式[转]

[转]doc,ppt,xls文件格式转PDF格式http:blog.csdn.netlee353086articledetails7920355确实好用。需要注意的是#import ... [详细]

蜡笔小新 2024-11-12 16:19:40
config
Ubuntu 22.04 安装搜狗输入法详细指南及常见问题解决方案

本文将详细介绍如何在 Ubuntu 22.04 上安装搜狗输入法，并提供常见问题的解决方法。包括下载安装包、更新源、安装依赖项等步骤。 ... [详细]

蜡笔小新 2024-11-15 10:11:27
config
获取父类参数类型工具类

packagecom.panchan.tsmese.utils;importjava.lang.reflect.ParameterizedType;importjava.lang. ... [详细]

蜡笔小新 2024-11-15 09:33:44
config
如何在PHP中高效读取大文件？

探讨PHP中不同文件读取方法的性能差异，特别是针对大文件的处理。 ... [详细]

蜡笔小新 2024-11-15 03:24:24
function
使用HTML和JavaScript实现视频截图功能

本文介绍了如何利用HTML和JavaScript实现从远程MP4、本地摄像头及本地上传的MP4文件中截取视频帧，并展示了具体的实现步骤和示例代码。 ... [详细]

蜡笔小新 2024-11-15 00:19:42
function
周排行与月排行榜开发总结

本文详细介绍了如何在PHP中实现周排行和月排行榜的开发，包括数据库设计、数据记录和查询方法。涉及的知识点包括MySQL的GROUP BY、WEEK和MONTH函数。 ... [详细]

蜡笔小新 2024-11-14 19:14:58
range
（7）Python爬虫——爬取豆瓣电影Top250

利用python爬取豆瓣电影Top250的相关信息，包括电影详情链接,图片链接,影片中文名,影片外国名,评分,评价数,概况,导演,主演,年份,地区,类别这12项内容，然后将爬取的信息写入Exce ... [详细]

蜡笔小新 2024-11-13 11:35:24
config
使用Python和smtplib实现邮件发送功能

本文详细介绍了如何使用Python中的smtplib库来发送带有附件的邮件，并提供了完整的代码示例。作者：多测师_王sir，时间：2020年5月20日 17:24，微信：15367499889，公司：上海多测师信息有限公司。 ... [详细]

蜡笔小新 2024-11-12 12:21:27
loops
python解决CSF布料模拟滤波的批处理问题（解决获取多个点云数据las数据）

解决问题：1、批量读取点云las数据2、点云数据读与写出3、csf滤波分类参考：https:github.comsuyunzzzCSF论文题目ÿ ... [详细]

蜡笔小新 2024-11-12 11:32:15
loops
ARM汇编基础基于Keil创建STM32汇编程序的编写

文章目录一、新建项目（1）工具介绍（2）创建项目：二、配置环境（1）配置芯片&#x ... [详细]

蜡笔小新 2024-11-12 08:39:33
flutter
Flutter中计算文本尺寸的方法

在Flutter开发中，有时需要计算文本的宽度和高度。本文介绍了一种利用TextPainter类实现这一功能的方法。 ... [详细]

蜡笔小新 2024-11-12 00:43:44

php.com

这个家伙很懒，什么也没留下！

Tags | 热门标签

RankList | 热门文章