SparkEnv源码解读

作者：手机用户2602927977 | 来源：互联网 | 2023-08-28 15:24

SparkEnv是Spark的执行环境对象，其中包括众多与Executor执行相关的对象。由于local模式下Driver会创建Executor，而cl

SparkEnv是Spark的执行环境对象&＃xff0c;其中包括众多与Executor执行相关的对象。由于local模式下Driver会创建Executor&＃xff0c;而cluster模式下Worker启动的CoarseGrainedExecutorBackend进程也会创建Executor&＃xff0c;所以SparkEnv存在于Driver或者CoarseGrainedExecutorBackend进程中。

SparkEnv源码解读思维导图

下面是SparkEnv的主构造函数定义&＃xff1a;

/*** :: DeveloperApi ::* Holds all the runtime environment objects for a running Spark instance (either master or worker),* including the serializer, RpcEnv, block manager, map output tracker, etc. Currently* Spark code finds the SparkEnv through a global variable, so all the threads can access the same* SparkEnv. It can be accessed by SparkEnv.get (e.g. after creating a SparkContext).* 保存所有Spark运行时master和worker的环境对象, 包括serializer, RpcEnv, block manager和map output tracker等。* 现在Spark代码通过一个全局变量来引用SparkEnv&＃xff0c;因此所有线程可以访问同一个SparkEnv对象。* 在创建完SparkContext对象后&＃xff0c;它可以通过SparkEnv.get方法访问。*/&＃64;DeveloperApiclass SparkEnv (val executorId: String,private[spark] val rpcEnv: RpcEnv,val serializer: Serializer,val closureSerializer: Serializer,val serializerManager: SerializerManager,val mapOutputTracker: MapOutputTracker,val shuffleManager: ShuffleManager,val broadcastManager: BroadcastManager,val blockManager: BlockManager,val securityManager: SecurityManager,val metricsSystem: MetricsSystem,val memoryManager: MemoryManager,val outputCommitCoordinator: OutputCommitCoordinator,val conf: SparkConf) extends Logging {...}object SparkEnv extends Logging {// 存储全局的SparkEnv对象&＃xff0c;并且为volatile变量&＃xff0c;保证各线程得到相同的值&＃64;volatile private var env: SparkEnv &＃61; _/*** Returns the SparkEnv.*/def get: SparkEnv &＃61; {env}

安全管理器SecurityManager

主要对权限和账号进行设置。如果使用Hadoop YARN作为集群管理器&＃xff0c;则需要使用证书生成secret key登录&＃xff0c;最后给当前系统设置默认的口令认证实例&＃xff0c;验证器实例采用匿名内部类实现。

/*** Spark class responsible for security.* 负责安全的spark类* In general this class should be instantiated by the SparkEnv and most components* should access it from that. There are some cases where the SparkEnv hasn&＃39;t been* initialized yet and this class must be instantiated directly.* 通常这个类应该被SparkEnv初始化&＃xff0c;然后大多数组件应该通过SparkEnv对象访问它。* This class implements all of the configuration related to security features described* in the "Security" document. Please refer to that document for specific features implemented* here.* 这个类实现了所有在官网&＃39;Security&＃39;文档中描述的有关安全特性的配置&＃xff0c;具体实现的特性可参考[文档](https://spark.apache.org/docs/latest/security.html "spark安全文档")。*/private[spark] class SecurityManager(sparkConf: SparkConf,val ioEncryptionKey: Option[Array[Byte]] &＃61; None,authSecretFileConf: ConfigEntry[Option[String]] &＃61; AUTH_SECRET_FILE)extends Logging with SecretKeyHolder {// Set our own authenticator to properly negotiate user/password for HTTP connections.// This is needed by the HTTP client fetching from the HttpServer. Put here so its// only set once.// 设定你自己的验证器为HTTP连接协商正确的用户名/密码。// 用户名/密码是给从HTTP服务器获取的HTTP客户端使用的。把它放在这里是为了保证它只被设置一次。// 用于每次使用HTTP client从HTTP服务器获取用户的用户和密码。这是由于Spark的节点间通信往往需要动态协商用户名、密码if (authOn) {Authenticator.setDefault(new Authenticator() {override def getPasswordAuthentication(): PasswordAuthentication &＃61; {var passAuth: PasswordAuthentication &＃61; nullval userInfo &＃61; getRequestingURL().getUserInfo()if (userInfo !&＃61; null) {val parts &＃61; userInfo.split(":", 2)passAuth &＃61; new PasswordAuthentication(parts(0), parts(1).toCharArray())}return passAuth}})}...}

rpc通信环境RpcEnv

用来进行Master和Worker间的分布式通信及数据传输。RpcEnv是RPC的环境对象&＃xff0c;管理着整个RpcEndpoint的生命周期&＃xff0c;其主要功能有&＃xff1a;根据name或uri注册endpoints、管理各种消息的处理、停止endpoints。其中RpcEnv只能通过NettyRpcEnvFactory创建得到。
Rpc组件关系图

RpcEndpoint是一个通信端&＃xff0c;例如Spark集群中的Master&＃xff0c;或Worker&＃xff0c;都是一个RpcEndpoint。但是&＃xff0c;如果想要与一个RpcEndpoint端进行通信&＃xff0c;一定需要获取到该RpcEndpoint一个RpcEndpointRef&＃xff0c;通过RpcEndpointRef与RpcEndpoint进行通信&＃xff0c;只能通过一个RpcEnv环境对象来获取RpcEndpoint对应的RPCEndpointRef。客户端通过RpcEndpointRef发消息&＃xff0c;首先通过RpcEnv来处理这个消息&＃xff0c;找到这个消息具体发给谁&＃xff0c;然后路由给RpcEndpoint实体。

为什么从Spark 1.6以后用Netty代替Akka作为底层通信框架&＃xff0c;实现类为NettyRpcEnv&＃xff1a;

Akka不同版本之间无法互相通信&＃xff0c;这就要求用户必须使用跟Spark完全一样的Akka版本&＃xff0c;导致用户无法升级Akka。
Spark的Akka配置是针对Spark自身来调优的&＃xff0c;可能跟用户自己代码中的Akka配置冲突。

序列化管理器SerializerManager

用来配置各Spark组件的序列化、压缩和加密。serializer默认实现为org.apache.spark.serializer.JavaSerializer&＃xff0c;用户可以通过spark.serializer属性配置其他的序列化实现&＃xff0c;如org.apache.spark.serializer.KryoSerializer。

/*** Component which configures serialization, compression and encryption for various Spark* components, including automatic selection of which [[Serializer]] to use for shuffles.* SerializerManager组件用来配置各Spark组件的序列化、压缩和加密&＃xff0c;包括自动选择用于shuffle的[[Serializer]]*/private[spark] class SerializerManager(defaultSerializer: Serializer,conf: SparkConf,encryptionKey: Option[Array[Byte]]) {// Whether to compress broadcast variables that are stored// 是否压缩存储的广播变量private[this] val compressBroadcast &＃61; conf.get(config.BROADCAST_COMPRESS)// Whether to compress shuffle output that are stored// 是否压缩存储的shuffle输出private[this] val compressShuffle &＃61; conf.get(config.SHUFFLE_COMPRESS)// Whether to compress RDD partitions that are stored serialized// 是否压缩存储序列化的RDD分区private[this] val compressRdds &＃61; conf.get(config.RDD_COMPRESS)// Whether to compress shuffle output temporarily spilled to disk// 是否压缩shuffle临时spill到磁盘上的输出private[this] val compressShuffleSpill &＃61; conf.get(config.SHUFFLE_SPILL_COMPRESS)...}

闭包序列化器closureSerializer

实现类固定为org.apache.spark.serializer.JavaSerializer&＃xff0c;用户不能够自己指定。JavaSerializer采用Java语言内建的ObjectOutputStream将闭包序列化&＃xff0c;目前Task的序列化只支持Java序列化。

val closureSerializer &＃61; new JavaSerializer(conf)/*** :: DeveloperApi ::* A Spark serializer that uses Java&＃39;s built-in serialization.* 一个基于java内建序列化实现的spark序列化器* &＃64;note This serializer is not guaranteed to be wire-compatible across different versions of* Spark. It is intended to be used to serialize/de-serialize data within a single* Spark application.* */&＃64;DeveloperApiclass JavaSerializer(conf: SparkConf) extends Serializer with Externalizable {...}

Spark对闭包序列化前&＃xff0c;会通过工具类org.apache.spark.util.ClosureCleaner尝试clean掉闭包中无关的外部对象引用&＃xff0c;ClosureCleaner对闭包的处理是在运行期间&＃xff0c;相比Scala编译器&＃xff0c;能更精准的去除闭包中无关的引用。这样做&＃xff0c;一方面可以尽可能保证闭包可被序列化&＃xff0c;另一方面可以减少闭包序列化后的大小&＃xff0c;便于网络传输。

广播管理器BroadcastManager

用于将配置信息和序列化后的RDD、Job以及ShuffleDependency等信息在本地存储。如果为了容灾&＃xff0c;也会复制到其他节点上。

// Called by SparkContext or Executor before using Broadcast// BroadcastManager必须在初始化方法initialize调用后才能使用private def initialize() {synchronized {if (!initialized) {broadcastFactory &＃61; new TorrentBroadcastFactorybroadcastFactory.initialize(isDriver, conf, securityManager)initialized &＃61; true}}}// 通过原子长整型对象来为广播对象生成唯一的idprivate val nextBroadcastId &＃61; new AtomicLong(0)// newBroadcast实际代理了工厂TorrentBroadcastFactory的newBroadcast方法来生成广播对象def newBroadcast[T: ClassTag](value_ : T, isLocal: Boolean): Broadcast[T] &＃61; {broadcastFactory.newBroadcast[T](value_, isLocal, nextBroadcastId.getAndIncrement())}// unbroadcast方法实际代理了工厂TorrentBroadcastFactory的unbroadcast方法将广播对象移除def unbroadcast(id: Long, removeFromDriver: Boolean, blocking: Boolean) {broadcastFactory.unbroadcast(id, removeFromDriver, blocking)}

map任务输出跟踪器mapOutputTracker

用于跟踪map阶段任务的输出状态&＃xff0c;此状态便于reduce阶段任务获取地址及中间输出结果。

在Driver中则创建MapOutputTrackerMaster&＃xff0c;否则创建MapOutputTrackerWorker。通过registerOrLookupEndpoint方法&＃xff0c;如果是在Driver中则设置trackerEndpoint为MapOutputTrackerMasterEndpoint&＃xff0c;否则利用RpcUtils.makeDriverRef找到Driver中trackerEndpoint的rpc引用。map任务的状态正是由Executor向Driver发送GetMapOutputStatuses消息&＃xff0c;将map任务状态同步回来。

val mapOutputTracker &＃61; if (isDriver) {new MapOutputTrackerMaster(conf, broadcastManager, isLocal)} else {new MapOutputTrackerWorker(conf)}// Have to assign trackerEndpoint after initialization as MapOutputTrackerEndpoint// requires the MapOutputTracker itselfmapOutputTracker.trackerEndpoint &＃61; registerOrLookupEndpoint(MapOutputTracker.ENDPOINT_NAME,new MapOutputTrackerMasterEndpoint(rpcEnv, mapOutputTracker.asInstanceOf[MapOutputTrackerMaster], conf))def registerOrLookupEndpoint(name: String, endpointCreator: &＃61;> RpcEndpoint):RpcEndpointRef &＃61; {if (isDriver) {logInfo("Registering " &＃43; name)rpcEnv.setupEndpoint(name, endpointCreator)} else {RpcUtils.makeDriverRef(name, conf, rpcEnv)}}// Executor向Driver发送消息寻问对应shuffleId的map任务状态if (fetchedStatuses &＃61;&＃61; null) {// We won the race to fetch the statuses; do sologInfo("Doing the fetch; tracker endpoint &＃61; " &＃43; trackerEndpoint)// This try-finally prevents hangs due to timeouts:try {val fetchedBytes &＃61; askTracker[Array[Byte]](GetMapOutputStatuses(shuffleId))fetchedStatuses &＃61; MapOutputTracker.deserializeMapStatuses(fetchedBytes)logInfo("Got the output locations")mapStatuses.put(shuffleId, fetchedStatuses)} finally {fetching.synchronized {fetching -&＃61; shuffleIdfetching.notifyAll()}}}

MapOutputTrackerMaster中通过shuffleStatuses来维护跟踪各个map任务的输出状态&＃xff0c;其中key对应shuffleId&＃xff0c;ShuffleStatus保存各个map任务对应的状态信息MapStatus。由于MapStatus维护了map输出Block的地址BlockManagerId&＃xff0c;所以reduce任务知道从何处获取map任务的中间输出。

// HashMap for storing shuffleStatuses in the driver.// Statuses are dropped only by explicit de-registering.// Exposed for testing// 在driver中用来存储shuffle状态的map&＃xff0c;只有被显示注销状态才会被清除val shuffleStatuses &＃61; new ConcurrentHashMap[Int, ShuffleStatus]().asScala/*** Helper class used by the [[MapOutputTrackerMaster]] to perform bookkeeping for a single* ShuffleMapStage.* [[MapOutputTrackerMaster]]使用的工具类&＃xff0c;用来订阅一个ShuffleMapStage的状态信息* This class maintains a mapping from mapIds to &＃96;MapStatus&＃96;. It also maintains a cache of* serialized map statuses in order to speed up tasks&＃39; requests for map output statuses.* 这个类保存了一个mapId到MapStatus的映射。* 为了加速tasks请求map输出状态&＃xff0c;同样也保存了一个序列化的map任务状态缓存。* All public methods of this class are thread-safe.* 该类的所有public方法都是线程安全的。*/private class ShuffleStatus(numPartitions: Int) {// All accesses to the following state must be guarded with &＃96;this.synchronized&＃96;./*** MapStatus for each partition. The index of the array is the map partition id.* Each value in the array is the MapStatus for a partition, or null if the partition* is not available. Even though in theory a task may run multiple times (due to speculation,* stage retries, etc.), in practice the likelihood of a map output being available at multiple* locations is so small that we choose to ignore that case and store only a single location* for each output.* 存储每个partition对应的map任务状态&＃xff0c;数组的下标是map任务的partition id值。* 数组中的每个值对应每个partition的MapStatus&＃xff0c;如果某个partition还不可用则为null。* 尽管理论上一个task可能运行多次&＃xff08;由于推测执行或者stage重试等&＃xff09;&＃xff0c;* 但实际上一个map任务的输出同时在多个地方可用的概率很小&＃xff0c;因此我们选择忽略这种情况而只为每个输出存储一个单一地址。*/// Exposed for testingval mapStatuses &＃61; new Array[MapStatus](numPartitions)...}/*** Result returned by a ShuffleMapTask to a scheduler. Includes the block manager address that the* task ran on as well as the sizes of outputs for each reducer, for passing on to the reduce tasks.* ShuffleMapTask任务返回给调度器的结果。包含了任务运行的块管理器的地址&＃xff0c;以及为每个reduce task输出数据的大小。*/private[spark] sealed trait MapStatus {/** Location where this task was run. */def location: BlockManagerId...}

shuffle管理器ShuffleManager

负责管理本地及远程的block数据的shuffle操作。ShuffleManager默认为通过反射方式生成的org.apache.spark.shuffle.sort.SortShuffleManager实例&＃xff0c;目前还支持org.apache.spark.shuffle.sort.SortShuffleManager方式&＃xff0c;可以通过修改属性spark.shuffle.manager为tungsten-sort改变。

/*** Pluggable interface for shuffle systems. A ShuffleManager is created in SparkEnv on the driver* and on each executor, based on the spark.shuffle.manager setting. The driver registers shuffles* with it, and executors (or tasks running locally in the driver) can ask to read and write data.* shuffle系统的可插拔接口。ShuffleManager基于spark.shuffle.manager配置被在Driver和每个Executor的SparkEnv中创建。* Driver利用它注册shuffle过程&＃xff0c;executors&＃xff08;或者运行在driver本地的任务&＃xff09;可以用它来读写数据。* NOTE: this will be instantiated by SparkEnv so its constructor can take a SparkConf and* boolean isDriver as parameters.* 注意它将被SparkEnv初始化&＃xff0c;所以它的构造器会需要一个SparkConf对象和布尔值isDriver作为参数。*/private[spark] trait ShuffleManager {...}

内存管理器MemoryManager

实现类为UnifiedMemoryManager动态内存管理器&＃xff0c;execution部分和storage部分可以相互借用内存。

总的来说内存分为三大块&＃xff0c;包括storageMemory&＃xff08;存储内存&＃xff09;、executionMemory&＃xff08;执行内存&＃xff09;和系统预留&＃xff0c;其中storageMemory用来缓存rdd&＃xff0c;unroll partition&＃xff0c;存放direct task result、广播变量&＃xff0c;在 Spark Streaming receiver 模式中存放每个 batch 的 blocks。executionMemory用于shuffle、join、sort、aggregation 中的缓存。除了这两者以外的内存都是预留给系统的。storageMemory和executionMemory初始状态是内存各占一半&＃xff0c;但其中一方内存不足时可以向对方借用&＃xff0c;对内存资源进行合理有效的利用&＃xff0c;提高了整体资源的利用率。当storageMemory占用executionMemory内存时&＃xff0c;如果此时executionMemory内存不足&＃xff0c;则cached blocks会被从内存中清除直到释放足够的借用内存来满足executionMemory的要求。反之当storageMemory不足时&＃xff0c;executionMemory也不会立即清除内存来返还给storageMemory。

/*** A [[MemoryManager]] that enforces a soft boundary between execution and storage such that* either side can borrow memory from the other.* UnifiedMemoryManager是一个在execution和storage间实行动态边界的[[MemoryManager]]&＃xff0c;因此两者可以相互借用内存。* The region shared between execution and storage is a fraction of (the total heap space - 300MB)* configurable through &＃96;spark.memory.fraction&＃96; (default 0.6). The position of the boundary* within this space is further determined by &＃96;spark.memory.storageFraction&＃96; (default 0.5).* This means the size of the storage region is 0.6 * 0.5 &＃61; 0.3 of the heap space by default.* execution和storage共同占有的区域是&＃xff08;总堆内存-300M&＃xff09;的一个系数&＃xff0c;通过 &＃96;spark.memory.fraction&＃96; (默认值0.6)配置。* execution和storage在堆内存中的边界位置由参数&＃96;spark.memory.storageFraction&＃96; (默认值0.5)决定。* 这意味着storage区域的大小默认为堆内存大小的0.6 * 0.5 &＃61; 0.3倍。* Storage can borrow as much execution memory as is free until execution reclaims its space.* When this happens, cached blocks will be evicted from memory until sufficient borrowed* memory is released to satisfy the execution memory request.* 只要execution内存空闲则Storage可以就可以借用&＃xff0c;直到execution回收它自己的空间。* Similarly, execution can borrow as much storage memory as is free. However, execution* memory is *never* evicted by storage due to the complexities involved in implementing this.* The implication is that attempts to cache blocks may fail if execution has already eaten* up most of the storage space, in which case the new blocks will be evicted immediately* according to their respective storage levels.* 同样&＃xff0c;只要storage内存空闲则execution同样可以借用。* 然而考虑到实现的复杂性&＃xff0c;execution内存从不会因为storage回收自己内存而被清除。* 实际上是&＃xff0c;如果execution已经占用了大部分的storage空间&＃xff0c;当试图缓存blocks时可能会失败&＃xff0c;* 这种情况下新的blocks会根据它们对应的存储级别而被立即清除。* &＃64;param onHeapStorageRegionSize Size of the storage region, in bytes.* This region is not statically reserved; execution can borrow from* it if necessary. Cached blocks can be evicted only if actual* storage memory usage exceeds this region.* storage区域大小&＃xff0c;单位为字节。这个区域不会被固定保留&＃xff0c;如有必要execution可以借用。* 如果实际的storage使用内存超过了这个区域大小&＃xff0c;缓存的blocks会被清除。*/private[spark] class UnifiedMemoryManager(conf: SparkConf,val maxHeapMemory: Long,onHeapStorageRegionSize: Long,numCores: Int)extends MemoryManager(conf,numCores,onHeapStorageRegionSize,maxHeapMemory - onHeapStorageRegionSize) {...}

块传输服务blockTransferService

具体实现为NettyBlockTransferService&＃xff0c;使用Netty提供的异步事件驱动的网络应用框架&＃xff0c;提供web服务及客户端&＃xff0c;获取远程节点上Block的集合。

BlockManagerMaster

负责对BlockManager的管理和协调&＃xff0c;具体操作依赖于BlockManagerMasterEndpoint。通过registerOrLookupEndpoint方法查找或者注册BlockManagerMasterEndpoint&＃xff0c;对Driver和Executor处理BlockManagerMaster的方式不同&＃xff1a;

当前应用程序是Driver&＃xff0c;则创建BlockManagerMasterEndpoint&＃xff0c;并且注册到Dispatcher中&＃xff0c;注册名为 BlockManagerMaster&＃xff1b;
当前应用程序是Executor&＃xff0c;则从远端Driver实例的NettyRpcEnv的Dispatcher中查找BlockManagerMasterEndpoint的引用。

val blockManagerMaster &＃61; new BlockManagerMaster(registerOrLookupEndpoint(BlockManagerMaster.DRIVER_ENDPOINT_NAME,new BlockManagerMasterEndpoint(rpcEnv, isLocal, conf, listenerBus)),conf, isDriver)/*** BlockManagerMasterEndpoint is an [[ThreadSafeRpcEndpoint]] on the master node to track statuses* of all slaves&＃39; block managers.* BlockManagerMasterEndpoint是在master节点上的一个[[ThreadSafeRpcEndpoint]]&＃xff0c;为了跟踪所有slaves的块管理器的状态。*/private[spark]class BlockManagerMasterEndpoint(override val rpcEnv: RpcEnv,val isLocal: Boolean,conf: SparkConf,listenerBus: LiveListenerBus)extends ThreadSafeRpcEndpoint with Logging {...}

BlockManager

负责对Block的管理。BlockManager对象在SparkContext初始化创建SparkEnv执行环境被创建&＃xff0c;而在 SparkContext后续的初始化过程中调用其initialize()完成其初始化。

// NB: blockManager is not valid until initialize() is called later.// BlockManager只有在初始化方法initialize被调用后才是有效的val blockManager &＃61; new BlockManager(executorId, rpcEnv, blockManagerMaster,serializerManager, conf, memoryManager, mapOutputTracker, shuffleManager,blockTransferService, securityManager, numUsableCores)/*** Manager running on every node (driver and executors) which provides interfaces for putting and* retrieving blocks both locally and remotely into various stores (memory, disk, and off-heap).* 管理器运行在driver和executor每个节点上&＃xff0c;为从本地和远程存储和检索blocks到不同的存储&＃xff08;内存、磁盘和堆外内存&＃xff09;提供接口。* Note that [[initialize()]] must be called before the BlockManager is usable.* 注意[[initialize()]]方法必须在BlockManager使用前被调用。*/private[spark] class BlockManager(executorId: String,rpcEnv: RpcEnv,val master: BlockManagerMaster,val serializerManager: SerializerManager,val conf: SparkConf,memoryManager: MemoryManager,mapOutputTracker: MapOutputTracker,shuffleManager: ShuffleManager,val blockTransferService: BlockTransferService,securityManager: SecurityManager,numUsableCores: Int)extends BlockDataManager with BlockEvictionHandler with Logging {...}

指标系统MetricsSystem

可以看出创建度量系统根据当前实例是Driver还是Executor有所区别。

val metricsSystem &＃61; if (isDriver) {// 当前实例为Driver则创建度量系统并指定实例名为driver// Don&＃39;t start metrics system right now for Driver.// 在Driver中不要立即启动度量系统。// We need to wait for the task scheduler to give us an app ID.// 我们需要等待SparkContext中的任务调度器TaskScheculer告诉我们应用程序ID。// Then we can start the metrics system.// 然后我们可以启动度量系统。MetricsSystem.createMetricsSystem(MetricsSystemInstances.DRIVER, conf, securityManager)} else {// 当前实例为Executor则创建度量系统并指定实例名为executor// We need to set the executor ID before the MetricsSystem is created because sources and// sinks specified in the metrics configuration file will want to incorporate this executor&＃39;s// ID into the metrics they report.// 我们需要在度量系统创建前设置spark.executor.id属性为当前Executor的ID&＃xff0c;// 因为在指标配置文件中指定的数据源和收集器想要将此executor的ID包括到它们上报的指标中。conf.set(EXECUTOR_ID, executorId)val ms &＃61; MetricsSystem.createMetricsSystem(MetricsSystemInstances.EXECUTOR, conf,securityManager)ms.start()ms}

构造MetricsSystem的过程最重要的是调用了MetricsConfig的initialize方法。initialize方法主要负责加载metrics.properties文件&＃xff0c;可通过spark.metrics.conf参数修改。

/** * Load properties from various places, based on precedence * If the same property is set again latter on in the method, it overwrites the previous value */ def initialize() {// Add default properties in case there&＃39;s no properties filesetDefaultProperties(properties)loadPropertiesFromFile(conf.get(METRICS_CONF))// Also look for the properties in provided Spark configurationval prefix &＃61; "spark.metrics.conf."conf.getAll.foreach {case (k, v) if k.startsWith(prefix) &＃61;>properties.setProperty(k.substring(prefix.length()), v)case _ &＃61;>}

输出提交协调器OutputCommitCoordinator

当Spark应用程序使用了Spark SQL&＃xff08;包括Hive&＃xff09;或者需要将任务的输出保存到HDFS时&＃xff0c;OutputCommitCoordinator将决定任务是否可以提交输出到HDFS&＃xff08;使用"第一个提交者获胜"的策略&＃xff09;。

val outputCommitCoordinator &＃61; mockOutputCommitCoordinator.getOrElse {new OutputCommitCoordinator(conf, isDriver)}val outputCommitCoordinatorRef &＃61; registerOrLookupEndpoint("OutputCommitCoordinator",new OutputCommitCoordinatorEndpoint(rpcEnv, outputCommitCoordinator))outputCommitCoordinator.coordinatorRef &＃61; Some(outputCommitCoordinatorRef)/*** Authority that decides whether tasks can commit output to HDFS. Uses a "first committer wins"* policy.** OutputCommitCoordinator is instantiated in both the drivers and executors. On executors, it is* configured with a reference to the driver&＃39;s OutputCommitCoordinatorEndpoint, so requests to* commit output will be forwarded to the driver&＃39;s OutputCommitCoordinator.** This class was introduced in SPARK-4879; see that JIRA issue (and the associated pull requests)* for an extensive design discussion.*/private[spark] class OutputCommitCoordinator(conf: SparkConf, isDriver: Boolean) extends Logging {...}

无论是Driver还是Executor&＃xff0c;最后都由OutputCommitCoordinator的属性coordinatorRef持有 OutputCommitCoordinatorEndpoint的引用&＃xff1a;

当前实例为Driver时&＃xff0c;则创建OutputCommitCoordinatorEndpoint&＃xff0c;并且注册到Dispatcher中&＃xff0c;注册名为 OutputCommitCoordinator&＃xff1b;
当前实例为Executor时&＃xff0c;则从远端Driver实例的NettyRpcEnv的Dispatcher中查找OutputCommitCoordinatorEndpoint的引用。