作者:小薇虫虫_851_413 | 来源:互联网 | 2023-09-24 18:02
FollowingismentionedintheHadoopdefinitiveguideHadoop权威指南中提到了以下内容Whatqualifiesasasmal
Following is mentioned in the Hadoop definitive guide
Hadoop权威指南中提到了以下内容
"What qualifies as a small job? By default one that has less than 10 mappers, only one reducer, and the input size is less than the size of one HDFS block. "
But how does it count no of mapper in a job before executing it on YARN ? In MR1 number of mapper depends on the no. of input splits. is the same applies for the YARN as well ? In YARN containers are flexible. So Is there any way for computing max number of map task that can run on a given cluster in parallel( some kind of tight upper bound, because it will give me rough idea about how much data i can process in parallel ? ) ?
但是在YARN上执行它之前,它如何计算作业中的mapper?在MR1中,映射器的数量取决于否。输入分裂。是否同样适用于YARN?在YARN容器中是灵活的。那么有没有办法计算可以在给定集群上并行运行的最大数量的map任务(某种紧张的上限,因为它会让我粗略地了解我可以并行处理多少数据?)?
2 个解决方案
mapreduce.job.maps = MIN(yarn.nodemanager.resource.memory-mb / mapreduce.map.memory.mb,yarn.nodemanager.resource.cpu-vcores / mapreduce.map.cpu.vcores, number of physical drives x workload factor) x number of worker nodes
mapreduce.job.reduces = MIN(yarn.nodemanager.resource.memory-mb / mapreduce.reduce.memory.mb,yarn.nodemanager.resource.cpu-vcores / mapreduce.reduce.cpu.vcores, # of physical drives xworkload factor) x # of worker nodes
The workload factor can be set to 2.0 for most workloads. Consider a higher setting for CPU-bound workloads.
对于大多数工作负载,工作负载因子可以设置为2.0。考虑更高的CPU绑定工作负载设置。
yarn.nodemanager.resource.memory-mb( Available Memory on a node for containers )= Total System memory – Reserved memory( like 10-20% of memory for Linux and its daemon services) - HDFS Data node ( 1024 MB) – (resources for task buffers, such as the HDFS Sort I/O buffer) – (Memory allocated for DataNode( default 1024 MB), NodeManager, RegionServer etc.)
Hadoop is a disk I/O-centric platform by design. The number of independent physical drives (“spindles”) dedicated to DataNode use limits how much concurrent processing a node can sustain. As a result, the number of vcores allocated to the NodeManager should be the lesser of either:
Hadoop是一个以磁盘I / O为中心的平台。专用于DataNode的独立物理驱动器(“主轴”)的数量限制了节点可以承受的并发处理量。因此,分配给NodeManager的vcores数量应该是以下两者中的较小者:
[(total vcores) – (number of vcores reserved for non-YARN use)] or [ 2 x (number of physical disks used for DataNode storage)]
So
yarn.nodemanager.resource.cpu-vcores = min{ ((total vcores) – (number of vcores reserved for non-YARN use)), (2 x (number of physical disks used for DataNode storage))}
Available vcores on a node for cOntainers= total no. of vcores – for operating system( For calculating vcore demand, consider the number of concurrent processes or tasks each service runs as an initial guide. For OS we take 2 ) – Yarn node Manager( Def. is 1) – HDFS data node( Def. is 1).
Note ==>
mapreduce.map.memory.mb is combination of both mapreduce.map.java.opts.max.heap + some head room (safety value)
The settings for mapreduce.[map | reduce].java.opts.max.heap
specify the default memory allotted for mapper and reducer heap size, respectively. The mapreduce.[map| reduce].memory.mb
settings specify memory allotted their containers, and the value assigned should allow overhead beyond the task heap size. Cloudera recommends applying a factor of 1.2 to the mapreduce.[map | reduce].java.opts.max.heap
setting. The optimal value depends on the actual tasks. Cloudera also recommends setting mapreduce.map.memory.mb to 1–2 GB and setting mapreduce.reduce.memory.mb to twice the mapper value. The ApplicationMaster heap size is 1 GB by default, and can be increased if your jobs contain many concurrent tasks.
mapreduce的设置。[map | reduce] .java.opts.max.heap分别指定为mapper和reducer堆大小分配的默认内存。 mapreduce。[map | reduce] .memory.mb设置指定分配其容器的内存,并且分配的值应允许超出任务堆大小的开销。 Cloudera建议在mapreduce中应用因子1.2。[map | reduce] .java.opts.max.heap设置。最佳值取决于实际任务。 Cloudera还建议将mapreduce.map.memory.mb设置为1-2 GB,并将mapreduce.reduce.memory.mb设置为mapper值的两倍。 ApplicationMaster堆大小默认为1 GB,如果作业包含许多并发任务,则可以增加它。
Reference –
- http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_ig_yarn_tuning.html
- http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_installing_manually_book/content/rpm-chap1-11.html