Hadoop2.6.0不能在WordCount示例中减少任务。-Hadoop2.6.0doesnotworkreducetasksinWordCountexample

作者：慧萍书群415 | 来源：互联网 | 2023-09-11 13:36

IveinstalledHadoopClusterSetuponMultipleNodes(physical).IhaveoneserverforNameNode,Re

I've installed Hadoop Cluster Setup on Multiple Nodes (physical). I have one server for NameNode, ResourceManager and JobHistory server. I have two servers for DataNodes. I followed this tutorial while configuring.

我在多个节点上安装了Hadoop集群设置(物理)。我有一个服务器用于NameNode, ResourceManager和JobHistory服务器。我有两台DataNodes服务器。我在配置时遵循了本教程。

I tried to test MapReduce programs, such as WordCount, Terasoft, Teragen and etc, all i can launch from hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar

我试着测试MapReduce程序，比如WordCount、Terasoft、Teragen等，我可以从hadoop/share/hadoop/ MapReduce /hadoop- MapReduce -examples-2.6.0.jar中启动。

So, Teragen and randomwriter i launch and they completed with success status (because there are no Reduce tasks, only Map tasks), but when i tried launch WordCount or WordMean, Map tasks completed (1 task), but Reduce 0% all the time. It is just stop completing. In yarn-root-resourcemanager-yamaster.log after success Map tasks i see only one row:

所以，Teragen和randomwriter我启动了，他们成功地完成了状态(因为没有Reduce任务，只有Map任务)，但是当我尝试启动WordCount或WordMean时，Map任务完成了(1个任务)，但是一直减少0%。它只是停止完成。在yarn-root-resourcemanager-yamaster。登录成功地图任务后，我只看到一行:

INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Null container completed...

I tried to find out solution and i've found similar question on SOF, but there are no correct answer, actually i don't know how to see free reducers in resource manager. What i have:

我试图找到解决方案，我也发现了类似的问题，但是没有正确的答案，实际上我不知道如何在资源管理器中看到免费的减速器。我有什么:

Hadoop Web Interface: master:50070
Hadoop Web界面:主:50070
Resource manager: master:8088
资源管理器:主:8088
JobHistory Server: master:19888/jobhistory
JobHistory服务器:主:19888 / JobHistory

UPDATE: I try to launch wordcount example program without reduce tasks using key -D mapd.reduce.tasks=0:

更新:我尝试启动wordcount示例程序，而不使用key -D map .reduce.tasks=0:

hadoop jar hadoop-mapreduce-examples-2.6.0.jar wordcount -D mapd.reduce.tasks=0 /bigtext.txt /bigtext_wc_1.txt

And it works. I got wordcount result. It's wrong, coz no reduce, but my program completed.

和它的工作原理。我得到wordcount结果。这是错误的，因为没有减少，但是我的程序完成了。

15/02/03 12:40:37 INFO mapreduce.Job: Running job: job_1422950901990_0004
15/02/03 12:40:52 INFO mapreduce.Job: Job job_1422950901990_0004 running in uber mode : false
15/02/03 12:40:52 INFO mapreduce.Job:  map 0% reduce 0%
15/02/03 12:41:03 INFO mapreduce.Job:  map 100% reduce 0%
15/02/03 12:41:04 INFO mapreduce.Job: Job job_1422950901990_0004 completed successfully
15/02/03 12:41:05 INFO mapreduce.Job: Counters: 30

UPDATE #2:

更新2:

More information from application log:

更多来自应用程序日志的信息:

2015-02-03 15:02:12,008 INFO [IPC Server handler 0 on 55452] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1422959549820_0005_m_000000_0 is : 1.0
2015-02-03 15:02:12,025 INFO [IPC Server handler 1 on 55452] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Done acknowledgement from attempt_1422959549820_0005_m_000000_0
2015-02-03 15:02:12,028 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1422959549820_0005_m_000000_0 TaskAttempt Transitioned from RUNNING to SUCCESS_CONTAINER_CLEANUP
2015-02-03 15:02:12,029 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container container_1422959549820_0005_01_000002 taskAttempt attempt_1422959549820_0005_m_000000_0
2015-02-03 15:02:12,030 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1422959549820_0005_m_000000_0
2015-02-03 15:02:12,030 INFO [ContainerLauncher #1] org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: Opening proxy : slave102.hadoop.ot.ru:51573
2015-02-03 15:02:12,063 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1422959549820_0005_m_000000_0 TaskAttempt Transitioned from SUCCESS_CONTAINER_CLEANUP to SUCCEEDED
2015-02-03 15:02:12,084 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with attempt attempt_1422959549820_0005_m_000000_0
2015-02-03 15:02:12,087 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1422959549820_0005_m_000000 Task Transitioned from RUNNING to SUCCEEDED
2015-02-03 15:02:12,094 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 1
2015-02-03 15:02:12,792 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:1 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:1 AssignedReds:0 CompletedMaps:1 CompletedReds:0 ContAlloc:1 ContRel:0 HostLocal:1 RackLocal:0
2015-02-03 15:02:12,794 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule, headroom=
2015-02-03 15:02:12,794 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce slow start threshold reached. Scheduling reduces.
2015-02-03 15:02:12,795 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: All maps assigned. Ramping up all remaining reduces:1
2015-02-03 15:02:12,795 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:1 AssignedMaps:1 AssignedReds:0 CompletedMaps:1 CompletedReds:0 ContAlloc:1 ContRel:0 HostLocal:1 RackLocal:0
2015-02-03 15:02:13,805 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() for application_1422959549820_0005: ask=1 release= 0 newCOntainers=0 finishedCOntainers=1 resourcelimit= knownNMs=4
2015-02-03 15:02:13,806 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received completed container container_1422959549820_0005_01_000002
2015-02-03 15:02:13,808 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:1 AssignedMaps:0 AssignedReds:0 CompletedMaps:1 CompletedReds:0 ContAlloc:1 ContRel:0 HostLocal:1 RackLocal:0
2015-02-03 15:02:13,808 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1422959549820_0005_m_000000_0: Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

Configuration files of cluster.

集群的配置文件。

hdfs-site.xml



    
        dfs.namenode.name.dir
        /grid/hadoop1/nn
        Path on the local filesystem where the NameNode stores the namespace and transactions logs persistently. If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.
    
    
            dfs.namenode.hosts
            /opt/current/hadoop/etc/hadoop/slaves
            List of permitted DataNodes.If necessary, use these files to control the list of allowable datanodes.
    
    
            dfs.namenode.hosts.exclude
            /opt/current/hadoop/etc/hadoop/excludes
            List of excluded DataNodes. If necessary, use these files to control the list of allowable datanodes.
    
    
            dfs.blocksize
            268435456
            HDFS blocksize of 256MB for large file-systems.
    
    
            dfs.namenode.handler.count
            100
            More NameNode server threads to handle RPCs from large number of DataNodes.
    


    
            dfs.datanode.data.dir
            /grid/hadoop1/dn
            Comma separated list of paths on the local filesystem of a DataNode where it should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices.

core-site.xml


    
        fs.defaultFS
        hdfs://master:8020
        Default hdfs filesystem on namenode host like - hdfs://host:port/
    
    
        io.file.buffer.size
        131072
        Size of read/write buffer used in SequenceFiles.

mapred-site.xml


        
        
                mapreduce.framework.name
                yarn
                Execution framework set to Hadoop YARN.
        
        
                mapreduce.map.memory.mb
                1536
                Larger resource limit for maps.
        
        
                mapreduce.map.java.opts
                -Xmx1024M
                Larger heap-size for child jvms of maps.
        
        
                mapreduce.reduce.memory.mb
                3072
                Larger resource limit for reduces.
        
        
                mapreduce.reduce.java.opts
                -Xmx2560M
                Larger heap-size for child jvms of reduces.
        
        
                mapreduce.task.io.sort.mb
                512
                Higher memory-limit while sorting data for efficiency.
        
        
                mapreduce.task.io.sort.factor
                100
                More streams merged at once while sorting files.
        
        
                mapreduce.reduce.shuffle.parallelcopies
                50
                Higher number of parallel copies run by reduces to fetch outputs from very large number of maps.
        

        
                mapreduce.jobhistory.address
                master:10020
                MapReduce JobHistory Server host:port. Default port is 10020.
        
        
                mapreduce.jobhistory.webapp.address
                master:19888
                MapReduce JobHistory Server Web UI host:port. Default port is 19888.
        
        
                mapreduce.jobhistory.intermediate-done-dir
                /mr-history/tmp
                Directory where history files are written by MapReduce jobs.
        
        
                mapreduce.jobhistory.done-dir
                /mr-history/done
                Directory where history files are managed by the MR JobHistory Server.

yarn-site.xml





        
                yarn.acl.enable
                yes
                Enable ACLs? Defaults to false.
        
        
                yarn.admin.acl
                false
                ACL to set admins on the cluster. ACLs are of for comma-separated-usersspacecomma-separated-groups. Defaults to special value of * which means anyone. Special value of just space means no one has access.
        
        
                yarn.log-aggregation-enable
                false
                Configuration to enable or disable log aggregation
        
     
        
                yarn.resourcemanager.address
                master:8050
                Value: host:port. If set, overrides the hostname set in yarn.resourcemanager.hostname.
        
        
                yarn.resourcemanager.scheduler.address
                master:8030
                ResourceManager host:port for ApplicationMasters to talk to Scheduler to obtain resources. If set, overrides the hostname set in yarn.resourcemanager.hostname.
        
        
                yarn.resourcemanager.resource-tracker.address
                master:8025
                ResourceManager host:port for NodeManagers. If set, overrides the hostname set in yarn.resourcemanager.hostname.
        
        
                yarn.resourcemanager.admin.address
                master:8141
                ResourceManager host:port for administrative commands. If set, overrides the hostname set in yarn.resourcemanager.hostname.
        
        
                yarn.resourcemanager.webapp.address
                master:8088
                web-ui host:port. If set, overrides the hostname set in
                     
        
                yarn.resourcemanager.hostname
                master
                ResourceManager host
        
        
                yarn.resourcemanager.scheduler.class
                org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
                ResourceManager Scheduler class.
        
        
            yarn.scheduler.maximum-allocation-mb
            6144
            Maximum limit of memory to allocate to each container request at the Resource Manager. In MBs
         

        
            yarn.scheduler.minimum-allocation-mb
            2048
            Minimum limit of memory to allocate to each container request at the Resource Manager. In MBs
         
        
                yarn.resourcemanager.nodes.include-path
                /opt/current/hadoop/etc/hadoop/slaves
                List of permitted NodeManagers. If necessary, use these files to control the list of allowable NodeManagers.
        
        
                yarn.resourcemanager.nodes.exclude-path
                /opt/current/hadoop/etc/hadoop/excludes
                List of excluded NodeManagers. If necessary, use these files to control the list of allowable NodeManagers.
        

        
                yarn.nodemanager.resource.memory-mb
                2048
                Resource i.e. available physical memory, in MB, for given NodeManager. Defines total available resources on the NodeManager to be made available to running containers
        
        
            yarn.nodemanager.vmem-pmem-ratio
            2.1
            Maximum ratio by which virtual memory usage of tasks may exceed physical memory. The virtual memory usage of each task may exceed its physical memory limit by this ratio. The total amount of virtual memory used by tasks on the NodeManager may exceed its physical memory usage by this ratio.
        
        
                yarn.nodemanager.local-dirs
                /grid/hadoop1/yarn/local
                Comma-separated list of paths on the local filesystem where intermediate data is written.Multiple paths help spread disk i/o.
            
        
            yarn.nodemanager.log-dirs
            /var/log/hadoop-yarn/containers
            Where to store container logs.
         
        
            yarn.nodemanager.log.retain-second
            10800
            Default time (in seconds) to retain log files on the NodeManager Only applicable if log-aggregation is disabled.
             
        
            yarn.nodemanager.remote-app-log-dir
            /logs
            HDFS directory where the application logs are moved on application completion. Need to set appropriate permissions. Only applicable if log-aggregation is enabled.
         
        
            yarn.nodemanager.remote-app-log-dir-suffix
            logs
            Suffix appended to the remote log dir. Logs will be aggregated to ${yarn.nodemanager.remote-app-log-dir}/${user}/${thisParam} Only applicable if log-aggregation is enabled.
        
        
            yarn.nodemanager.aux-services
            mapreduce_shuffle
            Shuffle service that needs to be set for Map Reduce applications.

And finally /etc/hosts:

最后/ etc / hosts:

127.0.0.1 localhost

## BigData Hadoop Lab ##
#Name Node
172.25.28.100 master.hadoop.ot.ru master
172.25.28.101 secondary.hadoop.ot.ru secondary
#DataNodes on DL Servers
172.25.28.102 slave102.hadoop.ot.ru slave102
172.25.28.103 slave103.hadoop.ot.ru slave103
172.25.28.104 slave104.hadoop.ot.ru slave104
172.25.28.105 slave105.hadoop.ot.ru slave105
172.25.28.106 slave106.hadoop.ot.ru slave106
172.25.28.107 slave107.hadoop.ot.ru slave107
#DataNodes on ARM Servers
172.25.40.25 slave25.hadoop.ot.ru slave25
172.25.40.26 slave26.hadoop.ot.ru slave26
172.25.40.27 slave27.hadoop.ot.ru slave27
172.25.40.28 slave28.hadoop.ot.ru slave28

1 个解决方案

#1

The answer is not enough memory. Every container of task (map or reduce), was too big for my machines.

答案是没有足够的记忆。每个任务的容器(map或reduce)对我的机器来说都太大了。

This error:

这个错误:

2015-02-03 15:02:13,808 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1422959549820_0005_m_000000_0: Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

tells me about it.

告诉我关于它。

That's optimal settings most of my servers:

这是我大部分服务器的最优设置:

yarn.scheduler.minimum-allocation-mb=768
 yarn.scheduler.maximum-allocation-mb=3072
 yarn.nodemanager.resource.memory-mb=3072
 mapreduce.map.memory.mb=768
 mapreduce.map.java.opts=-Xmx512m
 mapreduce.reduce.memory.mb=1536
 mapreduce.reduce.java.opts=-Xmx1024m
 yarn.app.mapreduce.am.resource.mb=768
 yarn.app.mapreduce.am.command-opts=-Xmx512m

推荐阅读

substring
Spring Boot 中配置全局文件上传路径并实现文件上传功能

本文介绍如何在 Spring Boot 项目中配置全局文件上传路径，并通过读取配置项实现文件上传功能。通过这种方式，可以更好地管理和维护文件路径。 ... [详细]

蜡笔小新 2024-11-13 11:19:38
int
在Linux系统中避免安装MySQL的简易指南

在Linux系统中避免安装MySQL的简易指南 ... [详细]

蜡笔小新 2024-11-11 13:22:28
int
WordPress Duplicator 0.4.4 版本存在跨站脚本攻击漏洞分析

在对WordPress Duplicator插件0.4.4版本的安全评估中，发现其存在跨站脚本（XSS）攻击漏洞。此漏洞可能被利用进行恶意操作，建议用户及时更新至最新版本以确保系统安全。测试方法仅限于安全研究和教学目的，使用时需自行承担风险。漏洞编号：HTB23162。 ... [详细]

蜡笔小新 2024-11-10 13:16:43
int
javax.mail.search.BodyTerm.matchPart()方法的使用及代码示例

javax.mail.search.BodyTerm.matchPart()方法的使用及代码示例 ... [详细]

蜡笔小新 2024-11-13 15:24:50
int
应用链时代，详解 Avalanche 与 Cosmos 的差异

应用链时代，详解 Avalanche 与 Cosmos 的差异 ... [详细]

蜡笔小新 2024-11-13 09:37:19
int
如何在Webpack项目中集成ECharts

本文将详细介绍如何在Webpack项目中安装和使用ECharts，包括全量引入和按需引入的方法，并提供一个柱状图的示例。 ... [详细]

蜡笔小新 2024-11-12 09:49:07
int
使用 Matplotlib 保存 Python 动态图像为视频文件的方法与技巧

本文介绍了如何利用 `matplotlib` 库中的 `FuncAnimation` 类将 Python 中的动态图像保存为视频文件。通过详细解释 `FuncAnimation` 类的参数和方法，文章提供了多种实用技巧，帮助用户高效地生成高质量的动态图像视频。此外，还探讨了不同视频编码器的选择及其对输出文件质量的影响，为读者提供了全面的技术指导。 ... [详细]

蜡笔小新 2024-11-11 22:11:30
byte
JavaWeb文件上传：前端实现与后端处理详解

在JavaWeb开发中，文件上传是一个常见的需求。无论是通过表单还是其他方式上传文件，都必须使用POST请求。前端部分通常采用HTML表单来实现文件选择和提交功能。后端则利用Apache Commons FileUpload库来处理上传的文件，该库提供了强大的文件解析和存储能力，能够高效地处理各种文件类型。此外，为了提高系统的安全性和稳定性，还需要对上传文件的大小、格式等进行严格的校验和限制。 ... [详细]

蜡笔小新 2024-11-11 19:50:46
int
Python错误重试让多少开发者头疼？高效解决方案出炉

### 优化后的摘要在处理 Python 开发中的错误重试问题时，许多开发者常常感到困扰。为了应对这一挑战，`tenacity` 库提供了一种高效的解决方案。首先，通过 `pip install tenacity` 安装该库。使用时，可以通过简单的规则配置重试策略。例如，可以设置多个重试条件，使用 `|`（或）和 `&`（与）操作符组合不同的参数，从而实现灵活的错误重试机制。此外，`tenacity` 还支持自定义等待时间、重试次数和异常处理，为开发者提供了强大的工具来提高代码的健壮性和可靠性。 ... [详细]

蜡笔小新 2024-11-11 10:33:20
int
Go Echo 框架入门指南【1】

本文介绍了 Go 语言中的高性能、可扩展、轻量级 Web 框架 Echo。Echo 框架简单易用，仅需几行代码即可启动一个高性能 HTTP 服务。 ... [详细]

蜡笔小新 2024-11-14 18:30:58
int
Cookie学习小结

Cookie学习小结 ... [详细]

蜡笔小新 2024-11-14 16:26:25
fetch
包含phppdoerrorcode的词条

包含phppdoerrorcode的词条 ... [详细]

蜡笔小新 2024-11-14 12:06:14
数组
PHP 行为日志记录详解

本文详细介绍了如何在PHP中记录和管理行为日志，包括ThinkPHP框架中的日志记录方法、日志的用途、实现原理以及相关配置。 ... [详细]

蜡笔小新 2024-11-14 09:55:11
int
为什么多数程序员难以成为架构师？

探讨80%的程序员为何难以晋升为架构师，涉及技术深度、经验积累和综合能力等方面。本文将详细解析Tomcat的配置和服务组件，帮助读者理解其内部机制。 ... [详细]

蜡笔小新 2024-11-14 03:39:46
settings
基于Net Core 3.0与Web API的前后端分离开发：Vue.js在前端的应用

本文介绍了如何使用Net Core 3.0和Web API进行前后端分离开发，并重点探讨了Vue.js在前端的应用。后端采用MySQL数据库和EF Core框架进行数据操作，开发环境为Windows 10和Visual Studio 2019，MySQL服务器版本为8.0.16。文章详细描述了API项目的创建过程、启动步骤以及必要的插件安装，为开发者提供了一套完整的开发指南。 ... [详细]

蜡笔小新 2024-11-11 10:58:21

慧萍书群415

这个家伙很懒，什么也没留下！

Tags | 热门标签

RankList | 热门文章