Hadoop常见问题与解决方法汇总

作者：小艾6456 | 来源：互联网 | 2023-09-24 11:05

1.Toomanyopenfiles错误有時候MapReduce的工作跑一跑，會發現datanode突然都陣亡，去看log會發現很多Toomanyopenfiles的錯誤：2008

1. Too many open files错误
有時候 Map Reduce 的工作跑一跑，會發現 datanode 突然都陣亡，去看 log 會發現很多 Too many open files 的錯誤：

2008-09-11 20:20:22,836 ERROR org.apache.hadoop.dfs.DataNode: 192.168.1.34:50010:DataXceiver: java.io.IOException: Too many open files
         at sun.nio.ch.EPollArrayWrapper.epollCreate(Native Method)
         at sun.nio.ch.EPollArrayWrapper.(EPollArrayWrapper.java:68)
         at sun.nio.ch.EPollSelectorImpl.(EPollSelectorImpl.java:52)
         at sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:18)
         at sun.nio.ch.Util.getTemporarySelector(Util.java:123)
         at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:92)
         at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1150)
         at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:994)
         at java.lang.Thread.run(Thread.java:619)

這個發生的原因是同時間很多 client 要去跟 datanode 要東西，因此消耗太多的 file descriptor，那又因為我在用的 Linux 上面預設單一 process 能開的檔案只有 1,024 個，>
於是就造成了這種結果。

修正的方法是去 /etc/security/limits.conf 加上這行：

* - nofile 8192

讓單一 process 能同時開到 8,192 個檔案。改好後重開 datanode 就可以了。

2. 出现错误时采取的步骤:看Log，试试单机模式

If you are having problems, check the logs in the logs directory to see if there are any Hadoop errors or Java Exceptions.
Logs are named by machine and job they carry out in the cluster, and this can help you figure out which part of your configuration is giving you trouble.
Even if you were very careful, the problem is probably with your configuration. Try running the grep example from the QuickStart. If it doesn't run then you need to check your configuration.
If you can't get it to work on a real cluster, try it on a single-node.

3. 常见问题与解决

Symptom	Possible Problem	Possible Solution
You get an error that you cluster is in "safe mode"	Your cluster enters safe mode when it hasn't been able to verify that all the data nodes necessary to replicate your data are up and responding. Check the documentation to learn more about safe mode.	First, wait a minute or two and then retry your command. If you just started your cluster, it's possible that it isn't fully initialized yet. If waiting a few minutes didn't help and you still get a "safe mode" error, check your logs to see if any of your data nodes didn't start correctly (either they have Java exceptions in their logs or they have messages stating that they are unable to contact some other node in your cluster). If this is the case you need to resolve the configuration issue (or possibly pick some new nodes) before you can continue.
You get a NoRouteToHostException in your logs or in stderr output from a command.	One of your nodes cannot be reached correctly. This may be a firewall issue, so you should report it to me.	The only workaround is to pick a new node to replace the unreachable one. Currently, I think that creusa is unreachable, but all other Linux boxes should be okay. None of the Macs will currently work in a cluster.
You get an error that "remote host identification has changed" when you try to ssh to localhost.	You have moved your single node cluster from one machine in the Berry Patch to another. The name localhost thus is pointing to a new machine, and your ssh client thinks that it might be a man-in-the-middle attack.	You can ask your login to skip checking the validity of localhost. You do this by setting NoHostAuthenticationForLocalhost to yes in ~/.ssh/config. You can accomplish this with the following command: echo "NoHostAuthenticationForLocalhost yes" >>~/.ssh/config
Your DataNode is started and you can create directories with bin/hadoop dfs -mkdir, but you get an error message when you try to put files into the HDFS (e.g., when you run a command like bin/hadoop dfs -put).	Creating directories is only a function of the NameNode, so your DataNode is not exercised until you actually want to put some bytes into a file. If you are sure that the DataNode is started, then it could be that your DataNodes are out of disk space.	Go to the HDFS info web page (open your web browser and go to http://namenode:dfs_info_port where namenode is the hostname of your NameNode and dfs_info_port is the port you chose dfs.info.port; if followed the QuickStart on your personal computer then this URL will be http://localhost:50070). Once at that page click on the number where it tells you how many DataNodes you have to look at a list of the DataNodes in your cluster. If it says you have used 100% of your space, then you need to free up room on local disk(s) of the DataNode(s). If you are on Windows then this number will not be accurate (there is some kind of bug either in Cygwin's df.exe or in Windows). Just free up some more space and you should be okay. On one Windows machine we tried the disk had 1GB free but Hadoop reported that it was 100% full. Then we freed up another 1GB and then it said that the disk was 99.15% full and started writing data into the HDFS again. We encountered this bug on Windows XP SP2.
You try to run the grep example from the QuickStart but you get an error message like this: java.io.IOException: Not a file: hdfs://localhost:9000/user/ross/input/conf	You may have created a directory inside the input directory in the HDFS. For example, this might happen if you run bin/hadoop dfs -put conf input twice in a row (this would create a subdirectory in input... why?).	The easiest way to get the example run is to just start over and make the input anew. bin/hadoop dfs -rmr input bin/hadoop dfs -put conf input
Your DataNodes won't start, and you see something like this in logs/datanode: Incompatible namespaceIDs in /tmp/hadoop-ross/dfs/data	Your Hadoop namespaceID became corrupted. Unfortunately the easiest thing to do reformat the HDFS.	You need to do something like this: bin/stop-all.sh rm -Rf /tmp/hadoop-your-username/* bin/hadoop namenode -format Be VERY careful with rm -Rf
When you try the grep example in the QuickStart, you get an error like the following: org.apache.hadoop.mapred.InvalidInputException: Input path doesnt exist : /user/ross/input	You haven't created an input directory containing one or more text files.	bin/hadoop dfs -put conf input
When you try the grep example in the QuickStart, you get an error like the following: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory /user/ross/output already exists	You might have already run the example once, creating an output directory. Hadoop doesn't like to overwrite files.	Remove the output directory before rerunning the example: bin/hadoop dfs -rmr output Alternatively you can change the output directory of the grep example, something like this: bin/hadoop jar hadoop-*-examples.jar \ grep input output2 'dfs[a-z.]+'
You can run Hadoop jobs written in Java (like the grep example), but your HadoopStreaming jobs (such as the Python example that fetches web page titles) won't work.	You might have given only a relative path to the mapper and reducer programs. The tutorial originally just specified relative paths, but absolute paths are required if you are running in a real cluster.	Use absolute paths like this from the tutorial: bin/hadoop jar contrib/hadoop-0.15.2-streaming.jar \ -mapper $HOME/proj/hadoop/multifetch.py \ -reducer $HOME/proj/hadoop/reducer.py \ -input urls/* \ -output titles

推荐阅读

less
深入解析JQuery Mobile特有的事件与方法

本文详细介绍了JQuery Mobile框架中特有的事件和方法，帮助开发者更好地理解和应用这些特性，提升移动Web开发的效率。 ... [详细]

蜡笔小新 2024-11-21 14:24:21
java
MapReduce 中的输入输出格式控制

本文介绍了如何在 MapReduce 作业中使用 SequenceFileOutputFormat 生成 SequenceFile 文件，并详细解释了 SequenceFile 的结构和用途。 ... [详细]

蜡笔小新 2024-11-17 14:43:42
java
com.sun.javadoc.PackageDoc.exceptions()方法的使用及代码示例

com.sun.javadoc.PackageDoc.exceptions()方法的使用及代码示例 ... [详细]

蜡笔小新 2024-11-13 10:47:33
process
优化Hadoop 2.7.2源代码以支持Snappy压缩和解压功能的Native编译

为了在Hadoop 2.7.2中实现对Snappy压缩和解压功能的原生支持，本文详细介绍了如何重新编译Hadoop源代码，并优化其Native编译过程。通过这一优化，可以显著提升数据处理的效率和性能。此外，还探讨了编译过程中可能遇到的问题及其解决方案，为用户提供了一套完整的操作指南。 ... [详细]

蜡笔小新 2024-11-09 19:45:36
text
Delphi XE2 之 FireMonkey 入门(19) - TFmxObject 的子类们(表)

td{border:1pxsolid#808080;}参考:和FMX相关的类(表)TFmxObjectIFreeNotification ... [详细]

蜡笔小新 2024-11-21 22:35:24
process
解决iOS应用推送通知错误：未找到有效aps-environment权限

在尝试加载支持推送通知的iOS应用程序的Ad Hoc构建时，遇到了‘no valid aps-environment entitlement found for application’的错误提示。本文将探讨此错误的原因及多种可能的解决方案。 ... [详细]

蜡笔小新 2024-11-21 19:26:31
java
如何使用 org.apache.tinkerpop.gremlin.structure.VertexProperty 的 key 方法

本文详细介绍了 `org.apache.tinkerpop.gremlin.structure.VertexProperty` 类中的 `key()` 方法，并提供了多个实际应用的代码示例。通过这些示例，读者可以更好地理解该方法在图数据库操作中的具体用途。 ... [详细]

蜡笔小新 2024-11-21 17:38:10
char
IC卡操作功能实现

本文介绍了如何通过C#语言调用动态链接库（DLL）中的函数来实现IC卡的基本操作，包括初始化设备、设置密码模式、获取设备状态等，并详细展示了将TextBox中的数据写入IC卡的具体实现方法。 ... [详细]

蜡笔小新 2024-11-21 11:02:19
process
spring boot使用jetty无法启动

spring boot使用jetty无法启动 ... [详细]

蜡笔小新 2024-11-21 10:15:52
java
深入解析 Bootstrap Table 的使用技巧

本文详细介绍了如何利用 Bootstrap Table 实现数据展示与操作，包括数据加载、表格配置及前后端交互等关键步骤。 ... [详细]

蜡笔小新 2024-11-20 17:21:26
window
U3d 属性面板自定义扩展（多态数组）

原文地址：https:blog.csdn.netqq_35361471articledetails84715491原文地址：https:blog.cs ... [详细]

蜡笔小新 2024-11-19 19:22:47
window
日志处理流程：Flume+MapReduce+Hive+Sqoop+MySQL

本文介绍了如何使用Flume从Linux文件系统收集日志并存储到HDFS，然后通过MapReduce清洗数据，使用Hive进行数据分析，并最终通过Sqoop将结果导出到MySQL数据库。 ... [详细]

蜡笔小新 2024-11-13 18:47:34
rsa
解决 Ubuntu 环境下 Hadoop 集群 SSH 密钥认证问题

本文详细介绍了在 Ubuntu 系统上搭建 Hadoop 集群时遇到的 SSH 密钥认证问题及其解决方案。通过本文，读者可以了解如何在多台虚拟机之间实现无密码 SSH 登录，从而顺利启动 Hadoop 集群。 ... [详细]

蜡笔小新 2024-11-13 09:14:02
char
Python 使用 DOM 和 SAX 解析 XML 的应用实例

本文介绍如何使用 Python 的 DOM 和 SAX 方法解析 XML 文件，并通过示例展示了如何动态创建数据库表和处理大量数据的实时插入。 ... [详细]

蜡笔小新 2024-11-12 16:10:39
text
OBS Studio自动化实践：利用脚本批量生成录制场景

本文探讨了如何利用OBS Studio进行高效录屏，并通过脚本实现场景的自动生成。适合对自动化办公感兴趣的读者。 ... [详细]

蜡笔小新 2024-11-21 10:44:53

小艾6456

这个家伙很懒，什么也没留下！

Tags | 热门标签

RankList | 热门文章