热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

Hadoop常见问题与解决方法汇总

1.Toomanyopenfiles错误有時候MapReduce的工作跑一跑,會發現datanode突然都陣亡,去看log會發現很多Toomanyopenfiles的錯誤:2008

1. Too many open files错误
有時候 Map Reduce 的工作跑一跑,會發現 datanode 突然都陣亡,去看 log 會發現很多 Too many open files 的錯誤:

2008-09-11 20:20:22,836 ERROR org.apache.hadoop.dfs.DataNode: 192.168.1.34:50010:DataXceiver: java.io.IOException: Too many open files
         at sun.nio.ch.EPollArrayWrapper.epollCreate(Native Method)
         at sun.nio.ch.EPollArrayWrapper.(EPollArrayWrapper.java:68)
         at sun.nio.ch.EPollSelectorImpl.(EPollSelectorImpl.java:52)
         at sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:18)
         at sun.nio.ch.Util.getTemporarySelector(Util.java:123)
         at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:92)
         at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1150)
         at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:994)
         at java.lang.Thread.run(Thread.java:619)

這個發生的原因是同時間很多 client 要去跟 datanode 要東西,因此消耗太多的 file descriptor,那又因為我在用的 Linux 上面預設單一 process 能開的檔案只有 1,024 個,>
於是就造成了這種結果。

修正的方法是去 /etc/security/limits.conf 加上這行:

* - nofile 8192
讓單一 process 能同時開到 8,192 個檔案。改好後重開 datanode 就可以了。


2. 出现错误时采取的步骤:看Log,试试单机模式

  • If you are having problems, check the logs in the logs directory to see if there are any Hadoop errors or Java Exceptions.
  • Logs are named by machine and job they carry out in the cluster, and this can help you figure out which part of your configuration is giving you trouble.
  • Even if you were very careful, the problem is probably with your configuration. Try running the grep example from the QuickStart. If it doesn't run then you need to check your configuration.
  • If you can't get it to work on a real cluster, try it on a single-node.

3. 常见问题与解决

Symptom Possible Problem Possible Solution
You get an error that you cluster is in "safe mode" Your cluster enters safe mode when it hasn't been able to verify that all the data nodes necessary to replicate your data are up and responding. Check the documentation to learn more about safe mode.
  1. First, wait a minute or two and then retry your command. If you just started your cluster, it's possible that it isn't fully initialized yet.
  2. If waiting a few minutes didn't help and you still get a "safe mode" error, check your logs to see if any of your data nodes didn't start correctly (either they have Java exceptions in their logs or they have messages stating that they are unable to contact some other node in your cluster). If this is the case you need to resolve the configuration issue (or possibly pick some new nodes) before you can continue.
You get a NoRouteToHostException in your logs or in stderr output from a command. One of your nodes cannot be reached correctly. This may be a firewall issue, so you should report it to me. The only workaround is to pick a new node to replace the unreachable one. Currently, I think that creusa is unreachable, but all other Linux boxes should be okay. None of the Macs will currently work in a cluster.
You get an error that "remote host identification has changed" when you try to ssh to localhost. You have moved your single node cluster from one machine in the Berry Patch to another. The name localhost thus is pointing to a new machine, and your ssh client thinks that it might be a man-in-the-middle attack. You can ask your login to skip checking the validity of localhost. You do this by setting NoHostAuthenticationForLocalhost to yes in ~/.ssh/config. You can accomplish this with the following command:
echo "NoHostAuthenticationForLocalhost yes" >>~/.ssh/config
Your DataNode is started and you can create directories with bin/hadoop dfs -mkdir, but you get an error message when you try to put files into the HDFS (e.g., when you run a command like bin/hadoop dfs -put). Creating directories is only a function of the NameNode, so your DataNode is not exercised until you actually want to put some bytes into a file. If you are sure that the DataNode is started, then it could be that your DataNodes are out of disk space.
  • Go to the HDFS info web page (open your web browser and go to http://namenode:dfs_info_port where namenode is the hostname of your NameNode and dfs_info_port is the port you chose dfs.info.port; if followed the QuickStart on your personal computer then this URL will be http://localhost:50070). Once at that page click on the number where it tells you how many DataNodes you have to look at a list of the DataNodes in your cluster.
  • If it says you have used 100% of your space, then you need to free up room on local disk(s) of the DataNode(s).
  • If you are on Windows then this number will not be accurate (there is some kind of bug either in Cygwin's df.exe or in Windows). Just free up some more space and you should be okay. On one Windows machine we tried the disk had 1GB free but Hadoop reported that it was 100% full. Then we freed up another 1GB and then it said that the disk was 99.15% full and started writing data into the HDFS again. We encountered this bug on Windows XP SP2.
You try to run the grep example from the QuickStart but you get an error message like this:
java.io.IOException: Not a file:
hdfs://localhost:9000/user/ross/input/conf
You may have created a directory inside the input directory in the HDFS. For example, this might happen if you run bin/hadoop dfs -put conf input twice in a row (this would create a subdirectory in input... why?). The easiest way to get the example run is to just start over and make the input anew.
bin/hadoop dfs -rmr input
bin/hadoop dfs -put conf input
Your DataNodes won't start, and you see something like this in logs/*datanode*:
Incompatible namespaceIDs in /tmp/hadoop-ross/dfs/data
Your Hadoop namespaceID became corrupted. Unfortunately the easiest thing to do reformat the HDFS. You need to do something like this:
bin/stop-all.sh
rm -Rf /tmp/hadoop-your-username/*
bin/hadoop namenode -format
Be VERY careful with rm -Rf
When you try the grep example in the QuickStart, you get an error like the following:
org.apache.hadoop.mapred.InvalidInputException:
Input path doesnt exist : /user/ross/input
You haven't created an input directory containing one or more text files.
bin/hadoop dfs -put conf input
When you try the grep example in the QuickStart, you get an error like the following:
org.apache.hadoop.mapred.FileAlreadyExistsException:
Output directory /user/ross/output already exists
You might have already run the example once, creating an output directory. Hadoop doesn't like to overwrite files. Remove the output directory before rerunning the example:
bin/hadoop dfs -rmr output
Alternatively you can change the output directory of the grep example, something like this:
bin/hadoop jar hadoop-*-examples.jar \
grep input output2 'dfs[a-z.]+'
You can run Hadoop jobs written in Java (like the grep example), but your HadoopStreaming jobs (such as the Python example that fetches web page titles) won't work. You might have given only a relative path to the mapper and reducer programs. The tutorial originally just specified relative paths, but absolute paths are required if you are running in a real cluster. Use absolute paths like this from the tutorial:
bin/hadoop jar contrib/hadoop-0.15.2-streaming.jar \
-mapper $HOME/proj/hadoop/multifetch.py \
-reducer $HOME/proj/hadoop/reducer.py \
-input urls/* \
-output titles

推荐阅读
  • 本文讨论了在VMWARE5.1的虚拟服务器Windows Server 2008R2上安装oracle 10g客户端时出现的问题,并提供了解决方法。错误日志显示了异常访问违例,通过分析日志中的问题帧,找到了解决问题的线索。文章详细介绍了解决方法,帮助读者顺利安装oracle 10g客户端。 ... [详细]
  • 本文介绍了闭包的定义和运转机制,重点解释了闭包如何能够接触外部函数的作用域中的变量。通过词法作用域的查找规则,闭包可以访问外部函数的作用域。同时还提到了闭包的作用和影响。 ... [详细]
  • Linux重启网络命令实例及关机和重启示例教程
    本文介绍了Linux系统中重启网络命令的实例,以及使用不同方式关机和重启系统的示例教程。包括使用图形界面和控制台访问系统的方法,以及使用shutdown命令进行系统关机和重启的句法和用法。 ... [详细]
  • 标题: ... [详细]
  • Java在运行已编译完成的类时,是通过java虚拟机来装载和执行的,java虚拟机通过操作系统命令JAVA_HOMEbinjava–option来启 ... [详细]
  • 本文介绍了在Linux下安装和配置Kafka的方法,包括安装JDK、下载和解压Kafka、配置Kafka的参数,以及配置Kafka的日志目录、服务器IP和日志存放路径等。同时还提供了单机配置部署的方法和zookeeper地址和端口的配置。通过实操成功的案例,帮助读者快速完成Kafka的安装和配置。 ... [详细]
  • 解决nginx启动报错epoll_wait() reported that client prematurely closed connection的方法
    本文介绍了解决nginx启动报错epoll_wait() reported that client prematurely closed connection的方法,包括检查location配置是否正确、pass_proxy是否需要加“/”等。同时,还介绍了修改nginx的error.log日志级别为debug,以便查看详细日志信息。 ... [详细]
  • 如何用JNI技术调用Java接口以及提高Java性能的详解
    本文介绍了如何使用JNI技术调用Java接口,并详细解析了如何通过JNI技术提高Java的性能。同时还讨论了JNI调用Java的private方法、Java开发中使用JNI技术的情况以及使用Java的JNI技术调用C++时的运行效率问题。文章还介绍了JNIEnv类型的使用方法,包括创建Java对象、调用Java对象的方法、获取Java对象的属性等操作。 ... [详细]
  • 大数据Hadoop生态(20)MapReduce框架原理OutputFormat的开发笔记
    本文介绍了大数据Hadoop生态(20)MapReduce框架原理OutputFormat的开发笔记,包括outputFormat接口实现类、自定义outputFormat步骤和案例。案例中将包含nty的日志输出到nty.log文件,其他日志输出到other.log文件。同时提供了一些相关网址供参考。 ... [详细]
  • Java如何导入和导出Excel文件的方法和步骤详解
    本文详细介绍了在SpringBoot中使用Java导入和导出Excel文件的方法和步骤,包括添加操作Excel的依赖、自定义注解等。文章还提供了示例代码,并将代码上传至GitHub供访问。 ... [详细]
  • 开发笔记:加密&json&StringIO模块&BytesIO模块
    篇首语:本文由编程笔记#小编为大家整理,主要介绍了加密&json&StringIO模块&BytesIO模块相关的知识,希望对你有一定的参考价值。一、加密加密 ... [详细]
  • 目录实现效果:实现环境实现方法一:基本思路主要代码JavaScript代码总结方法二主要代码总结方法三基本思路主要代码JavaScriptHTML总结实 ... [详细]
  • 本文介绍了C#中生成随机数的三种方法,并分析了其中存在的问题。首先介绍了使用Random类生成随机数的默认方法,但在高并发情况下可能会出现重复的情况。接着通过循环生成了一系列随机数,进一步突显了这个问题。文章指出,随机数生成在任何编程语言中都是必备的功能,但Random类生成的随机数并不可靠。最后,提出了需要寻找其他可靠的随机数生成方法的建议。 ... [详细]
  • CEPH LIO iSCSI Gateway及其使用参考文档
    本文介绍了CEPH LIO iSCSI Gateway以及使用该网关的参考文档,包括Ceph Block Device、CEPH ISCSI GATEWAY、USING AN ISCSI GATEWAY等。同时提供了多个参考链接,详细介绍了CEPH LIO iSCSI Gateway的配置和使用方法。 ... [详细]
  • 图像因存在错误而无法显示 ... [详细]
author-avatar
小艾6456
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有