作者:sdfdsafgafsdf | 来源:互联网 | 2023-09-09 14:52
Snappy压缩_ps压缩文件怎么安装1.功能说明使用snappy压缩来提升mapreduce和hbase的性能。其实就是用CPU换IO吞吐量和磁盘空间。配置并使用snappy有如
1. 功能说明 使用snappy压缩来提升mapreduce和hbase的性能。其实就是用CPU换IO吞吐量和磁盘空间。配置并使用snappy有如下几点要求:
首先需要hadoop集群的native库已经收到编译好,并且添加了对snappy的支持。编译hadoop源码之前安装了snappy并且编译时指定-Drequire.snappy参数。(我使用的版本是hadoop-2.5.0-cdh5.3.3伪分布式) 安装了maven(我使用的版本是3.0.5) jdk已经成功安装并设置了JAVA_HOME(我使用的版本是1.7.0_75) 2. MapReduce配置snappy 配置过程参考官网(但是有所区别)
https://github.com/electrum/hadoop-snappy
2.1 测试MR 为了与后期配置完成snappy后进行对比我们先测试一个简单mapreduce程序,然后记录map的输出bytes大小
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce -examples - 2.5 .0 -cdh5 .3 .3 . jar wordcount /wordcount/in /wordcount/out
Jetbrains全家桶1年46,售后保障稳定
4. Uber模式使用Snappy 配置了uber模式后使用上述的snappy压缩配置方法后mapreduce程序运行报错:
2015 -06 -17 04 :27 :48 ,905 FATAL [uber-SubtaskRunner] org.apache.hadoop.mapred.LocalContainerLauncher: Error running local (uberized) 'child' : java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native Method ) at org .apache .hadoop .io .compress .SnappyCodec .checkNativeCodeLoaded (SnappyCodec.java:63) at org .apache .hadoop .io .compress .SnappyCodec .getCompressorType (SnappyCodec.java:132) at org .apache .hadoop .io .compress .CodecPool .getCompressor (CodecPool.java:148) at org .apache .hadoop .io .compress .CodecPool .getCompressor (CodecPool.java:163) at org .apache .hadoop .mapred .IFile $Writer .<init >(IFile.java:114) at org .apache .hadoop .mapred .IFile $Writer .<init >(IFile.java:97) at org .apache .hadoop .mapred .MapTask $MapOutputBuffer .sortAndSpill (MapTask.java:1602) at org .apache .hadoop .mapred .MapTask $MapOutputBuffer .flush (MapTask.java:1482) at org .apache .hadoop .mapred .MapTask $NewOutputCollector .close (MapTask.java:720) at org .apache .hadoop .mapred .MapTask .closeQuietly (MapTask.java:2012) at org .apache .hadoop .mapred .MapTask .runNewMapper (MapTask.java:794) at org .apache .hadoop .mapred .MapTask .run (MapTask.java:341) at org .apache .hadoop .mapred .LocalContainerLauncher $EventHandler .runSubtask (LocalContainerLauncher.java:370) at org .apache .hadoop .mapred .LocalContainerLauncher $EventHandler .runTask (LocalContainerLauncher.java:295) at org .apache .hadoop .mapred .LocalContainerLauncher $EventHandler .access $200(LocalContainerLauncher.java:181) at org .apache .hadoop .mapred .LocalContainerLauncher $EventHandler $1.run (LocalContainerLauncher.java:224) at java .util .concurrent .Executors $RunnableAdapter .call (Executors.java:471) at java .util .concurrent .FutureTask .run (FutureTask.java:262) at java .util .concurrent .ThreadPoolExecutor .runWorker (ThreadPoolExecutor.java:1145) at java .util .concurrent .ThreadPoolExecutor $Worker .run (ThreadPoolExecutor.java:615) at java .lang .Thread .run (Thread.java:745)
这是因为在uber模式下无法加载到snappy的native,解决办法是在mapred-site.xml中添加如下配置:
<property > <name > yarn.app.mapreduce.am.envname > <value > LD_LIBRARY_PATH=$HADOOP_HOME/lib/nativevalue >property >
https://issues.apache.org/jira/browse/MAPREDUCE-5799
注:如果是CM安装的CDH版本hadoop则snappy的native在【/opt/cloudera/parcels/CDH/lib/hadoop/lib/native】目录下。