ElasticSearch和hive结合使用

2019独角兽企业重金招聘Python工程师标准>>>

首先去这个网站下载elasticsearch-hadoop-2.0.2.jar
可以用maven下载

org.elasticsearch
elasticsearch-hadoop
2.0.2

也有最新版本

org.elasticsearch
elasticsearch-hadoop
2.1.0.Beta3

也可以从这里下载http://www.elasticsearch.org/overview/hadoop/download/
这里是教程网址&＃xff1a;http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/hive.html#_writing_data_to_elasticsearch_2

取得这个jar包之后&＃xff0c;可以将其拷贝到hive的lib目录中&＃xff0c;然后以如下方式打开hive命令窗口&＃xff1a;
bin/hive -hiveconf hive.aux.jars.path&＃61;/root/hive/lib/elasticsearch-hadoop-2.0.2.jar

这个也可以写在hive的配置文件中&＃xff0c;

&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;
CLI configuration.

$ bin/hive --auxpath&＃61;/path/elasticsearch-hadoop.jar

or use the hive.aux.jars.path property specified either through the command-line or, if available, through if the hive-site.xml file, to register additional jars (that accepts an URI as well):

$ bin/hive -hiveconf hive.aux.jars.path&＃61;/path/elasticsearch-hadoop.jar

or if the hive-site.xml configuration can be modified, one can register additional jars through the hive.aux.jars.path option (that accepts an URI as well):

hive-site.xml configuration.

hive.aux.jars.path
/path/elasticsearch-hadoop.jar
A comma separated list (with no spaces) of the jar files

&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;
上面说明官网给的配置方式

首先你得告诉es这个表是ElasticSearch支持的&＃xff1a;
建立view表是
CREATE EXTERNAL TABLE user(id BIGINT, name STRING) STORED BY &＃39;org.elasticsearch.hadoop.hive.EsStorageHandler&＃39; TBLPROPERTIES(&＃39;es.resource&＃39; &＃61; &＃39;radio/artists&＃39;,&＃39;es.index.auto.create&＃39; &＃61; &＃39;true&＃39;);

如果无法插入数据请执行下面命令指定es端口和ip&＃xff1a;
CREATE EXTERNAL TABLE user(id BIGINT, name STRING) STORED BY &＃39;org.elasticsearch.hadoop.hive.EsStorageHandler&＃39; TBLPROPERTIES(&＃39;es.resource&＃39; &＃61; &＃39;radio/artists&＃39;,&＃39;es.index.auto.create&＃39; &＃61; &＃39;true&＃39;,&＃39;es.nodes&＃39;&＃61;&＃39;192.168.1.88&＃39;,&＃39;es.port&＃39;&＃61;&＃39;9200&＃39;);
其他配置请参见这里http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/configuration.html
es.resource的radiott/artiststt分别是索引名和索引的类型&＃xff0c;这个是在es访问数据时候使用的。
然后建立源数据表&＃xff1a;
CREATE TABLE user_source (id INT, name STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY &＃39;,&＃39;;

在linux里建立一个data.txt数据导入到user_source里
vim data.txt

1,medcl
2,lcdem
3,tom
4,jack

将数据导入到user_source表中&＃xff1a;
LOAD DATA LOCAL INPATH &＃39;/home/steven/data.txt&＃39; OVERWRITE INTO TABLE user_source;

hive> select * from user_source;
OK
1   medcl
2   lcdem
3   tom
4   jack

Time taken: 0.149 seconds, Fetched: 4 row(s)

将数据导入到user表中&＃xff1a;
INSERT OVERWRITE TABLE user SELECT s.id, s.name FROM user_source s;

不知道为什么执行完insert后发现找不到文件
INSERT OVERWRITE TABLE user SELECT s.id,s.name FROM user_source s;
Total jobs &＃61; 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there&＃39;s no reduce operator
java.io.FileNotFoundException: File does not exist: hdfs://dev-53:8020/root/hive/lib/elasticsearch-hadoop-2.0.2.jar
   at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1110)
   at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
   at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
   at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
   at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
   at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
   at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:99)
   at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
   at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:264)
   at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:300)
   at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:387)
   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
   at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
   at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:420)
   at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:136)
   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
   at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1503)
   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1270)
   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
   at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
   at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792)
   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Job Submission failed with exception &＃39;java.io.FileNotFoundException(File does not exist: hdfs://dev-53:8020/root/hive/lib/elasticsearch-hadoop-2.0.2.jar)&＃39;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

后面解决方法是这样解决的
首先用hadoop命令把
hadoop fs -put /root/hive/lib/elasticsearch-hadoop-2.0.2.jar /tmp/elasticsearch-hadoop-2.0.2.jar加载到hdfs
然后在启动的时候这样启动
bin/hive -hiveconf hive.aux.jars.path&＃61;/tmp/elasticsearch-hadoop-2.0.2.jar
这样就ok了

如果插入报es链接失败请添加esip和port&＃xff1b;

ElasticSearch和hive结合使用

解决JAX-WS动态客户端工厂弃用问题并迁移到XFire

掌握Java EE的全面指南

ssm 框架整合及工程分层

Spring Boot 中静态资源映射详解

XNA 3.0 游戏编程：从 XML 文件加载数据

网络链路质量监控：Smokeping部署与配置

Struts与Spring框架的集成指南

简化报表生成：EasyReport工具的全面解析

如何使用JavaScript或jQuery检测文本框焦点状态和鼠标悬停事件

Dockerfile 编写与 Docker 网络配置详解

MyBatis 动态 SQL 详解与应用

使用 Azure Service Principal 和 Microsoft Graph API 获取 AAD 用户列表

深入解析Spring Cloud Ribbon负载均衡机制

Spring Boot 服务的最大并发处理能力

docker镜像重启_docker怎么启动镜像