1、需求分析
流程:将Hive表中的数据进行分析,将分析结果存储到hdfs上,然后用Sqoop将hdfs上的分析结果导出到mysql表中,前端通过查询mysql表进行数据展示。
Oozie构成:需要两个Action,一个是hive action,一个是sqoop action,workflow中的start节点进入hive action,hive action执行成功进入sqoop action,sqoop action执行成功到达end,有任何地方出错进入error。
2、测试SQL语句
hive (default)> select id, account, password, INPUT__FILE__NAME from user_hive;
OK
id account password input__file__name
1 admin admin hdfs://hadoop-senior.ibeifeng.com:8020/user/hive/warehouse/user_hive/part-m-00000
2 pu 12345 hdfs://hadoop-senior.ibeifeng.com:8020/user/hive/warehouse/user_hive/part-m-00000
3 system system hdfs://hadoop-senior.ibeifeng.com:8020/user/hive/warehouse/user_hive/part-m-00000
4 zxh zxh hdfs://hadoop-senior.ibeifeng.com:8020/user/hive/warehouse/user_hive/part-m-00000
5 test test hdfs://hadoop-senior.ibeifeng.com:8020/user/hive/warehouse/user_hive/part-m-00000
6 pudong pudong hdfs://hadoop-senior.ibeifeng.com:8020/user/hive/warehouse/user_hive/part-m-00000
7 qiqi qiqi hdfs://hadoop-senior.ibeifeng.com:8020/user/hive/warehouse/user_hive/part-m-00000
8 beifeng beifeng hdfs://hadoop-senior.ibeifeng.com:8020/user/hive/warehouse/user_hive/part-m-00000
9 xuanyu xuanyu hdfs://hadoop-senior.ibeifeng.com:8020/user/hive/warehouse/user_hive/part-m-00000
Time taken: 30.244 seconds, Fetched: 9 row(s)
3、SQL脚本select-user.sql
-- drop table tmp_user
drop table if exists default.tmp_user ;
-- create table tmp_user
create external table default.tmp_user like user_hive location '${OUTPUT}';
-- create table default.tmp_user like user_hive location '/user/beifeng/oozie/datas/wf-user-select/output';
-- load data to tmp_user
insert into table tmp_user
select id, account, password
from user_hive
where id >= 15 and account like 'beif%' ;
4、配置文件hive-site.xml
此文件也需要放入apps目录下。
javax.jdo.option.ConnectionURLjdbc:mysql://hadoop-senior.ibeifeng.com:3306/metadata?createDatabaseIfNotExist=trueJDBC connect string for a JDBC metastorejavax.jdo.option.ConnectionDriverNamecom.mysql.jdbc.DriverDriver class name for a JDBC metastorejavax.jdo.option.ConnectionUserNamerootusername to use against metastore databasejavax.jdo.option.ConnectionPassword123456password to use against metastore databasehive.cli.print.headertrueWhether to print the names of the columns in query output.hive.cli.print.current.dbtrueWhether to include the current database in the Hive prompt.hive.fetch.task.conversionminimalSome select queries can be converted to single FETCH task minimizing latency.Currently the query should be single sourced not having any subquery and should not haveany aggregations or distincts (which incurs RS), lateral views and joins.1. minimal : SELECT STAR, FILTER on partition columns, LIMIT only2. more : SELECT, FILTER, LIMIT only (TABLESAMPLE, virtual columns)
5、配置文件job.properties
nameNode=hdfs://hadoop-senior.ibeifeng.com:8020
jobTracker=hadoop-senior.ibeifeng.com:8032
queueName=default
oozieAppsRoot=user/beifeng/oozie-apps
oozieDataRoot=user/beifeng/oozie/datasoozie.use.system.libpath=trueoozie.coord.application.path=${nameNode}/${oozieAppsRoot}/wf-user-select
start=2015-10-15T00:00+0800
end=2015-10-26T00:00+0800
workflowAppUri=${nameNode}/${oozieAppsRoot}/wf-user-selectoutputDir=wf-user-select/output
6、配置文件workflow.xml
${jobTracker}${nameNode}${nameNode}/${oozieAppsRoot}/hive-select/hive-site.xmlmapred.job.queue.name${queueName}OUTPUT=${nameNode}/${oozieDataRoot}/${outputDir}${jobTracker}${nameNode}mapred.job.queue.name${queueName}export --connect jdbc:mysql://hadoop-senior.ibeifeng.com:3306/test --username root --password 123456 --table my_user --num-mappers 1 --fields-terminated-by "\t" --export-dir /user/beifeng/oozie/datas/wf-user-select/output Hive failed, error message[${wf:errorMessage(wf:lastErrorNode())}]
7、配置文件Coordinator.xml
${workflowAppUri}jobTracker${jobTracker}nameNode${nameNode}queueName${queueName}