3.2 启动ElasticSearch索引
#1. 创建数据集、注释、字段映射curl -XPUT '$YOUR_INDEX_URL:9200/wherehows' -d '{"mappings": {"dataset": {},"comment": {"_parent": {"type": "dataset"}},"field": {"_parent": {"type": "dataset"}}}}'#2. 创建flow_job内嵌对象映射curl -XPUT '$YOUR_INDEX_URL:9200/wherehows/flow_jobs/_mapping' -d '{"flow_jobs": {"properties": {"jobs": {"type": "nested","properties": {"job_name": { "type": "string" },"job_path": { "type": "string" },"job_type": { "type": "string" },"pre_jobs": { "type": "string" },"post_jobs": { "type": "string" },"is_current": { "type": "string" },"is_first": { "type": "string" },"is_last": { "type": "string" },"job_type_id": { "type": "short" },"app_id": { "type": "short" },"flow_id": { "type": "long" },"job_id": { "type": "long" }}}}}}'#3. 构建ElasticSearch索引这个构建工作作为一个`ETL Job`,在从Mysql获取数据时会自动触发,当然也可以手动执行`metadata-etl/src/main/resources/jython/ElasticSearchIndex.py`来构建
3.3 调整Akka-actor版本
WhereHows是一个 Gradle多工程项目,公共依赖都是在父工程
声明的,所以修改WhereHows/build.gradle
文件
将"akka" : "com.typesafe.akka:akka-actor_2.10:2.3.15",调整为"akka" : "com.typesafe.akka:akka-actor_2.10:2.2.5",
具体描述参见 这里
3.4 创建一个临时目录供ETL-Job使用
WhereHows中的metadata-ETL
的工作流程是这样的:
-
backend-service项目定时读取DB中wherehows.wh_etl_job
表的记录,找出本次需要执行的ETL-Job
-
Java调Jython脚本执行Extract,生成一些文件到磁盘,CSV文件
-
Java调Jython脚本执行Transform,生成一些文件到磁盘,CSV和JSON文件
-
Java调Jython脚本执行Load,分析上面生成的文件,并录入Mysql
cd $HOME/Documents#1. 用来存放生成的csv或json文件mkdir -p wherehows_tmp/execmkdir -p wherehows_tmp/app_folder#3. 存放一些UI相关的文件midir -p wherehows_tmp/resources
4. 构建并运行
主要是启动backend-service
和web
两个工程
UI和后台服务之间是互相独立的,可以分别启动
4.1 启动backend-service
它是一个Play应用,启动方式如下:
#1. 修改conf/database.conf配置信息db.wherehows.driver = "com.mysql.jdbc.Driver"db.wherehows.url = "jdbc:mysql://localhost/wherehows?charset=utf8&zeroDateTimeBehavior=convertToNull"db.wherehows.user = "wherehows"db.wherehows.password = "wherehows"db.wherehows.host = "localhost"#2. dev模式-启动cd backend-service$PLAY_HOME/play "run PORT_NUM"#3. prod模式-启动cd backend-servicegradle dist##启动./target/universal/stage/bin/backend-service -Dhttp.port=PORT_NUM
在浏览器输入http://localhost:PORT_NUM
能看到Test
即启动成功
4.2 启动UI
它是一个Play应用,启动方式如下:
#1. 修改web/conf/application.conf的如下配置search.engine = "default"elasticsearch.dataset.url = "$YOUR_DATASET_INDEX_URL"elasticsearch.flow.url = "$YOUR_FLOW_INDEX_URL"datasets.tree.name = "$YOUR_HOME/Documents/wherehows_tmp/resource/dataset.json"flows.tree.name = "$YOUR_HOME/Documents/wherehows_tmp/resource/flow.json"database.opensource.username = "wherehows"database.opensource.password = "wherehows"database.opensource.url = "jdbc:mysql://localhost/wherehows?charset=utf8&zeroDateTimeBehavior=convertToNull"#2. dev-模式 启动cd web$PLA_HOME/play "run PORT_NUM"#3. 发布程序cd webgradle dist在`target/universal`下就包含了zip包
在浏览器输入http://localhost:PORT_NUM
就可以看到了
5. 新增HIVE-ETL-Job
保证backend-service
应用启动的前提下,例如http://localhost:9000
,教程在 这里
5.1 增加公共配置属性到wherehows.wh_property
NOTE:这一步骤做一次就好了,直接刷脚本吧
改一下路径INSERT INTO `wh_property` (`property_name`,`property_value`,`is_encrypted`,`group_name`) VALUES ('wherehows.app_folder','$YOUR_HOME/Documents/wherehows_tmp/app_folder','N',NULL);INSERT INTO `wh_property` (`property_name`,`property_value`,`is_encrypted`,`group_name`) VALUES ('wherehows.db.driver','com.mysql.jdbc.Driver','N',NULL);INSERT INTO `wh_property` (`property_name`,`property_value`,`is_encrypted`,`group_name`) VALUES ('wherehows.db.jdbc.url','jdbc:mysql://localhost/wherehows?charset=utf8&zeroDateTimeBehavior=convertToNull','N',NULL);INSERT INTO `wh_property` (`property_name`,`property_value`,`is_encrypted`,`group_name`) VALUES ('wherehows.db.password','wherehows','N',NULL);INSERT INTO `wh_property` (`property_name`,`property_value`,`is_encrypted`,`group_name`) VALUES ('wherehows.db.username','wherehows','N',NULL);INSERT INTO `wh_property` (`property_name`,`property_value`,`is_encrypted`,`group_name`) VALUES ('wherehows.ui.tree.dataset.file','$YOUR_HOME/Documents/wherehows_tmp/resource/dataset.json','N',NULL);INSERT INTO `wh_property` (`property_name`,`property_value`,`is_encrypted`,`group_name`) VALUES ('wherehows.ui.tree.flow.file','$YOUR_HOME/Documents/wherehows_tmp/resource/flow.json','N',NULL);
5.2 新增Hive-database
文档说明在 这里,那些Required=N
的字段也必须给,文档描述有误
URL: http://localhost:9000/cfg/dbMethod: POSTBody(JSON):{"db_id": 10001,"db_code": "HIVE_DEMO","db_type_id": 0,"description": "HIVE_DEMO_desc","cluster_size": 0,"associated_data_centers": 1,"replication_role": "MASTER","uri": "Teradata://sample-td","short_connection_string": "SAMPLE-HIVE"
}
5.3 新增ETL-Job
文档说明在 这里,这里的wh_etl_job_name
字段取值去metadata-etl/src/main/java/metadata/etl/models/EtJobName.java
里找,这里没有完全列举。。。
URL: http://localhost:9000/cfg/dbMethod: POSTBody(JSON):{"db_id": 10001,"db_code": "HIVE_DEMO","db_type_id": 0,"description": "HIVE_DEMO_desc","cluster_size": 0,"associated_data_centers": 1,"replication_role": "MASTER","uri": "Teradata://sample-td","short_connection_string": "SAMPLE-HIVE"
一旦新增成功后,backend-service在下一次调度时就会去执行了
6. 后续
- 理解针对Hive的
E-T-L
过程,完善目前UI界面的缺陷
转载:https://blog.csdn.net/houzhizhen/article/details/66972166