本文主要描述了HDFS和RDBMS数据库之间的数据迁移。
一、创建Link
这里主要描述下Oracle、PostgreSQL、MySQL以及HDFS的link创建。
1.1 JDBC驱动安装
从官网下载Oracle和PostgreSQL的jdbc驱动,拷贝到$SQOOP_HOME/server/lib,然后重启sqoop2服务器。我这里用到的Oracle jdbc驱动为ojdbc6.jar,PostgreSQL为postgresql-42.1.4.jar,MySQL为mysql-connector-java-5.1.44-bin.jar。
1.2 创建HDFS Link
sqoop:000> create link -c hdfs-connector
Creating link for connector with name hdfs-connector
Please fill following values to create new link object
Name: HDFS_Link
HDFS cluster
URI: hdfs://192.168.120.96:9000
Conf directory: /u01/hadoop/etc/hadoop
Additional configs::
There are currently 0 values in the map:
entry#
New link was successfully created with validation status OK and name HDFS_Link
1.3 创建MySQL Link
sqoop:000> create link -c generic-jdbc-connector
Creating link for connector with name generic-jdbc-connector
Please fill following values to create new link object
Name: MySQL_Link
Database connection
Driver class: com.mysql.jdbc.Driver
Connection String: jdbc:mysql://192.168.120.92:3306/smsqw?useSSL=false
Username: root
Password: *********
Fetch Size:
Connection Properties:
There are currently 0 values in the map:
entry#
SQL Dialect
Identifier enclose:
New link was successfully created with validation status OK and name MySQL_Link
1.4 创建PostgreSQL link
sqoop:000> create link -c generic-jdbc-connector
Creating link for connector with name generic-jdbc-connector
Please fill following values to create new link object
Name: PostgreSQL_Link
Database connection
Driver class: org.postgresql.Driver
Connection String: jdbc:postgresql://192.168.120.93:5432/rhndb
Username: rhnuser
Password: ******
Fetch Size:
Connection Properties:
There are currently 0 values in the map:
entry#
SQL Dialect
Identifier enclose:
New link was successfully created with validation status OK and name PostgreSQL_Link
1.5 创建Oracle Link
sqoop:000> create link -c generic-jdbc-connector
Creating link for connector with name generic-jdbc-connector
Please fill following values to create new link object
Name: Oracle_link
Database connection
Driver class: oracle.jdbc.driver.OracleDriver
Connection String: jdbc:oracle:thin:@192.168.120.121:1521:rhndb
Username: spwuser
Password: ******
Fetch Size:
Connection Properties:
There are currently 0 values in the map:
entry#
SQL Dialect
Identifier enclose:
New link was successfully created with validation status OK and name Oracle_link
> 注意:关系型数据库在Identifier enclose这里,必须先空格再回车。
二、创建关系型数据库到HDFS作业
2.1 MySQL To HDFS作业
sqoop:000> create job -f MySQL_Link -t HDFS_Link
Creating job for links with from name MySQL_Link and to name HDFS_Link
Please fill following values to create new job object
Name: MySQL_To_HDFS
Database source
Schema name: smsqw
Table name: tbMessage
SQL statement:
Column names:
There are currently 0 values in the list:
element#
Partition column:
Partition column nullable:
Boundary query:
Incremental read
Check column:
Last value:
Target configuration
Override null value:
Null value:
File format:
0 : TEXT_FILE
1 : SEQUENCE_FILE
2 : PARQUET_FILE
Choose: 0
Compression codec:
0 : NONE
1 : DEFAULT
2 : DEFLATE
3 : GZIP
4 : BZIP2
5 : LZO
6 : LZ4
7 : SNAPPY
8 : CUSTOM
Choose: 1
Custom codec:
Output directory: /user/DataSource
Append mode:
Throttling resources
Extractors:
Loaders:
Classpath configuration
Extra mapper jars:
There are currently 0 values in the list:
element#
New job was successfully created with validation status OK and name MySQL_To_HDFS
2.2 PostgreSQL To HDFS作业
sqoop:000> create job -f PostgreSQL_Link -t HDFS_Link
Creating job for links with from name PostgreSQL_Link and to name HDFS_Link
Please fill following values to create new job object
Name: PostgreSQL_to_HDFS
Database source
Schema name: rhndb
Table name: rhnpackagechangelogdata
SQL statement:
Column names:
There are currently 0 values in the list:
element#
Partition column: id --如果表没有主键,就指定一个字段
Partition column nullable:
Boundary query:
Incremental read
Check column:
Last value:
Target configuration
Override null value:
Null value:
File format:
0 : TEXT_FILE
1 : SEQUENCE_FILE
2 : PARQUET_FILE
Choose: 0
Compression codec:
0 : NONE
1 : DEFAULT
2 : DEFLATE
3 : GZIP
4 : BZIP2
5 : LZO
6 : LZ4
7 : SNAPPY
8 : CUSTOM
Choose: 1
Custom codec:
Output directory: /user/DataSource/PostgreSQL/RHNPACKAGECHANGELOGDATA
Append mode:
Throttling resources
Extractors:
Loaders:
Classpath configuration
Extra mapper jars:
There are currently 0 values in the list:
element#
New job was successfully created with validation status OK and name PostgreSQL_to_HDFS
2.3 创建Oracle To HDFS作业
sqoop:000> create job -f Oracle_link -t HDFS_Link
Creating job for links with from name Oracle_link and to name HDFS_Link
Please fill following values to create new job object
Name: Oracle_To_HDFS
Database source
Schema name: spwuser
Table name: RHNCHANNELPACKAGE
SQL statement:
Column names:
There are currently 0 values in the list:
element#
Partition column:
Partition column nullable:
Boundary query:
Incremental read
Check column:
Last value:
Target configuration
Override null value:
Null value:
File format:
0 : TEXT_FILE
1 : SEQUENCE_FILE
2 : PARQUET_FILE
Choose: 0
Compression codec:
0 : NONE
1 : DEFAULT
2 : DEFLATE
3 : GZIP
4 : BZIP2
5 : LZO
6 : LZ4
7 : SNAPPY
8 : CUSTOM
Choose: 1
Custom codec:
Output directory: /user/DataSource/Oracle/RHNCHANNELPACKAGE
Append mode:
Throttling resources
Extractors:
Loaders:
Classpath configuration
Extra mapper jars:
There are currently 0 values in the list:
element#
New job was successfully created with validation status OK and name Oracle_To_HDFS
> 注意:如果表没有主键,则需在"Partition column: "处指定一个字段。
三、创建HDFS To 关系型数据库作业
3.1 HDFS To MySQL作业
sqoop:000> create job -f HDFS_Link -t MySQL_Link
Creating job for links with from name HDFS_Link and to name MySQL_Link
Please fill following values to create new job object
Name: HDFS_To_MySQL
Input configuration
Input directory: /user/DataSource/MySQL/tbDeliverMsg
Override null value:
Null value:
Incremental import
Incremental type:
0 : NONE
1 : NEW_FILES
Choose: 0
Last imported date:
Database target
Schema name: smsqw
Table name: tbDeliverMsg2
Column names:
There are currently 0 values in the list:
element#
Staging table:
Clear stage table:
Throttling resources
Extractors:
Loaders:
Classpath configuration
Extra mapper jars:
There are currently 0 values in the list:
element#
New job was successfully created with validation status OK and name HDFS_To_MySQL
3.2 HDFS To PostgreSQL作业
sqoop:000> create job -f HDFS_Link -t PostgreSQL_Link
Creating job for links with from name HDFS_Link and to name PostgreSQL_Link
Please fill following values to create new job object
Name: HDFS_To_PostgreSQL
Input configuration
Input directory: /user/DataSource/PostgreSQL/RHNPACKAGECHANGELOGDATA
Override null value:
Null value:
Incremental import
Incremental type:
0 : NONE
1 : NEW_FILES
Choose: 0
Last imported date:
Database target
Schema name: public
Table name: rhnpackagechangelogdata
Column names:
There are currently 0 values in the list:
element#
Staging table:
Clear stage table:
Throttling resources
Extractors:
Loaders:
Classpath configuration
Extra mapper jars:
There are currently 0 values in the list:
element#
New job was successfully created with validation status OK and name HDFS_To_PostgreSQL
3.3 HDFS To Oracle作业
sqoop:000> create job -f HDFS_Link -t Oracle_link
Creating job for links with from name HDFS_Link and to name Oracle_link
Please fill following values to create new job object
Name: HDFS_To_Oracle
Input configuration
Input directory: /user/DataSource/Oracle/QRTZ_CRON_TRIGGERS
Override null value:
Null value:
Incremental import
Incremental type:
0 : NONE
1 : NEW_FILES
Choose: 0
Last imported date:
Database target
Schema name: spwuser
Table name: QRTZ_CRON_TRIGGERS
Column names:
There are currently 0 values in the list:
element#
Staging table:
Clear stage table:
Throttling resources
Extractors:
Loaders:
Classpath configuration
Extra mapper jars:
There are currently 0 values in the list:
element#
New job was successfully created with validation status OK and name create job -f HDFS_Link -t Oracle_link
另外,可以查看指定作业的详细信息,如下:
四、运行作业
4.1 MySQL_To_HDFS
4.2 PostgreSQL_To_HDFS
4.3 Oracle_To_HDFS
4.4 HDFS_To_MySQL
4.5 HDFS_To_PostgreSQL
4.6 HDFS_To_Oracle
参考文献:
1、Sqoop2 User Guide