热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

mysqldatatransfer_DataTransferBySqoop2

本文主要描述了HDFS和RDBMS数据库之间的数据迁移。一、创建Link这里主要描述下Oracle、PostgreSQL、MySQL以及HDFS的link创建。1.1JDBC驱动安

本文主要描述了HDFS和RDBMS数据库之间的数据迁移。

一、创建Link

这里主要描述下Oracle、PostgreSQL、MySQL以及HDFS的link创建。

1.1 JDBC驱动安装

从官网下载Oracle和PostgreSQL的jdbc驱动,拷贝到$SQOOP_HOME/server/lib,然后重启sqoop2服务器。我这里用到的Oracle jdbc驱动为ojdbc6.jar,PostgreSQL为postgresql-42.1.4.jar,MySQL为mysql-connector-java-5.1.44-bin.jar。

1.2 创建HDFS Link

sqoop:000> create link -c hdfs-connector

Creating link for connector with name hdfs-connector

Please fill following values to create new link object

Name: HDFS_Link

HDFS cluster

URI: hdfs://192.168.120.96:9000

Conf directory: /u01/hadoop/etc/hadoop

Additional configs::

There are currently 0 values in the map:

entry#

New link was successfully created with validation status OK and name HDFS_Link

1.3 创建MySQL Link

sqoop:000> create link -c generic-jdbc-connector

Creating link for connector with name generic-jdbc-connector

Please fill following values to create new link object

Name: MySQL_Link

Database connection

Driver class: com.mysql.jdbc.Driver

Connection String: jdbc:mysql://192.168.120.92:3306/smsqw?useSSL=false

Username: root

Password: *********

Fetch Size:

Connection Properties:

There are currently 0 values in the map:

entry#

SQL Dialect

Identifier enclose:

New link was successfully created with validation status OK and name MySQL_Link

1.4 创建PostgreSQL link

sqoop:000> create link -c generic-jdbc-connector

Creating link for connector with name generic-jdbc-connector

Please fill following values to create new link object

Name: PostgreSQL_Link

Database connection

Driver class: org.postgresql.Driver

Connection String: jdbc:postgresql://192.168.120.93:5432/rhndb

Username: rhnuser

Password: ******

Fetch Size:

Connection Properties:

There are currently 0 values in the map:

entry#

SQL Dialect

Identifier enclose:

New link was successfully created with validation status OK and name PostgreSQL_Link

1.5 创建Oracle Link

sqoop:000> create link -c generic-jdbc-connector

Creating link for connector with name generic-jdbc-connector

Please fill following values to create new link object

Name: Oracle_link

Database connection

Driver class: oracle.jdbc.driver.OracleDriver

Connection String: jdbc:oracle:thin:@192.168.120.121:1521:rhndb

Username: spwuser

Password: ******

Fetch Size:

Connection Properties:

There are currently 0 values in the map:

entry#

SQL Dialect

Identifier enclose:

New link was successfully created with validation status OK and name Oracle_link

> 注意:关系型数据库在Identifier enclose这里,必须先空格再回车。

b56118559fc0d81302b294c99d1fd31c.png

二、创建关系型数据库到HDFS作业

2.1 MySQL To HDFS作业

sqoop:000> create job -f MySQL_Link -t HDFS_Link

Creating job for links with from name MySQL_Link and to name HDFS_Link

Please fill following values to create new job object

Name: MySQL_To_HDFS

Database source

Schema name: smsqw

Table name: tbMessage

SQL statement:

Column names:

There are currently 0 values in the list:

element#

Partition column:

Partition column nullable:

Boundary query:

Incremental read

Check column:

Last value:

Target configuration

Override null value:

Null value:

File format:

0 : TEXT_FILE

1 : SEQUENCE_FILE

2 : PARQUET_FILE

Choose: 0

Compression codec:

0 : NONE

1 : DEFAULT

2 : DEFLATE

3 : GZIP

4 : BZIP2

5 : LZO

6 : LZ4

7 : SNAPPY

8 : CUSTOM

Choose: 1

Custom codec:

Output directory: /user/DataSource

Append mode:

Throttling resources

Extractors:

Loaders:

Classpath configuration

Extra mapper jars:

There are currently 0 values in the list:

element#

New job was successfully created with validation status OK and name MySQL_To_HDFS

2.2 PostgreSQL To HDFS作业

sqoop:000> create job -f PostgreSQL_Link -t HDFS_Link

Creating job for links with from name PostgreSQL_Link and to name HDFS_Link

Please fill following values to create new job object

Name: PostgreSQL_to_HDFS

Database source

Schema name: rhndb

Table name: rhnpackagechangelogdata

SQL statement:

Column names:

There are currently 0 values in the list:

element#

Partition column: id --如果表没有主键,就指定一个字段

Partition column nullable:

Boundary query:

Incremental read

Check column:

Last value:

Target configuration

Override null value:

Null value:

File format:

0 : TEXT_FILE

1 : SEQUENCE_FILE

2 : PARQUET_FILE

Choose: 0

Compression codec:

0 : NONE

1 : DEFAULT

2 : DEFLATE

3 : GZIP

4 : BZIP2

5 : LZO

6 : LZ4

7 : SNAPPY

8 : CUSTOM

Choose: 1

Custom codec:

Output directory: /user/DataSource/PostgreSQL/RHNPACKAGECHANGELOGDATA

Append mode:

Throttling resources

Extractors:

Loaders:

Classpath configuration

Extra mapper jars:

There are currently 0 values in the list:

element#

New job was successfully created with validation status OK and name PostgreSQL_to_HDFS

2.3 创建Oracle To HDFS作业

sqoop:000> create job -f Oracle_link -t HDFS_Link

Creating job for links with from name Oracle_link and to name HDFS_Link

Please fill following values to create new job object

Name: Oracle_To_HDFS

Database source

Schema name: spwuser

Table name: RHNCHANNELPACKAGE

SQL statement:

Column names:

There are currently 0 values in the list:

element#

Partition column:

Partition column nullable:

Boundary query:

Incremental read

Check column:

Last value:

Target configuration

Override null value:

Null value:

File format:

0 : TEXT_FILE

1 : SEQUENCE_FILE

2 : PARQUET_FILE

Choose: 0

Compression codec:

0 : NONE

1 : DEFAULT

2 : DEFLATE

3 : GZIP

4 : BZIP2

5 : LZO

6 : LZ4

7 : SNAPPY

8 : CUSTOM

Choose: 1

Custom codec:

Output directory: /user/DataSource/Oracle/RHNCHANNELPACKAGE

Append mode:

Throttling resources

Extractors:

Loaders:

Classpath configuration

Extra mapper jars:

There are currently 0 values in the list:

element#

New job was successfully created with validation status OK and name Oracle_To_HDFS

> 注意:如果表没有主键,则需在"Partition column: "处指定一个字段。

三、创建HDFS To 关系型数据库作业

3.1 HDFS To MySQL作业

sqoop:000> create job -f HDFS_Link -t MySQL_Link

Creating job for links with from name HDFS_Link and to name MySQL_Link

Please fill following values to create new job object

Name: HDFS_To_MySQL

Input configuration

Input directory: /user/DataSource/MySQL/tbDeliverMsg

Override null value:

Null value:

Incremental import

Incremental type:

0 : NONE

1 : NEW_FILES

Choose: 0

Last imported date:

Database target

Schema name: smsqw

Table name: tbDeliverMsg2

Column names:

There are currently 0 values in the list:

element#

Staging table:

Clear stage table:

Throttling resources

Extractors:

Loaders:

Classpath configuration

Extra mapper jars:

There are currently 0 values in the list:

element#

New job was successfully created with validation status OK and name HDFS_To_MySQL

3.2 HDFS To PostgreSQL作业

sqoop:000> create job -f HDFS_Link -t PostgreSQL_Link

Creating job for links with from name HDFS_Link and to name PostgreSQL_Link

Please fill following values to create new job object

Name: HDFS_To_PostgreSQL

Input configuration

Input directory: /user/DataSource/PostgreSQL/RHNPACKAGECHANGELOGDATA

Override null value:

Null value:

Incremental import

Incremental type:

0 : NONE

1 : NEW_FILES

Choose: 0

Last imported date:

Database target

Schema name: public

Table name: rhnpackagechangelogdata

Column names:

There are currently 0 values in the list:

element#

Staging table:

Clear stage table:

Throttling resources

Extractors:

Loaders:

Classpath configuration

Extra mapper jars:

There are currently 0 values in the list:

element#

New job was successfully created with validation status OK and name HDFS_To_PostgreSQL

3.3 HDFS To Oracle作业

sqoop:000> create job -f HDFS_Link -t Oracle_link

Creating job for links with from name HDFS_Link and to name Oracle_link

Please fill following values to create new job object

Name: HDFS_To_Oracle

Input configuration

Input directory: /user/DataSource/Oracle/QRTZ_CRON_TRIGGERS

Override null value:

Null value:

Incremental import

Incremental type:

0 : NONE

1 : NEW_FILES

Choose: 0

Last imported date:

Database target

Schema name: spwuser

Table name: QRTZ_CRON_TRIGGERS

Column names:

There are currently 0 values in the list:

element#

Staging table:

Clear stage table:

Throttling resources

Extractors:

Loaders:

Classpath configuration

Extra mapper jars:

There are currently 0 values in the list:

element#

New job was successfully created with validation status OK and name create job -f HDFS_Link -t Oracle_link

9d585218ddbed067a74cceaf306df766.png

另外,可以查看指定作业的详细信息,如下:

aa261495f690b43bd537c92266505366.png

四、运行作业

4.1 MySQL_To_HDFS

cb7a8f1759bcc1e9bed2305db21a0908.png

4.2 PostgreSQL_To_HDFS

a8e30197f843040cfcad1fd037bd54a8.png

4.3 Oracle_To_HDFS

faad6b9c8f44ca291a75a0d6f764b333.png

4.4 HDFS_To_MySQL

004e88da129969c40ba18cda4a705805.png

4.5 HDFS_To_PostgreSQL

c11f9cbdb5aa16bede5e8589fd1b1a6c.png

4.6 HDFS_To_Oracle

b0627989565e950b682c7a1861ec0d12.png

参考文献:

1、Sqoop2 User Guide



推荐阅读
  • 数据读取hadoopFileParameters:path–pathtoHadoopfileinputFormatClass–fullyqualifiedclassnameo ... [详细]
  • 从0到1搭建大数据平台
    从0到1搭建大数据平台 ... [详细]
  • Hadoop的文件操作位于包org.apache.hadoop.fs里面,能够进行新建、删除、修改等操作。比较重要的几个类:(1)Configurati ... [详细]
  • 本文详细介绍了 PHP 中对象的生命周期、内存管理和魔术方法的使用,包括对象的自动销毁、析构函数的作用以及各种魔术方法的具体应用场景。 ... [详细]
  • 在CentOS 7环境中安装配置Redis及使用Redis Desktop Manager连接时的注意事项与技巧
    在 CentOS 7 环境中安装和配置 Redis 时,需要注意一些关键步骤和最佳实践。本文详细介绍了从安装 Redis 到配置其基本参数的全过程,并提供了使用 Redis Desktop Manager 连接 Redis 服务器的技巧和注意事项。此外,还探讨了如何优化性能和确保数据安全,帮助用户在生产环境中高效地管理和使用 Redis。 ... [详细]
  • 您的数据库配置是否安全?DBSAT工具助您一臂之力!
    本文探讨了Oracle提供的免费工具DBSAT,该工具能够有效协助用户检测和优化数据库配置的安全性。通过全面的分析和报告,DBSAT帮助用户识别潜在的安全漏洞,并提供针对性的改进建议,确保数据库系统的稳定性和安全性。 ... [详细]
  • 在Linux系统中避免安装MySQL的简易指南
    在Linux系统中避免安装MySQL的简易指南 ... [详细]
  • 本文深入解析了通过JDBC实现ActiveMQ消息持久化的机制。JDBC能够将消息可靠地存储在多种关系型数据库中,如MySQL、SQL Server、Oracle和DB2等。采用JDBC持久化方式时,数据库会自动生成三个关键表:`activemq_msgs`、`activemq_lock`和`activemq_ACKS`,分别用于存储消息数据、锁定信息和确认状态。这种机制不仅提高了消息的可靠性,还增强了系统的可扩展性和容错能力。 ... [详细]
  • 在搭建Hadoop集群以处理大规模数据存储和频繁读取需求的过程中,经常会遇到各种配置难题。本文总结了作者在实际部署中遇到的典型问题,并提供了详细的解决方案,帮助读者避免常见的配置陷阱。通过这些经验分享,希望读者能够更加顺利地完成Hadoop集群的搭建和配置。 ... [详细]
  • HBase在金融大数据迁移中的应用与挑战
    随着最后一台设备的下线,标志着超过10PB的HBase数据迁移项目顺利完成。目前,新的集群已在新机房稳定运行超过两个月,监控数据显示,新集群的查询响应时间显著降低,系统稳定性大幅提升。此外,数据消费的波动也变得更加平滑,整体性能得到了显著优化。 ... [详细]
  • 技术日志:深入探讨Spark Streaming与Spark SQL的融合应用
    技术日志:深入探讨Spark Streaming与Spark SQL的融合应用 ... [详细]
  • 深入理解Spark框架:RDD核心概念与操作详解
    RDD是Spark框架的核心计算模型,全称为弹性分布式数据集(Resilient Distributed Dataset)。本文详细解析了RDD的基本概念、特性及其在Spark中的关键操作,包括创建、转换和行动操作等,帮助读者深入理解Spark的工作原理和优化策略。通过具体示例和代码片段,进一步阐述了如何高效利用RDD进行大数据处理。 ... [详细]
  • 本文详细介绍了在Windows操作系统上使用Python 3.8.5编译支持CUDA 11和cuDNN 8.0.2的TensorFlow 2.3的步骤。文章不仅提供了详细的编译指南,还分享了编译后的文件下载链接,方便用户快速获取所需资源。此外,文中还涵盖了常见的编译问题及其解决方案,确保用户能够顺利进行编译和安装。 ... [详细]
  • 环境:centos6.5,hadoop2.6.4集群1.解压安装sqoop从官网下载好安装包,发送到集群中任意一台主机即可。相信大家已经看到,1.99.7跟1.4.6是不兼容的,而 ... [详细]
  • 本文介绍了在sqoop1.4.*版本中,如何实现自定义分隔符的方法及步骤。通过修改sqoop生成的java文件,并重新编译,可以满足实际开发中对分隔符的需求。具体步骤包括修改java文件中的一行代码,重新编译所需的hadoop包等。详细步骤和编译方法在本文中都有详细说明。 ... [详细]
author-avatar
85中互动_715
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有