热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

Sqoop使用和简介

nsitionalENhttp:www.w3.orgTRxhtml1DTDxhtml1-transitional.dtd

Sqoop 工具是Hadoop环境下连接关系数据库,和hadoop存储系统的桥梁,支持多种关系数据源和hive,hdfs,hbase的相互导入。一般情况下,关系数据表存在于线上环境的备份环境,需要每天进行数据导入,根据每天的数据量而言,sqoop可以全表导入,对于每天产生的数据量不是很大的情形可以全表导入,但是sqoop也提供了增量数据导入的机制。

下面介绍几个常用的sqoop的命令,以及一些参数:

 

序号

命令/command

说明

1

impor

ImportTool

从关系型数据库中导入数据(来自表或者查询语句)到HDFS中

2

export

ExportTool

将HDFS中的数据导入到关系型数据库中

3

codegen

CodeGenTool

获取数据库中某张表数据生成Java并打成jar包

4

create-hive-table

CreateHiveTableTool

创建Hive表

5

eval

EvalSqlTool

查看SQL执行结果

6

import-all-tables

ImportAllTablesTool

导入某个数据库下所有表到HDFS中

7

job

JobTool

 

8

list-databases

ListDatabasesTool

列出所有数据库名

9

list-tables

ListTablesTool

列出某个数据库下所有表

10

merge

MergeTool

 

11

metastore

MetastoreTool

 

12

help

HelpTool

查看帮助

13

version

VersionTool

查看版本

 

接着列出Sqoop的各种通用参数,然后针对以上13个命令列出他们自己的参数.Sqoop通用参数又分Common arguments

Incrementalimport arguments

Outputline formatting arguments

Inputparsing arguments,Hive arguments

HBasearguments

GenericHadoop command-line arguments

 

1.Common arguments通用参数,主要是针对关系型数据库链接的一些参数

序号

参数

说明

样例

1

connect

连接关系型数据库的URL

jdbc:mysql://localhost/sqoop_datas

2

connection-manager

连接管理类,一般不用

 

3

driver

连接驱动

 

4

hadoop-home

hadoop目录

/home/hadoop

5

help

查看帮助信息

 

6

password

连接关系型数据库的密码

 

7

username

链接关系型数据库的用户名

 

8

verbose

查看更多的信息,其实是将日志级别调低

该参数后面不接值

 

Importcontrol arguments:

Argument

Description

--append

Append data to an existing dataset in HDFS

--as-avrodatafile

Imports data to Avro Data Files

--as-sequencefile

Imports data to SequenceFiles

--as-textfile

Imports data as plain text (default)

--boundary-query

Boundary query to use for creating splits

--columns

Columns to import from table

--direct

Use direct import fast path

--direct-split-size

Split the input stream every n bytes when importing in direct mode

--inline-lob-limit

Set the maximum size for an inline LOB

-m,--num-mappers

Use n map tasks to import in parallel

-e,--query

Import the results of statement.

--split-by

Column of the table used to split work units

--table

Table to read

--target-dir

HDFS destination dir

--warehouse-dir

HDFS parent for table destination

--where

WHERE clause to use during import

-z,--compress

Enable compression

--compression-codec

Use Hadoop codec (default gzip)

--null-string

The string to be written for a null value for string columns

--null-non-string

The string to be written for a null value for non-string columns

 

 

Incrementalimport arguments:

Argument

Description

--check-column (col)

Specifies the column to be examined when determining which rows to import.

--incremental (mode)

Specifies how Sqoop determines which rows are new. Legal values for mode include append and lastmodified.

--last-value (value)

Specifies the maximum value of the check column from the previous import.

 

 

Output lineformatting arguments:

Argument

Description

--enclosed-by

Sets a required field enclosing character

--escaped-by

Sets the escape character

--fields-terminated-by

Sets the field separator character

--lines-terminated-by

Sets the end-of-line character

--mysql-delimiters

Uses MySQL’s default delimiter set: fields: , lines: \n escaped-by: \ optionally-enclosed-by: '

--optionally-enclosed-by

Sets a field enclosing character

 

 

Hivearguments:

Argument

Description

--hive-home

Override $HIVE_HOME

--hive-import

Import tables into Hive (Uses Hive’s default delimiters if none are set.)

--hive-overwrite

Overwrite existing data in the Hive table.

--create-hive-table

If set, then the job will fail if the target hive

 

table exits. By default this property is false.

--hive-table

Sets the table name to use when importing to Hive.

--hive-drop-import-delims

Drops \n, \r, and \01 from string fields when importing to Hive.

--hive-delims-replacement

Replace \n, \r, and \01 from string fields with user defined string when importing to Hive.

--hive-partition-key

Name of a hive field to partition are sharded on

--hive-partition-value

String-value that serves as partition key for this imported into hive in this job.

--map-column-hive

Override default mapping from SQL type to Hive type for configured columns.

 

 

HBasearguments:

Argument

Description

--column-family

Sets the target column family for the import

--hbase-create-table

If specified, create missing HBase tables

--hbase-row-key

Specifies which input column to use as the row key

--hbase-table

Specifies an HBase table to use as the target instead of HDFS

 

 

Codegeneration arguments:

Argument

Description

--bindir

Output directory for compiled objects

--class-name

Sets the generated class name. This overrides --package-name. When combined with --jar-file, sets the input class.

--jar-file

Disable code generation; use specified jar

--outdir

Output directory for generated code

--package-name

Put auto-generated classes in this package

--map-column-java

Override default mapping from SQL type to Java type for configured columns.

Sqoop 的详细介绍:请点这里
Sqoop 的下载地址:请点这里

[Hadoop] Sqoop安装过程详解 2013-05/84082.htm

用Sqoop进行MySQL和HDFS系统间的数据互导 2013-04/83447.htm

Hadoop Oozie学习笔记 Oozie不支持Sqoop问题解决 2012-08/67027.htm

Hadoop生态系统搭建(hadoop hive hbase zookeeper oozie Sqoop) 2012-03/55721.htm

Hadoop学习全程记录——使用Sqoop将MySQL中数据导入到Hive中 2012-01/51993.htm


推荐阅读
author-avatar
小丁啊小丁
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有