作者:小丁啊小丁 | 来源:互联网 | 2023-05-19 03:33
Sqoop 工具是Hadoop环境下连接关系数据库,和hadoop存储系统的桥梁,支持多种关系数据源和hive,hdfs,hbase的相互导入。一般情况下,关系数据表存在于线上环境的备份环境,需要每天进行数据导入,根据每天的数据量而言,sqoop可以全表导入,对于每天产生的数据量不是很大的情形可以全表导入,但是sqoop也提供了增量数据导入的机制。
下面介绍几个常用的sqoop的命令,以及一些参数:
序号
|
命令/command
|
类
|
说明
|
1
|
impor
|
ImportTool
|
从关系型数据库中导入数据(来自表或者查询语句)到HDFS中
|
2
|
export
|
ExportTool
|
将HDFS中的数据导入到关系型数据库中
|
3
|
codegen
|
CodeGenTool
|
获取数据库中某张表数据生成Java并打成jar包
|
4
|
create-hive-table
|
CreateHiveTableTool
|
创建Hive表
|
5
|
eval
|
EvalSqlTool
|
查看SQL执行结果
|
6
|
import-all-tables
|
ImportAllTablesTool
|
导入某个数据库下所有表到HDFS中
|
7
|
job
|
JobTool
|
|
8
|
list-databases
|
ListDatabasesTool
|
列出所有数据库名
|
9
|
list-tables
|
ListTablesTool
|
列出某个数据库下所有表
|
10
|
merge
|
MergeTool
|
|
11
|
metastore
|
MetastoreTool
|
|
12
|
help
|
HelpTool
|
查看帮助
|
13
|
version
|
VersionTool
|
查看版本
|
接着列出Sqoop的各种通用参数,然后针对以上13个命令列出他们自己的参数.Sqoop通用参数又分Common arguments
Incrementalimport arguments
Outputline formatting arguments
Inputparsing arguments,Hive arguments
HBasearguments
GenericHadoop command-line arguments
1.Common arguments通用参数,主要是针对关系型数据库链接的一些参数
序号
|
参数
|
说明
|
样例
|
1
|
connect
|
连接关系型数据库的URL
|
jdbc:mysql://localhost/sqoop_datas
|
2
|
connection-manager
|
连接管理类,一般不用
|
|
3
|
driver
|
连接驱动
|
|
4
|
hadoop-home
|
hadoop目录
|
/home/hadoop
|
5
|
help
|
查看帮助信息
|
|
6
|
password
|
连接关系型数据库的密码
|
|
7
|
username
|
链接关系型数据库的用户名
|
|
8
|
verbose
|
查看更多的信息,其实是将日志级别调低
|
该参数后面不接值
|
Importcontrol arguments:
Argument
|
Description
|
--append
|
Append data to an existing dataset in HDFS
|
--as-avrodatafile
|
Imports data to Avro Data Files
|
--as-sequencefile
|
Imports data to SequenceFiles
|
--as-textfile
|
Imports data as plain text (default)
|
--boundary-query
|
Boundary query to use for creating splits
|
--columns
|
Columns to import from table
|
--direct
|
Use direct import fast path
|
--direct-split-size
|
Split the input stream every n bytes when importing in direct mode
|
--inline-lob-limit
|
Set the maximum size for an inline LOB
|
-m,--num-mappers
|
Use n map tasks to import in parallel
|
-e,--query
|
Import the results of statement.
|
--split-by
|
Column of the table used to split work units
|
--table
|
Table to read
|
--target-dir
|
HDFS destination dir
|
--warehouse-dir
|
HDFS parent for table destination
|
--where
|
WHERE clause to use during import
|
-z,--compress
|
Enable compression
|
--compression-codec
|
Use Hadoop codec (default gzip)
|
--null-string
|
The string to be written for a null value for string columns
|
--null-non-string
|
The string to be written for a null value for non-string columns
|
Incrementalimport arguments:
Argument
|
Description
|
--check-column (col)
|
Specifies the column to be examined when determining which rows to import.
|
--incremental (mode)
|
Specifies how Sqoop determines which rows are new. Legal values for mode include append and lastmodified.
|
--last-value (value)
|
Specifies the maximum value of the check column from the previous import.
|
Output lineformatting arguments:
Argument
|
Description
|
--enclosed-by
|
Sets a required field enclosing character
|
--escaped-by
|
Sets the escape character
|
--fields-terminated-by
|
Sets the field separator character
|
--lines-terminated-by
|
Sets the end-of-line character
|
--mysql-delimiters
|
Uses MySQL’s default delimiter set: fields: , lines: \n escaped-by: \ optionally-enclosed-by: '
|
--optionally-enclosed-by
|
Sets a field enclosing character
|
Hivearguments:
Argument
|
Description
|
--hive-home
|
Override $HIVE_HOME
|
--hive-import
|
Import tables into Hive (Uses Hive’s default delimiters if none are set.)
|
--hive-overwrite
|
Overwrite existing data in the Hive table.
|
--create-hive-table
|
If set, then the job will fail if the target hive
|
|
table exits. By default this property is false.
|
--hive-table
|
Sets the table name to use when importing to Hive.
|
--hive-drop-import-delims
|
Drops \n, \r, and \01 from string fields when importing to Hive.
|
--hive-delims-replacement
|
Replace \n, \r, and \01 from string fields with user defined string when importing to Hive.
|
--hive-partition-key
|
Name of a hive field to partition are sharded on
|
--hive-partition-value
|
String-value that serves as partition key for this imported into hive in this job.
|
--map-column-hive
|
Override default mapping from SQL type to Hive type for configured columns.
|
HBasearguments:
Argument
|
Description
|
--column-family
|
Sets the target column family for the import
|
--hbase-create-table
|
If specified, create missing HBase tables
|
--hbase-row-key
|
Specifies which input column to use as the row key
|
--hbase-table
|
Specifies an HBase table to use as the target instead of HDFS
|
Codegeneration arguments:
Argument
|
Description
|
--bindir
|
Output directory for compiled objects
|
--class-name
|
Sets the generated class name. This overrides --package-name. When combined with --jar-file, sets the input class.
|
--jar-file
|
Disable code generation; use specified jar
|
--outdir
|
Output directory for generated code
|
--package-name
|
Put auto-generated classes in this package
|
--map-column-java
|
Override default mapping from SQL type to Java type for configured columns.
|
Sqoop 的详细介绍:请点这里
Sqoop 的下载地址:请点这里
[Hadoop] Sqoop安装过程详解 2013-05/84082.htm
用Sqoop进行MySQL和HDFS系统间的数据互导 2013-04/83447.htm
Hadoop Oozie学习笔记 Oozie不支持Sqoop问题解决 2012-08/67027.htm
Hadoop生态系统搭建(hadoop hive hbase zookeeper oozie Sqoop) 2012-03/55721.htm
Hadoop学习全程记录——使用Sqoop将MySQL中数据导入到Hive中 2012-01/51993.htm