Sqoop使用和简介

作者：小丁啊小丁 | 来源：互联网 | 2023-05-19 03:33

nsitionalENhttp:www.w3.orgTRxhtml1DTDxhtml1-transitional.dtd

Sqoop 工具是Hadoop环境下连接关系数据库，和hadoop存储系统的桥梁，支持多种关系数据源和hive,hdfs,hbase的相互导入。一般情况下，关系数据表存在于线上环境的备份环境，需要每天进行数据导入，根据每天的数据量而言，sqoop可以全表导入，对于每天产生的数据量不是很大的情形可以全表导入，但是sqoop也提供了增量数据导入的机制。

下面介绍几个常用的sqoop的命令，以及一些参数：

序号	命令/command	类	说明
1	impor	ImportTool	从关系型数据库中导入数据(来自表或者查询语句)到HDFS中
2	export	ExportTool	将HDFS中的数据导入到关系型数据库中
3	codegen	CodeGenTool	获取数据库中某张表数据生成Java并打成jar包
4	create-hive-table	CreateHiveTableTool	创建Hive表
5	eval	EvalSqlTool	查看SQL执行结果
6	import-all-tables	ImportAllTablesTool	导入某个数据库下所有表到HDFS中
7	job	JobTool
8	list-databases	ListDatabasesTool	列出所有数据库名
9	list-tables	ListTablesTool	列出某个数据库下所有表
10	merge	MergeTool
11	metastore	MetastoreTool
12	help	HelpTool	查看帮助
13	version	VersionTool	查看版本

接着列出Sqoop的各种通用参数,然后针对以上13个命令列出他们自己的参数.Sqoop通用参数又分Common arguments

Incrementalimport arguments

Outputline formatting arguments

Inputparsing arguments,Hive arguments

HBasearguments

GenericHadoop command-line arguments

1.Common arguments通用参数,主要是针对关系型数据库链接的一些参数

序号	参数	说明	样例
1	connect	连接关系型数据库的URL	jdbc:mysql://localhost/sqoop_datas
2	connection-manager	连接管理类,一般不用
3	driver	连接驱动
4	hadoop-home	hadoop目录	/home/hadoop
5	help	查看帮助信息
6	password	连接关系型数据库的密码
7	username	链接关系型数据库的用户名
8	verbose	查看更多的信息,其实是将日志级别调低	该参数后面不接值

Importcontrol arguments:

Argument	Description
--append	Append data to an existing dataset in HDFS
--as-avrodatafile	Imports data to Avro Data Files
--as-sequencefile	Imports data to SequenceFiles
--as-textfile	Imports data as plain text (default)
--boundary-query	Boundary query to use for creating splits
--columns	Columns to import from table
--direct	Use direct import fast path
--direct-split-size	Split the input stream every n bytes when importing in direct mode
--inline-lob-limit	Set the maximum size for an inline LOB
-m,--num-mappers	Use n map tasks to import in parallel
-e,--query	Import the results of statement.
--split-by	Column of the table used to split work units
--table	Table to read
--target-dir	HDFS destination dir
--warehouse-dir	HDFS parent for table destination
--where	WHERE clause to use during import
-z,--compress	Enable compression
--compression-codec	Use Hadoop codec (default gzip)
--null-string	The string to be written for a null value for string columns
--null-non-string	The string to be written for a null value for non-string columns

Incrementalimport arguments:

Argument	Description
--check-column (col)	Specifies the column to be examined when determining which rows to import.
--incremental (mode)	Specifies how Sqoop determines which rows are new. Legal values for mode include append and lastmodified.
--last-value (value)	Specifies the maximum value of the check column from the previous import.

Output lineformatting arguments:

Argument	Description
--enclosed-by	Sets a required field enclosing character
--escaped-by	Sets the escape character
--fields-terminated-by	Sets the field separator character
--lines-terminated-by	Sets the end-of-line character
--mysql-delimiters	Uses MySQL’s default delimiter set: fields: , lines: \n escaped-by: \ optionally-enclosed-by: '
--optionally-enclosed-by	Sets a field enclosing character

Hivearguments:

Argument	Description
--hive-home	Override $HIVE_HOME
--hive-import	Import tables into Hive (Uses Hive’s default delimiters if none are set.)
--hive-overwrite	Overwrite existing data in the Hive table.
--create-hive-table	If set, then the job will fail if the target hive
	table exits. By default this property is false.
--hive-table	Sets the table name to use when importing to Hive.
--hive-drop-import-delims	Drops \n, \r, and \01 from string fields when importing to Hive.
--hive-delims-replacement	Replace \n, \r, and \01 from string fields with user defined string when importing to Hive.
--hive-partition-key	Name of a hive field to partition are sharded on
--hive-partition-value	String-value that serves as partition key for this imported into hive in this job.
--map-column-hive	Override default mapping from SQL type to Hive type for configured columns.

HBasearguments:

Argument	Description
--column-family	Sets the target column family for the import
--hbase-create-table	If specified, create missing HBase tables
--hbase-row-key
Specifies which input column to use as the row key
--hbase-table	Specifies an HBase table to use as the target instead of HDFS

Codegeneration arguments:

Argument	Description
--bindir	Output directory for compiled objects
--class-name	Sets the generated class name. This overrides --package-name. When combined with --jar-file, sets the input class.
--jar-file	Disable code generation; use specified jar
--outdir	Output directory for generated code
--package-name	Put auto-generated classes in this package
--map-column-java	Override default mapping from SQL type to Java type for configured columns.

Sqoop 的详细介绍：请点这里
Sqoop 的下载地址：请点这里

[Hadoop] Sqoop安装过程详解 2013-05/84082.htm

用Sqoop进行MySQL和HDFS系统间的数据互导 2013-04/83447.htm

Hadoop Oozie学习笔记 Oozie不支持Sqoop问题解决 2012-08/67027.htm

Hadoop生态系统搭建（hadoop hive hbase zookeeper oozie Sqoop） 2012-03/55721.htm

Hadoop学习全程记录——使用Sqoop将MySQL中数据导入到Hive中 2012-01/51993.htm

推荐阅读

nginx
ftp和文件服务器,ftp和文件服务器的区别

ftp和文件服务器的区别内容精选换一换obsftp工具于2021年2月9日正式下线，下线后OBS不再对此工具提供维护和客户支持服务，给您带来不便敬请谅解 ... [详细]

蜡笔小新 2023-10-11 19:18:33
php
网站访问全流程解析

本文详细介绍了从用户在浏览器中输入一个域名（如www.yy.com）到页面完全展示的整个过程，包括DNS解析、TCP连接、请求响应等多个步骤。 ... [详细]

蜡笔小新 2024-11-12 18:13:16
select
第三节 Sqoop：实现数据的精准控制与高效导入

通过使用Sqoop导入工具，可以精确控制并高效地将表数据的特定子集导入到HDFS中。具体而言，可以通过在导入命令中添加WHERE子句来指定所需的数据范围，从而在数据库服务器上执行相应的SQL查询，并将查询结果高效地存储到HDFS中。这种方法不仅提高了数据导入的灵活性，还确保了数据的准确性和完整性。 ... [详细]

蜡笔小新 2024-11-11 22:58:51
select
您的数据库配置是否安全？DBSAT工具助您一臂之力！

本文探讨了Oracle提供的免费工具DBSAT，该工具能够有效协助用户检测和优化数据库配置的安全性。通过全面的分析和报告，DBSAT帮助用户识别潜在的安全漏洞，并提供针对性的改进建议，确保数据库系统的稳定性和安全性。 ... [详细]

蜡笔小新 2024-11-11 14:44:47
php
PHP自学必备：从零开始的准备工作与工具选择

PHP自学必备：从零开始的准备工作与工具选择 ... [详细]

蜡笔小新 2024-11-07 15:13:09
php
马蜂窝数据总监分享：从数仓到数据中台，大数据演进技术选型最优解

大家好，今天分享的议题主要包括几大内容：带大家回顾一下大数据在国内的发展，从传统数仓到当前数据中台的演进过程；我个人认为数 ... [详细]

蜡笔小新 2023-10-14 14:20:07
input
在CentOS 7环境中安装配置Redis及使用Redis Desktop Manager连接时的注意事项与技巧

在 CentOS 7 环境中安装和配置 Redis 时，需要注意一些关键步骤和最佳实践。本文详细介绍了从安装 Redis 到配置其基本参数的全过程，并提供了使用 Redis Desktop Manager 连接 Redis 服务器的技巧和注意事项。此外，还探讨了如何优化性能和确保数据安全，帮助用户在生产环境中高效地管理和使用 Redis。 ... [详细]

蜡笔小新 2024-11-11 18:27:44
php
飞秋软件的OA消息接口服务系统

为了提升单位内部沟通效率，我们开发了一套飞秋软件与OA系统的消息接口服务系统。该系统能够将OA系统中的审批、通知等信息自动同步至飞秋平台，确保员工在使用飞秋进行日常沟通的同时，也能及时获取OA系统的各类重要信息，从而实现无缝对接，提高工作效率。 ... [详细]

蜡笔小新 2024-11-11 13:44:09
select
如何将TS文件转换为M3U8直播流：HLS与M3U8格式详解

在视频传输领域，MP4虽然常见，但在直播场景中直接使用MP4格式存在诸多问题。例如，MP4文件的头部信息（如ftyp、moov）较大，导致初始加载时间较长，影响用户体验。相比之下，HLS（HTTP Live Streaming）协议及其M3U8格式更具优势。HLS通过将视频切分成多个小片段，并生成一个M3U8播放列表文件，实现低延迟和高稳定性。本文详细介绍了如何将TS文件转换为M3U8直播流，包括技术原理和具体操作步骤，帮助读者更好地理解和应用这一技术。 ... [详细]

蜡笔小新 2024-11-11 12:12:04
select
如何在本地环境中调试远程服务器上的网站代码执行问题

在本地环境中调试远程服务器上的网站代码执行问题，可以通过以下步骤实现：首先，在本地安装 Visual Studio 并配置远程调试工具。接着，确保服务器和本地机器之间的网络连接畅通，并正确设置防火墙规则以允许调试流量。最后，使用 Visual Studio 的远程调试功能连接到服务器，进行代码调试。这种方法不仅提高了开发效率，还减少了在服务器上直接操作的风险。 ... [详细]

蜡笔小新 2024-11-10 10:32:01
main
FastDFS Nginx 扩展模块的源代码解析与技术剖析

FastDFS Nginx 扩展模块的源代码解析与技术剖析 ... [详细]

蜡笔小新 2024-11-04 20:15:18
main
hadoop基础----hadoop实战(六)-----hadoop管理工具---Cloudera Manager---CDH介绍

我们在之前的文章中已经初步介绍了Cloudera。hadoop基础----hadoop实战(零)-----hadoop的平台版本选择从版本选择这篇文章中我们了解到除了hadoop官方版本外很多 ... [详细]

蜡笔小新 2023-10-16 14:21:13
php
推荐引擎数据导入模块的实现

毕设做到后半部分，需要实现将用户在一段时间(比如1天)内产生的新数据导入HDFS的功能，这样数据仓库中的数据才能和数据库中的数据同步在新建了一个PyDev项目后，需要如下操作(拣最 ... [详细]

蜡笔小新 2023-10-14 14:05:02
php
不会搭建大数据平台，我被老板优化了...

不会,搭建,大数,据,平台,我 ... [详细]

蜡笔小新 2023-10-12 16:44:49
php
数据仓库中基本概念

一、数据仓库数据仓库(DataWarehouse)是一个面向主题的、集成的、稳定的且随时间变化的数据集合，用于支持管理人员的决策面向主题主题就是类型的意思。传统数 ... [详细]

蜡笔小新 2023-10-12 16:18:36

小丁啊小丁

这个家伙很懒，什么也没留下！

Tags | 热门标签

RankList | 热门文章