sqoop安装及使用

作者：手机用户2502908275 | 来源：互联网 | 2023-05-19 10:57

简介：sqoop是一款用于hadoop和关系型数据库之间数据导入导出的工具。你可以通过sqoop把数据从数据库（比如mysql,oracle）导入到hdfs中；也可以把数据从hdfs中导出到关

简介：

　　sqoop是一款用于hadoop和关系型数据库之间数据导入导出的工具。你可以通过sqoop把数据从数据库（比如mysql,oracle）导入到hdfs中；也可以把数据从hdfs中导出到关系型数据库中。通过将sqoop的操作命令转化为Hadoop的MapReduce作业进行导入导出，(通常只涉及到Map任务)即sqoop生成的Job主要是并发运行MapTask实现数据并行传输以提升数据传送速度和效率，如果使用Shell脚本来实现多线程数据传送则存在很大的难度Sqoop2(sqoop1.99.7)需要在Hadoop安装目录下的配置文件中设置代理，属于重量级嵌入安装，文中我们使用qoop1(Sqoop1.4.6)。

前提：（若不知道如何安装请看我前面写的hadoop分类的文章）

CloudDeskTop上安装了： hadoop-2.7.3  jdk1.7.0_79  mysql-5.5.32 sqoop-1.4.6 hive-1.2.2
master01和master02安装了： hadoop-2.7.3 jdk1.7.0_79
slave01、slave02、slave03安装了： hadoop-2.7.3 jdk1.7.0_79 zookeeper-3.4.10

一、安装：

1、上传安装包sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz到/install/目录下

2、解压：

[hadoop@CloudDeskTop install]$ tar -zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz -C /software/

3、配置环境：

[hadoop@CloudDeskTop software]$ su -lc "vi /etc/profile"

JAVA_HOME=/software/jdk1.7.0_79
HADOOP_HOME=/software/hadoop-2.7.3
SQOOP_HOME=/software/sqoop-1.4.6
PATH=$PATH:$JAVA_HOME/bin:$JAVA_HOME/lib:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SQOOP_HOME/bin
export PATH JAVA_HOME HADOOP_HOME SQOOP_HOME

4、配置完环境后，执行如下语句，立即生效配置文件：

[hadoop@CloudDeskTop software]$ source /etc/profile

5、进入/software/sqoop-1.4.6/lib/目录，上传mysql-connector-java-5.1.43-bin.jar包

这个地方的数据库驱动包必须选择该版本(5.1.43)，因为Sqoop需要对接MySql数据库，如果选择的数据库驱动包不是这个版本，很容易出错。

6、配置sqoop

[hadoop@CloudDeskTop software]$ cd /software/sqoop-1.4.6/bin/

[hadoop@CloudDeskTop bin]$ vi configure-sqoop

注释掉如下代码：用这个符号“:<”作为起始符，“COMMENT”作为结束符；

127 :<<COMMENT 128 ## Moved to be a runtime check in sqoop. 129 if [ ! -d "${HBASE_HOME}" ]; then 130 echo "Warning: $HBASE_HOME does not exist! HBase imports will fail." 131 echo 'Please set $HBASE_HOME to the root of your HBase installation.' 132 fi 133 134 ## Moved to be a runtime check in sqoop. 135 if [ ! -d "${HCAT_HOME}" ]; then 136 echo "Warning: $HCAT_HOME does not exist! HCatalog jobs will fail." 137 echo 'Please set $HCAT_HOME to the root of your HCatalog installation.' 138 fi 139 140 if [ ! -d "${ACCUMULO_HOME}" ]; then 141 echo "Warning: $ACCUMULO_HOME does not exist! Accumulo imports will fail." 142 echo 'Please set $ACCUMULO_HOME to the root of your Accumulo installation.' 143 fi 144 if [ ! -d "${ZOOKEEPER_HOME}" ]; then 145 echo "Warning: $ZOOKEEPER_HOME does not exist! Accumulo imports will fail." 146 echo 'Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.' 147 fi 148 COMMENT

View Code

二、启动（没说明的都默认是在hadoop用户下操作）

【0、在CloudDeskTop的root用户下启动mysql】

[root@CloudDeskTop ~]# cd /software/mysql-5.5.32/sbin/ && ./mysqld start && lsof -i:3306 && cd -

【1、在slave节点启动zookeeper集群（小弟中选个leader和follower）】

　　cd /software/zookeeper-3.4.10/bin/ && ./zkServer.sh start && cd - && jps
　　cd /software/zookeeper-3.4.10/bin/ && ./zkServer.sh status && cd -

【2、master01启动HDFS集群】cd /software/ && start-dfs.sh && jps

【3、master01启动YARN集群】cd /software/ && start-yarn.sh && jps

【YARN集群启动时，不会把另外一个备用主节点的YARN集群拉起来启动，所以在master02执行语句:】

cd /software/ && yarn-daemon.sh start resourcemanager && jps

【4、查看进程】

【6、查询sqoop版本来判断sqoop是否安装成功】

　[hadoop@CloudDeskTop software]$ sqoop version

三、测试

　　说明：导入与导出操作的方向是以HDFS集群为基准参考点来定义的，如果数据从HDFS集群流出则表示导出，如果数据流入HDFS集群则表示导入Hive表中的数据实际上是存储到HDFS集群中的，因此对Hive表的导入与导出实际上都是在操作HDFS集群中的文件。

首先，在本地创建数据：

在hive数据库建表后上传到集群中表存放数据的路径下：

[hadoop@CloudDeskTop test]$ hdfs dfs -put testsqoop.out /user/hive/warehouse/mmzs.db/testsqoop

目标一、将hdfs集群的数据导入到mysql数据库中

1、在hive数据库mmzs中创建表，并导入数据

[hadoop@CloudDeskTop software]$ cd /software/hive-1.2.2/bin/ [hadoop@CloudDeskTop bin]$ ./hive hive> show databases; OK default mmzs mmzsmysql Time taken: 0.373 seconds, Fetched: 3 row(s) hive> create table if not exists mmzs.testsqoop(id int,name string,age int) row format delimited fields terminated by '\t'; OK Time taken: 0.126 seconds hive> select * from mmzs.testsqoop; OK 1 ligang 2 2 chenghua 3 3 liqin 1 4 zhanghua 4 5 wanghua 1 6 liulinjin 5 7 wangxiaochuan 6 8 guchuan 2 9 xiaoyong 4 10 huping 6 Time taken: 0.824 seconds, Fetched: 10 row(s)

2、在mysql数据库中创建相同字段的表

[root@CloudDeskTop bin]# cd ~ [root@CloudDeskTop ~]# cd /software/mysql-5.5.32/bin/ [root@CloudDeskTop bin]# ./mysql -uroot -p123456 -P3306 -h192.168.154.134 -e "create database mmzs character set utf8" [root@CloudDeskTop bin]# ./mysql -uroot -p123456 -h192.168.154.134 -P3306 -Dmmzs Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 12 Server version: 5.5.32 Source distribution Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> show tables; Empty set (0.00 sec) mysql> create table if not exists testsqoop(uid int(11),uname varchar(30),age int)engine=innodb charset=utf8 -> ; Query OK, 0 rows affected (0.06 sec) mysql> desc testsqoop; +-------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+-------------+------+-----+---------+-------+ | uid | int(11) | YES | | NULL | | | uname | varchar(30) | YES | | NULL | | | age | int(11) | YES | | NULL | | +-------+-------------+------+-----+---------+-------+ 3 rows in set (0.00 sec) mysql> select * from testsqoop; Empty set (0.01 sec)

3、使用Sqoop将Hive表中的数据导出到MySql数据库中(整个HDFS文件导出)

[hadoop@CloudDeskTop software]$ sqoop-export --help

17/12/30 21:54:38 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6 usage: sqoop export [GENERIC-ARGS] [TOOL-ARGS] Common arguments: --connect Specify JDBC connect string --connection-manager <class-name> Specify connection manager class name --connection-param-file Specify connection parameters file --driver <class-name> Manually specify JDBC driver class to use --hadoop-home Override $HADOOP_MAPRED_HOME_ARG --hadoop-mapred-home Override $HADOOP_MAPRED_HOME_ARG --help Print usage instructions -P Read password from console --password Set authentication password --password-alias Credential provider password alias --password-file Set authentication password file path --relaxed-isolation Use read-uncommitted isolation for imports --skip-dist-cache Skip copying jars to distributed cache --username Set authentication username --verbose Print more information while working Export control arguments: --batch Indicates underlying statements to be executed in batch mode --call Populate the table using this stored procedure (one call per row) --clear-staging-table Indicates that any data in staging table can be deleted --columns Columns to export to table --direct Use direct export fast path --export-dir HDFS source path for the export -m,--num-mappers Use 'n' map tasks to export in parallel --mapreduce-job-name Set name for generated mapreduce job --staging-table Intermediate staging table --table Table to populate --update-key Update records by specified key column --update-mode Specifies how updates are performed when new rows are found with non-matching keys in database --validate Validate the copy using the configured validator --validation-failurehandler Fully qualified class name for ValidationFa ilureHandler --validation-threshold Fully qualified class name for ValidationTh reshold --validator Fully qualified class name for the Validator Input parsing arguments: --input-enclosed-by <char> Sets a required field encloser --input-escaped-by <char> Sets the input escape character --input-fields-terminated-by <char> Sets the input field separator --input-lines-terminated-by <char> Sets the input end-of-line char --input-optionally-enclosed-by <char> Sets a field enclosing character Output line formatting arguments: --enclosed-by <char> Sets a required field enclosing character --escaped-by <char> Sets the escape character --fields-terminated-by <char> Sets the field separator character --lines-terminated-by <char> Sets the end-of-line character --mysql-delimiters Uses MySQL's default delimiter set: fields: , lines: \n escaped-by: \ optionally-enclosed-by: ' --optionally-enclosed-by <char> Sets a field enclosing character Code generation arguments: --bindir Output directory for compiled objects --class-name Sets the generated class name. This overrides --package-name. When combined with --jar-file, sets the input class. --input-null-non-string <null-str> Input null non-string representation --input-null-string <null-str> Input null string representation --jar-file Disable code generation; use specified jar --map-column-java Override mapping for specific columns to java types --null-non-string <null-str> Null non-string representation --null-string <null-str> Null string representation --outdir Output directory for generated code --package-name Put auto-generated classes in this package HCatalog arguments: --hcatalog-database HCatalog database name --hcatalog-home Override $HCAT_HOME --hcatalog-partition-keys Sets the partition keys to use when importing to hive --hcatalog-partition-values Sets the partition values to use when importing to hive --hcatalog-table HCatalog table name --hive-home Override $HIVE_HOME --hive-partition-key Sets the partition key to use when importing to hive --hive-partition-value Sets the partition value to use when importing to hive --map-column-hive Override mapping for specific column to hive types. Generic Hadoop command-line arguments: (must preceed any tool-specific arguments) Generic options supported are -conf specify an application configuration file -D use value for given property -fs specify a namenode -jt specify a ResourceManager -files specify comma separated files to be copied to the map reduce cluster -libjars specify comma separated jar files to include in the classpath. -archives specify comma separated archives to be unarchived on the compute machines. The general command line syntax is bin/hadoop command [genericOptions] [commandOptions] At minimum, you must specify --connect, --export-dir, and --table

View Code

#-m是指定map任务的个数

[hadoop@CloudDeskTop software]$ sqoop-export --export-dir '/user/hive/warehouse/mmzs.db/testsqoop' --fields-terminated-by '\t' --lines-terminated-by '\n' --connect 'jdbc:mysql://192.168.154.134:3306/mmzs' --username 'root' --password '123456' --table 'testsqoop' -m 2

[hadoop@CloudDeskTop software]$ sqoop-export --export-dir '/user/hive/warehouse/mmzs.db/testsqoop' --fields-terminated-by '\t' --lines-terminated-by '\n' --connect 'jdbc:mysql://192.168.154.134:3306/mmzs' --username 'root' --password '123456' --table 'testsqoop' -m 2 17/12/30 22:02:04 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6 17/12/30 22:02:04 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 17/12/30 22:02:04 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 17/12/30 22:02:04 INFO tool.CodeGenTool: Beginning code generation 17/12/30 22:02:05 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `testsqoop` AS t LIMIT 1 17/12/30 22:02:05 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `testsqoop` AS t LIMIT 1 17/12/30 22:02:05 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /software/hadoop-2.7.3 注: /tmp/sqoop-hadoop/compile/e2b7e669ef4d8d43016e44ce1cddb620/testsqoop.java使用或覆盖了已过时的 API。注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。 17/12/30 22:02:11 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/e2b7e669ef4d8d43016e44ce1cddb620/testsqoop.jar 17/12/30 22:02:11 INFO mapreduce.ExportJobBase: Beginning export of testsqoop SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/software/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/software/hbase-1.2.6/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 17/12/30 22:02:11 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 17/12/30 22:02:13 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative 17/12/30 22:02:13 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative 17/12/30 22:02:13 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 17/12/30 22:02:22 INFO input.FileInputFormat: Total input paths to process : 1 17/12/30 22:02:22 INFO input.FileInputFormat: Total input paths to process : 1 17/12/30 22:02:23 INFO mapreduce.JobSubmitter: number of splits:2 17/12/30 22:02:23 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative 17/12/30 22:02:24 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1514638990227_0001 17/12/30 22:02:25 INFO impl.YarnClientImpl: Submitted application application_1514638990227_0001 17/12/30 22:02:25 INFO mapreduce.Job: The url to track the job: http://master01:8088/proxy/application_1514638990227_0001/ 17/12/30 22:02:25 INFO mapreduce.Job: Running job: job_1514638990227_0001 17/12/30 22:03:13 INFO mapreduce.Job: Job job_1514638990227_0001 running in uber mode : false 17/12/30 22:03:13 INFO mapreduce.Job: map 0% reduce 0% 17/12/30 22:03:58 INFO mapreduce.Job: map 100% reduce 0% 17/12/30 22:03:59 INFO mapreduce.Job: Job job_1514638990227_0001 completed successfully 17/12/30 22:03:59 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=277282 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=484 HDFS: Number of bytes written=0 HDFS: Number of read operations=8 HDFS: Number of large read operations=0 HDFS: Number of write operations=0 Job Counters Launched map tasks=2 Data-local map tasks=2 Total time spent by all maps in occupied slots (ms)=79918 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=79918 Total vcore-milliseconds taken by all map tasks=79918 Total megabyte-milliseconds taken by all map tasks=81836032 Map-Reduce Framework Map input records=10 Map output records=10 Input split bytes=286 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=386 CPU time spent (ms)=4950 Physical memory (bytes) snapshot=216600576 Virtual memory (bytes) snapshot=1697566720 Total committed heap usage (bytes)=32874496 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=0 17/12/30 22:03:59 INFO mapreduce.ExportJobBase: Transferred 484 bytes in 105.965 seconds (4.5675 bytes/sec) 17/12/30 22:03:59 INFO mapreduce.ExportJobBase: Exported 10 records.

运行截图

小结：从运行过程可以看出只有Map任务，没有Reduce任务。

4、在mysql数据库再次查询结果

mysql> select * from testsqoop; +------+---------------+------+ | uid | uname | age | +------+---------------+------+ | 1 | ligang | 2 | | 2 | chenghua | 3 | | 3 | liqin | 1 | | 4 | zhanghua | 4 | | 5 | wanghua | 1 | | 6 | liulinjin | 5 | | 7 | wangxiaochuan | 6 | | 8 | guchuan | 2 | | 9 | xiaoyong | 4 | | 10 | huping | 6 | +------+---------------+------+ 10 rows in set (0.00 sec)

从结果可以证明数据导出到mysql数据库成功。

目标二、将mysql的数据导入到hdfs集群中

1、删除hive中mmzs数据库的testsqoop表的数据

确认真的删除了：

2、将mysql中的数据导入到hdfs群

A、指定部分查询数据导入到集群众

[hadoop@CloudDeskTop software]$ sqoop-import --append --connect 'jdbc:mysql://192.168.154.134:3306/mmzs' --username 'root' --password '123456' --query 'select * from mmzs.testsqoop where uid>3 and $CONDITIONS' -m 1 --target-dir '/user/hive/warehouse/mmzs.db/testsqoop' --fields-terminated-by '\t' --lines-terminated-by '\n'

[hadoop@CloudDeskTop software]$ sqoop-import --append --connect 'jdbc:mysql://192.168.154.134:3306/mmzs' --username 'root' --password '123456' --query 'select * from mmzs.testsqoop where uid>3 and $CONDITIONS' -m 1 --target-dir '/user/hive/warehouse/mmzs.db/testsqoop' --fields-terminated-by '\t' --lines-terminated-by '\n' 17/12/30 22:40:54 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6 17/12/30 22:40:54 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 17/12/30 22:40:55 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 17/12/30 22:40:55 INFO tool.CodeGenTool: Beginning code generation 17/12/30 22:40:55 INFO manager.SqlManager: Executing SQL statement: select * from mmzs.testsqoop where uid>3 and (1 = 0) 17/12/30 22:40:55 INFO manager.SqlManager: Executing SQL statement: select * from mmzs.testsqoop where uid>3 and (1 = 0) 17/12/30 22:40:55 INFO manager.SqlManager: Executing SQL statement: select * from mmzs.testsqoop where uid>3 and (1 = 0) 17/12/30 22:40:55 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /software/hadoop-2.7.3 注: /tmp/sqoop-hadoop/compile/cd00e059648175875074eed7f4189e0b/QueryResult.java使用或覆盖了已过时的 API。注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。 17/12/30 22:40:58 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/cd00e059648175875074eed7f4189e0b/QueryResult.jar 17/12/30 22:40:58 INFO mapreduce.ImportJobBase: Beginning query import. SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/software/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/software/hbase-1.2.6/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 17/12/30 22:40:59 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 17/12/30 22:41:01 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 17/12/30 22:41:08 INFO db.DBInputFormat: Using read commited transaction isolation 17/12/30 22:41:09 INFO mapreduce.JobSubmitter: number of splits:1 17/12/30 22:41:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1514638990227_0003 17/12/30 22:41:10 INFO impl.YarnClientImpl: Submitted application application_1514638990227_0003 17/12/30 22:41:10 INFO mapreduce.Job: The url to track the job: http://master01:8088/proxy/application_1514638990227_0003/ 17/12/30 22:41:10 INFO mapreduce.Job: Running job: job_1514638990227_0003 17/12/30 22:41:54 INFO mapreduce.Job: Job job_1514638990227_0003 running in uber mode : false 17/12/30 22:41:54 INFO mapreduce.Job: map 0% reduce 0% 17/12/30 22:42:29 INFO mapreduce.Job: map 100% reduce 0% 17/12/30 22:42:31 INFO mapreduce.Job: Job job_1514638990227_0003 completed successfully 17/12/30 22:42:32 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=138692 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=87 HDFS: Number of bytes written=94 HDFS: Number of read operations=4 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Other local map tasks=1 Total time spent by all maps in occupied slots (ms)=32275 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=32275 Total vcore-milliseconds taken by all map tasks=32275 Total megabyte-milliseconds taken by all map tasks=33049600 Map-Reduce Framework Map input records=7 Map output records=7 Input split bytes=87 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=170 CPU time spent (ms)=2020 Physical memory (bytes) snapshot=109428736 Virtual memory (bytes) snapshot=851021824 Total committed heap usage (bytes)=19091456 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=94 17/12/30 22:42:32 INFO mapreduce.ImportJobBase: Transferred 94 bytes in 91.0632 seconds (1.0322 bytes/sec) 17/12/30 22:42:32 INFO mapreduce.ImportJobBase: Retrieved 7 records. 17/12/30 22:42:32 INFO util.AppendUtils: Appending to directory testsqoop

View Code

在集群中查询是否真的导入了数据：

在hive数据库中中查询是否真的导入了数据：

从结果可以证明数据导入到hdfs集群成功。

删除集群数据，方便下次导入操作：

[hadoop@master01 software]$ hdfs dfs -rm -r /user/hive/warehouse/mmzs.db/testsqoop/part-m-00000

B、指定一张表，整个表的数据一起导入到集群中

sqoop-import --append --connect 'jdbc:mysql://192.168.154.134:3306/mmzs' --username 'root' --password '123456' --table testsqoop -m 1 --target-dir '/user/hive/warehouse/mmzs.db/testsqoop/' --fields-terminated-by '\t' --lines-terminated-by '\n'

[hadoop@CloudDeskTop software]$ sqoop-import --append --connect 'jdbc:mysql://192.168.154.134:3306/mmzs' --username 'root' --password '123456' --table testsqoop -m 1 --target-dir '/user/hive/warehouse/mmzs.db/testsqoop/' --fields-terminated-by '\t' --lines-terminated-by '\n' 17/12/30 22:28:31 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6 17/12/30 22:28:31 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 17/12/30 22:28:32 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 17/12/30 22:28:32 INFO tool.CodeGenTool: Beginning code generation 17/12/30 22:28:33 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `testsqoop` AS t LIMIT 1 17/12/30 22:28:33 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `testsqoop` AS t LIMIT 1 17/12/30 22:28:33 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /software/hadoop-2.7.3 注: /tmp/sqoop-hadoop/compile/d427f3a0d1a3328c5dc9ae1bd6cbd988/testsqoop.java使用或覆盖了已过时的 API。注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。 17/12/30 22:28:36 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/d427f3a0d1a3328c5dc9ae1bd6cbd988/testsqoop.jar 17/12/30 22:28:36 WARN manager.MySQLManager: It looks like you are importing from mysql. 17/12/30 22:28:36 WARN manager.MySQLManager: This transfer can be faster! Use the --direct 17/12/30 22:28:36 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path. 17/12/30 22:28:36 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql) 17/12/30 22:28:36 INFO mapreduce.ImportJobBase: Beginning import of testsqoop SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/software/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/software/hbase-1.2.6/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 17/12/30 22:28:36 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 17/12/30 22:28:38 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 17/12/30 22:28:45 INFO db.DBInputFormat: Using read commited transaction isolation 17/12/30 22:28:45 INFO mapreduce.JobSubmitter: number of splits:1 17/12/30 22:28:46 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1514638990227_0002 17/12/30 22:28:46 INFO impl.YarnClientImpl: Submitted application application_1514638990227_0002 17/12/30 22:28:47 INFO mapreduce.Job: The url to track the job: http://master01:8088/proxy/application_1514638990227_0002/ 17/12/30 22:28:47 INFO mapreduce.Job: Running job: job_1514638990227_0002 17/12/30 22:29:29 INFO mapreduce.Job: Job job_1514638990227_0002 running in uber mode : false 17/12/30 22:29:29 INFO mapreduce.Job: map 0% reduce 0% 17/12/30 22:30:06 INFO mapreduce.Job: map 100% reduce 0% 17/12/30 22:30:07 INFO mapreduce.Job: Job job_1514638990227_0002 completed successfully 17/12/30 22:30:08 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=138842 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=87 HDFS: Number of bytes written=128 HDFS: Number of read operations=4 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Other local map tasks=1 Total time spent by all maps in occupied slots (ms)=33630 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=33630 Total vcore-milliseconds taken by all map tasks=33630 Total megabyte-milliseconds taken by all map tasks=34437120 Map-Reduce Framework Map input records=10 Map output records=10 Input split bytes=87 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=177 CPU time spent (ms)=2490 Physical memory (bytes) snapshot=109060096 Virtual memory (bytes) snapshot=850882560 Total committed heap usage (bytes)=18972672 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=128 17/12/30 22:30:08 INFO mapreduce.ImportJobBase: Transferred 128 bytes in 89.4828 seconds (1.4304 bytes/sec) 17/12/30 22:30:08 INFO mapreduce.ImportJobBase: Retrieved 10 records. 17/12/30 22:30:08 INFO util.AppendUtils: Appending to directory testsqoop

运行结果

在集群中查询是否真的导入了数据：

在hive数据库中中查询是否真的导入了数据：

从结果可以证明数据导入到hdfs集群成功。

sqoop

hadoop

数据库

mysql

oracle

hdfs

mapreduce

并发

shell

写下你的评论吧 !

吐个槽吧,看都看了

会员登录 | 用户注册

推荐阅读

byte
HBase运维工具全解析

本文深入探讨了HBase常用的运维工具，详细介绍了每种工具的功能、使用场景及操作示例。对于HBase的开发人员和运维工程师来说，这些工具是日常管理和故障排查的重要手段。 ... [详细]

蜡笔小新   2024-12-24 17:00:59

php
Hadoop发行版本选择指南：技术解析与应用实践

本文详细介绍了Hadoop的不同发行版本及其特点，帮助读者根据实际需求选择最合适的Hadoop版本。内容涵盖Apache Hadoop、Cloudera CDH等主流版本的特性及应用场景。 ... [详细]

蜡笔小新   2024-12-22 20:38:12

java
大数据领域的职业路径与角色解析

本文将深入探讨大数据领域的各种职业和工作角色，帮助读者全面了解大数据行业的需求、市场趋势，以及从入门到高级专业人士的职业发展路径。文章还将详细介绍不同公司对大数据人才的需求，并解析各岗位的具体职责、所需技能和经验。 ... [详细]

蜡笔小新   2024-11-16 08:54:03

php
从0到1搭建大数据平台

从0到1搭建大数据平台 ... [详细]

蜡笔小新   2024-11-12 15:26:03

php
hadoop基础----hadoop实战(六)-----hadoop管理工具---Cloudera Manager---CDH介绍

我们在之前的文章中已经初步介绍了Cloudera。hadoop基础----hadoop实战(零)-----hadoop的平台版本选择从版本选择这篇文章中我们了解到除了hadoop官方版本外很多 ... [详细]

蜡笔小新   2023-10-16 14:21:13

php
Apache Spark 基础操作指南

本文详细介绍如何使用 Apache Spark 执行基本任务，包括启动 Spark Shell、运行示例程序以及编写简单的 WordCount 程序。同时提供了参数配置的注意事项和优化建议。 ... [详细]

蜡笔小新   2024-12-20 18:01:20

schema
databasesync适配openGauss使用指导书

一、database-sync简介database-sync作为一种开源辅助工具，用于数据库之间的表同步，更确切的说法是复制，可以从一个数据库复制表到另一个数据库该工具支持的功能如 ... [详细]

蜡笔小新   2024-12-02 18:31:18

int
深入浅出：Hadoop架构详解

Hadoop作为大数据处理的核心技术，包含了一系列组件如HDFS（分布式文件系统）、YARN（资源管理框架）和MapReduce（并行计算模型）。本文将通过实例解析Hadoop的工作原理及其优势。 ... [详细]

蜡笔小新   2024-11-26 13:26:40

int
Hadoop MapReduce 实战案例：手机流量使用统计分析

本文通过一个具体的Hadoop MapReduce案例，详细介绍了如何利用MapReduce框架来统计和分析手机用户的流量使用情况，包括上行和下行流量的计算以及总流量的汇总。 ... [详细]

蜡笔小新   2024-11-23 20:11:23

cmd
Python 实现监控与运维自动化方案

本文探讨了使用Python实现监控信息收集的方法，涵盖从基础的日志记录到复杂的系统运维解决方案，旨在帮助开发者和运维人员提升工作效率。 ... [详细]

蜡笔小新   2024-11-23 11:25:14

int
日志处理流程：Flume+MapReduce+Hive+Sqoop+MySQL

本文介绍了如何使用Flume从Linux文件系统收集日志并存储到HDFS，然后通过MapReduce清洗数据，使用Hive进行数据分析，并最终通过Sqoop将结果导出到MySQL数据库。 ... [详细]

蜡笔小新   2024-11-13 18:47:34

config
Hadoop 2.6 日志文件解析与MapReduce日志管理深入探讨

Hadoop 2.6 主要由 HDFS 和 YARN 两大部分组成，其中 YARN 包含了运行在 ResourceManager 的 JVM 中的组件以及在 NodeManager 中运行的部分。本文深入探讨了 Hadoop 2.6 日志文件的解析方法，并详细介绍了 MapReduce 日志管理的最佳实践，旨在帮助用户更好地理解和优化日志处理流程，提高系统运维效率。 ... [详细]

蜡笔小新   2024-11-03 16:23:38

jar
Sqoop-1.99.7安装配置（详细图文）

环境：centos6.5，hadoop2.6.4集群1.解压安装sqoop从官网下载好安装包，发送到集群中任意一台主机即可。相信大家已经看到，1.99.7跟1.4.6是不兼容的，而 ... [详细]

蜡笔小新   2024-10-09 16:41:01

jar
本文_大数据之非常详细Sqoop安装和基本操作

篇首语：本文由编程笔记#小编为大家整理，主要介绍了大数据之非常详细Sqoop安装和基本操作相关的知识，希望对你有一定的参考价值。大数据大数据之 ... [详细]

蜡笔小新   2023-10-15 15:25:37

jar
马蜂窝数据总监分享：从数仓到数据中台，大数据演进技术选型最优解

大家好，今天分享的议题主要包括几大内容：带大家回顾一下大数据在国内的发展，从传统数仓到当前数据中台的演进过程；我个人认为数 ... [详细]

蜡笔小新   2023-10-14 14:20:07

手机用户2502908275

这个家伙很懒，什么也没留下！

Tags | 热门标签

ip

testing

php5

dagger

yaml

search

java

dockerfile

replace

heatmap

hash

text

process

python2

jar

request

config

format

bitmap

range

python

cmd

bash

char

golang

php

byte

schema

dll

int

RankList | 热门文章

1win10远程桌面连接需要网络级别身份验证的完美解决方法！

2[变革] 杨元庆内部会猛批联想移动业务：你们太慢了！榔头都敲不醒！

3asp.net web 如何调试？

4rtx4090上市时间

5人类传说中的吸血鬼种族的由来与发展

6好听的彩铃铃声

7接口与抽象类概念

8com.google.common.collect.ImmutableList.copyFromCollection()方法的使用及代码示例

9DB2 purescale VS Oracle的RAC

10源码解析Nacos配置中心

11《Effective STL》学习笔记：深入理解STL（第三部分）

12Dreamweaver 支持Jquery智能提示

13一个完整的HTTPS请求过程

14开发笔记:Memcached高性能内存对象缓存系统

15Liunx下Maven私服的搭建