热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

sqoop安装及使用

简介:sqoop是一款用于hadoop和关系型数据库之间数据导入导出的工具。你可以通过sqoop把数据从数据库(比如mysql,oracle)导入到hdfs中;也可以把数据从hdfs中导出到关

简介:

  sqoop是一款用于hadoop和关系型数据库之间数据导入导出的工具。你可以通过sqoop把数据从数据库(比如mysql,oracle)导入到hdfs中;也可以把数据从hdfs中导出到关系型数据库中。通过将sqoop的操作命令转化为Hadoop的MapReduce作业进行导入导出,(通常只涉及到Map任务)即sqoop生成的Job主要是并发运行MapTask实现数据并行传输以提升数据传送速度和效率,如果使用Shell脚本来实现多线程数据传送则存在很大的难度Sqoop2(sqoop1.99.7)需要在Hadoop安装目录下的配置文件中设置代理,属于重量级嵌入安装,文中我们使用qoop1(Sqoop1.4.6)。

 

前提:(若不知道如何安装请看我前面写的hadoop分类的文章)

CloudDeskTop上安装了: hadoop-2.7.3  jdk1.7.0_79  mysql-5.5.32 sqoop-1.4.6 hive-1.2.2
master01和master02安装了: hadoop-2.7.3 jdk1.7.0_79
slave01、slave02、slave03安装了: hadoop-2.7.3 jdk1.7.0_79 zookeeper-3.4.10

一、安装:

 1、上传安装包sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz到/install/目录下

2、解压:

[hadoop@CloudDeskTop install]$ tar -zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz -C /software/

3、配置环境:

[hadoop@CloudDeskTop software]$ su -lc "vi /etc/profile"

JAVA_HOME=/software/jdk1.7.0_79
HADOOP_HOME=/software/hadoop-2.7.3
SQOOP_HOME=/software/sqoop-1.4.6
PATH=$PATH:$JAVA_HOME/bin:$JAVA_HOME/lib:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SQOOP_HOME/bin
export PATH JAVA_HOME HADOOP_HOME SQOOP_HOME

4、配置完环境后,执行如下语句,立即生效配置文件:

[hadoop@CloudDeskTop software]$ source /etc/profile

5、进入/software/sqoop-1.4.6/lib/目录,上传mysql-connector-java-5.1.43-bin.jar包

这个地方的数据库驱动包必须选择该版本(5.1.43),因为Sqoop需要对接MySql数据库,如果选择的数据库驱动包不是这个版本,很容易出错。

6、配置sqoop

[hadoop@CloudDeskTop software]$ cd /software/sqoop-1.4.6/bin/

[hadoop@CloudDeskTop bin]$ vi configure-sqoop

注释掉如下代码:用这个符号“:<”作为起始符,“COMMENT”作为结束符;

127 :<<COMMENT
128 ## Moved to be a runtime check in sqoop.
129 if [ ! -d "${HBASE_HOME}" ]; then
130   echo "Warning: $HBASE_HOME does not exist! HBase imports will fail."
131   echo 'Please set $HBASE_HOME to the root of your HBase installation.'
132 fi
133 
134 ## Moved to be a runtime check in sqoop.
135 if [ ! -d "${HCAT_HOME}" ]; then
136   echo "Warning: $HCAT_HOME does not exist! HCatalog jobs will fail."
137   echo 'Please set $HCAT_HOME to the root of your HCatalog installation.'
138 fi
139 
140 if [ ! -d "${ACCUMULO_HOME}" ]; then
141   echo "Warning: $ACCUMULO_HOME does not exist! Accumulo imports will fail."
142   echo 'Please set $ACCUMULO_HOME to the root of your Accumulo installation.'
143 fi
144 if [ ! -d "${ZOOKEEPER_HOME}" ]; then
145   echo "Warning: $ZOOKEEPER_HOME does not exist! Accumulo imports will fail."
146   echo 'Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.'
147 fi
148 COMMENT
View Code

 

二、启动(没说明的都默认是在hadoop用户下操作)

【0、在CloudDeskTop的root用户下启动mysql】

[root@CloudDeskTop ~]# cd /software/mysql-5.5.32/sbin/ && ./mysqld start && lsof -i:3306 && cd -

【1、在slave节点启动zookeeper集群(小弟中选个leader和follower)】

  cd /software/zookeeper-3.4.10/bin/ && ./zkServer.sh start && cd - && jps
  cd /software/zookeeper-3.4.10/bin/ && ./zkServer.sh status && cd -

【2、master01启动HDFS集群】cd /software/ && start-dfs.sh && jps

【3、master01启动YARN集群】cd /software/ && start-yarn.sh && jps

【YARN集群启动时,不会把另外一个备用主节点的YARN集群拉起来启动,所以在master02执行语句:】

cd /software/ && yarn-daemon.sh start resourcemanager && jps

 【4、查看进程】

 

【6、查询sqoop版本来判断sqoop是否安装成功】

 [hadoop@CloudDeskTop software]$ sqoop version

 

 

三、测试

  说明:导入与导出操作的方向是以HDFS集群为基准参考点来定义的,如果数据从HDFS集群流出则表示导出,如果数据流入HDFS集群则表示导入Hive表中的数据实际上是存储到HDFS集群中的,因此对Hive表的导入与导出实际上都是在操作HDFS集群中的文件。

首先,在本地创建数据:

在hive数据库建表后上传到集群中表存放数据的路径下:

[hadoop@CloudDeskTop test]$ hdfs dfs -put testsqoop.out /user/hive/warehouse/mmzs.db/testsqoop

 目标一、将hdfs集群的数据导入到mysql数据库中

1、在hive数据库mmzs中创建表,并导入数据

[hadoop@CloudDeskTop software]$ cd /software/hive-1.2.2/bin/
[hadoop@CloudDeskTop bin]$ ./hive
hive> show databases;
OK
default
mmzs
mmzsmysql
Time taken: 0.373 seconds, Fetched: 3 row(s)
hive> create table if not exists mmzs.testsqoop(id int,name string,age int) row format delimited fields terminated by '\t';
OK
Time taken: 0.126 seconds
hive> select * from mmzs.testsqoop;
OK
1    ligang    2
2    chenghua    3
3    liqin    1
4    zhanghua    4
5    wanghua    1
6    liulinjin    5
7    wangxiaochuan    6
8    guchuan    2
9    xiaoyong    4
10    huping    6
Time taken: 0.824 seconds, Fetched: 10 row(s)

 2、在mysql数据库中创建相同字段的表

[root@CloudDeskTop bin]# cd ~
[root@CloudDeskTop ~]# cd /software/mysql-5.5.32/bin/
[root@CloudDeskTop bin]# ./mysql -uroot -p123456 -P3306 -h192.168.154.134 -e "create database mmzs character set utf8"
[root@CloudDeskTop bin]# ./mysql -uroot -p123456 -h192.168.154.134 -P3306 -Dmmzs
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 12
Server version: 5.5.32 Source distribution

Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show tables;
Empty set (0.00 sec)

mysql> create table if not exists testsqoop(uid int(11),uname varchar(30),age int)engine=innodb charset=utf8
    -> ;
Query OK, 0 rows affected (0.06 sec)

mysql> desc testsqoop;
+-------+-------------+------+-----+---------+-------+
| Field | Type        | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| uid   | int(11)     | YES  |     | NULL    |       |
| uname | varchar(30) | YES  |     | NULL    |       |
| age   | int(11)     | YES  |     | NULL    |       |
+-------+-------------+------+-----+---------+-------+
3 rows in set (0.00 sec)

mysql> select * from testsqoop;
Empty set (0.01 sec)

3、使用Sqoop将Hive表中的数据导出到MySql数据库中(整个HDFS文件导出)

[hadoop@CloudDeskTop software]$ sqoop-export --help

17/12/30 21:54:38 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
usage: sqoop export [GENERIC-ARGS] [TOOL-ARGS]

Common arguments:
   --connect                          Specify JDBC connect
                                                string
   --connection-manager <class-name>            Specify connection manager
                                                class name
   --connection-param-file     Specify connection
                                                parameters file
   --driver <class-name>                        Manually specify JDBC
                                                driver class to use
   --hadoop-home                          Override
                                                $HADOOP_MAPRED_HOME_ARG
   --hadoop-mapred-home                    Override
                                                $HADOOP_MAPRED_HOME_ARG
   --help                                       Print usage instructions
-P                                              Read password from console
   --password                         Set authentication
                                                password
   --password-alias             Credential provider
                                                password alias
   --password-file               Set authentication
                                                password file path
   --relaxed-isolation                          Use read-uncommitted
                                                isolation for imports
   --skip-dist-cache                            Skip copying jars to
                                                distributed cache
   --username                         Set authentication
                                                username
   --verbose                                    Print more information
                                                while working

Export control arguments:
   --batch                                                    Indicates
                                                              underlying
                                                              statements
                                                              to be
                                                              executed in
                                                              batch mode
   --call                                                Populate the
                                                              table using
                                                              this stored
                                                              procedure
                                                              (one call
                                                              per row)
   --clear-staging-table                                      Indicates
                                                              that any
                                                              data in
                                                              staging
                                                              table can be
                                                              deleted
   --columns                                  Columns to
                                                              export to
                                                              table
   --direct                                                   Use direct
                                                              export fast
                                                              path
   --export-dir                                          HDFS source
                                                              path for the
                                                              export
-m,--num-mappers                                           Use 'n' map
                                                              tasks to
                                                              export in
                                                              parallel
   --mapreduce-job-name                                 Set name for
                                                              generated
                                                              mapreduce
                                                              job
   --staging-table                                Intermediate
                                                              staging
                                                              table
   --table                                        Table to
                                                              populate
   --update-key                                          Update
                                                              records by
                                                              specified
                                                              key column
   --update-mode                                        Specifies
                                                              how updates
                                                              are
                                                              performed
                                                              when new
                                                              rows are
                                                              found with
                                                              non-matching
                                                              keys in
                                                              database
   --validate                                                 Validate the
                                                              copy using
                                                              the
                                                              configured
                                                              validator
   --validation-failurehandler     Fully
                                                              qualified
                                                              class name
                                                              for
                                                              ValidationFa
                                                              ilureHandler
   --validation-threshold               Fully
                                                              qualified
                                                              class name
                                                              for
                                                              ValidationTh
                                                              reshold
   --validator                                     Fully
                                                              qualified
                                                              class name
                                                              for the
                                                              Validator

Input parsing arguments:
   --input-enclosed-by <char>               Sets a required field encloser
   --input-escaped-by <char>                Sets the input escape
                                            character
   --input-fields-terminated-by <char>      Sets the input field separator
   --input-lines-terminated-by <char>       Sets the input end-of-line
                                            char
   --input-optionally-enclosed-by <char>    Sets a field enclosing
                                            character

Output line formatting arguments:
   --enclosed-by <char>               Sets a required field enclosing
                                      character
   --escaped-by <char>                Sets the escape character
   --fields-terminated-by <char>      Sets the field separator character
   --lines-terminated-by <char>       Sets the end-of-line character
   --mysql-delimiters                 Uses MySQL's default delimiter set:
                                      fields: ,  lines: \n  escaped-by: \
                                      optionally-enclosed-by: '
   --optionally-enclosed-by <char>    Sets a field enclosing character

Code generation arguments:
   --bindir                         Output directory for compiled
                                         objects
   --class-name                    Sets the generated class name.
                                         This overrides --package-name.
                                         When combined with --jar-file,
                                         sets the input class.
   --input-null-non-string <null-str>    Input null non-string
                                         representation
   --input-null-string <null-str>        Input null string representation
   --jar-file                      Disable code generation; use
                                         specified jar
   --map-column-java                Override mapping for specific
                                         columns to java types
   --null-non-string <null-str>          Null non-string representation
   --null-string <null-str>              Null string representation
   --outdir                         Output directory for generated
                                         code
   --package-name                  Put auto-generated classes in
                                         this package

HCatalog arguments:
   --hcatalog-database                         HCatalog database name
   --hcatalog-home                            Override $HCAT_HOME
   --hcatalog-partition-keys         Sets the partition
                                                    keys to use when
                                                    importing to hive
   --hcatalog-partition-values     Sets the partition
                                                    values to use when
                                                    importing to hive
   --hcatalog-table                            HCatalog table name
   --hive-home                                 Override $HIVE_HOME
   --hive-partition-key              Sets the partition key
                                                    to use when importing
                                                    to hive
   --hive-partition-value          Sets the partition
                                                    value to use when
                                                    importing to hive
   --map-column-hive                           Override mapping for
                                                    specific column to
                                                    hive types.

Generic Hadoop command-line arguments:
(must preceed any tool-specific arguments)
Generic options supported are
-conf      specify an application configuration file
-D             use value for given property
-fs       specify a namenode
-jt     specify a ResourceManager
-files     specify comma separated files to be copied to the map reduce cluster
-libjars     specify comma separated jar files to include in the classpath.
-archives     specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]


At minimum, you must specify --connect, --export-dir, and --table
View Code

#-m是指定map任务的个数

[hadoop@CloudDeskTop software]$ sqoop-export --export-dir '/user/hive/warehouse/mmzs.db/testsqoop' --fields-terminated-by '\t' --lines-terminated-by '\n' --connect 'jdbc:mysql://192.168.154.134:3306/mmzs' --username 'root' --password '123456' --table 'testsqoop' -m 2
[hadoop@CloudDeskTop software]$ sqoop-export --export-dir '/user/hive/warehouse/mmzs.db/testsqoop' --fields-terminated-by '\t' --lines-terminated-by '\n' --connect 'jdbc:mysql://192.168.154.134:3306/mmzs' --username 'root' --password '123456' --table 'testsqoop' -m 2
17/12/30 22:02:04 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
17/12/30 22:02:04 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
17/12/30 22:02:04 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
17/12/30 22:02:04 INFO tool.CodeGenTool: Beginning code generation
17/12/30 22:02:05 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `testsqoop` AS t LIMIT 1
17/12/30 22:02:05 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `testsqoop` AS t LIMIT 1
17/12/30 22:02:05 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /software/hadoop-2.7.3
注: /tmp/sqoop-hadoop/compile/e2b7e669ef4d8d43016e44ce1cddb620/testsqoop.java使用或覆盖了已过时的 API。
注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
17/12/30 22:02:11 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/e2b7e669ef4d8d43016e44ce1cddb620/testsqoop.jar
17/12/30 22:02:11 INFO mapreduce.ExportJobBase: Beginning export of testsqoop
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/software/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/software/hbase-1.2.6/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
17/12/30 22:02:11 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
17/12/30 22:02:13 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
17/12/30 22:02:13 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
17/12/30 22:02:13 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
17/12/30 22:02:22 INFO input.FileInputFormat: Total input paths to process : 1
17/12/30 22:02:22 INFO input.FileInputFormat: Total input paths to process : 1
17/12/30 22:02:23 INFO mapreduce.JobSubmitter: number of splits:2
17/12/30 22:02:23 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
17/12/30 22:02:24 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1514638990227_0001
17/12/30 22:02:25 INFO impl.YarnClientImpl: Submitted application application_1514638990227_0001
17/12/30 22:02:25 INFO mapreduce.Job: The url to track the job: http://master01:8088/proxy/application_1514638990227_0001/
17/12/30 22:02:25 INFO mapreduce.Job: Running job: job_1514638990227_0001
17/12/30 22:03:13 INFO mapreduce.Job: Job job_1514638990227_0001 running in uber mode : false
17/12/30 22:03:13 INFO mapreduce.Job:  map 0% reduce 0%
17/12/30 22:03:58 INFO mapreduce.Job:  map 100% reduce 0%
17/12/30 22:03:59 INFO mapreduce.Job: Job job_1514638990227_0001 completed successfully
17/12/30 22:03:59 INFO mapreduce.Job: Counters: 30
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=277282
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=484
        HDFS: Number of bytes written=0
        HDFS: Number of read operations=8
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=0
    Job Counters 
        Launched map tasks=2
        Data-local map tasks=2
        Total time spent by all maps in occupied slots (ms)=79918
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=79918
        Total vcore-milliseconds taken by all map tasks=79918
        Total megabyte-milliseconds taken by all map tasks=81836032
    Map-Reduce Framework
        Map input records=10
        Map output records=10
        Input split bytes=286
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=386
        CPU time spent (ms)=4950
        Physical memory (bytes) snapshot=216600576
        Virtual memory (bytes) snapshot=1697566720
        Total committed heap usage (bytes)=32874496
    File Input Format Counters 
        Bytes Read=0
    File Output Format Counters 
        Bytes Written=0
17/12/30 22:03:59 INFO mapreduce.ExportJobBase: Transferred 484 bytes in 105.965 seconds (4.5675 bytes/sec)
17/12/30 22:03:59 INFO mapreduce.ExportJobBase: Exported 10 records.
运行截图

小结:从运行过程可以看出只有Map任务,没有Reduce任务。

4、在mysql数据库再次查询结果

mysql> select * from testsqoop;
+------+---------------+------+
| uid  | uname         | age  |
+------+---------------+------+
|    1 | ligang        |    2 |
|    2 | chenghua      |    3 |
|    3 | liqin         |    1 |
|    4 | zhanghua      |    4 |
|    5 | wanghua       |    1 |
|    6 | liulinjin     |    5 |
|    7 | wangxiaochuan |    6 |
|    8 | guchuan       |    2 |
|    9 | xiaoyong      |    4 |
|   10 | huping        |    6 |
+------+---------------+------+
10 rows in set (0.00 sec)

从结果可以证明数据导出到mysql数据库成功。

 

 目标二、将mysql的数据导入到hdfs集群中

 1、删除hive中mmzs数据库的testsqoop表的数据

 

 确认真的删除了:

2、将mysql中的数据导入到hdfs群

A、指定部分查询数据导入到集群众

[hadoop@CloudDeskTop software]$ sqoop-import --append --connect 'jdbc:mysql://192.168.154.134:3306/mmzs' --username 'root' --password '123456' --query 'select * from mmzs.testsqoop where uid>3 and $CONDITIONS' -m 1 --target-dir '/user/hive/warehouse/mmzs.db/testsqoop' --fields-terminated-by '\t' --lines-terminated-by '\n'
[hadoop@CloudDeskTop software]$ sqoop-import --append --connect 'jdbc:mysql://192.168.154.134:3306/mmzs' --username 'root' --password '123456' --query 'select * from mmzs.testsqoop where uid>3 and $CONDITIONS' -m 1 --target-dir '/user/hive/warehouse/mmzs.db/testsqoop' --fields-terminated-by '\t' --lines-terminated-by '\n'
17/12/30 22:40:54 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
17/12/30 22:40:54 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
17/12/30 22:40:55 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
17/12/30 22:40:55 INFO tool.CodeGenTool: Beginning code generation
17/12/30 22:40:55 INFO manager.SqlManager: Executing SQL statement: select * from mmzs.testsqoop where uid>3 and  (1 = 0) 
17/12/30 22:40:55 INFO manager.SqlManager: Executing SQL statement: select * from mmzs.testsqoop where uid>3 and  (1 = 0) 
17/12/30 22:40:55 INFO manager.SqlManager: Executing SQL statement: select * from mmzs.testsqoop where uid>3 and  (1 = 0) 
17/12/30 22:40:55 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /software/hadoop-2.7.3
注: /tmp/sqoop-hadoop/compile/cd00e059648175875074eed7f4189e0b/QueryResult.java使用或覆盖了已过时的 API。
注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
17/12/30 22:40:58 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/cd00e059648175875074eed7f4189e0b/QueryResult.jar
17/12/30 22:40:58 INFO mapreduce.ImportJobBase: Beginning query import.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/software/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/software/hbase-1.2.6/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
17/12/30 22:40:59 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
17/12/30 22:41:01 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
17/12/30 22:41:08 INFO db.DBInputFormat: Using read commited transaction isolation
17/12/30 22:41:09 INFO mapreduce.JobSubmitter: number of splits:1
17/12/30 22:41:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1514638990227_0003
17/12/30 22:41:10 INFO impl.YarnClientImpl: Submitted application application_1514638990227_0003
17/12/30 22:41:10 INFO mapreduce.Job: The url to track the job: http://master01:8088/proxy/application_1514638990227_0003/
17/12/30 22:41:10 INFO mapreduce.Job: Running job: job_1514638990227_0003
17/12/30 22:41:54 INFO mapreduce.Job: Job job_1514638990227_0003 running in uber mode : false
17/12/30 22:41:54 INFO mapreduce.Job:  map 0% reduce 0%
17/12/30 22:42:29 INFO mapreduce.Job:  map 100% reduce 0%
17/12/30 22:42:31 INFO mapreduce.Job: Job job_1514638990227_0003 completed successfully
17/12/30 22:42:32 INFO mapreduce.Job: Counters: 30
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=138692
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=87
        HDFS: Number of bytes written=94
        HDFS: Number of read operations=4
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=1
        Other local map tasks=1
        Total time spent by all maps in occupied slots (ms)=32275
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=32275
        Total vcore-milliseconds taken by all map tasks=32275
        Total megabyte-milliseconds taken by all map tasks=33049600
    Map-Reduce Framework
        Map input records=7
        Map output records=7
        Input split bytes=87
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=170
        CPU time spent (ms)=2020
        Physical memory (bytes) snapshot=109428736
        Virtual memory (bytes) snapshot=851021824
        Total committed heap usage (bytes)=19091456
    File Input Format Counters 
        Bytes Read=0
    File Output Format Counters 
        Bytes Written=94
17/12/30 22:42:32 INFO mapreduce.ImportJobBase: Transferred 94 bytes in 91.0632 seconds (1.0322 bytes/sec)
17/12/30 22:42:32 INFO mapreduce.ImportJobBase: Retrieved 7 records.
17/12/30 22:42:32 INFO util.AppendUtils: Appending to directory testsqoop
View Code

在集群中查询是否真的导入了数据:

 

在hive数据库中中查询是否真的导入了数据:

从结果可以证明数据导入到hdfs集群成功。

删除集群数据,方便下次导入操作:

[hadoop@master01 software]$ hdfs dfs -rm -r /user/hive/warehouse/mmzs.db/testsqoop/part-m-00000

B、指定一张表,整个表的数据一起导入到集群中

sqoop-import --append --connect 'jdbc:mysql://192.168.154.134:3306/mmzs' --username 'root' --password '123456' --table testsqoop -m 1 --target-dir '/user/hive/warehouse/mmzs.db/testsqoop/' --fields-terminated-by '\t' --lines-terminated-by '\n'
[hadoop@CloudDeskTop software]$ sqoop-import --append --connect 'jdbc:mysql://192.168.154.134:3306/mmzs' --username 'root' --password '123456' --table testsqoop -m 1 --target-dir '/user/hive/warehouse/mmzs.db/testsqoop/' --fields-terminated-by '\t' --lines-terminated-by '\n'
17/12/30 22:28:31 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
17/12/30 22:28:31 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
17/12/30 22:28:32 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
17/12/30 22:28:32 INFO tool.CodeGenTool: Beginning code generation
17/12/30 22:28:33 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `testsqoop` AS t LIMIT 1
17/12/30 22:28:33 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `testsqoop` AS t LIMIT 1
17/12/30 22:28:33 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /software/hadoop-2.7.3
注: /tmp/sqoop-hadoop/compile/d427f3a0d1a3328c5dc9ae1bd6cbd988/testsqoop.java使用或覆盖了已过时的 API。
注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
17/12/30 22:28:36 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/d427f3a0d1a3328c5dc9ae1bd6cbd988/testsqoop.jar
17/12/30 22:28:36 WARN manager.MySQLManager: It looks like you are importing from mysql.
17/12/30 22:28:36 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
17/12/30 22:28:36 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
17/12/30 22:28:36 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
17/12/30 22:28:36 INFO mapreduce.ImportJobBase: Beginning import of testsqoop
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/software/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/software/hbase-1.2.6/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
17/12/30 22:28:36 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
17/12/30 22:28:38 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
17/12/30 22:28:45 INFO db.DBInputFormat: Using read commited transaction isolation
17/12/30 22:28:45 INFO mapreduce.JobSubmitter: number of splits:1
17/12/30 22:28:46 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1514638990227_0002
17/12/30 22:28:46 INFO impl.YarnClientImpl: Submitted application application_1514638990227_0002
17/12/30 22:28:47 INFO mapreduce.Job: The url to track the job: http://master01:8088/proxy/application_1514638990227_0002/
17/12/30 22:28:47 INFO mapreduce.Job: Running job: job_1514638990227_0002
17/12/30 22:29:29 INFO mapreduce.Job: Job job_1514638990227_0002 running in uber mode : false
17/12/30 22:29:29 INFO mapreduce.Job:  map 0% reduce 0%
17/12/30 22:30:06 INFO mapreduce.Job:  map 100% reduce 0%
17/12/30 22:30:07 INFO mapreduce.Job: Job job_1514638990227_0002 completed successfully
17/12/30 22:30:08 INFO mapreduce.Job: Counters: 30
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=138842
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=87
        HDFS: Number of bytes written=128
        HDFS: Number of read operations=4
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=1
        Other local map tasks=1
        Total time spent by all maps in occupied slots (ms)=33630
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=33630
        Total vcore-milliseconds taken by all map tasks=33630
        Total megabyte-milliseconds taken by all map tasks=34437120
    Map-Reduce Framework
        Map input records=10
        Map output records=10
        Input split bytes=87
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=177
        CPU time spent (ms)=2490
        Physical memory (bytes) snapshot=109060096
        Virtual memory (bytes) snapshot=850882560
        Total committed heap usage (bytes)=18972672
    File Input Format Counters 
        Bytes Read=0
    File Output Format Counters 
        Bytes Written=128
17/12/30 22:30:08 INFO mapreduce.ImportJobBase: Transferred 128 bytes in 89.4828 seconds (1.4304 bytes/sec)
17/12/30 22:30:08 INFO mapreduce.ImportJobBase: Retrieved 10 records.
17/12/30 22:30:08 INFO util.AppendUtils: Appending to directory testsqoop
运行结果

在集群中查询是否真的导入了数据:

在hive数据库中中查询是否真的导入了数据:

 

 从结果可以证明数据导入到hdfs集群成功。

 


推荐阅读
  • 怎么快速学好大数据开发?
    新如何学习大数据技术?大数据怎么入门?怎么做大数据分析?数据科学需要学习那些技术?大数据的应用前景等等问题,已成为热门大数据领域热门问题,以下是对新手如何学习大数据技术问题的解答! ... [详细]
  • 阅读目录一、Hadoop简介二、Hadoop的特性三、hadoop组成与体系结构四、Hadoop安装方式五、Hadoop集群中的节点类型一、Hadoop简介Hadoop是Apac ... [详细]
  • 我们在之前的文章中已经初步介绍了Cloudera。hadoop基础----hadoop实战(零)-----hadoop的平台版本选择从版本选择这篇文章中我们了解到除了hadoop官方版本外很多 ... [详细]
  • Java开发实战讲解!字节跳动三场技术面+HR面
    二、回顾整理阿里面试题基本就这样了,还有一些零星的问题想不起来了,答案也整理出来了。自我介绍JVM如何加载一个类的过程,双亲委派模型中有 ... [详细]
  • 本文_大数据之非常详细Sqoop安装和基本操作
    篇首语:本文由编程笔记#小编为大家整理,主要介绍了大数据之非常详细Sqoop安装和基本操作相关的知识,希望对你有一定的参考价值。大数据大数据之 ... [详细]
  • Hadoop——Hive简介和环境配置
    一、Hive的简介和配置1.简介Hive是构建在Hadoop之上的数据操作平台lHive是一个SQL解析引擎,它将SQL转译成MapReduce作业,并 ... [详细]
  • 大数据开发笔记(一):HDFS介绍
    ✨大数据开发笔记推荐:大数据开发面试知识点总结_GoAI的博客-CSDN博客_大数据开发面试​本文详细介绍大数据hadoop生态圈各部分知识,包括不限 ... [详细]
  • MySQL数据 实时同步到KafkaBinlog canal、Maxwell、Kafka Connect 实现MySQL增量同步
    一、需求分析早期业务借助Sqoop将Mysql中的数据同步到Hive、hdfs来进行数据分析,使用过程中也带来了一些问题:虽然Sqoop支持增量同步但还属于粗粒度的离线同步,无法满 ... [详细]
  • ubuntu用sqoop将数据从hive导入mysql时,命令: ... [详细]
  • 一次上线事故,30岁+的程序员踩坑经验之谈
    本文主要介绍了一位30岁+的程序员在一次上线事故中踩坑的经验之谈。文章提到了在双十一活动期间,作为一个在线医疗项目,他们进行了优惠折扣活动的升级改造。然而,在上线前的最后一天,由于大量数据请求,导致部分接口出现问题。作者通过部署两台opentsdb来解决问题,但读数据的opentsdb仍然经常假死。作者只能查询最近24小时的数据。这次事故给他带来了很多教训和经验。 ... [详细]
  • 不会搭建大数据平台,我被老板优化了...
    不会,搭建,大数,据,平台,我 ... [详细]
  • 数据仓库中基本概念
    一、数据仓库数据仓库(DataWarehouse)是一个面向主题的、集成的、稳定的且随时间变化的数据集合,用于支持管理人员的决策面向主题主题就是类型的意思。传统数 ... [详细]
  • 【数据结构与算法】——快速排序
    Sqoop是一款开源的工具,主要用于在Hadoop(Hive)与传统的数据库(mysql、postgresql)间进行数据的传递,可以将一个关系型数据库(例如:MySQL,O ... [详细]
  • 架构升级给DolphScheduler带来2~3倍性能提升
     引言大数据任务调度作为大数据建设中的核心基础设施,在经过社区用户们长期的使用中,不少用户对调度也提出了很多新的要求,为此,ApacheDolphinScheduler(Incub ... [详细]
  • 前言本文隶属于专栏《1000个问题搞定大数据技术体系》,该专栏为笔者原创,引用请注明来源,不足和错误之处请在评论区帮忙指出, ... [详细]
author-avatar
手机用户2502908275
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有