大数据教程（13.6）sqoop使用教程

2019独角兽企业重金招聘Python工程师标准>>>

上一章节&＃xff0c;介绍了sqoop数据迁移工具安装以及简单导入实例的相关知识&＃xff1b;本篇博客&＃xff0c;博主将继续为小伙伴们分享sqoop的使用。

一、sqoop数据导入

(1)、导入关系表到HIVE

./sqoop import --connect jdbc:mysql://centos-aaron-03:3306/test --username root --password 123456 --table emp --hive-import --m 1

执行报错

[hadoop&＃64;centos-aaron-h1 bin]$ ./sqoop import --connect jdbc:mysql://centos-aaron-03:3306/test --username root --password 123456 --table emp --hive-import --m 1 Warning: /home/hadoop/sqoop/bin/../../hbase does not exist! HBase imports will fail. Please set $HBASE_HOME to the root of your HBase installation. Warning: /home/hadoop/sqoop/bin/../../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /home/hadoop/sqoop/bin/../../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. Warning: /home/hadoop/sqoop/bin/../../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 19/03/18 18:46:49 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 19/03/18 18:46:49 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 19/03/18 18:46:49 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override 19/03/18 18:46:49 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc. 19/03/18 18:46:49 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 19/03/18 18:46:49 INFO tool.CodeGenTool: Beginning code generation 19/03/18 18:46:49 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM &＃96;emp&＃96; AS t LIMIT 1 19/03/18 18:46:49 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM &＃96;emp&＃96; AS t LIMIT 1 19/03/18 18:46:49 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/apps/hadoop-2.9.1 注: /tmp/sqoop-hadoop/compile/b0cd7f379424039f4df44ee2b703c3d0/emp.java使用或覆盖了已过时的 API。注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。 19/03/18 18:46:51 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/b0cd7f379424039f4df44ee2b703c3d0/emp.jar 19/03/18 18:46:51 WARN manager.MySQLManager: It looks like you are importing from mysql. 19/03/18 18:46:51 WARN manager.MySQLManager: This transfer can be faster! Use the --direct 19/03/18 18:46:51 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path. 19/03/18 18:46:51 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql) 19/03/18 18:46:51 INFO mapreduce.ImportJobBase: Beginning import of emp 19/03/18 18:46:51 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 19/03/18 18:46:52 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 19/03/18 18:46:52 INFO client.RMProxy: Connecting to ResourceManager at centos-aaron-h1/192.168.29.144:8032 19/03/18 18:46:54 INFO mapreduce.JobSubmitter: number of splits:1 19/03/18 18:46:54 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled 19/03/18 18:46:54 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552898029697_0003 19/03/18 18:46:54 INFO impl.YarnClientImpl: Submitted application application_1552898029697_0003 19/03/18 18:46:54 INFO mapreduce.Job: The url to track the job: http://centos-aaron-h1:8088/proxy/application_1552898029697_0003/ 19/03/18 18:46:54 INFO mapreduce.Job: Running job: job_1552898029697_0003 19/03/18 18:47:06 INFO mapreduce.Job: Job job_1552898029697_0003 running in uber mode : false 19/03/18 18:47:06 INFO mapreduce.Job: map 0% reduce 0% 19/03/18 18:47:13 INFO mapreduce.Job: map 100% reduce 0% 19/03/18 18:47:13 INFO mapreduce.Job: Job job_1552898029697_0003 completed successfully 19/03/18 18:47:13 INFO mapreduce.Job: Counters: 30File System CountersFILE: Number of bytes read&＃61;0FILE: Number of bytes written&＃61;206933FILE: Number of read operations&＃61;0FILE: Number of large read operations&＃61;0FILE: Number of write operations&＃61;0HDFS: Number of bytes read&＃61;87HDFS: Number of bytes written&＃61;151HDFS: Number of read operations&＃61;4HDFS: Number of large read operations&＃61;0HDFS: Number of write operations&＃61;2Job Counters Launched map tasks&＃61;1Other local map tasks&＃61;1Total time spent by all maps in occupied slots (ms)&＃61;3950Total time spent by all reduces in occupied slots (ms)&＃61;0Total time spent by all map tasks (ms)&＃61;3950Total vcore-milliseconds taken by all map tasks&＃61;3950Total megabyte-milliseconds taken by all map tasks&＃61;4044800Map-Reduce FrameworkMap input records&＃61;5Map output records&＃61;5Input split bytes&＃61;87Spilled Records&＃61;0Failed Shuffles&＃61;0Merged Map outputs&＃61;0GC time elapsed (ms)&＃61;65CPU time spent (ms)&＃61;680Physical memory (bytes) snapshot&＃61;135651328Virtual memory (bytes) snapshot&＃61;1715556352Total committed heap usage (bytes)&＃61;42860544File Input Format Counters Bytes Read&＃61;0File Output Format Counters Bytes Written&＃61;151 19/03/18 18:47:13 INFO mapreduce.ImportJobBase: Transferred 151 bytes in 21.0263 seconds (7.1815 bytes/sec) 19/03/18 18:47:13 INFO mapreduce.ImportJobBase: Retrieved 5 records. 19/03/18 18:47:13 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table emp 19/03/18 18:47:13 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM &＃96;emp&＃96; AS t LIMIT 1 19/03/18 18:47:13 WARN hive.TableDefWriter: Column salary had to be cast to a less precise type in Hive 19/03/18 18:47:13 INFO hive.HiveImport: Loading uploaded data into Hive 19/03/18 18:47:13 ERROR hive.HiveConfig: Could not load org.apache.hadoop.hive.conf.HiveConf. Make sure HIVE_CONF_DIR is set correctly. 19/03/18 18:47:13 ERROR tool.ImportTool: Import failed: java.io.IOException: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConfat org.apache.sqoop.hive.HiveConfig.getHiveConf(HiveConfig.java:50)at org.apache.sqoop.hive.HiveImport.getHiveArgs(HiveImport.java:392)at org.apache.sqoop.hive.HiveImport.executeExternalHiveScript(HiveImport.java:379)at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:337)at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:241)at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:537)at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628)at org.apache.sqoop.Sqoop.run(Sqoop.java:147)at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)at org.apache.sqoop.Sqoop.main(Sqoop.java:252) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConfat java.net.URLClassLoader$1.run(URLClassLoader.java:366)at java.net.URLClassLoader$1.run(URLClassLoader.java:355)at java.security.AccessController.doPrivileged(Native Method)at java.net.URLClassLoader.findClass(URLClassLoader.java:354)at java.lang.ClassLoader.loadClass(ClassLoader.java:425)at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)at java.lang.ClassLoader.loadClass(ClassLoader.java:358)at java.lang.Class.forName0(Native Method)at java.lang.Class.forName(Class.java:190)at org.apache.sqoop.hive.HiveConfig.getHiveConf(HiveConfig.java:44)... 12 more

解决方案&＃xff1a;

# 查看HiveConf.class类是否存在 [hadoop&＃64;centos-aaron-h1 lib]$ jcd /home/hadoop/apps/apache-hive-1.2.2-bin/lib [hadoop&＃64;centos-aaron-h1 lib]$ jar tf hive-common-1.2.2.jar |grep HiveConf.class org/apache/hadoop/hive/conf/HiveConf.class [hadoop&＃64;centos-aaron-h1 lib]$ 查看到HiveConf.class类明明存在&＃xff0c;只是环境没有找到。

修改环境配置&＃xff0c;将hive的lib添加HADOOP_CLASSPATH中

#编辑环境变量,并且添加以下内容 vi /etc/profile export HADOOP_CLASSPATH&＃61;/home/hadoop/apps/hadoop-2.9.1/lib/* export HADOOP_CLASSPATH&＃61;$HADOOP_CLASSPATH:/home/hadoop/apps/apache-hive-1.2.2-bin/lib/* #生效环境变量 source /etc/profile

再次执行&＃xff0c;报错之前导入emp的临时目录已经存在&＃xff0c;需要删除

解决方案&＃xff1a;

hdfs dfs -rm -r /user/hadoop/emp

再次执行&＃xff0c;成功

[hadoop&＃64;centos-aaron-h1 bin]$ ./sqoop import --connect jdbc:mysql://centos-aaron-03:3306/test --username root --password 123456 --table emp --hive-import --m 1 Warning: /home/hadoop/sqoop/bin/../../hbase does not exist! HBase imports will fail. Please set $HBASE_HOME to the root of your HBase installation. Warning: /home/hadoop/sqoop/bin/../../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /home/hadoop/sqoop/bin/../../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. Warning: /home/hadoop/sqoop/bin/../../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 19/03/18 19:15:15 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 19/03/18 19:15:15 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 19/03/18 19:15:15 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override 19/03/18 19:15:15 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc. 19/03/18 19:15:15 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 19/03/18 19:15:15 INFO tool.CodeGenTool: Beginning code generation 19/03/18 19:15:15 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM &＃96;emp&＃96; AS t LIMIT 1 19/03/18 19:15:15 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM &＃96;emp&＃96; AS t LIMIT 1 19/03/18 19:15:15 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/apps/hadoop-2.9.1 注: /tmp/sqoop-hadoop/compile/e3a407469bc365c026d8fabf4e264f38/emp.java使用或覆盖了已过时的 API。注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。 19/03/18 19:15:17 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/e3a407469bc365c026d8fabf4e264f38/emp.jar 19/03/18 19:15:17 WARN manager.MySQLManager: It looks like you are importing from mysql. 19/03/18 19:15:17 WARN manager.MySQLManager: This transfer can be faster! Use the --direct 19/03/18 19:15:17 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path. 19/03/18 19:15:17 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql) 19/03/18 19:15:17 INFO mapreduce.ImportJobBase: Beginning import of emp 19/03/18 19:15:18 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 19/03/18 19:15:18 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 19/03/18 19:15:19 INFO client.RMProxy: Connecting to ResourceManager at centos-aaron-h1/192.168.29.144:8032 19/03/18 19:15:20 INFO db.DBInputFormat: Using read commited transaction isolation 19/03/18 19:15:20 INFO mapreduce.JobSubmitter: number of splits:1 19/03/18 19:15:20 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled 19/03/18 19:15:21 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552898029697_0004 19/03/18 19:15:21 INFO impl.YarnClientImpl: Submitted application application_1552898029697_0004 19/03/18 19:15:21 INFO mapreduce.Job: The url to track the job: http://centos-aaron-h1:8088/proxy/application_1552898029697_0004/ 19/03/18 19:15:21 INFO mapreduce.Job: Running job: job_1552898029697_0004 19/03/18 19:15:28 INFO mapreduce.Job: Job job_1552898029697_0004 running in uber mode : false 19/03/18 19:15:28 INFO mapreduce.Job: map 0% reduce 0% 19/03/18 19:15:34 INFO mapreduce.Job: map 100% reduce 0% 19/03/18 19:15:34 INFO mapreduce.Job: Job job_1552898029697_0004 completed successfully 19/03/18 19:15:34 INFO mapreduce.Job: Counters: 30File System CountersFILE: Number of bytes read&＃61;0FILE: Number of bytes written&＃61;206933FILE: Number of read operations&＃61;0FILE: Number of large read operations&＃61;0FILE: Number of write operations&＃61;0HDFS: Number of bytes read&＃61;87HDFS: Number of bytes written&＃61;151HDFS: Number of read operations&＃61;4HDFS: Number of large read operations&＃61;0HDFS: Number of write operations&＃61;2Job Counters Launched map tasks&＃61;1Other local map tasks&＃61;1Total time spent by all maps in occupied slots (ms)&＃61;3734Total time spent by all reduces in occupied slots (ms)&＃61;0Total time spent by all map tasks (ms)&＃61;3734Total vcore-milliseconds taken by all map tasks&＃61;3734Total megabyte-milliseconds taken by all map tasks&＃61;3823616Map-Reduce FrameworkMap input records&＃61;5Map output records&＃61;5Input split bytes&＃61;87Spilled Records&＃61;0Failed Shuffles&＃61;0Merged Map outputs&＃61;0GC time elapsed (ms)&＃61;59CPU time spent (ms)&＃61;540Physical memory (bytes) snapshot&＃61;129863680Virtual memory (bytes) snapshot&＃61;1715556352Total committed heap usage (bytes)&＃61;42860544File Input Format Counters Bytes Read&＃61;0File Output Format Counters Bytes Written&＃61;151 19/03/18 19:15:34 INFO mapreduce.ImportJobBase: Transferred 151 bytes in 15.9212 seconds (9.4842 bytes/sec) 19/03/18 19:15:34 INFO mapreduce.ImportJobBase: Retrieved 5 records. 19/03/18 19:15:34 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table emp 19/03/18 19:15:34 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM &＃96;emp&＃96; AS t LIMIT 1 19/03/18 19:15:34 WARN hive.TableDefWriter: Column salary had to be cast to a less precise type in Hive 19/03/18 19:15:34 INFO hive.HiveImport: Loading uploaded data into HiveLogging initialized using configuration in jar:file:/home/hadoop/apps/apache-hive-1.2.2-bin/lib/hive-common-1.2.2.jar!/hive-log4j.properties OK Time taken: 2.138 seconds Loading data to table default.emp Table default.emp stats: [numFiles&＃61;1, totalSize&＃61;151] OK Time taken: 0.547 seconds

查看结果&＃xff1a;

hive> [hadoop&＃64;centos-aaron-h1 bin]$ hadoop fs -cat /user/hive/warehouse/emp/part-m-00000 1gopalmanager50000.00TP 2manishaProof reader50000.00TP 3khalilphp dev30000.00AC 4prasanthphp dev30000.00AC 5kranthiadmin20000.00TP

(2)、指定行分隔符和列分隔符&＃xff0c;指定hive-import&＃xff0c;指定覆盖导入&＃xff0c;指定自动创建hive表&＃xff0c;指定表名&＃xff0c;指定删除中间结果数据目录

./sqoop import \ --connect jdbc:mysql://centos-aaron-03:3306/test \ --username root \ --password 123456 \ --table emp \ --fields-terminated-by "\t" \ --lines-terminated-by "\n" \ --hive-import \ --hive-overwrite \ --create-hive-table \ --delete-target-dir \ --hive-database mydb_test \ --hive-table emp

执行到最后报错hive库找不到

手动创建mydb_test数据块

hive> create database mydb_test; OK Time taken: 0.678 seconds hive>

再次执行&＃xff0c;依然报错找不到hive库&＃xff0c;用命令查看数据库是存在的&＃xff1b;

解决方法&＃xff1a;复制hive/conf下的hive-site.xml到sqoop工作目录的conf下,实际上该database是在hive中存在的&＃xff0c;由于sqoop下的配置文件太旧引起的&＃xff0c;一般会出现在,换台机器执行sqoopCDH 默认路径在sqoop下&＃xff1a; /etc/hive/conf/hive-site.xml copy到 /etc/sqoop/conf/hive-site.xm

再次执行&＃xff0c;成功

hive> [hadoop&＃64;centos-aaron-h1 bin]$ cd ~/sqoop/bin [hadoop&＃64;centos-aaron-h1 bin]$ ./sqoop import \ > --connect jdbc:mysql://centos-aaron-03:3306/test \ > --username root \ > --password 123456 \ > --table emp \ > --fields-terminated-by "\t" \ > --lines-terminated-by "\n" \ > --hive-import \ > --hive-overwrite \ > --create-hive-table \ > --delete-target-dir \ > --hive-database mydb_test \ > --hive-table emp Warning: /home/hadoop/sqoop/../hbase does not exist! HBase imports will fail. Please set $HBASE_HOME to the root of your HBase installation. Warning: /home/hadoop/sqoop/../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /home/hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. Warning: /home/hadoop/sqoop/../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 19/03/18 20:49:59 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 19/03/18 20:49:59 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 19/03/18 20:49:59 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 19/03/18 20:49:59 INFO tool.CodeGenTool: Beginning code generation 19/03/18 20:50:00 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM &＃96;emp&＃96; AS t LIMIT 1 19/03/18 20:50:00 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM &＃96;emp&＃96; AS t LIMIT 1 19/03/18 20:50:00 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/apps/hadoop-2.9.1 注: /tmp/sqoop-hadoop/compile/7a157b339316952d30024e165d5db00d/emp.java使用或覆盖了已过时的 API。注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。 19/03/18 20:50:01 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/7a157b339316952d30024e165d5db00d/emp.jar 19/03/18 20:50:03 INFO tool.ImportTool: Destination directory emp deleted. 19/03/18 20:50:03 WARN manager.MySQLManager: It looks like you are importing from mysql. 19/03/18 20:50:03 WARN manager.MySQLManager: This transfer can be faster! Use the --direct 19/03/18 20:50:03 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path. 19/03/18 20:50:03 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql) 19/03/18 20:50:03 INFO mapreduce.ImportJobBase: Beginning import of emp 19/03/18 20:50:03 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 19/03/18 20:50:03 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 19/03/18 20:50:03 INFO client.RMProxy: Connecting to ResourceManager at centos-aaron-h1/192.168.29.144:8032 19/03/18 20:50:04 INFO mapreduce.JobSubmitter: number of splits:5 19/03/18 20:50:04 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled 19/03/18 20:50:05 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552898029697_0016 19/03/18 20:50:05 INFO impl.YarnClientImpl: Submitted application application_1552898029697_0016 19/03/18 20:50:05 INFO mapreduce.Job: The url to track the job: http://centos-aaron-h1:8088/proxy/application_1552898029697_0016/ 19/03/18 20:50:05 INFO mapreduce.Job: Running job: job_1552898029697_0016 19/03/18 20:50:12 INFO mapreduce.Job: Job job_1552898029697_0016 running in uber mode : false 19/03/18 20:50:12 INFO mapreduce.Job: map 0% reduce 0% 19/03/18 20:50:18 INFO mapreduce.Job: map 20% reduce 0% 19/03/18 20:50:21 INFO mapreduce.Job: map 40% reduce 0% 19/03/18 20:50:22 INFO mapreduce.Job: map 100% reduce 0% 19/03/18 20:50:23 INFO mapreduce.Job: Job job_1552898029697_0016 completed successfully 19/03/18 20:50:23 INFO mapreduce.Job: Counters: 31File System CountersFILE: Number of bytes read&＃61;0FILE: Number of bytes written&＃61;1034665FILE: Number of read operations&＃61;0FILE: Number of large read operations&＃61;0FILE: Number of write operations&＃61;0HDFS: Number of bytes read&＃61;491HDFS: Number of bytes written&＃61;151HDFS: Number of read operations&＃61;20HDFS: Number of large read operations&＃61;0HDFS: Number of write operations&＃61;10Job Counters Killed map tasks&＃61;1Launched map tasks&＃61;5Other local map tasks&＃61;5Total time spent by all maps in occupied slots (ms)&＃61;32416Total time spent by all reduces in occupied slots (ms)&＃61;0Total time spent by all map tasks (ms)&＃61;32416Total vcore-milliseconds taken by all map tasks&＃61;32416Total megabyte-milliseconds taken by all map tasks&＃61;33193984Map-Reduce FrameworkMap input records&＃61;5Map output records&＃61;5Input split bytes&＃61;491Spilled Records&＃61;0Failed Shuffles&＃61;0Merged Map outputs&＃61;0GC time elapsed (ms)&＃61;1240CPU time spent (ms)&＃61;3190Physical memory (bytes) snapshot&＃61;660529152Virtual memory (bytes) snapshot&＃61;8577761280Total committed heap usage (bytes)&＃61;214302720File Input Format Counters Bytes Read&＃61;0File Output Format Counters Bytes Written&＃61;151 19/03/18 20:50:23 INFO mapreduce.ImportJobBase: Transferred 151 bytes in 20.6001 seconds (7.3301 bytes/sec) 19/03/18 20:50:23 INFO mapreduce.ImportJobBase: Retrieved 5 records. 19/03/18 20:50:23 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table emp 19/03/18 20:50:23 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM &＃96;emp&＃96; AS t LIMIT 1 19/03/18 20:50:23 WARN hive.TableDefWriter: Column salary had to be cast to a less precise type in Hive 19/03/18 20:50:23 INFO hive.HiveImport: Loading uploaded data into HiveLogging initialized using configuration in jar:file:/home/hadoop/apps/apache-hive-1.2.2-bin/lib/hive-common-1.2.2.jar!/hive-log4j.properties OK Time taken: 1.131 seconds Loading data to table mydb_test.emp Table mydb_test.emp stats: [numFiles&＃61;5, numRows&＃61;0, totalSize&＃61;151, rawDataSize&＃61;0] OK Time taken: 0.575 seconds [hadoop&＃64;centos-aaron-h1 bin]$

查看结果数据&＃xff1a;

[hadoop&＃64;centos-aaron-h1 bin]$ hiveLogging initialized using configuration in jar:file:/home/hadoop/apps/apache-hive-1.2.2-bin/lib/hive-common-1.2.2.jar!/hive-log4j.properties hive> show databases; OK default mydb_test wcc_log Time taken: 0.664 seconds, Fetched: 3 row(s) hive> use mydb_test; OK Time taken: 0.027 seconds hive> show tables; OK emp Time taken: 0.038 seconds, Fetched: 1 row(s) hive> select * from emp; OK 1 gopal manager 50000.0 TP 2 manisha Proof reader 50000.0 TP 3 khalil php dev 30000.0 AC 4 prasanth php dev 30000.0 AC 5 kranthi admin 20000.0 TP Time taken: 0.634 seconds, Fetched: 5 row(s) hive>

上面的语句等价于&＃xff1a;

sqoop import \ --connect jdbc:mysql://centos-aaron-03:3306/test \ --username root \ --password 123456 \ --table emp \ --fields-terminated-by "\t" \ --lines-terminated-by "\n" \ --hive-import \ --hive-overwrite \ --create-hive-table \ --hive-table mydb_test.emp \ --delete-target-dir

(3)、导入到HDFS指定目录

在导入表数据到HDFS使用Sqoop导入工具&＃xff0c;我们可以指定目标目录。以下是指定目标目录选项的Sqoop导入命令的语法:

--target-dir

下面的命令是用来导入emp表数据到&＃39;/queryresult&＃39;目录。

./sqoop import \ --connect jdbc:mysql://centos-aaron-03:3306/test \ --username root \ --password 123456 \ --target-dir /queryresult \ --table emp --m 1

执行效果

[hadoop&＃64;centos-aaron-h1 bin]$ ./sqoop import \ > --connect jdbc:mysql://centos-aaron-03:3306/test \ > --username root \ > --password 123456 \ > --target-dir /queryresult \ > --table emp --m 1 Warning: /home/hadoop/sqoop/../hbase does not exist! HBase imports will fail. Please set $HBASE_HOME to the root of your HBase installation. Warning: /home/hadoop/sqoop/../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /home/hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. Warning: /home/hadoop/sqoop/../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 19/03/18 21:00:59 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 19/03/18 21:00:59 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 19/03/18 21:00:59 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 19/03/18 21:00:59 INFO tool.CodeGenTool: Beginning code generation 19/03/18 21:00:59 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM &＃96;emp&＃96; AS t LIMIT 1 19/03/18 21:00:59 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM &＃96;emp&＃96; AS t LIMIT 1 19/03/18 21:00:59 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/apps/hadoop-2.9.1 注: /tmp/sqoop-hadoop/compile/433dbe7d1d24f817e00a85bf0d78eb42/emp.java使用或覆盖了已过时的 API。注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。 19/03/18 21:01:01 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/433dbe7d1d24f817e00a85bf0d78eb42/emp.jar 19/03/18 21:01:01 WARN manager.MySQLManager: It looks like you are importing from mysql. 19/03/18 21:01:01 WARN manager.MySQLManager: This transfer can be faster! Use the --direct 19/03/18 21:01:01 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path. 19/03/18 21:01:01 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql) 19/03/18 21:01:01 INFO mapreduce.ImportJobBase: Beginning import of emp 19/03/18 21:01:01 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 19/03/18 21:01:02 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 19/03/18 21:01:02 INFO client.RMProxy: Connecting to ResourceManager at centos-aaron-h1/192.168.29.144:8032 19/03/18 21:01:04 INFO mapreduce.JobSubmitter: number of splits:1 19/03/18 21:01:04 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled 19/03/18 21:01:04 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552898029697_0017 19/03/18 21:01:04 INFO impl.YarnClientImpl: Submitted application application_1552898029697_0017 19/03/18 21:01:04 INFO mapreduce.Job: The url to track the job: http://centos-aaron-h1:8088/proxy/application_1552898029697_0017/ 19/03/18 21:01:04 INFO mapreduce.Job: Running job: job_1552898029697_0017 19/03/18 21:01:11 INFO mapreduce.Job: Job job_1552898029697_0017 running in uber mode : false 19/03/18 21:01:11 INFO mapreduce.Job: map 0% reduce 0% 19/03/18 21:01:17 INFO mapreduce.Job: map 100% reduce 0% 19/03/18 21:01:17 INFO mapreduce.Job: Job job_1552898029697_0017 completed successfully 19/03/18 21:01:17 INFO mapreduce.Job: Counters: 30File System CountersFILE: Number of bytes read&＃61;0FILE: Number of bytes written&＃61;206929FILE: Number of read operations&＃61;0FILE: Number of large read operations&＃61;0FILE: Number of write operations&＃61;0HDFS: Number of bytes read&＃61;87HDFS: Number of bytes written&＃61;151HDFS: Number of read operations&＃61;4HDFS: Number of large read operations&＃61;0HDFS: Number of write operations&＃61;2Job Counters Launched map tasks&＃61;1Other local map tasks&＃61;1Total time spent by all maps in occupied slots (ms)&＃61;3157Total time spent by all reduces in occupied slots (ms)&＃61;0Total time spent by all map tasks (ms)&＃61;3157Total vcore-milliseconds taken by all map tasks&＃61;3157Total megabyte-milliseconds taken by all map tasks&＃61;3232768Map-Reduce FrameworkMap input records&＃61;5Map output records&＃61;5Input split bytes&＃61;87Spilled Records&＃61;0Failed Shuffles&＃61;0Merged Map outputs&＃61;0GC time elapsed (ms)&＃61;60CPU time spent (ms)&＃61;530Physical memory (bytes) snapshot&＃61;133115904Virtual memory (bytes) snapshot&＃61;1715552256Total committed heap usage (bytes)&＃61;42860544File Input Format Counters Bytes Read&＃61;0File Output Format Counters Bytes Written&＃61;151 19/03/18 21:01:17 INFO mapreduce.ImportJobBase: Transferred 151 bytes in 14.555 seconds (10.3744 bytes/sec) 19/03/18 21:01:17 INFO mapreduce.ImportJobBase: Retrieved 5 records.

查看数据结果&＃xff1a;

[hadoop&＃64;centos-aaron-h1 bin]$ hdfs dfs -ls /queryresult Found 2 items -rw-r--r-- 2 hadoop supergroup 0 2019-03-18 21:01 /queryresult/_SUCCESS -rw-r--r-- 2 hadoop supergroup 151 2019-03-18 21:01 /queryresult/part-m-00000 [hadoop&＃64;centos-aaron-h1 bin]$ hdfs dfs -cat /queryresult/part-m-00000 1,gopal,manager,50000.00,TP 2,manisha,Proof reader,50000.00,TP 3,khalil,php dev,30000.00,AC 4,prasanth,php dev,30000.00,AC 5,kranthi,admin,20000.00,TP [hadoop&＃64;centos-aaron-h1 bin]$

(4)、导入表数据子集
我们可以导入表的使用Sqoop导入工具&＃xff0c;"where"子句的一个子集。它执行在各自的数据库服务器相应的SQL查询&＃xff0c;并将结果存储在HDFS的目标目录。
where子句的语法如下:

--where

下面的命令用来导入emp表数据的子集。子集查询检索员工ID为3&＃xff0c;

./sqoop import \ --connect jdbc:mysql://centos-aaron-03:3306/test \ --username root \ --password 123456 \ --where "id &＃61;3 " \ --target-dir /wherequery \ --table emp --m 1

执行效果

(5)、按需导入

./sqoop import \ --connect jdbc:mysql://centos-aaron-03:3306/test \ --username root \ --password 123456 \ --target-dir /wherequery2 \ --query &＃39;select id,name,deg from emp WHERE id>2 and $CONDITIONS&＃39; \ --split-by id \ --fields-terminated-by &＃39;\t&＃39; \ --m 1

执行效果

(6)、增量导入

我们可以导入表的使用Sqoop导入工具&＃xff0c;"where"子句的一个子集。它执行在各自的数据库服务器相应的SQL查询&＃xff0c;并将结果存储在HDFS的目标目录。增量导入是仅导入新添加的表中的行的技术。它需要添加‘incremental’, ‘check-column’, 和 ‘last-value’选项来执行增量导入。
下面的语法用于Sqoop导入命令增量选项:

--incremental --check-column --last value

假设新添加的数据转换成emp表如下&＃xff1a;

6, satish p, grp des, 20000, GR

下面的命令用于在emp表执行增量导入:

./sqoop import \ --connect jdbc:mysql://centos-aaron-03:3306/test \ --username root \ --password 123456 \ --table emp --m 1 \ --target-dir /wherequery \ --incremental append \ --check-column id \ --last-value 5

执行效果&＃xff1a;

二、Sqoop的数据导出

将数据从HDFS导出到RDBMS数据库&＃xff1b;导出前&＃xff0c;目标表必须存在于目标数据库中&＃xff1b;默认操作是将文件中的数据使用INSERT语句插入到表中&＃xff1b;更新模式下&＃xff0c;是生成UPDATE语句更新表数据&＃xff1b;

语法&＃xff1a;

以下是export命令语法

sqoop export (generic-args) (export-args)

示例&＃xff1a;

数据是在HDFS 中“/queryresult ”目录的hdfs dfs -cat /queryresult/part-m-00000文件中。所述hdfs dfs -cat /queryresult/part-m-00000如下&＃xff1a;

1,gopal,manager,50000.00,TP 2,manisha,Proof reader,50000.00,TP 3,khalil,php dev,30000.00,AC 4,prasanth,php dev,30000.00,AC 5,kranthi,admin,20000.00,TP

(1)、首先需要手动创建mysql中的目标表

mysql> show databases; &＃43;--------------------&＃43; | Database | &＃43;--------------------&＃43; | information_schema | | azkaban | | hive | | hivedb | | mysql | | performance_schema | | test | | urldb | | web_log_wash | &＃43;--------------------&＃43; 9 rows in set (0.00 sec)mysql> use test; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -ADatabase changed mysql> CREATE TABLE employee ( -> id INT NOT NULL PRIMARY KEY, -> name VARCHAR(20), -> deg VARCHAR(20),-> salary INT,-> dept VARCHAR(10)); Query OK, 0 rows affected (0.02 sec) Aborted

(2)、然后执行导出命令

./sqoop export \ --connect "jdbc:mysql://centos-aaron-03:3306/test?useUnicode&＃61;true&characterEncoding&＃61;utf-8" \ --username root \ --password 123456 \ --table employee \ --fields-terminated-by "," \ --export-dir /queryresult/part-m-00000 \ --columns&＃61;"id,name,deg,salary,dept"

报错

具体问题是数据中有中文&＃xff0c;而数据库表编码不支持
解决方案如下&＃xff1a;
将表的数据导出&＃xff0c;删除表后重新创建表&＃xff0c;指定编码DEFAULT CHARSET&＃61;utf8

继续报错&＃xff0c;分析确认hdfs上数据内容与建表时的int字段不匹配&＃xff0c;需要将表的int改为decimal类型

继续执行&＃xff0c;成功

验证效果&＃xff1a;

三、Sqoop作业

注&＃xff1a;Sqoop作业——将事先定义好的数据导入导出任务按照指定流程运行

语法&＃xff1a;

以下是创建Sqoop作业的语法

$ sqoop job (generic-args) (job-args)[-- [subtool-name] (subtool-args)]

创建作业(--create)

在这里&＃xff0c;我们创建一个名为myjob&＃xff0c;这可以从RDBMS表的数据导入到HDFS作业

#该命令创建了一个从db库的employee表导入到HDFS文件的作业 ./sqoop job --create myimportjob -- import --connect jdbc:mysql://centos-aaron-03:3306/test --username root --password 123456 --table emp --m 1

验证作业 (--list)

‘--list’ 参数是用来验证保存的作业。下面的命令用来验证保存Sqoop作业的列表。

#它显示了保存作业列表。 sqoop job --list

检查作业(--show)
‘--show’ 参数用于检查或验证特定的工作&＃xff0c;及其详细信息。以下命令和样本输出用来验证一个名为myjob的作业。

#它显示了工具和它们的选择&＃xff0c;这是使用在myjob中作业情况。 sqoop job --show myjob

[hadoop&＃64;centos-aaron-h1 bin]$ sqoop job --show myimportjob Warning: /home/hadoop/sqoop/../hbase does not exist! HBase imports will fail. Please set $HBASE_HOME to the root of your HBase installation. Warning: /home/hadoop/sqoop/../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /home/hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. Warning: /home/hadoop/sqoop/../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 19/03/18 22:46:25 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 Enter password: Job: myimportjob Tool: import Options: ---------------------------- verbose &＃61; false hcatalog.drop.and.create.table &＃61; false db.connect.string &＃61; jdbc:mysql://centos-aaron-03:3306/test codegen.output.delimiters.escape &＃61; 0 codegen.output.delimiters.enclose.required &＃61; false codegen.input.delimiters.field &＃61; 0 split.limit &＃61; null hbase.create.table &＃61; false mainframe.input.dataset.type &＃61; p db.require.password &＃61; true skip.dist.cache &＃61; false hdfs.append.dir &＃61; false db.table &＃61; emp codegen.input.delimiters.escape &＃61; 0 accumulo.create.table &＃61; false import.fetch.size &＃61; null codegen.input.delimiters.enclose.required &＃61; false db.username &＃61; root reset.onemapper &＃61; false codegen.output.delimiters.record &＃61; 10 import.max.inline.lob.size &＃61; 16777216 sqoop.throwOnError &＃61; false hbase.bulk.load.enabled &＃61; false hcatalog.create.table &＃61; false db.clear.staging.table &＃61; false codegen.input.delimiters.record &＃61; 0 enable.compression &＃61; false hive.overwrite.table &＃61; false hive.import &＃61; false codegen.input.delimiters.enclose &＃61; 0 accumulo.batch.size &＃61; 10240000 hive.drop.delims &＃61; false customtool.options.jsonmap &＃61; {} codegen.output.delimiters.enclose &＃61; 0 hdfs.delete-target.dir &＃61; false codegen.output.dir &＃61; . codegen.auto.compile.dir &＃61; true relaxed.isolation &＃61; false mapreduce.num.mappers &＃61; 1 accumulo.max.latency &＃61; 5000 import.direct.split.size &＃61; 0 sqlconnection.metadata.transaction.isolation.level &＃61; 2 codegen.output.delimiters.field &＃61; 44 export.new.update &＃61; UpdateOnly incremental.mode &＃61; None hdfs.file.format &＃61; TextFile sqoop.oracle.escaping.disabled &＃61; true codegen.compile.dir &＃61; /tmp/sqoop-hadoop/compile/e0ba9288d4916ac38fdbbe98737f9829 direct.import &＃61; false temporary.dirRoot &＃61; _sqoop hive.fail.table.exists &＃61; false db.batch &＃61; false [hadoop&＃64;centos-aaron-h1 bin]$

执行作业 (--exec)

‘--exec’ 选项用于执行保存的作业。下面的命令用于执行保存的作业称为myjob

sqoop job --exec myjob #正常情况它会显示下面的输出。 10/08/19 13:08:45 INFO tool.CodeGenTool: Beginning code generation

报错&＃xff1a;

分析是由于mysql访问权限引起&＃xff0c;需要修改数据库权限&＃xff1a;

#123456表示数据库连接密码 grant all privileges on *.* to root&＃64;&＃39;%&＃39; identified by &＃39;123456&＃39; ; FLUSH PRIVILEGES;

再次执行sqoop job,成功

[hadoop&＃64;centos-aaron-h1 bin]$ sqoop job --exec myimportjob Warning: /home/hadoop/sqoop/../hbase does not exist! HBase imports will fail. Please set $HBASE_HOME to the root of your HBase installation. Warning: /home/hadoop/sqoop/../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /home/hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. Warning: /home/hadoop/sqoop/../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 19/03/18 23:02:08 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 Enter password: 19/03/18 23:02:11 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 19/03/18 23:02:11 INFO tool.CodeGenTool: Beginning code generation 19/03/18 23:02:12 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM &＃96;emp&＃96; AS t LIMIT 1 19/03/18 23:02:12 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM &＃96;emp&＃96; AS t LIMIT 1 19/03/18 23:02:12 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/apps/hadoop-2.9.1 注: /tmp/sqoop-hadoop/compile/ea795ab1037c940352cf3f7d5af2728f/emp.java使用或覆盖了已过时的 API。注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。 19/03/18 23:02:13 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/ea795ab1037c940352cf3f7d5af2728f/emp.jar 19/03/18 23:02:13 WARN manager.MySQLManager: It looks like you are importing from mysql. 19/03/18 23:02:13 WARN manager.MySQLManager: This transfer can be faster! Use the --direct 19/03/18 23:02:13 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path. 19/03/18 23:02:13 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql) 19/03/18 23:02:13 INFO mapreduce.ImportJobBase: Beginning import of emp 19/03/18 23:02:14 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 19/03/18 23:02:14 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 19/03/18 23:02:14 INFO client.RMProxy: Connecting to ResourceManager at centos-aaron-h1/192.168.29.144:8032 19/03/18 23:02:16 INFO db.DBInputFormat: Using read commited transaction isolation 19/03/18 23:02:16 INFO mapreduce.JobSubmitter: number of splits:1 19/03/18 23:02:16 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled 19/03/18 23:02:16 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552898029697_0030 19/03/18 23:02:17 INFO impl.YarnClientImpl: Submitted application application_1552898029697_0030 19/03/18 23:02:17 INFO mapreduce.Job: The url to track the job: http://centos-aaron-h1:8088/proxy/application_1552898029697_0030/ 19/03/18 23:02:17 INFO mapreduce.Job: Running job: job_1552898029697_0030 19/03/18 23:02:24 INFO mapreduce.Job: Job job_1552898029697_0030 running in uber mode : false 19/03/18 23:02:24 INFO mapreduce.Job: map 0% reduce 0% 19/03/18 23:02:30 INFO mapreduce.Job: map 100% reduce 0% 19/03/18 23:02:30 INFO mapreduce.Job: Job job_1552898029697_0030 completed successfully 19/03/18 23:02:30 INFO mapreduce.Job: Counters: 30File System CountersFILE: Number of bytes read&＃61;0FILE: Number of bytes written&＃61;207365FILE: Number of read operations&＃61;0FILE: Number of large read operations&＃61;0FILE: Number of write operations&＃61;0HDFS: Number of bytes read&＃61;87HDFS: Number of bytes written&＃61;180HDFS: Number of read operations&＃61;4HDFS: Number of large read operations&＃61;0HDFS: Number of write operations&＃61;2Job Counters Launched map tasks&＃61;1Other local map tasks&＃61;1Total time spent by all maps in occupied slots (ms)&＃61;3466Total time spent by all reduces in occupied slots (ms)&＃61;0Total time spent by all map tasks (ms)&＃61;3466Total vcore-milliseconds taken by all map tasks&＃61;3466Total megabyte-milliseconds taken by all map tasks&＃61;3549184Map-Reduce FrameworkMap input records&＃61;6Map output records&＃61;6Input split bytes&＃61;87Spilled Records&＃61;0Failed Shuffles&＃61;0Merged Map outputs&＃61;0GC time elapsed (ms)&＃61;63CPU time spent (ms)&＃61;590Physical memory (bytes) snapshot&＃61;132681728Virtual memory (bytes) snapshot&＃61;1715552256Total committed heap usage (bytes)&＃61;42860544File Input Format Counters Bytes Read&＃61;0File Output Format Counters Bytes Written&＃61;180 19/03/18 23:02:30 INFO mapreduce.ImportJobBase: Transferred 180 bytes in 15.5112 seconds (11.6045 bytes/sec) 19/03/18 23:02:30 INFO mapreduce.ImportJobBase: Retrieved 6 records. [hadoop&＃64;centos-aaron-h1 bin]$

四、Sqoop的原理

概述&＃xff1a;Sqoop的原理其实就是将导入导出命令转化为mapreduce程序来执行&＃xff0c;sqoop在接收到命令后&＃xff0c;都要生成mapreduce程序&＃xff1b;使用sqoop的代码生成工具可以方便查看到sqoop所生成的java代码&＃xff0c;并可在此基础之上进行深入定制开发。

代码定制&＃xff1a;

以下是Sqoop代码生成命令的语法

$ sqoop-codegen (generic-args) (codegen-args)

示例&＃xff1a;以USERDB数据库中的表emp来生成Java代码为例。
下面的命令用来生成导入

sqoop codegen --connect jdbc:mysql://centos-aaron-03:3306/test --username root --password 123456 --table emp -bindir .

如果命令成功执行&＃xff0c;那么它就会产生如下的输出

[hadoop&＃64;centos-aaron-h1 bin]$ sqoop codegen --connect jdbc:mysql://centos-aaron-03:3306/test --username root --password 123456 --table emp -bindir . Warning: /home/hadoop/sqoop/../hbase does not exist! HBase imports will fail. Please set $HBASE_HOME to the root of your HBase installation. Warning: /home/hadoop/sqoop/../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /home/hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. Warning: /home/hadoop/sqoop/../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 19/03/18 23:21:24 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 19/03/18 23:21:24 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 19/03/18 23:21:24 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 19/03/18 23:21:24 INFO tool.CodeGenTool: Beginning code generation 19/03/18 23:21:24 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM &＃96;emp&＃96; AS t LIMIT 1 19/03/18 23:21:24 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM &＃96;emp&＃96; AS t LIMIT 1 19/03/18 23:21:24 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/apps/hadoop-2.9.1 注: ./emp.java使用或覆盖了已过时的 API。注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。 19/03/18 23:21:26 INFO orm.CompilationManager: Writing jar file: ./emp.jar [hadoop&＃64;centos-aaron-h1 bin]$ ll