1.搭建sqoop
下载关于sqoop的包,以hadoop解压到对应的目录
配置环境变量
export SQOOP_HOME=/home/hadoop/app/sqoop
export PATH=${SQOOP_HOME}/bin:$PATH
修改配置文件,需要先拷贝一份模板文件,只需要修改hadoop跟hive的路径就可以
[hadoop@hadoop001 conf]$ cp sqoop-env-template.sh sqoop-env.sh
[hadoop@hadoop001 conf]$ cat sqoop-env.sh
……
#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/home/hadoop/app/hadoop#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/home/hadoop/app/hadoop#set the path to where bin/hbase is available
#export HBASE_HOME=#Set the path to where bin/hive is available
export HIVE_HOME=/home/hadoop/app/hive#Set the path for where zookeper config dir is
#export ZOOCFGDIR=上传文件到$SQOOP_HOME/lib目录下,防止后续报错
[hadoop@hadoop001 lib]$ ll mysql-connector-java-5.1.47.jar
-rw-rw-r-- 1 hadoop hadoop 1007502 Jul 23 09:45 mysql-connector-java-5.1.47.jar
[hadoop@hadoop001 lib]$ ll java-json.jar
-rw-r--r-- 1 hadoop hadoop 84697 Jul 23 14:51 java-json.jar
接下来可以使用sqoop进行文件的导入导出,(导入导出基于hadoop)
2.sqoop参数介绍
通过help来查看sqoop的帮助信息
[hadoop@hadoop001 lib]$ sqoop help
Warning: /home/hadoop/app/sqoop/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/hadoop/app/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/hadoop/app/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/hadoop/app/sqoop/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
19/07/23 20:10:03 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.15.1
usage: sqoop COMMAND [ARGS]Available commands:codegen Generate code to interact with database recordscreate-hive-table Import a table definition into Hiveeval Evaluate a SQL statement and display the resultsexport Export an HDFS directory to a database table 常用参数help List available commandsimport Import a table from a database to HDFS 常用参数import-all-tables Import tables from a database to HDFSimport-mainframe Import datasets from a mainframe server to HDFSjob Work with saved jobslist-databases List available databases on a serverlist-tables List available tables in a databasemerge Merge results of incremental importsmetastore Run a standalone Sqoop metastoreversion Display version informationSee 'sqoop help COMMAND' for information on a specific command.
接下里重点介绍sqoop的导入导出功能
sqoop help import 查看导入帮助--connect 指定连接数据库的地址 Specify JDBCconnectstring
--username 数据库的用户名 Setauthentication username--password 数据库密码 Setauthentication password
--columns 单独导出表的字段 Columns toimport fromtable
--delete-target-dir 删除导入的对应的目录 Imports datain deletemode
-e,--query 以sql语句的结果导入,不能与table同时使用 Importresults ofSQL'statement'
-m,--num-mappers 指定使用几个map Use 'n' maptasks toimport inparallel
--mapreduce-job-name 指定job名称 Set name forgeneratedmapreducejob
--table 指定导入什么表 Table toread
--target-dir 导入的位置 HDFS plaintabledestination
--where 采用什么条件导入 WHERE clauseto useduringimport
--fields-terminated-by 导入的数据之间用什么分隔 Sets the field separator charactersqoop help export 查看导出帮助
使用案例MySQL ==> HDFS:
sqoop import \
--connect jdbc:mysql://hadoop001:3306/ruozedata_d7 \
--username root --password 123456 \
--table tbls \
--target-dir /user/hadoop/test01 \
--mapreduce-job-name test01 \
--delete-target-dir \
--fields-terminated-by '$' \
-m 2
[hadoop@hadoop001 lib]$ hdfs dfs -ls /user/hadoop/test01
Found 3 items
-rw-r--r-- 1 hadoop supergroup 0 2019-07-23 20:30 /user/hadoop/test01/_SUCCESS
-rw-r--r-- 1 hadoop supergroup 373 2019-07-23 20:30 /user/hadoop/test01/part-m-00000
-rw-r--r-- 1 hadoop supergroup 692 2019-07-23 20:30 /user/hadoop/test01/part-m-00001
[hadoop@hadoop001 lib]$ hdfs dfs -text /user/hadoop/test01/part-m-00000
1$1563267428$1$0$hadoop$0$1$ruozedata$MANAGED_TABLE$null$null
8$1563333223$1$0$hadoop$0$8$emp$MANAGED_TABLE$null$null
11$1563417967$1$0$hadoop$0$11$emp_01$MANAGED_TABLE$null$null
14$1563420051$1$0$hadoop$0$14$emp_external$EXTERNAL_TABLE$null$null
16$1563439978$1$0$hadoop$0$16$dual$MANAGED_TABLE$null$null
21$1563682724$11$0$hadoop$0$21$rating_json$MANAGED_TABLE$null$null
使用案例MySQL ==> HIVE:
sqoop import \
--connect jdbc:mysql://localhost:3306/ruozedata \
--password root \
--username root \
--table product_info \
--hive-database d7_hive \
--hive-table product_info \
--hive-import \
--hive-overwrite \
--delete-target-dir \
--fields-terminated-by '\t' \
--split-by product_id \
-m 2