作者:手机用户2502907451 | 来源:互联网 | 2023-09-09 20:37
本文介绍了使用Spark连接Hive的两种方式,spark-shell和IDEA远程连接。1.spark-shell1.1.拷贝配置文件拷贝hiveconfhdfs-site.xm
本文介绍了使用Spark连接Hive的两种方式,spark-shell和IDEA远程连接。
1.spark-shell
1.1.拷贝配置文件
- 拷贝hive/conf/hdfs-site.xml 到 spark/conf/ 下
- 拷贝hive/lib/mysql 到 spark/jars/下
这里可以通过如下参数来实现指定jar-path
--driver-class-path path/mysql-connector-java-5.1.13-bin.jar
1.2.启动spark-shell
spark.sql("show databases").show()
spark.sql("use test")
spark.sql("select * from student").show()
执行结果:
[hadoop@hadoop1 spark-2.3.0-bin-hadoop2.7]$ ./bin/spark-shell
2018-09-04 11:43:10 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://hadoop1:4040
Spark context available as 'sc' (master = local[*], app id = local-1536032600945).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.3.0
/_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_91)
Type in expressions to have them evaluated.
Type :help for more information.
scala> spark.sql("show databases").show()
2018-09-04 11:43:54 WARN ObjectStore:568 - Failed to get database global_temp, returning NoSuchObjectException
+------------+
|databaseName|
+------------+
| default|
| test|
+------------+
scala> spark.sql("use test")
res1: org.apache.spark.sql.DataFrame = []
scala> spark.sql("select * from student").show()
+----+-----+---+----+-----+
| sno|sname|sex|sage|sdept|
+----+-----+---+----+-----+
|1001| 张三| 男| 22| 高一|
|1002| 李四| 女| 25| 高二|
+----+-----+---+----+-----+
2.IDEA连接Hive
这里是连接远程的Hive,如果还没有部署Hive,请参考Hive之环境安装,前提是必须先启动hdfs。
2.1.引入依赖
org.apache.spark
spark-core_2.11
2.3.0
org.apache.spark
spark-sql_2.11
2.3.0
org.apache.spark
spark-hive_2.11
2.3.0
mysql
mysql-connector-java
5.1.40
2.2.拷贝配置文件
拷贝hive-site.xml到项目的resources目录下即可
hive-site.xml
javax.jdo.option.ConnectionURL
jdbc:mysql://hadoop1:3306/hive?createDatabaseIfNotExist=true
JDBC connect string for a JDBC metastore
javax.jdo.option.ConnectionDriverName
com.mysql.jdbc.Driver
Driver class name for a JDBC metastore
javax.jdo.option.ConnectionUserName
root
username to use against metastore database
javax.jdo.option.ConnectionPassword
root
password to use against metastore database
2.3.编写代码
object HiveSupport {
def main(args: Array[String]): Unit = {
//val warehouseLocation = "D:\\workspaces\\idea\\hadoop"
val spark =
SparkSession.builder()
.appName("HiveSupport")
.master("local[2]")
//拷贝hdfs-site.xml不用设置,如果使用本地hive,可通过该参数设置metastore_db的位置
//.config("spark.sql.warehouse.dir", warehouseLocation)
.enableHiveSupport() //开启支持hive
.getOrCreate()
//spark.sparkContext.setLogLevel("WARN") //设置日志输出级别
import spark.implicits._
import spark.sql
sql("show databases")
sql("use test")
sql("select * from student").show()
Thread.sleep(150 * 1000)
spark.stop()
}
}
执行结果:
+----+-----+---+----+-----+
| sno|sname|sex|sage|sdept|
+----+-----+---+----+-----+
|1001| 张三| 男| 22| 高一|
|1002| 李四| 女| 25| 高二|
+----+-----+---+----+-----+
参考:
- Spark的spark.sql.warehouse.dir相关
- Spark 2.2.1 + Hive
- spark连接hive(spark-shell和eclipse两种方式)
- 官方文档