作者:手机用户2602930515 | 来源:互联网 | 2023-07-02 13:06
在用spark-submit提交作业时,用sbtpackage打包好的jar程序,可以很好的运行在client模式,当在cluster模式,一直报错:Exceptioninthre
在用spark-submit提交作业时,用sbt package打包好的jar程序,可以很好的运行在client模式,当在cluster模式,
一直报错:Exception in thread "main" java.lang.ClassNotFoundException。决定利用sbt assembly插件把所有的依赖打成一个jar。
我的工程结构:
myProject/build.sbt
myProject/project/assembly.sbt
myProject/src/main/scala/com/lasclocker/java/SparkGopProcess.java
上面褐色部分是java源程序的包名。
build.sbt的内容:
lazy val root = (project in file(".")).
settings(
name := "my-project",
version := "1.0",
scalaVersion := "2.11.7",
mainClass in Compile := Some("com.lasclocker.java.SparkGopProcess") // 这里是主类名字
)
autoScalaLibrary := false // exclude scala library
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.4.1" % "provided" // exclude spark library
unmanagedBase := baseDirectory.value / "custom_spark_lib" // 这里是第三方依赖包,我直接放在myProject的custom_spark_lib目录下面
// META-INF discarding
mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
{
case PathList("META-INF", xs @ _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
}
其中custom_spark_lib目录下的jar包有:guava-10.0.1.jar, hadoopCustomInputFormat.jar.
assembly.sbt的内容:
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.12.0")
在myProject目录下,执行:
最后生成a fat jar包:target/scala-2.11/my-project-assembly-1.0.jar.
最后附上我的spark-submit cluster模式的shell脚本(脚本中的ip地方被xx了):
inPath=/LPR
outPath=/output
minPartitionNum=4
sparkURL=spark://xx.xx.xx.xx:7077
hdfsFile=hdfs://xx.xx.xx.xx:9000/user/root
ldLib=/opt/hadoop/lib #这里放一些动态库, 比如JNI中的.so文件
spark-submit \
--class ${yourAppClass} \
--master ${sparkURL} \
--driver-library-path $ldLib \
--deploy-mode cluster \
$hdfsFile/my-project-assembly-1.0.jar $inPath $outPath $minPartitionNum
参考: sbt-assembly, How to build an Uber JAR (Fat JAR) using SBT within IntelliJ IDEA?