一、目的
在远程电脑的windows系统上,部署远程spark代码开发环境,从而提升效率。
二、环境
1.CDH5.15.2
2.scala2.11.8
三、实现步骤
1.新建scala项目
(1)依据模板建立maven项目
(2)输入项目关键名称
(3)选择本地maven仓库对应配置文件settings.xml
(4)填写项目名称,确认建立项目
(5)修改pom文件中scala对应版本为2.11.8
4.0.0sparktestsparktest1.0-SNAPSHOT20082.11.8scala-tools.orgScala-Tools Maven2 Repositoryhttp://scala-tools.org/repo-releasesscala-tools.orgScala-Tools Maven2 Repositoryhttp://scala-tools.org/repo-releasesorg.scala-langscala-library${scala.version}org.apache.sparkspark-core_${scala.binary}${spark.version}
org.apache.sparkspark-sql_${scala.binary}${spark.version}
org.apache.sparkspark-hive_2.11${spark.version}src/main/scalasrc/test/scalaorg.scala-toolsmaven-scala-plugincompiletestCompile${scala.version}-target:jvm-1.5org.apache.maven.pluginsmaven-eclipse-plugintruech.epfl.lamp.sdt.core.scalabuilderch.epfl.lamp.sdt.core.scalanatureorg.eclipse.jdt.launching.JRE_CONTAINERch.epfl.lamp.sdt.launching.SCALA_CONTAINERorg.scala-toolsmaven-scala-plugin${scala.version}
等待项目相关依赖导入
(6)配置环境变量HADOOP_HOME
否则会报错:Failed to locate the winutils binary in the hadoop binary path java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
报错原因:windows下运行spark程序报错:Failed to locate the winutils binary in the hadoop binary path
(7)添加环境变量Path中对应包含winutils.exe文件的位置
如下图
(8)远程调试,无法读取hdfs文件,参考
Hadoop远程调试删除文件报错:org.apache.hadoop.security.AccessControlException: Permission denied: user=
(9)重启idea后生效,可以运行spark程序
2.创建scala代码:简单wordcount代码
import org.apache.spark.SparkConf
import org.apache.spark.streaming.{Seconds, StreamingContext}/*** spark Streaming 处理socket数据** 使用nc测试nc -lk 6789*/
object NetworkWordCount {def main(args: Array[String]): Unit = {val sparkConf=new SparkConf().setMaster("local[2]").setAppName("NetworkWordCount")/**** 创建StreamingContext需要sparkConf和batch interval*/val ssc=new StreamingContext(sparkConf,Seconds(5))val lines=ssc.socketTextStream("bigdata.ibeifeng.com",6789)val result= lines.flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_)result.print()ssc.start()ssc.awaitTermination()}
}
3.从CM中下载配置文件hive-site.xml以及log4j.properties文件到resources文件夹下
四、测试
1.开启spark
2.开启nc端口
nc -lk 6789
3.运行代码即可~