1.实现功能
Spark应用运行在Standalone资源管理框架系统上,Standalone是spark自带的一种资源管理框架,类似yarn,分布式的。
2.Standalone的框架
Worker: 执行节点服务,管理当前节点的资源及启动executor
Master: 集群资源管理及申请
3.配置信息
(1)要求:spark的local本地模式可以成功运行,配置spark-env.sh
JAVA_HOME=/opt/jdk1.8.0_151
SCALA_HOME=/opt/modules/scala-2.11.8HADOOP_CONF_DIR=/opt/modules/apache/hadoop-2.7.3/etc/hadoop
SPARK_LOCAL_IP=bigdata.ibeifeng.com
(2)在spark-env.sh添加master和worker信息
(a)虚拟机
SPARK_MASTER_IP=bigdata.ibeifeng.com
SPARK_MASTER_PORT=7070
SPARK_MASTER_WEBUI_PORT=8080
SPARK_WORKER_CORES=2
SPARK_WORKER_MEMORY=2g
SPARK_WORKER_PORT=7071
SPARK_WORKER_WEBUI_PORT=8081
SPARK_WORKER_INSTANCES=2
(b)服务器配置
SPARK_MASTER_HOST=hadoop
SPARK_WORKER_CORES=2
SPARK_WORKER_MEMORY=2g
SPARK_WORKER_INSTANCES=1
(3)配置slaves文件
mv slaves.template slaves
添加
(a)虚拟机
# A Spark Worker will be started on each of the machines listed below.
bigdata.ibeifeng.com
(b)服务器
# A Spark Worker will be started on each of the machines listed below.
hadoop
(4)启动服务
sbin/start-all.sh
结果:
(a)服务器
starting org.apache.spark.deploy.master.Master, logging to /opt/modules/spark-2.1.0-bin-2.6.0-cdh5.7.0/logs/spark-root-org.apache.spark.deploy.master.Master-1-hadoop.out
hadoop: starting org.apache.spark.deploy.worker.Worker, logging to /opt/modules/spark-2.1.0-bin-2.6.0-cdh5.7.0/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-hadoop.out其中,master和worker分别记录在/opt/modules/spark-2.1.0-bin-2.6.0-cdh5.7.0/logs/spark-root-org.apache.spark.deploy.master.Master-1-hadoop.out和/opt/modules/spark-2.1.0-bin-2.6.0-cdh5.7.0/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-hadoop.out
4.测试
(1)启动spark-shell
(a)虚拟机
bin/spark-shell --master spark://bigdata.ibeifeng.com:7070
(b)服务器
bin/spark-shell --master spark://hadoop:7077
结果:
Spark context available as 'sc' (master = spark://hadoop:7077, app id = app-20190116000819-0001).
Spark session available as 'spark'.
Welcome to____ __/ __/__ ___ _____/ /___\ \/ _ \/ _ `/ __/ '_//___/ .__/\_,_/_/ /_/\_\ version 2.1.0/_/Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_151)
Type in expressions to have them evaluated.
Type :help for more information.
scala>
(2)测试topN
val lines = sc.textFile("/README.md") #这个是HDFS上的路径
val words = lines.flatMap(line => line.split(" "))
val words2 = words.map(word => (word,1))
val wordCountRDD= words2.reduceByKey(_ + _)
wordCountRDD.sortBy(t => -t._2).take(10)
(测试成功~)