热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

Spark1.2StandaloneMode单机安装

1:spark下载,解压[jifeng@jifeng01hadoop]$wgethttp:d3kbcqa49mib13.cloudfront.netspark-1.2.0

1:spark下载,解压

[jifeng@jifeng01 hadoop]$ wget http://d3kbcqa49mib13.cloudfront.net/spark-1.2.0-bin-hadoop1.tgz
--2015-02-03 21:50:25--  http://d3kbcqa49mib13.cloudfront.net/spark-1.2.0-bin-hadoop1.tgz
正在解析主机 d3kbcqa49mib13.cloudfront.net... 54.192.150.88, 54.230.149.156, 54.230.150.31, ...
Connecting to d3kbcqa49mib13.cloudfront.net|54.192.150.88|:80... 已连接。
已发出 HTTP 请求,正在等待回应... 200 OK
长度:203977086 (195M) [application/x-compressed]
Saving to: `spark-1.2.0-bin-hadoop1.tgz'

100%[===========================================================================>] 203,977,086  410K/s   in 7m 55s  

2015-02-03 21:58:21 (419 KB/s) - `spark-1.2.0-bin-hadoop1.tgz' saved [203977086/203977086]

[jifeng@jifeng01 hadoop]$ tar zxf spark-1.2.0-bin-hadoop1.tgz 


2:scala下载

在jifeng04下载并复制到jifeng01

[jifeng@jifeng04 hadoop]$ wget http://downloads.typesafe.com/scala/2.11.4/scala-2.11.4.tgz?_ga=1.248348352.61371242.1418807768
--2015-02-03 21:55:35--  http://downloads.typesafe.com/scala/2.11.4/scala-2.11.4.tgz?_ga=1.248348352.61371242.1418807768
正在解析主机 downloads.typesafe.com... 54.230.151.79, 54.230.151.80, 54.230.151.162, ...
Connecting to downloads.typesafe.com|54.230.151.79|:80... 已连接。
已发出 HTTP 请求,正在等待回应... 200 OK
长度:26509669 (25M) [application/octet-stream]
Saving to: `scala-2.11.4.tgz?_ga=1.248348352.61371242.1418807768'

100%[===========================================================================>] 26,509,669   320K/s   in 71s     

2015-02-03 21:56:48 (366 KB/s) - `scala-2.11.4.tgz?_ga=1.248348352.61371242.1418807768' saved [26509669/26509669]

[jifeng@jifeng04 hadoop]$ ls
hadoop-1.2.1  scala-2.11.4.tgz?_ga=1.248348352.61371242.1418807768  tmp
[jifeng@jifeng04 hadoop]$ mv scala-2.11.4.tgz\?_ga\=1.248348352.61371242.1418807768 scala-2.11.4.tgz
[jifeng@jifeng04 hadoop]$ tar zxf scala-2.11.4.tgz 
[jifeng@jifeng04 hadoop]$ ls
hadoop-1.2.1  scala-2.11.4  scala-2.11.4.tgz  tmp
[jifeng@jifeng04 hadoop]$ scp ./scala-2.11.4.tgz jifeng@jifeng01.sohudo.com:/home/jifeng/hadoop
scala-2.11.4.tgz                                                                   100%   25MB  25.3MB/s   00:01    
[jifeng@jifeng04 hadoop]$ 

jifeng01 解压

[jifeng@jifeng01 hadoop]$ tar zxf scala-2.11.4.tgz

3:配置scala

配置scala环境变量

export SCALA_HOME=$HOME/hadoop/scala-2.11.4
export PATH=$PATH:$ANT_HOME/bin:$HBASE_HOME/bin:$SQOOP_HOME/bin:$HADOOP_HOME/bin:$MAHOUT_HOME/bin:$SCALA_HOME/bin

[jifeng@jifeng01 ~]$ vi ~/.bash_profile

# .bash_profile

# Get the aliases and functions
if [ -f ~/.bashrc ]; then
        . ~/.bashrc
fi

# User specific environment and startup programs

PATH=$PATH:$HOME/bin

export PATH
export JAVA_HOME=$HOME/jdk1.7.0_45
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export HADOOP_HOME=$HOME/hadoop/hadoop-1.2.1
export HADOOP_CONF_DIR=$HOME/hadoop/hadoop-1.2.1/conf
export ANT_HOME=$HOME/apache-ant-1.9.4
export HBASE_HOME=$HOME/hbase-0.94.21
export SQOOP_HOME=$HOME/sqoop-1.99.3-bin-hadoop100
export CATALINA_HOME=$SQOOP_HOME/server
export LOGDIR=$SQOOP_HOME/logs

export MAHOUT_HOME=$HOME/hadoop/mahout-distribution-0.9
export MAHOUT_CONF_DIR=$HOME/hadoop/mahout-distribution-0.9/conf

export SCALA_HOME=$HOME/hadoop/scala-2.11.4
export PATH=$PATH:$ANT_HOME/bin:$HBASE_HOME/bin:$SQOOP_HOME/bin:$HADOOP_HOME/bin:$MAHOUT_HOME/bin:$SCALA_HOME/bin
~                                                                                                                    
~                                                                                                                    
".bash_profile" 28L, 889C 已写入
[jifeng@jifeng01 ~]$ source ~/.bash_profile
source ~/.bash_profile 配置文件马上生效

查看版本

[jifeng@jifeng01 hadoop]$ scala -version
Scala code runner version 2.11.4 -- Copyright 2002-2013, LAMP/EPFL

[jifeng@jifeng01 hadoop]$ scala
Welcome to Scala version 2.11.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_45).
Type in expressions to have them evaluated.
Type :help for more information.

scala> 

4:配置spark环境变量

加入环境变量

export SPARK_HOME=$HOME/hadoop/spark-1.2.0-bin-hadoop1
export PATH=$PATH:$SPARK_HOME/bin

[jifeng@jifeng01 ~]$ vi ~/.bash_profile    

# .bash_profile

# Get the aliases and functions
if [ -f ~/.bashrc ]; then
        . ~/.bashrc
fi

# User specific environment and startup programs

PATH=$PATH:$HOME/bin

export PATH
export JAVA_HOME=$HOME/jdk1.7.0_45
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export HADOOP_HOME=$HOME/hadoop/hadoop-1.2.1
export HADOOP_CONF_DIR=$HOME/hadoop/hadoop-1.2.1/conf
export ANT_HOME=$HOME/apache-ant-1.9.4
export HBASE_HOME=$HOME/hbase-0.94.21
export SQOOP_HOME=$HOME/sqoop-1.99.3-bin-hadoop100
export CATALINA_HOME=$SQOOP_HOME/server
export LOGDIR=$SQOOP_HOME/logs

export MAHOUT_HOME=$HOME/hadoop/mahout-distribution-0.9
export MAHOUT_CONF_DIR=$HOME/hadoop/mahout-distribution-0.9/conf

export SCALA_HOME=$HOME/hadoop/scala-2.11.4
export SPARK_HOME=$HOME/hadoop/spark-1.2.0-bin-hadoop1
export PATH=$PATH:$ANT_HOME/bin:$HBASE_HOME/bin:$SQOOP_HOME/bin:$HADOOP_HOME/bin:$MAHOUT_HOME/bin:$SCALA_HOME/bin
export PATH=$PATH:$SPARK_HOME/bin
".bash_profile" 30L, 978C 已写入                                                                   
[jifeng@jifeng01 ~]$ source ~/.bash_profile

5:配置spark

复制配置文件 cp spark-env.sh.template spark-env.sh

[jifeng@jifeng01 ~]$ cd hadoop
[jifeng@jifeng01 hadoop]$ cd spark-1.2.0-bin-hadoop1
[jifeng@jifeng01 spark-1.2.0-bin-hadoop1]$ ls
bin  conf  data  ec2  examples  lib  LICENSE  NOTICE  python  README.md  RELEASE  sbin
[jifeng@jifeng01 spark-1.2.0-bin-hadoop1]$ cd conf
[jifeng@jifeng01 conf]$ ls
fairscheduler.xml.template  metrics.properties.template  spark-defaults.conf.template
log4j.properties.template   slaves.template              spark-env.sh.template
[jifeng@jifeng01 conf]$ cp spark-env.sh.template spark-env.sh

在spark-env.sh最后添加下面

export SCALA_HOME=/home/jifeng/hadoop/scala-2.11.4
export SPARK_MASTER_IP=jifeng01.sohudo.com
export SPARK_WORKER_MEMORY=2G
export JAVA_HOME=/home/jifeng/jdk1.7.0_45

# Generic options for the daemons used in the standalone deploy mode
# - SPARK_CONF_DIR      Alternate conf dir. (Default: ${SPARK_HOME}/conf)
# - SPARK_LOG_DIR       Where log files are stored.  (Default: ${SPARK_HOME}/logs)
# - SPARK_PID_DIR       Where the pid file is stored. (Default: /tmp)
# - SPARK_IDENT_STRING  A string representing this instance of spark. (Default: $USER)
# - SPARK_NICENESS      The scheduling priority for daemons. (Default: 0)
export SCALA_HOME=/home/jifeng/hadoop/scala-2.11.4
export SPARK_MASTER_IP=jifeng01.sohudo.com
export SPARK_WORKER_MEMORY=2G
export JAVA_HOME=/home/jifeng/jdk1.7.0_45
"spark-env.sh" 54L, 3378C 已写入                                                                   
[jifeng@jifeng01 conf]$ cd ..

6:启动

[jifeng@jifeng01 sbin]$ cd ..
[jifeng@jifeng01 spark-1.2.0-bin-hadoop1]$ sbin/start-master.sh
starting org.apache.spark.deploy.master.Master, logging to /home/jifeng/hadoop/spark-1.2.0-bin-hadoop1/sbin/../logs/spark-jifeng-org.apache.spark.deploy.master.Master-1-jifeng01.sohudo.com.out


通过 http://jifeng01:8080/ 看到对应界面

7:启动worker

sbin/start-slaves.sh park://jifeng01:7077

[jifeng@jifeng01 spark-1.2.0-bin-hadoop1]$ sbin/start-slaves.sh park://jifeng01:7077
localhost: Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
localhost: starting org.apache.spark.deploy.worker.Worker, logging to /home/jifeng/hadoop/spark-1.2.0-bin-hadoop1/sbin/../logs/spark-jifeng-org.apache.spark.deploy.worker.Worker-1-jifeng01.sohudo.com.out

[jifeng@jifeng01 spark-1.2.0-bin-hadoop1]$ sbin/start-slaves.sh park://jifeng01.sohudo.com:7077
localhost: org.apache.spark.deploy.worker.Worker running as process 10273. Stop it first.
[jifeng@jifeng01 spark-1.2.0-bin-hadoop1]$ 

看界面的变化


8:测试

进入交互模式

[jifeng@jifeng01 spark-1.2.0-bin-hadoop1]$ master=spark://jifeng01.sohudo.com:7077 ./bin/spark-shell
Spark assembly has been built with Hive, including Datanucleus jars on classpath
15/02/03 23:05:32 INFO spark.SecurityManager: Changing view acls to: jifeng
15/02/03 23:05:32 INFO spark.SecurityManager: Changing modify acls to: jifeng
15/02/03 23:05:32 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(jifeng); users with modify permissions: Set(jifeng)
15/02/03 23:05:32 INFO spark.HttpServer: Starting HTTP Server
15/02/03 23:05:32 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/02/03 23:05:32 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:46617
15/02/03 23:05:32 INFO util.Utils: Successfully started service 'HTTP class server' on port 46617.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.2.0
      /_/

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_45)
Type in expressions to have them evaluated.
Type :help for more information.
15/02/03 23:05:36 INFO spark.SecurityManager: Changing view acls to: jifeng
15/02/03 23:05:36 INFO spark.SecurityManager: Changing modify acls to: jifeng
15/02/03 23:05:36 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(jifeng); users with modify permissions: Set(jifeng)
15/02/03 23:05:37 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/02/03 23:05:37 INFO Remoting: Starting remoting
15/02/03 23:05:37 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@jifeng01.sohudo.com:41533]
15/02/03 23:05:37 INFO util.Utils: Successfully started service 'sparkDriver' on port 41533.
15/02/03 23:05:37 INFO spark.SparkEnv: Registering MapOutputTracker
15/02/03 23:05:37 INFO spark.SparkEnv: Registering BlockManagerMaster
15/02/03 23:05:37 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local-20150203230537-f1cb
15/02/03 23:05:37 INFO storage.MemoryStore: MemoryStore started with capacity 265.4 MB
15/02/03 23:05:37 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-0e198e02-fea8-49df-a1ab-8dd08e988cab
15/02/03 23:05:37 INFO spark.HttpServer: Starting HTTP Server
15/02/03 23:05:37 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/02/03 23:05:37 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:52156
15/02/03 23:05:37 INFO util.Utils: Successfully started service 'HTTP file server' on port 52156.
15/02/03 23:05:38 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/02/03 23:05:38 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
15/02/03 23:05:38 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
15/02/03 23:05:38 INFO ui.SparkUI: Started SparkUI at http://jifeng01.sohudo.com:4040
15/02/03 23:05:38 INFO executor.Executor: Using REPL class URI: http://10.5.4.54:46617
15/02/03 23:05:38 INFO util.AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@jifeng01.sohudo.com:41533/user/HeartbeatReceiver
15/02/03 23:05:38 INFO netty.NettyBlockTransferService: Server created on 39115
15/02/03 23:05:38 INFO storage.BlockManagerMaster: Trying to register BlockManager
15/02/03 23:05:38 INFO storage.BlockManagerMasterActor: Registering block manager localhost:39115 with 265.4 MB RAM, BlockManagerId(, localhost, 39115)
15/02/03 23:05:38 INFO storage.BlockManagerMaster: Registered BlockManager
15/02/03 23:05:38 INFO repl.SparkILoop: Created spark context..
Spark context available as sc.

scala> 

测试

输入命令:

val file=sc.textFile("hdfs://jifeng01.sohudo.com:9000/user/jifeng/in/test1.txt")
val count=file.flatMap(line=>line.split(" ")).map(word=>(word,1)).reduceByKey(_+_)
count.collect()

scala> val file=sc.textFile("hdfs://jifeng01.sohudo.com:9000/user/jifeng/in/test1.txt")
15/02/03 23:07:25 INFO storage.MemoryStore: ensureFreeSpace(32768) called with curMem=0, maxMem=278302556
15/02/03 23:07:25 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 32.0 KB, free 265.4 MB)
15/02/03 23:07:25 INFO storage.MemoryStore: ensureFreeSpace(5070) called with curMem=32768, maxMem=278302556
15/02/03 23:07:25 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 5.0 KB, free 265.4 MB)
15/02/03 23:07:25 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:39115 (size: 5.0 KB, free: 265.4 MB)
15/02/03 23:07:25 INFO storage.BlockManagerMaster: Updated info of block broadcast_0_piece0
15/02/03 23:07:25 INFO spark.SparkContext: Created broadcast 0 from textFile at :12
file: org.apache.spark.rdd.RDD[String] = hdfs://jifeng01.sohudo.com:9000/user/jifeng/in/test1.txt MappedRDD[1] at textFile at :12

scala> val count=file.flatMap(line=>line.split(" ")).map(word=>(word,1)).reduceByKey(_+_)
15/02/03 23:07:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/02/03 23:07:38 WARN snappy.LoadSnappy: Snappy native library not loaded
15/02/03 23:07:38 INFO mapred.FileInputFormat: Total input paths to process : 1
count: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[4] at reduceByKey at :14

scala> count.collect()
15/02/03 23:07:44 INFO spark.SparkContext: Starting job: collect at :17
15/02/03 23:07:44 INFO scheduler.DAGScheduler: Registering RDD 3 (map at :14)
15/02/03 23:07:44 INFO scheduler.DAGScheduler: Got job 0 (collect at :17) with 3 output partitions (allowLocal=false)
15/02/03 23:07:44 INFO scheduler.DAGScheduler: Final stage: Stage 1(collect at :17)
15/02/03 23:07:44 INFO scheduler.DAGScheduler: Parents of final stage: List(Stage 0)
15/02/03 23:07:44 INFO scheduler.DAGScheduler: Missing parents: List(Stage 0)
15/02/03 23:07:44 INFO scheduler.DAGScheduler: Submitting Stage 0 (MappedRDD[3] at map at :14), which has no missing parents
15/02/03 23:07:44 INFO storage.MemoryStore: ensureFreeSpace(3584) called with curMem=37838, maxMem=278302556
15/02/03 23:07:44 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.5 KB, free 265.4 MB)
15/02/03 23:07:44 INFO storage.MemoryStore: ensureFreeSpace(2543) called with curMem=41422, maxMem=278302556
15/02/03 23:07:44 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.5 KB, free 265.4 MB)
15/02/03 23:07:44 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:39115 (size: 2.5 KB, free: 265.4 MB)
15/02/03 23:07:44 INFO storage.BlockManagerMaster: Updated info of block broadcast_1_piece0
15/02/03 23:07:44 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:838
15/02/03 23:07:44 INFO scheduler.DAGScheduler: Submitting 3 missing tasks from Stage 0 (MappedRDD[3] at map at :14)
15/02/03 23:07:44 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 3 tasks
15/02/03 23:07:44 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, ANY, 1310 bytes)
15/02/03 23:07:44 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, ANY, 1310 bytes)
15/02/03 23:07:44 INFO scheduler.TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, localhost, ANY, 1310 bytes)
15/02/03 23:07:44 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)
15/02/03 23:07:44 INFO executor.Executor: Running task 2.0 in stage 0.0 (TID 2)
15/02/03 23:07:44 INFO executor.Executor: Running task 1.0 in stage 0.0 (TID 1)
15/02/03 23:07:44 INFO rdd.HadoopRDD: Input split: hdfs://jifeng01.sohudo.com:9000/user/jifeng/in/test1.txt:6+6
15/02/03 23:07:44 INFO rdd.HadoopRDD: Input split: hdfs://jifeng01.sohudo.com:9000/user/jifeng/in/test1.txt:0+6
15/02/03 23:07:44 INFO rdd.HadoopRDD: Input split: hdfs://jifeng01.sohudo.com:9000/user/jifeng/in/test1.txt:12+1
15/02/03 23:07:44 INFO executor.Executor: Finished task 2.0 in stage 0.0 (TID 2). 1897 bytes result sent to driver
15/02/03 23:07:44 INFO executor.Executor: Finished task 1.0 in stage 0.0 (TID 1). 1897 bytes result sent to driver
15/02/03 23:07:44 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 1897 bytes result sent to driver
15/02/03 23:07:44 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 164 ms on localhost (1/3)
15/02/03 23:07:44 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 160 ms on localhost (2/3)
15/02/03 23:07:44 INFO scheduler.TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 160 ms on localhost (3/3)
15/02/03 23:07:44 INFO scheduler.DAGScheduler: Stage 0 (map at :14) finished in 0.185 s
15/02/03 23:07:44 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
15/02/03 23:07:44 INFO scheduler.DAGScheduler: looking for newly runnable stages
15/02/03 23:07:44 INFO scheduler.DAGScheduler: running: Set()
15/02/03 23:07:44 INFO scheduler.DAGScheduler: waiting: Set(Stage 1)
15/02/03 23:07:44 INFO scheduler.DAGScheduler: failed: Set()
15/02/03 23:07:44 INFO scheduler.DAGScheduler: Missing parents for Stage 1: List()
15/02/03 23:07:44 INFO scheduler.DAGScheduler: Submitting Stage 1 (ShuffledRDD[4] at reduceByKey at :14), which is now runnable
15/02/03 23:07:44 INFO storage.MemoryStore: ensureFreeSpace(2112) called with curMem=43965, maxMem=278302556
15/02/03 23:07:44 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 2.1 KB, free 265.4 MB)
15/02/03 23:07:44 INFO storage.MemoryStore: ensureFreeSpace(1546) called with curMem=46077, maxMem=278302556
15/02/03 23:07:44 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 1546.0 B, free 265.4 MB)
15/02/03 23:07:44 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:39115 (size: 1546.0 B, free: 265.4 MB)
15/02/03 23:07:44 INFO storage.BlockManagerMaster: Updated info of block broadcast_2_piece0
15/02/03 23:07:44 INFO spark.SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:838
15/02/03 23:07:44 INFO scheduler.DAGScheduler: Submitting 3 missing tasks from Stage 1 (ShuffledRDD[4] at reduceByKey at :14)
15/02/03 23:07:44 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 3 tasks
15/02/03 23:07:44 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 3, localhost, PROCESS_LOCAL, 1056 bytes)
15/02/03 23:07:44 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.0 (TID 4, localhost, PROCESS_LOCAL, 1056 bytes)
15/02/03 23:07:44 INFO scheduler.TaskSetManager: Starting task 2.0 in stage 1.0 (TID 5, localhost, PROCESS_LOCAL, 1056 bytes)
15/02/03 23:07:44 INFO executor.Executor: Running task 0.0 in stage 1.0 (TID 3)
15/02/03 23:07:44 INFO executor.Executor: Running task 2.0 in stage 1.0 (TID 5)
15/02/03 23:07:44 INFO executor.Executor: Running task 1.0 in stage 1.0 (TID 4)
15/02/03 23:07:44 INFO storage.ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 3 blocks
15/02/03 23:07:44 INFO storage.ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 3 blocks
15/02/03 23:07:44 INFO storage.ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 3 blocks
15/02/03 23:07:44 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 8 ms
15/02/03 23:07:44 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 7 ms
15/02/03 23:07:44 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 8 ms
15/02/03 23:07:44 INFO executor.Executor: Finished task 0.0 in stage 1.0 (TID 3). 820 bytes result sent to driver
15/02/03 23:07:44 INFO executor.Executor: Finished task 2.0 in stage 1.0 (TID 5). 820 bytes result sent to driver
15/02/03 23:07:44 INFO executor.Executor: Finished task 1.0 in stage 1.0 (TID 4). 990 bytes result sent to driver
15/02/03 23:07:44 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 3) in 44 ms on localhost (1/3)
15/02/03 23:07:44 INFO scheduler.TaskSetManager: Finished task 2.0 in stage 1.0 (TID 5) in 44 ms on localhost (2/3)
15/02/03 23:07:44 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 1.0 (TID 4) in 45 ms on localhost (3/3)
15/02/03 23:07:44 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
15/02/03 23:07:44 INFO scheduler.DAGScheduler: Stage 1 (collect at :17) finished in 0.048 s
15/02/03 23:07:44 INFO scheduler.DAGScheduler: Job 0 finished: collect at :17, took 0.366077 s
res0: Array[(String, Int)] = Array((hadoop,1), (hello,1))

scala> 

看到控制台有如下结果:res0: Array[(String, Int)] = Array((hadoop,1), (hello,1))

同时也可以将结果保存到HDFS上


scala>count.saveAsTextFile("hdfs://jifeng01.sohudo.com:9000/user/jifeng/out/test1.txt")


9:停止

./sbin/stop-master.sh

[jifeng@jifeng01 spark-1.2.0-bin-hadoop1]$ ./sbin/stop-master.sh 
stopping org.apache.spark.deploy.master.Master



推荐阅读
  • 2018深入java目标计划及学习内容
    本文介绍了作者在2018年的深入java目标计划,包括学习计划和工作中要用到的内容。作者计划学习的内容包括kafka、zookeeper、hbase、hdoop、spark、elasticsearch、solr、spring cloud、mysql、mybatis等。其中,作者对jvm的学习有一定了解,并计划通读《jvm》一书。此外,作者还提到了《HotSpot实战》和《高性能MySQL》等书籍。 ... [详细]
  • GetWindowLong函数
    今天在看一个代码里头写了GetWindowLong(hwnd,0),我当时就有点费解,靠,上网搜索函数原型说明,死活找不到第 ... [详细]
  • Spring源码解密之默认标签的解析方式分析
    本文分析了Spring源码解密中默认标签的解析方式。通过对命名空间的判断,区分默认命名空间和自定义命名空间,并采用不同的解析方式。其中,bean标签的解析最为复杂和重要。 ... [详细]
  • 本文介绍了在Win10上安装WinPythonHadoop的详细步骤,包括安装Python环境、安装JDK8、安装pyspark、安装Hadoop和Spark、设置环境变量、下载winutils.exe等。同时提醒注意Hadoop版本与pyspark版本的一致性,并建议重启电脑以确保安装成功。 ... [详细]
  • 解决nginx启动报错epoll_wait() reported that client prematurely closed connection的方法
    本文介绍了解决nginx启动报错epoll_wait() reported that client prematurely closed connection的方法,包括检查location配置是否正确、pass_proxy是否需要加“/”等。同时,还介绍了修改nginx的error.log日志级别为debug,以便查看详细日志信息。 ... [详细]
  • 本文介绍了Swing组件的用法,重点讲解了图标接口的定义和创建方法。图标接口用来将图标与各种组件相关联,可以是简单的绘画或使用磁盘上的GIF格式图像。文章详细介绍了图标接口的属性和绘制方法,并给出了一个菱形图标的实现示例。该示例可以配置图标的尺寸、颜色和填充状态。 ... [详细]
  • 纠正网上的错误:自定义一个类叫java.lang.System/String的方法
    本文纠正了网上关于自定义一个类叫java.lang.System/String的错误答案,并详细解释了为什么这种方法是错误的。作者指出,虽然双亲委托机制确实可以阻止自定义的System类被加载,但通过自定义一个特殊的类加载器,可以绕过双亲委托机制,达到自定义System类的目的。作者呼吁读者对网上的内容持怀疑态度,并带着问题来阅读文章。 ... [详细]
  • 基于Socket的多个客户端之间的聊天功能实现方法
    本文介绍了基于Socket的多个客户端之间实现聊天功能的方法,包括服务器端的实现和客户端的实现。服务器端通过每个用户的输出流向特定用户发送消息,而客户端通过输入流接收消息。同时,还介绍了相关的实体类和Socket的基本概念。 ... [详细]
  • 本文介绍了Android中的assets目录和raw目录的共同点和区别,包括获取资源的方法、目录结构的限制以及列出资源的能力。同时,还解释了raw目录中资源文件生成的ID,并说明了这些目录的使用方法。 ... [详细]
  • 本文介绍了使用Spark实现低配版高斯朴素贝叶斯模型的原因和原理。随着数据量的增大,单机上运行高斯朴素贝叶斯模型会变得很慢,因此考虑使用Spark来加速运行。然而,Spark的MLlib并没有实现高斯朴素贝叶斯模型,因此需要自己动手实现。文章还介绍了朴素贝叶斯的原理和公式,并对具有多个特征和类别的模型进行了讨论。最后,作者总结了实现低配版高斯朴素贝叶斯模型的步骤。 ... [详细]
  • 本文整理了Java中org.apache.hadoop.hive.ql.plan.ExprNodeColumnDesc.getTypeInfo()方法的一些代码示例,展 ... [详细]
  • 这个问题困扰了我两天,卸载Dr.COM客户端(我们学校上网要装这个客户端登陆服务器,以后只能在网页里输入用户名和密码了),问题解决了。问题的现象:在实验室机台式机上安装openfire和sp ... [详细]
  • java线条处理技术_Java使用GUI绘制线条的示例
    在Java的GUI编程中,如何使用GUI绘制线条?以下示例演示了如何使用Graphics2D类的Line2D对象的draw()方法作为参数来绘制一条线。 ... [详细]
  • 注:根据Qt小神童的视频教程改编概论:利用最新的Qt5.1.1在windows下开发的一个小的时钟程序,有指针与表盘。1.Qtforwindows开发环境最新的Qt已经集 ... [详细]
  • 7.4 基本输入源
    一、文件流1.在spark-shell中创建文件流进入spark-shell创建文件流。另外打开一个终端窗口,启动进入spark-shell上面在spark-shell中执行的程序 ... [详细]
author-avatar
个信2502875605
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有