热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

Spark1.2StandaloneMode单机安装

1:spark下载,解压[jifeng@jifeng01hadoop]$wgethttp:d3kbcqa49mib13.cloudfront.netspark-1.2.0

1:spark下载,解压

[jifeng@jifeng01 hadoop]$ wget http://d3kbcqa49mib13.cloudfront.net/spark-1.2.0-bin-hadoop1.tgz
--2015-02-03 21:50:25--  http://d3kbcqa49mib13.cloudfront.net/spark-1.2.0-bin-hadoop1.tgz
正在解析主机 d3kbcqa49mib13.cloudfront.net... 54.192.150.88, 54.230.149.156, 54.230.150.31, ...
Connecting to d3kbcqa49mib13.cloudfront.net|54.192.150.88|:80... 已连接。
已发出 HTTP 请求,正在等待回应... 200 OK
长度:203977086 (195M) [application/x-compressed]
Saving to: `spark-1.2.0-bin-hadoop1.tgz'

100%[===========================================================================>] 203,977,086  410K/s   in 7m 55s  

2015-02-03 21:58:21 (419 KB/s) - `spark-1.2.0-bin-hadoop1.tgz' saved [203977086/203977086]

[jifeng@jifeng01 hadoop]$ tar zxf spark-1.2.0-bin-hadoop1.tgz 


2:scala下载

在jifeng04下载并复制到jifeng01

[jifeng@jifeng04 hadoop]$ wget http://downloads.typesafe.com/scala/2.11.4/scala-2.11.4.tgz?_ga=1.248348352.61371242.1418807768
--2015-02-03 21:55:35--  http://downloads.typesafe.com/scala/2.11.4/scala-2.11.4.tgz?_ga=1.248348352.61371242.1418807768
正在解析主机 downloads.typesafe.com... 54.230.151.79, 54.230.151.80, 54.230.151.162, ...
Connecting to downloads.typesafe.com|54.230.151.79|:80... 已连接。
已发出 HTTP 请求,正在等待回应... 200 OK
长度:26509669 (25M) [application/octet-stream]
Saving to: `scala-2.11.4.tgz?_ga=1.248348352.61371242.1418807768'

100%[===========================================================================>] 26,509,669   320K/s   in 71s     

2015-02-03 21:56:48 (366 KB/s) - `scala-2.11.4.tgz?_ga=1.248348352.61371242.1418807768' saved [26509669/26509669]

[jifeng@jifeng04 hadoop]$ ls
hadoop-1.2.1  scala-2.11.4.tgz?_ga=1.248348352.61371242.1418807768  tmp
[jifeng@jifeng04 hadoop]$ mv scala-2.11.4.tgz\?_ga\=1.248348352.61371242.1418807768 scala-2.11.4.tgz
[jifeng@jifeng04 hadoop]$ tar zxf scala-2.11.4.tgz 
[jifeng@jifeng04 hadoop]$ ls
hadoop-1.2.1  scala-2.11.4  scala-2.11.4.tgz  tmp
[jifeng@jifeng04 hadoop]$ scp ./scala-2.11.4.tgz jifeng@jifeng01.sohudo.com:/home/jifeng/hadoop
scala-2.11.4.tgz                                                                   100%   25MB  25.3MB/s   00:01    
[jifeng@jifeng04 hadoop]$ 

jifeng01 解压

[jifeng@jifeng01 hadoop]$ tar zxf scala-2.11.4.tgz

3:配置scala

配置scala环境变量

export SCALA_HOME=$HOME/hadoop/scala-2.11.4
export PATH=$PATH:$ANT_HOME/bin:$HBASE_HOME/bin:$SQOOP_HOME/bin:$HADOOP_HOME/bin:$MAHOUT_HOME/bin:$SCALA_HOME/bin

[jifeng@jifeng01 ~]$ vi ~/.bash_profile

# .bash_profile

# Get the aliases and functions
if [ -f ~/.bashrc ]; then
        . ~/.bashrc
fi

# User specific environment and startup programs

PATH=$PATH:$HOME/bin

export PATH
export JAVA_HOME=$HOME/jdk1.7.0_45
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export HADOOP_HOME=$HOME/hadoop/hadoop-1.2.1
export HADOOP_CONF_DIR=$HOME/hadoop/hadoop-1.2.1/conf
export ANT_HOME=$HOME/apache-ant-1.9.4
export HBASE_HOME=$HOME/hbase-0.94.21
export SQOOP_HOME=$HOME/sqoop-1.99.3-bin-hadoop100
export CATALINA_HOME=$SQOOP_HOME/server
export LOGDIR=$SQOOP_HOME/logs

export MAHOUT_HOME=$HOME/hadoop/mahout-distribution-0.9
export MAHOUT_CONF_DIR=$HOME/hadoop/mahout-distribution-0.9/conf

export SCALA_HOME=$HOME/hadoop/scala-2.11.4
export PATH=$PATH:$ANT_HOME/bin:$HBASE_HOME/bin:$SQOOP_HOME/bin:$HADOOP_HOME/bin:$MAHOUT_HOME/bin:$SCALA_HOME/bin
~                                                                                                                    
~                                                                                                                    
".bash_profile" 28L, 889C 已写入
[jifeng@jifeng01 ~]$ source ~/.bash_profile
source ~/.bash_profile 配置文件马上生效

查看版本

[jifeng@jifeng01 hadoop]$ scala -version
Scala code runner version 2.11.4 -- Copyright 2002-2013, LAMP/EPFL

[jifeng@jifeng01 hadoop]$ scala
Welcome to Scala version 2.11.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_45).
Type in expressions to have them evaluated.
Type :help for more information.

scala> 

4:配置spark环境变量

加入环境变量

export SPARK_HOME=$HOME/hadoop/spark-1.2.0-bin-hadoop1
export PATH=$PATH:$SPARK_HOME/bin

[jifeng@jifeng01 ~]$ vi ~/.bash_profile    

# .bash_profile

# Get the aliases and functions
if [ -f ~/.bashrc ]; then
        . ~/.bashrc
fi

# User specific environment and startup programs

PATH=$PATH:$HOME/bin

export PATH
export JAVA_HOME=$HOME/jdk1.7.0_45
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export HADOOP_HOME=$HOME/hadoop/hadoop-1.2.1
export HADOOP_CONF_DIR=$HOME/hadoop/hadoop-1.2.1/conf
export ANT_HOME=$HOME/apache-ant-1.9.4
export HBASE_HOME=$HOME/hbase-0.94.21
export SQOOP_HOME=$HOME/sqoop-1.99.3-bin-hadoop100
export CATALINA_HOME=$SQOOP_HOME/server
export LOGDIR=$SQOOP_HOME/logs

export MAHOUT_HOME=$HOME/hadoop/mahout-distribution-0.9
export MAHOUT_CONF_DIR=$HOME/hadoop/mahout-distribution-0.9/conf

export SCALA_HOME=$HOME/hadoop/scala-2.11.4
export SPARK_HOME=$HOME/hadoop/spark-1.2.0-bin-hadoop1
export PATH=$PATH:$ANT_HOME/bin:$HBASE_HOME/bin:$SQOOP_HOME/bin:$HADOOP_HOME/bin:$MAHOUT_HOME/bin:$SCALA_HOME/bin
export PATH=$PATH:$SPARK_HOME/bin
".bash_profile" 30L, 978C 已写入                                                                   
[jifeng@jifeng01 ~]$ source ~/.bash_profile

5:配置spark

复制配置文件 cp spark-env.sh.template spark-env.sh

[jifeng@jifeng01 ~]$ cd hadoop
[jifeng@jifeng01 hadoop]$ cd spark-1.2.0-bin-hadoop1
[jifeng@jifeng01 spark-1.2.0-bin-hadoop1]$ ls
bin  conf  data  ec2  examples  lib  LICENSE  NOTICE  python  README.md  RELEASE  sbin
[jifeng@jifeng01 spark-1.2.0-bin-hadoop1]$ cd conf
[jifeng@jifeng01 conf]$ ls
fairscheduler.xml.template  metrics.properties.template  spark-defaults.conf.template
log4j.properties.template   slaves.template              spark-env.sh.template
[jifeng@jifeng01 conf]$ cp spark-env.sh.template spark-env.sh

在spark-env.sh最后添加下面

export SCALA_HOME=/home/jifeng/hadoop/scala-2.11.4
export SPARK_MASTER_IP=jifeng01.sohudo.com
export SPARK_WORKER_MEMORY=2G
export JAVA_HOME=/home/jifeng/jdk1.7.0_45

# Generic options for the daemons used in the standalone deploy mode
# - SPARK_CONF_DIR      Alternate conf dir. (Default: ${SPARK_HOME}/conf)
# - SPARK_LOG_DIR       Where log files are stored.  (Default: ${SPARK_HOME}/logs)
# - SPARK_PID_DIR       Where the pid file is stored. (Default: /tmp)
# - SPARK_IDENT_STRING  A string representing this instance of spark. (Default: $USER)
# - SPARK_NICENESS      The scheduling priority for daemons. (Default: 0)
export SCALA_HOME=/home/jifeng/hadoop/scala-2.11.4
export SPARK_MASTER_IP=jifeng01.sohudo.com
export SPARK_WORKER_MEMORY=2G
export JAVA_HOME=/home/jifeng/jdk1.7.0_45
"spark-env.sh" 54L, 3378C 已写入                                                                   
[jifeng@jifeng01 conf]$ cd ..

6:启动

[jifeng@jifeng01 sbin]$ cd ..
[jifeng@jifeng01 spark-1.2.0-bin-hadoop1]$ sbin/start-master.sh
starting org.apache.spark.deploy.master.Master, logging to /home/jifeng/hadoop/spark-1.2.0-bin-hadoop1/sbin/../logs/spark-jifeng-org.apache.spark.deploy.master.Master-1-jifeng01.sohudo.com.out


通过 http://jifeng01:8080/ 看到对应界面

7:启动worker

sbin/start-slaves.sh park://jifeng01:7077

[jifeng@jifeng01 spark-1.2.0-bin-hadoop1]$ sbin/start-slaves.sh park://jifeng01:7077
localhost: Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
localhost: starting org.apache.spark.deploy.worker.Worker, logging to /home/jifeng/hadoop/spark-1.2.0-bin-hadoop1/sbin/../logs/spark-jifeng-org.apache.spark.deploy.worker.Worker-1-jifeng01.sohudo.com.out

[jifeng@jifeng01 spark-1.2.0-bin-hadoop1]$ sbin/start-slaves.sh park://jifeng01.sohudo.com:7077
localhost: org.apache.spark.deploy.worker.Worker running as process 10273. Stop it first.
[jifeng@jifeng01 spark-1.2.0-bin-hadoop1]$ 

看界面的变化


8:测试

进入交互模式

[jifeng@jifeng01 spark-1.2.0-bin-hadoop1]$ master=spark://jifeng01.sohudo.com:7077 ./bin/spark-shell
Spark assembly has been built with Hive, including Datanucleus jars on classpath
15/02/03 23:05:32 INFO spark.SecurityManager: Changing view acls to: jifeng
15/02/03 23:05:32 INFO spark.SecurityManager: Changing modify acls to: jifeng
15/02/03 23:05:32 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(jifeng); users with modify permissions: Set(jifeng)
15/02/03 23:05:32 INFO spark.HttpServer: Starting HTTP Server
15/02/03 23:05:32 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/02/03 23:05:32 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:46617
15/02/03 23:05:32 INFO util.Utils: Successfully started service 'HTTP class server' on port 46617.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.2.0
      /_/

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_45)
Type in expressions to have them evaluated.
Type :help for more information.
15/02/03 23:05:36 INFO spark.SecurityManager: Changing view acls to: jifeng
15/02/03 23:05:36 INFO spark.SecurityManager: Changing modify acls to: jifeng
15/02/03 23:05:36 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(jifeng); users with modify permissions: Set(jifeng)
15/02/03 23:05:37 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/02/03 23:05:37 INFO Remoting: Starting remoting
15/02/03 23:05:37 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@jifeng01.sohudo.com:41533]
15/02/03 23:05:37 INFO util.Utils: Successfully started service 'sparkDriver' on port 41533.
15/02/03 23:05:37 INFO spark.SparkEnv: Registering MapOutputTracker
15/02/03 23:05:37 INFO spark.SparkEnv: Registering BlockManagerMaster
15/02/03 23:05:37 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local-20150203230537-f1cb
15/02/03 23:05:37 INFO storage.MemoryStore: MemoryStore started with capacity 265.4 MB
15/02/03 23:05:37 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-0e198e02-fea8-49df-a1ab-8dd08e988cab
15/02/03 23:05:37 INFO spark.HttpServer: Starting HTTP Server
15/02/03 23:05:37 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/02/03 23:05:37 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:52156
15/02/03 23:05:37 INFO util.Utils: Successfully started service 'HTTP file server' on port 52156.
15/02/03 23:05:38 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/02/03 23:05:38 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
15/02/03 23:05:38 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
15/02/03 23:05:38 INFO ui.SparkUI: Started SparkUI at http://jifeng01.sohudo.com:4040
15/02/03 23:05:38 INFO executor.Executor: Using REPL class URI: http://10.5.4.54:46617
15/02/03 23:05:38 INFO util.AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@jifeng01.sohudo.com:41533/user/HeartbeatReceiver
15/02/03 23:05:38 INFO netty.NettyBlockTransferService: Server created on 39115
15/02/03 23:05:38 INFO storage.BlockManagerMaster: Trying to register BlockManager
15/02/03 23:05:38 INFO storage.BlockManagerMasterActor: Registering block manager localhost:39115 with 265.4 MB RAM, BlockManagerId(, localhost, 39115)
15/02/03 23:05:38 INFO storage.BlockManagerMaster: Registered BlockManager
15/02/03 23:05:38 INFO repl.SparkILoop: Created spark context..
Spark context available as sc.

scala> 

测试

输入命令:

val file=sc.textFile("hdfs://jifeng01.sohudo.com:9000/user/jifeng/in/test1.txt")
val count=file.flatMap(line=>line.split(" ")).map(word=>(word,1)).reduceByKey(_+_)
count.collect()

scala> val file=sc.textFile("hdfs://jifeng01.sohudo.com:9000/user/jifeng/in/test1.txt")
15/02/03 23:07:25 INFO storage.MemoryStore: ensureFreeSpace(32768) called with curMem=0, maxMem=278302556
15/02/03 23:07:25 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 32.0 KB, free 265.4 MB)
15/02/03 23:07:25 INFO storage.MemoryStore: ensureFreeSpace(5070) called with curMem=32768, maxMem=278302556
15/02/03 23:07:25 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 5.0 KB, free 265.4 MB)
15/02/03 23:07:25 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:39115 (size: 5.0 KB, free: 265.4 MB)
15/02/03 23:07:25 INFO storage.BlockManagerMaster: Updated info of block broadcast_0_piece0
15/02/03 23:07:25 INFO spark.SparkContext: Created broadcast 0 from textFile at :12
file: org.apache.spark.rdd.RDD[String] = hdfs://jifeng01.sohudo.com:9000/user/jifeng/in/test1.txt MappedRDD[1] at textFile at :12

scala> val count=file.flatMap(line=>line.split(" ")).map(word=>(word,1)).reduceByKey(_+_)
15/02/03 23:07:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/02/03 23:07:38 WARN snappy.LoadSnappy: Snappy native library not loaded
15/02/03 23:07:38 INFO mapred.FileInputFormat: Total input paths to process : 1
count: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[4] at reduceByKey at :14

scala> count.collect()
15/02/03 23:07:44 INFO spark.SparkContext: Starting job: collect at :17
15/02/03 23:07:44 INFO scheduler.DAGScheduler: Registering RDD 3 (map at :14)
15/02/03 23:07:44 INFO scheduler.DAGScheduler: Got job 0 (collect at :17) with 3 output partitions (allowLocal=false)
15/02/03 23:07:44 INFO scheduler.DAGScheduler: Final stage: Stage 1(collect at :17)
15/02/03 23:07:44 INFO scheduler.DAGScheduler: Parents of final stage: List(Stage 0)
15/02/03 23:07:44 INFO scheduler.DAGScheduler: Missing parents: List(Stage 0)
15/02/03 23:07:44 INFO scheduler.DAGScheduler: Submitting Stage 0 (MappedRDD[3] at map at :14), which has no missing parents
15/02/03 23:07:44 INFO storage.MemoryStore: ensureFreeSpace(3584) called with curMem=37838, maxMem=278302556
15/02/03 23:07:44 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.5 KB, free 265.4 MB)
15/02/03 23:07:44 INFO storage.MemoryStore: ensureFreeSpace(2543) called with curMem=41422, maxMem=278302556
15/02/03 23:07:44 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.5 KB, free 265.4 MB)
15/02/03 23:07:44 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:39115 (size: 2.5 KB, free: 265.4 MB)
15/02/03 23:07:44 INFO storage.BlockManagerMaster: Updated info of block broadcast_1_piece0
15/02/03 23:07:44 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:838
15/02/03 23:07:44 INFO scheduler.DAGScheduler: Submitting 3 missing tasks from Stage 0 (MappedRDD[3] at map at :14)
15/02/03 23:07:44 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 3 tasks
15/02/03 23:07:44 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, ANY, 1310 bytes)
15/02/03 23:07:44 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, ANY, 1310 bytes)
15/02/03 23:07:44 INFO scheduler.TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, localhost, ANY, 1310 bytes)
15/02/03 23:07:44 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)
15/02/03 23:07:44 INFO executor.Executor: Running task 2.0 in stage 0.0 (TID 2)
15/02/03 23:07:44 INFO executor.Executor: Running task 1.0 in stage 0.0 (TID 1)
15/02/03 23:07:44 INFO rdd.HadoopRDD: Input split: hdfs://jifeng01.sohudo.com:9000/user/jifeng/in/test1.txt:6+6
15/02/03 23:07:44 INFO rdd.HadoopRDD: Input split: hdfs://jifeng01.sohudo.com:9000/user/jifeng/in/test1.txt:0+6
15/02/03 23:07:44 INFO rdd.HadoopRDD: Input split: hdfs://jifeng01.sohudo.com:9000/user/jifeng/in/test1.txt:12+1
15/02/03 23:07:44 INFO executor.Executor: Finished task 2.0 in stage 0.0 (TID 2). 1897 bytes result sent to driver
15/02/03 23:07:44 INFO executor.Executor: Finished task 1.0 in stage 0.0 (TID 1). 1897 bytes result sent to driver
15/02/03 23:07:44 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 1897 bytes result sent to driver
15/02/03 23:07:44 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 164 ms on localhost (1/3)
15/02/03 23:07:44 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 160 ms on localhost (2/3)
15/02/03 23:07:44 INFO scheduler.TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 160 ms on localhost (3/3)
15/02/03 23:07:44 INFO scheduler.DAGScheduler: Stage 0 (map at :14) finished in 0.185 s
15/02/03 23:07:44 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
15/02/03 23:07:44 INFO scheduler.DAGScheduler: looking for newly runnable stages
15/02/03 23:07:44 INFO scheduler.DAGScheduler: running: Set()
15/02/03 23:07:44 INFO scheduler.DAGScheduler: waiting: Set(Stage 1)
15/02/03 23:07:44 INFO scheduler.DAGScheduler: failed: Set()
15/02/03 23:07:44 INFO scheduler.DAGScheduler: Missing parents for Stage 1: List()
15/02/03 23:07:44 INFO scheduler.DAGScheduler: Submitting Stage 1 (ShuffledRDD[4] at reduceByKey at :14), which is now runnable
15/02/03 23:07:44 INFO storage.MemoryStore: ensureFreeSpace(2112) called with curMem=43965, maxMem=278302556
15/02/03 23:07:44 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 2.1 KB, free 265.4 MB)
15/02/03 23:07:44 INFO storage.MemoryStore: ensureFreeSpace(1546) called with curMem=46077, maxMem=278302556
15/02/03 23:07:44 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 1546.0 B, free 265.4 MB)
15/02/03 23:07:44 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:39115 (size: 1546.0 B, free: 265.4 MB)
15/02/03 23:07:44 INFO storage.BlockManagerMaster: Updated info of block broadcast_2_piece0
15/02/03 23:07:44 INFO spark.SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:838
15/02/03 23:07:44 INFO scheduler.DAGScheduler: Submitting 3 missing tasks from Stage 1 (ShuffledRDD[4] at reduceByKey at :14)
15/02/03 23:07:44 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 3 tasks
15/02/03 23:07:44 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 3, localhost, PROCESS_LOCAL, 1056 bytes)
15/02/03 23:07:44 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.0 (TID 4, localhost, PROCESS_LOCAL, 1056 bytes)
15/02/03 23:07:44 INFO scheduler.TaskSetManager: Starting task 2.0 in stage 1.0 (TID 5, localhost, PROCESS_LOCAL, 1056 bytes)
15/02/03 23:07:44 INFO executor.Executor: Running task 0.0 in stage 1.0 (TID 3)
15/02/03 23:07:44 INFO executor.Executor: Running task 2.0 in stage 1.0 (TID 5)
15/02/03 23:07:44 INFO executor.Executor: Running task 1.0 in stage 1.0 (TID 4)
15/02/03 23:07:44 INFO storage.ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 3 blocks
15/02/03 23:07:44 INFO storage.ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 3 blocks
15/02/03 23:07:44 INFO storage.ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 3 blocks
15/02/03 23:07:44 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 8 ms
15/02/03 23:07:44 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 7 ms
15/02/03 23:07:44 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 8 ms
15/02/03 23:07:44 INFO executor.Executor: Finished task 0.0 in stage 1.0 (TID 3). 820 bytes result sent to driver
15/02/03 23:07:44 INFO executor.Executor: Finished task 2.0 in stage 1.0 (TID 5). 820 bytes result sent to driver
15/02/03 23:07:44 INFO executor.Executor: Finished task 1.0 in stage 1.0 (TID 4). 990 bytes result sent to driver
15/02/03 23:07:44 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 3) in 44 ms on localhost (1/3)
15/02/03 23:07:44 INFO scheduler.TaskSetManager: Finished task 2.0 in stage 1.0 (TID 5) in 44 ms on localhost (2/3)
15/02/03 23:07:44 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 1.0 (TID 4) in 45 ms on localhost (3/3)
15/02/03 23:07:44 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
15/02/03 23:07:44 INFO scheduler.DAGScheduler: Stage 1 (collect at :17) finished in 0.048 s
15/02/03 23:07:44 INFO scheduler.DAGScheduler: Job 0 finished: collect at :17, took 0.366077 s
res0: Array[(String, Int)] = Array((hadoop,1), (hello,1))

scala> 

看到控制台有如下结果:res0: Array[(String, Int)] = Array((hadoop,1), (hello,1))

同时也可以将结果保存到HDFS上


scala>count.saveAsTextFile("hdfs://jifeng01.sohudo.com:9000/user/jifeng/out/test1.txt")


9:停止

./sbin/stop-master.sh

[jifeng@jifeng01 spark-1.2.0-bin-hadoop1]$ ./sbin/stop-master.sh 
stopping org.apache.spark.deploy.master.Master



推荐阅读
  • Java Socket 关键参数详解与优化建议
    Java Socket 的 API 虽然被广泛使用,但其关键参数的用途却鲜为人知。本文详细解析了 Java Socket 中的重要参数,如 backlog 参数,它用于控制服务器等待连接请求的队列长度。此外,还探讨了其他参数如 SO_TIMEOUT、SO_REUSEADDR 等的配置方法及其对性能的影响,并提供了优化建议,帮助开发者提升网络通信的稳定性和效率。 ... [详细]
  • 优化后的标题:深入探讨网关安全:将微服务升级为OAuth2资源服务器的最佳实践
    本文深入探讨了如何将微服务升级为OAuth2资源服务器,以订单服务为例,详细介绍了在POM文件中添加 `spring-cloud-starter-oauth2` 依赖,并配置Spring Security以实现对微服务的保护。通过这一过程,不仅增强了系统的安全性,还提高了资源访问的可控性和灵活性。文章还讨论了最佳实践,包括如何配置OAuth2客户端和资源服务器,以及如何处理常见的安全问题和错误。 ... [详细]
  • Amoeba 通过优化 MySQL 的读写分离功能显著提升了数据库性能。作为一款基于 MySQL 协议的代理工具,Amoeba 能够高效地处理应用程序的请求,并根据预设的规则将 SQL 请求智能地分配到不同的数据库实例,从而实现负载均衡和高可用性。该方案不仅提高了系统的并发处理能力,还有效减少了主数据库的负担,确保了数据的一致性和可靠性。 ... [详细]
  • 在处理大规模数据数组时,优化分页组件对于提高页面加载速度和用户体验至关重要。本文探讨了如何通过高效的分页策略,减少数据渲染的负担,提升应用性能。具体方法包括懒加载、虚拟滚动和数据预取等技术,这些技术能够显著降低内存占用和提升响应速度。通过实际案例分析,展示了这些优化措施的有效性和可行性。 ... [详细]
  • 在 Android 开发中,`android:exported` 属性用于控制组件(如 Activity、Service、BroadcastReceiver 和 ContentProvider)是否可以被其他应用组件访问或与其交互。若将此属性设为 `true`,则允许外部应用调用或与之交互;反之,若设为 `false`,则仅限于同一应用内的组件进行访问。这一属性对于确保应用的安全性和隐私保护至关重要。 ... [详细]
  • 本文详细解析了 Android 系统启动过程中的核心文件 `init.c`,探讨了其在系统初始化阶段的关键作用。通过对 `init.c` 的源代码进行深入分析,揭示了其如何管理进程、解析配置文件以及执行系统启动脚本。此外,文章还介绍了 `init` 进程的生命周期及其与内核的交互方式,为开发者提供了深入了解 Android 启动机制的宝贵资料。 ... [详细]
  • Python 伦理黑客技术:深入探讨后门攻击(第三部分)
    在《Python 伦理黑客技术:深入探讨后门攻击(第三部分)》中,作者详细分析了后门攻击中的Socket问题。由于TCP协议基于流,难以确定消息批次的结束点,这给后门攻击的实现带来了挑战。为了解决这一问题,文章提出了一系列有效的技术方案,包括使用特定的分隔符和长度前缀,以确保数据包的准确传输和解析。这些方法不仅提高了攻击的隐蔽性和可靠性,还为安全研究人员提供了宝贵的参考。 ... [详细]
  • 在Android平台中,播放音频的采样率通常固定为44.1kHz,而录音的采样率则固定为8kHz。为了确保音频设备的正常工作,底层驱动必须预先设定这些固定的采样率。当上层应用提供的采样率与这些预设值不匹配时,需要通过重采样(resample)技术来调整采样率,以保证音频数据的正确处理和传输。本文将详细探讨FFMpeg在音频处理中的基础理论及重采样技术的应用。 ... [详细]
  • 在处理 XML 数据时,如果需要解析 `` 标签的内容,可以采用 Pull 解析方法。Pull 解析是一种高效的 XML 解析方式,适用于流式数据处理。具体实现中,可以通过 Java 的 `XmlPullParser` 或其他类似的库来逐步读取和解析 XML 文档中的 `` 元素。这样不仅能够提高解析效率,还能减少内存占用。本文将详细介绍如何使用 Pull 解析方法来提取 `` 标签的内容,并提供一个示例代码,帮助开发者快速解决问题。 ... [详细]
  • 本文探讨了如何利用Java代码获取当前本地操作系统中正在运行的进程列表及其详细信息。通过引入必要的包和类,开发者可以轻松地实现这一功能,为系统监控和管理提供有力支持。示例代码展示了具体实现方法,适用于需要了解系统进程状态的开发人员。 ... [详细]
  • 使用Maven JAR插件将单个或多个文件及其依赖项合并为一个可引用的JAR包
    本文介绍了如何利用Maven中的maven-assembly-plugin插件将单个或多个Java文件及其依赖项打包成一个可引用的JAR文件。首先,需要创建一个新的Maven项目,并将待打包的Java文件复制到该项目中。通过配置maven-assembly-plugin,可以实现将所有文件及其依赖项合并为一个独立的JAR包,方便在其他项目中引用和使用。此外,该方法还支持自定义装配描述符,以满足不同场景下的需求。 ... [详细]
  • 本文详细介绍了在 Android 7.1 系统中调整屏幕分辨率和默认音量设置的方法。针对系统默认音量过大的问题,提供了具体的步骤来降低系统、铃声、媒体和闹钟的默认音量,以提升用户体验。此外,还涵盖了如何通过系统设置或使用第三方工具来优化屏幕分辨率,确保设备显示效果更加清晰和流畅。 ... [详细]
  • 在多年使用Java 8进行新应用开发和现有应用迁移的过程中,我总结了一些非常实用的技术技巧。虽然我不赞同“最佳实践”这一术语,因为它可能暗示了通用的解决方案,但这些技巧在实际项目中确实能够显著提升开发效率和代码质量。本文将深入解析并探讨这四大高级技巧的具体应用,帮助开发者更好地利用Java 8的强大功能。 ... [详细]
  • 在本地环境中部署了两个不同版本的 Flink 集群,分别为 1.9.1 和 1.9.2。近期在尝试启动 1.9.1 版本的 Flink 任务时,遇到了 TaskExecutor 启动失败的问题。尽管 TaskManager 日志显示正常,但任务仍无法成功启动。经过详细分析,发现该问题是由 Kafka 版本不兼容引起的。通过调整 Kafka 客户端配置并升级相关依赖,最终成功解决了这一故障。 ... [详细]
  • 本文介绍了如何利用ObjectMapper实现JSON与JavaBean之间的高效转换。ObjectMapper是Jackson库的核心组件,能够便捷地将Java对象序列化为JSON格式,并支持从JSON、XML以及文件等多种数据源反序列化为Java对象。此外,还探讨了在实际应用中如何优化转换性能,以提升系统整体效率。 ... [详细]
author-avatar
个信2502875605
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有