最新版spark1.1.0集群安装配置

2023-04-25,,

和分布式文件系统和NoSQL数据库相比而言,spark集群的安装配置还算是比较简单的:

很多教程提到要安装java和scala,但我发现spark最新版本是包含scala的,JRE采用linux内嵌的版本也是可以的!

    在主节点(bluejoe0)上安装spark1.1.0:

    wget http://mirror.bit.edu.cn/apache/spark/spark-1.1.0/spark-1.1.0-bin-hadoop2.3.tgz

    tar -zxvf spark-1.1.0-bin-hadoop2.3.tgz

    ln -s spark-1.1.0-bin-hadoop2.3 spark
    启动spark-shell:

    cd /usr/local/spark/bin

    ./spark-shell

    可以看到spark已经自带了scala 2.10:

    输入测试程序:

    scala> val data = Array(1, 2, 3, 4, 5)

    data: Array[Int] = Array(1, 2, 3, 4, 5)

    scala> val distData = sc.parallelize(data)

    distData: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at <console>:14

    scala> distData.reduce(_+_)
    可以观察4040端口:

    也可以测试PI的计算:

    ./bin/run-example SparkPi

    14/11/23 16:08:25 INFO SparkContext: Job finished: reduce at SparkPi.scala:35, took 1.008332384 s

    Pi is roughly 3.1403
    也可以采用spark-submit来提交任务:

    ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master local[6] /usr/local/spark/lib/spark-examples-1.1.0-hadoop2.3.0.jar 1000

    14/11/23 16:07:30 INFO SparkContext: Job finished: reduce at SparkPi.scala:35, took 46.220537186 s

    Pi is roughly 3.14172056
    现在安装几个从节点,scp spark.tgz文件到其它节点,如:bluejoe4,bluejoe5,bluejoe9
    注意设置好ssh无密码登录;
    修改conf/slaves

    # A Spark Worker will be started on each of the machines listed below.

    bluejoe4

    bluejoe5

    bluejoe9
    在bluejoe0上启动spark集群:

    ./sbin/start-all.sh

    此时可以在浏览器上观察到3个从节点的情况:

    再测试在集群上计算PI的程序:

    ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://bluejoe0:7077 /usr/local/spark/lib/spark-examples-1.1.0-hadoop2.3.0.jar 1000

    14/11/23 16:05:00 INFO SparkContext: Job finished: reduce at SparkPi.scala:35, took 26.322514766 s

    Pi is roughly 3.14159516

    此时观察浏览器的显示:

最新版spark1.1.0集群安装配置的相关教程结束。

《最新版spark1.1.0集群安装配置.doc》

下载本文的Word格式文档,以方便收藏与打印。