Standalone spark cluster setup in AWS cloud

Here I discuss how the standalone Spark cluster is setup in AWS using EC2.

Let’s assume we are setting up a 3 node standalone cluster. The ip address of each node say : (m4.xlarge – $0.239 per Hour) (m4.large – $0.12 per Hour) (m4.large – $0.12 per Hour)

Each node has 100 GB EBS volume

Servers Info

Launch 3 servers with Centos 6.5 or 7 in AWS EC2 (US West – Oregon region)

Configure host name on each node and make sure each one of them is accessible to the  other using hostname

We will make one node as master node and other two as worker nodes

Master node has a security group and worker nodes have another security group (configure firewall rules in both the security groups to run spark cluster smoothly)

Master node requires following ports to be opened for public:

8080 – Spark Web UI

7077 – Worker nodes connect to Driver program (master)



Cluster Deployment



Configurations on each node

Install Java 7 on each server

Place a compiled version of Spark on each node on the cluster

Install Jetty server in Master node

Install Maven in Master node to compile Spark Java code

Use “Cluster Launch Scripts” to bring up spark cluster

Cassandra setup

As part of pilot project we can use same servers to make multi-node cluster setup and scale appropriately based on performance

Install Cassandra on all nodes

Configure Cassandra instance at master node will act as seed node and other two are normal nodes

Cassandra cluster view


