Spark Cluster Modes

different modes are distinguished mainly by their Resource Manager and the nature of resource allocation

	Local	Standalone	YARN	Mesos
Definition	A single JVM contains Driver + Executors	4 different JVMS at least: Master, Worker, Driver and Executor	Driver + RM + Node manager + Container	Native to spark (*)
Mode	Client	Client	Cluster	Cluster
When to use?	Development / Debugging	When not plugged into Hadoop stack	when plugged into Hadoop stck
RA Method	Static	Static	Dynamic	Dynamic
RA policy	by threads	by cores	by vcores in containers	???
Cluster Manager	local (same JVM)	Spark Master	YARN RM	Mesos RM
RM Slave	Local	Worker	Node Manager	Node Manager
Executor	Local	Executor	YARN Container	Mesos Container
Job Scheduler (*)	FIFO	FIFO	FIFO/FAIR	FIFO/FAIR
Where does driver run	Local	Where spark-submit ran	Container/ApplicatoinMaster	(*)
Driver HA	None	None	yes	yes
master url format	local,local[*], local[N]	spark://<ip:port>	spark://<ip:port>	mesos://<url>
spark submit	--master local[*]	--master spark://ipport	--master spark://ipport	--master mesos://url

(*TODO: Confirm)

Spark Cores

is a misnomer. Spark Cores really means Java threads. So, if the machine has 10 available physical cpu cores, using only 10 spark cores severely under utilizes compute - since we are not utlizing the OS multithreading effectively.

Over subscription of Spark Cores

From that understanding it is clear to see that - we should always oversubscribe to spark cores. i.e., the worker should register 2x or 3x the the actual available cpu cores, when registering with the Resource Manager (RM). So, a spark node which has 10 free cpu cores, should register with 20 or 30 vcores with the RM.

Local Mode

Spark apps can be submitted to local mode using the -- master local[] option at command line. Following are some of the examples

./bin/spark-shell --master local        // start executor with 1 thread/core
./bin/spark-shell --master local[5]     // start executor with 5 thread/cores
./bin/spark-shell --master local[*]     // start executor with threads/cores as many as the logical cpus

Shuffles in Local mode do not involve any network I/O

??? How to get local master and worker ui.??

Standalone
Spark apps are submitted to standalone clsuter using --master spark::<drivernode> option.
Examples

./bin/spark-shell --master spark://localhost:port

Spark Master & Worker
- Master is a task scheduler, deciding where an executor should run.
- A worker - starts, ends executor processes.
- If a worker crashes, Spark master restarts it.
- if an executor crashes, spark worker restarts it.
- Spark master cannot be brought back, if HA is not enabled.
Highly available Spark Master
- We can add multiple secondary passive masters by using zoo keeper or other methods like using the store

//Example: dse's version of spark master HA is stored in cassandra table : 
dse_system.spark_xxx_master_recovery

Spark Cluster Modes