Spark Configuration
Location of Configs
- Spark code (default values of configs)
- Precedence of Cofnigs
spark-env.sh
important configs
- SPARK_WORKER_CORES
- Number of Spark Cores a worker can give out to its executors
BEST PRACTICE
Set SPARK_WORKER_CORES = ~2 or ~3 times logical cores on that Node. Over subscribe in general.
- SPARK_LOCAL_DIRS
- used for spill over when an RDD is persisted (RDD.persist(Memory and Disk)
- Also used for intermediate shuffle data.
- No way to make shuffle data and Persist spill over data to point to different locations, as they are both represented by this single setting.
- Can be pointed to multiple locations on multiple disks. Provided by a Comma separated list of directory paths.
BEST PRACTICE
Try using SSDs for SPARK_LOCAL_DIR setting.