Spark Configuration

Location of Configs
  • Spark code (default values of configs)
  • Precedence of Cofnigs

spark-env.sh

important configs

  • SPARK_WORKER_CORES
    • Number of Spark Cores a worker can give out to its executors
BEST PRACTICE
Set SPARK_WORKER_CORES = ~2 or ~3 times logical cores on that Node. Over subscribe in general.
  • SPARK_LOCAL_DIRS
    • used for spill over when an RDD is persisted (RDD.persist(Memory and Disk)
    • Also used for intermediate shuffle data.
    • No way to make shuffle data and Persist spill over data to point to different locations, as they are both represented by this single setting.
    • Can be pointed to multiple locations on multiple disks. Provided by a Comma separated list of directory paths.
BEST PRACTICE
Try using SSDs for SPARK_LOCAL_DIR setting.

results matching ""

    No results matching ""