Apache spark notes
Assorted collection of notes on Apache spark - Spark Architecture, Programming concepts and best practices.
Architecture
- Spark Cluster Modes
- Different JVMs
- Other components
- Spark UI
- History Server
- Hdfs
Anatomy of a Spark Application
- Types of Spark Application
- Programming Languages
- Life cycle
- Application/Jobs/Stages/Tasks
- RDDs/DataFrames/DataSets
- Shuffles/Caching/Persist
- Memory needs of the Application
- Deep dive into Spark Application Configurations
Managing Spark Applications
- Spark UI
- Eventlog
- driver/executor logs
- Advanced Metrics
- Best Practices
Spark and its EcoSystem
- Hadoop
Kafka
Cassandra