Spark debugging

Changing Logging Level for spark driver logging:

by passing spark.driver.extraJavaOptions

programmatically:

  val sparkConf = new SparkConf().setAppName(jobName)
                  .set("spark.driver.extraJavaOptions", 
                  "-Dlog4j.configuration=-Dlog4jspark.root.logger=DEBUG,TimeRollingHourly")

from spark-submit command line:

spark-submit [other args] --conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j.properties"
for standalone mode

following will work

import org.apache.log4j.Logger
import org.apache.log4j.Level

// use the suffix of the module's namespace you want to change the logging level for.
Logger.getLogger("org").setLevel(Level.OFF) 
Logger.getLogger("com.microsoft").setLevel(Level.OFF) 
SparkContext.getOrCreate(sparkConf)

Changing Logging Level for spark executor logging:

programmatically:

val sparkConf = new SparkConf().setAppName(jobName)
                .set("spark.executor.extraJavaOptions",
                "-Dlog4j.configuration=-Dlog4jspark.root.logger=DEBUG,TimeRollingHourly")

from spark-submit command line:

spark-submit [other args] --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j.properties"

NOTE: other driver/executor specific args can be passed in similar fashion

Spark Cassandra connector

Enabling Verbose Tracing for Spark Cassandra Connector

Pass the following extra command line arg to spark submit to get verbose logging for Spark-Cassandra Connector.

For driver:

--driver-java-options -Dlog4j.configuration=file:/usr/hdp/current/spark-client/conf/log4j.properties -Dlog4jspark.root.logger=INFO,TimeRollingHourly

For executor:

--conf spark.executor.extraJavaOptions=-Dlog4jspark.root.logger=INFO,TimeRollingHourly -Dlog4j.configuration=file:/usr/hdp/current/spark-client/conf/log4j.properties -XX:-UseParallelGC -XX:+UseG1GC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+HeapDumpOnOutOfMemoryError

results matching ""

    No results matching ""