Up to now I’ve mostly executed my Spark Scala code directly in sbt with run. Now I want to inspect the jobs, stages, and tasks of a particular run in the Spark UI. For this I need to execute the code using the spark-submit command, outside sbt.

To do this, I package the sbt project by running package on the sbt prompt. This creates a JAR file along with dependencies in the target/ directory:

└── target
    ├── global-logging
    ├── scala-2.12
    │   ├── classes
    │   ├── spark-learning_2.12-0.1.jar
    │   ├── sync
    │   ├── update
    │   └── zinc
    ├── streams
    │   ├── _global
    │   └── compile
    └── task-temp-directory

The scala-2.12/spark-learning_2.12-0.1.jar file will be passed to the spark-submit command:

 spark-submit \
 --class sparklearning.IOT \
 --conf spark.eventLog.enabled=true \
 --conf spark.eventLog.dir=/tmp/spark-events \
 target/scala-2.12/spark-learning_2.12-0.1.jar
  • Because my package has multiple main classes, I have to pass a --class flag indicating the class I want to run.
  • The two --conf class enable the Spark History Server and specify the location for its log files /tmp/spark-events.

To delete the created artifacts in the target/ directory, run clean on the sbt prompt.