Up to now I’ve mostly executed my Spark Scala code directly in sbt with run
. Now I want to inspect the jobs, stages, and tasks of a particular run in the Spark UI. For this I need to execute the code using the spark-submit
command, outside sbt.
To do this, I package the sbt project by running package
on the sbt prompt. This creates a JAR file along with dependencies in the target/ directory:
└── target
├── global-logging
├── scala-2.12
│ ├── classes
│ ├── spark-learning_2.12-0.1.jar
│ ├── sync
│ ├── update
│ └── zinc
├── streams
│ ├── _global
│ └── compile
└── task-temp-directory
The scala-2.12/spark-learning_2.12-0.1.jar file will be passed to the spark-submit
command:
spark-submit \
--class sparklearning.IOT \
--conf spark.eventLog.enabled=true \
--conf spark.eventLog.dir=/tmp/spark-events \
target/scala-2.12/spark-learning_2.12-0.1.jar
- Because my package has multiple main classes, I have to pass a
--class
flag indicating the class I want to run. - The two
--conf
class enable the Spark History Server and specify the location for its log files /tmp/spark-events.
To delete the created artifacts in the target/ directory, run clean
on the sbt prompt.