I’ve been working on an exercise in processing complex data types (arrays) using higher-order functions. Spark has a function reduce that reduces arrays to single values.

I wrote the following Scala code:

spark.sql("""
  SELECT
    celsius,
    reduce(
      celsius,
      0,
      (acc, t) -> acc + t
    ) as celsius_sum
  FROM temp_view;
""").show()

This code sums up arrays of temperatures (in Celsius) into a single value.

Problem

The function reduce can’t be found:

[error] (Compile / run) org.apache.spark.sql.AnalysisException: Undefined function: reduce. This function is neither a built-in/temporary function, nor a persistent function that is qualified as spark_catalog.default.reduce.; line 4 pos 8

That’s odd… After all:

  1. I use Spark 3.3.5 (reduce was introduced in 2.4).
  2. I can run the other built-in functions filter and transform.

What’s going on? I don’t know yet…