I’ve been working on an exercise in processing complex data types (arrays) using higher-order functions. Spark has a function reduce
that reduces arrays to single values.
I wrote the following Scala code:
spark.sql("""
SELECT
celsius,
reduce(
celsius,
0,
(acc, t) -> acc + t
) as celsius_sum
FROM temp_view;
""").show()
This code sums up arrays of temperatures (in Celsius) into a single value.
Problem
The function reduce
can’t be found:
[error] (Compile / run) org.apache.spark.sql.AnalysisException: Undefined function: reduce. This function is neither a built-in/temporary function, nor a persistent function that is qualified as spark_catalog.default.reduce.; line 4 pos 8
That’s odd… After all:
- I use Spark 3.3.5 (
reduce
was introduced in 2.4). - I can run the other built-in functions
filter
andtransform
.
What’s going on? I don’t know yet…