In yesterday’s email about the joy of Databricks Asset Bundles (DABs) I mentioned configuring your resources based on an environment (dev, prod, test). But how do you parametrize your actual code?
Say your company has uses separate data catalogs for development and production: my_data_dev and my_data_prod. Suppose you have a notebook (or a module imported in this module) that reads from the respective catalog. How can you keep the same code, without hardcoding both catalog names?
Pass the values as a notebook parameter:
Passing Values as Notebook Parameters
Databricks notebooks have widgets that allow to define parameters than can be used in the code:
my_parameter = dbutils.widgets.get("my_parameter")
It’s possible to set the value of this widget via DABs:
task:
task_key: Run_Daily_Tests
notebook_task:
notebook_path: src/run_daily_tests_notebook
base_parameters:
my_parameter: ${var.my_parameter_value}
Notice the ${var.my_parameter_value}
. This is a DAB variable that has to be declared, and can be assigned a value in the databricks.yml file:
Declaration
Added to the root of the databricks.yml file:
variables:
my_parameter:
description: "Some really important parameter"
Assignment
Set the value for this parameter for a particular target (environment):
targets:
dev:
mode: development
workspace:
host: https://adb-xxxxxxxxxxxxxxx.azuredatabricks.net
variables:
my_parameter_value: "twenty one"
prod:
mode: production
workspace:
host: https://adb-yyyyyyyyyyyyyyy.azuredatabricks.net
variables:
my_parameter_value: "thirty two"
Now, both Databricks notebooks can have exactly the same code, but a different value for my_parameter
.