In yesterday’s email about the joy of Databricks Asset Bundles (DABs) I mentioned configuring your resources based on an environment (dev, prod, test). But how do you parametrize your actual code?

Say your company has uses separate data catalogs for development and production: my_data_dev and my_data_prod. Suppose you have a notebook (or a module imported in this module) that reads from the respective catalog. How can you keep the same code, without hardcoding both catalog names?

Pass the values as a notebook parameter:

Passing Values as Notebook Parameters

Databricks notebooks have widgets that allow to define parameters than can be used in the code:

my_parameter = dbutils.widgets.get("my_parameter")

It’s possible to set the value of this widget via DABs:

task:
  task_key: Run_Daily_Tests
  notebook_task:
    notebook_path: src/run_daily_tests_notebook
    base_parameters:
      my_parameter: ${var.my_parameter_value}

Notice the ${var.my_parameter_value}. This is a DAB variable that has to be declared, and can be assigned a value in the databricks.yml file:

Declaration

Added to the root of the databricks.yml file:

variables:
  my_parameter:
    description: "Some really important parameter"

Assignment

Set the value for this parameter for a particular target (environment):

targets:
  dev:
    mode: development
    workspace:
      host: https://adb-xxxxxxxxxxxxxxx.azuredatabricks.net
    variables:
      my_parameter_value: "twenty one"
  prod:
    mode: production
    workspace:
      host: https://adb-yyyyyyyyyyyyyyy.azuredatabricks.net
    variables:
      my_parameter_value: "thirty two"

Now, both Databricks notebooks can have exactly the same code, but a different value for my_parameter.