The Databricks UI is completely browser-based. It’s nice to be platform-independent, but I can’t imagine anyone enjoying writing code in a web-browser. There’s another reason why the browser is not a viable alternative: it’s currently not possible to set breakpoints in Python modules for debugging.
Is there a way to write our code locally, while still accessing resources on Databricks, and then deploy our code to Databricks? The answer is: yes - with Databricks Connect.
from databricks.connect.session import DatabricksSession as SparkSession
# see: https://community.databricks.com/t5/data-governance/databricks-connect-version-13-0-0-throws-exception-with-details/td-p/5142
os.environ["USER"] = "anything"
spark = SparkSession.builder.profile("dev").getOrCreate()
Note how we import the DatabricksSession
as a SparkSession
. This allows us to read any table, volume, etc. we have in our Databricks workspace "dev"
. Be sure to define this profile in your databricks.cfg.