How to Start Learning Databricks?

Databricks is gaining a lot of popularity. Perhaps your company provided you with this tool. Naturally, you may ask yourself how to you become good with Databricks?

The most important thing to realize is that Databricks is really a platform for Apache Spark, an open-source framework for distributed computing (for big data processing). Databricks abstracts away many of the difficulties involved in distributed computing. In addition, Databricks provides many features that streamline working with data on an enterprise scale: data governance, workflows, data sharing…

I therefore recommend dividing your learning into two parts: 1) Spark and 2) Databricks features. The former requires you to interface with Spark using one of the three common languages Python, R, or Scala. The latter depends on your role within the organization: are you a data engineer or a manager? Your role will determine which of Databricks many features will be relevant for you.

At the end of the day, in my opinion, the best way to start is to learn how to interface with Spark’s Dataframe API in PySpark (Python). This will teach you how to read, manipulate, and write data on Databricks.