🎥 Read an Airtable into a PySpark dataframe on Databricks

In this video I explain how to read data from an Airtable into a PySpark dataframe on Databricks. Airtable is a popular spreadsheet tool used at many enterprises. It offers additional features compared to other tools such as Microsoft Excel and Google Sheets

April 24, 2025

🎧 Industrial Data Quality Podcast E2: My Six Years of Working and Freelancing in Data

Today I published a 30 minute talk about my career in data thus far. I’ve made many mistakes along the way, but learned a tonne from them. Although I made many jumps over the years, I’m happy I always stuck around the central theme of data. Perhaps my talk can give you some inspiration if you feel stuck? Topics include: How I ended up in data after graduating in Materials Science and working in aluminium. How I started freelancing, but ended up employed again. Why I decided to freelance again What I did better the second time. Why I’m focusing on Databricks in the future. Listen to the podcast here: ...

April 18, 2025

🎧 Industrial Data Quality Podcast E1: Introduction

Welcome to my podcast! In this very first episode I introduce the topics of this podcast and explain my background in data. Follow the show About Denis Gontcharov

April 1, 2025

🎥 Testing Data Quality with Soda Core in Databricks

In this video I demonstrate how to perform data quality checks on a Delta table in Databricks using Soda Core. Soda Core is the open-source Python package developed by Soda. It can be compared to Great Expectations, but is much simpler in my opinion. I enjoy using Soda in my professional projects and will continue exploring this framework. ...

March 29, 2025

Hosting Great Expectations Data Docs on Azure Blob Storage

Resources Check out the complete code on GitHub. Browse the GX Data Doc on Azure Blob Storage. Use Case Last week I explored Soda as a data quality testing framework for my large enterprise client. This week I’m exploring a more mature alternative called Great Expectations or GX in short. GX generates neat HTML reports called Data Docs that give an overview of your data quality test results. The client wants to share these reports with the team - but not with the world! As the client is already using Azure, hosting the report files on Azure Blob Storage seems like a good solution. ...

February 20, 2025