In this video I explain how to read data from an Airtable into a PySpark dataframe on Databricks.
Airtable is a popular spreadsheet tool used at many enterprises. It offers additional features compared to other tools such as Microsoft Excel and Google Sheets
Today I published a 30 minute talk about my career in data thus far. I’ve made many mistakes along the way, but learned a tonne from them. Although I made many jumps over the years, I’m happy I always stuck around the central theme of data. Perhaps my talk can give you some inspiration if you feel stuck?
Topics include:
How I ended up in data after graduating in Materials Science and working in aluminium. How I started freelancing, but ended up employed again. Why I decided to freelance again What I did better the second time. Why I’m focusing on Databricks in the future. Listen to the podcast here:
...
Welcome to my podcast! In this very first episode I introduce the topics of this podcast and explain my background in data.
Follow the show About Denis Gontcharov
In this video I demonstrate how to perform data quality checks on a Delta table in Databricks using Soda Core.
Soda Core is the open-source Python package developed by Soda. It can be compared to Great Expectations, but is much simpler in my opinion. I enjoy using Soda in my professional projects and will continue exploring this framework.
...
Resources Check out the complete code on GitHub. Browse the GX Data Doc on Azure Blob Storage. Use Case Last week I explored Soda as a data quality testing framework for my large enterprise client. This week I’m exploring a more mature alternative called Great Expectations or GX in short.
GX generates neat HTML reports called Data Docs that give an overview of your data quality test results. The client wants to share these reports with the team - but not with the world! As the client is already using Azure, hosting the report files on Azure Blob Storage seems like a good solution.
...