Data good enough for operations is not necessarily analytics-ready.
Exactly one month ago I published the first episode of my Industrial Data Quality Podcast. In my opinion, a topic of critical importance that is all too often overlooked, especially with all the buzz around AI.
In the most recent episode, I had the pleasure of inviting guest speaker Thomas Dhollander, co-founder of Timeseer.AI. Together we explored critical challenges in industrial time series data reliability and observability.
...
Welcome to my podcast! In this very first episode I introduce the topics of this podcast and explain my background in data.
Follow the show About Denis Gontcharov
In this video I demonstrate how to perform data quality checks on a Delta table in Databricks using Soda Core.
Soda Core is the open-source Python package developed by Soda. It can be compared to Great Expectations, but is much simpler in my opinion. I enjoy using Soda in my professional projects and will continue exploring this framework.
...
Resources Check out the complete code on GitHub. Browse the GX Data Doc on Azure Blob Storage. Use Case Last week I explored Soda as a data quality testing framework for my large enterprise client. This week Iām exploring a more mature alternative called Great Expectations or GX in short.
GX generates neat HTML reports called Data Docs that give an overview of your data quality test results. The client wants to share these reports with the team - but not with the world! As the client is already using Azure, hosting the report files on Azure Blob Storage seems like a good solution.
...
Use Case For my current engagement Iām tasked with developing an automated data quality framework for a large industrial enterprise in the renewable energy sector. The client has over a hundred independent SCADA systems from various vendors gathering energy production data. All this data has to flow in one central repository to be analyzed with Databricks. The client is obligated to ensure high data quality for contractual reporting to external parties. Failure to deliver incurs high financial penalties.
...