By now, most large enterprises have spent a lot amount of money on their data platform, often Databricks. Yet it’s sad to see that the knowledge to leverage these solutions is often not available in house.
As an external, I’m often asked to collect and clean the companies’ internal data for a business user. This person then sends this data to yet another third party that uses this data for some advanced analytics use case, and sends the result back to the client. The actual enterprise becomes just a man-in-the-middle checking up on outsourced tasks.
Over the years, I’ve noticed that the share of internal employees that actively engage with their own data did not really increase. There’s usually just a few “data champions” per company. The rest requests a polished Excel-extract. I get it: Spark is not easy, and everyone is busy. So does that make self-service a pipe dream?
The whole world seems on a craze to build more compute, but is compute really the bottleneck? In my view, what the (data) world needs are more motivated people, who have the domain knowledge to ask the right questions, and who have the time to dig into their own data for answers.
AI may (although I think it won’t) make this work lighter, but it won’t do this work for us anytime soon.