Saying that “smelters have an abundance of data” is like saying “the earth’s crust has an abundance of aluminium”.

While technically correct, it implies that the desired product just lies there, waiting to be dug up and used. Both statements start at the end (the final product) and ignore the strenuous processing required to obtain the final product from the starting material.

No one in his right mind would pick up a shovel and start digging for aluminium metal in an open pit mine. Yet this is exactly what happens in every new digital transformation initiative or new digital startup: a solution is built assuming the “data is already there”.

For the case of aluminium the distinction is clear: we know that the source material comes from across the globe, i.e. the process is painfully visible. It’s much easier to fool yourself by looking at a server and thinking “yep, all our data is in there”. Sure, the data is on a disk but it needs 1) cleaning and 2) annotation. The complexity of this cleaning step is obscured because it completely takes place in “cyber space” and is only visible to your IT-personnel.

On most projects I’ve worked on, the cleaning step was so time-consuming that we may as well have started from scratch by collecting new data. Large portions of existing data turned out to be completely unusable and were later discarded.

Hypothetically, even if you stumble on pristine data that you can use right away, you rarely uncover something new. It’s usually some correlation between physical phenomena that was already discovered by some French scientist over a century ago via chemical thermodynamics.

I’m still optimistic about data and truly believe there is treasure to be found on the hard drives in your IT-department. But it’s not under the form of “data” itself.

But if not data, what are we really looking for?

We are looking for information - and I don’t think we will simply find it. Information, like aluminium, has to be created by refining the starter product: data. This is where the overlooked second step of annotation comes in.