Fixing data pollution

I’ve used a few water analogies to data in talking about data management before, but came across this one from Thomas C. Redman recently which I like a lot:

There are three ways of dealing with a polluted reservoir:

1. Treat the people who drink the water and get sick.
2. Clean the water before people drink it.
3. Identify and prevent the pollution at source.

For data quality, Thomas suggests the choices are just as stark:

1. Let consumers of data sort out any problems in data.
2. Find data errors and fix them before the data is used.
3. Find the root causes of the errors and eliminate them.

Given our approach to data quality at Datasynthesis then I am a big fan of option 3 🙂

(photo above of Rutland Water, the second largest reservoir in the UK)

By brian|2025-02-17T03:48:34+00:00April 22, 2021|Blog|0 Comments