Having spent the last few days engrossed in some data-related work I’ve been reflecting on the fundamental challenges that the data world faces. Above all, I’m struck by how little they’ve changed compared to seven or eight years ago when I first became aware of the term ‘big data’.
Is it all about the data?
At the time, I was working with an experienced data architect (you know who you are, CY), who persistently claimed that regardless of what we were doing with technology, it was ultimately ‘all about the data’. But is this true? And has this ever been completely true?
Back then, the popular way of describing how to distinguish big data from any old dataset, were the three V’s: Volume, Variety and Velocity. Since then we seem to have acquired a few more ‘V’s, and two particularly interesting ones are Value and Veracity.
We’ll return to Value later, but Veracity points to the trustworthiness of the data. With many forms of big data, quality and accuracy aren’t completely controllable (just think of Twitter posts with hashtags, abbreviations, typos and colloquial speech, not to mention the the reliability and accuracy of content). But big data and analytics technology now allows us to work with these types of data. The volumes often make up for the lack of quality or accuracy.
My data architect must have been ahead of his time. Having wrestled with databases all his working life, he had come to realise that the holy grail of the ‘single source of truth’ was often misinterpreted as the ‘single version of the truth’. And this goal is often unrealistic, and usually damaging to any data management endeavour. In reality, a single data point should have one source and be universally agreed upon, but the value of that data is subjective, and can be measured differently by different people.
This is best illustrated by a simple example. A popular science book sells for £12.99 on Amazon. So there’s your single truth - the Amazon price of £12.99. But the value of that same popular science book is more subjective. What if you never liked science when you were at school? Or the opposite - you’re a senior physics lecturer and popular science books don’t tell you anything new? On the other hand, let’s say it’s written by Brian Cox and you happen to enjoy his television programmes. Or you’re aware of being a bit ignorant about science and want to find out more? All of these scenarios will affect your perception of the single truth - the £12.99 price tag.
So two groups can view the same ‘facts’ and arrive at different conclusions. Two groups may even give the same name to values that are derived from different inputs or different formulas. So my friend was always at pains to point out that our systems and designs must leave room for the possibility of many names for the same truth.
Big Data in the Big Wide World
This is a reality we should all accept, because data needs to reflect the real world. We can't shape the world to fit into a tidy dataset.
So where does all this leave us? Well, I'm happy to accept that technically it might be ‘all about the data’, but from the perspective of just about everyone else, it’s actually about achieving meaningful outcomes.
To put it another way - how is that vast flow of morphing data being used to help our users and customers? After all, most consumers aren't in the business of sifting through and interpreting datasets - they’re trying to answer a specific question.
How Hullabalook harnesses data to meet consumer needs
The best way I can get my point across is by referencing an ingenious little tool from Hullabalook. It looks at data from the perspective of a consumer trying to solve a specific problem.
The example I have in mind is its sofa-sizer tool, and here’s the story. The consumer needs a sofa. The consumer needs to position it within a specific space. Sofa-sizer organises the data around this problem, presenting the consumer with sofas that fit their space.
Of course, the tool is just a fancy way of filtering data. But the consumer isn't aware of that, being entirely immersed in the process of choosing a sofa. And Sofa-sizer helps them solve a problem as effortlessly as possible.
Isn’t that what we all want?
So is it really all about the data?
To return to the V’s of big data and ‘Value’, I know that people like me in the technical community can get lost in a continual cycle of analysis and categorisation. It’s so easy to forget why anyone else would actually care. That’s why I like the Amazon principle - ‘customer-obsessed’, and it’s also why I was always uncomfortable about my work being ‘all about the data’.. Perhaps we should all take a leaf out of their book and keep the customer's intentions front of mind when handling business data, making sure that our work is serving their goals and giving them the best possible experience.
There isn’t a single source of truth; there are multiple versions of the truth.
We can only release the full value of business data by focusing on outcomes.
Amazon’s overarching goal is to be ‘customer-obsessed’ and we should all emulate that.