There’s a lot of talk about big data these days like it is something new and mysterious and only a small elite group “get it”. This is simply not true.
The fact is that nobody, apart from the hardware and software vendors, care about data, oh and the people who peddle themselves as “experts”. NOBODY!
What we do care about is information, knowledge, insight, intelligence.
Big data has existed for years, the data has always been there, we just didn’t know how to use it, or have the tools to make sense of it. Now we do and in just the same way that we escaped having to navigate to our documents by typing C:\\my documents\work\customer\invoices and could simply click an icon, we now have tools that will give us the information we need at the click of a mouse because the complexity is in the code behind that mouse click, but it’s not rocket science!
Let’s take a supermarket as an example:
They have rows and rows of shelves stacked full of goods and every one of those items has a barcode and a price and a sell-by date and they know how many they have and they know their weight and locations, both on the shop floor and in the warehouse. There are thousands, even tens of thousands of individual items to keep track of! Then there are the buy-one-get-one-free type deals, where the price you pay must be adjusted in real time to give you discounts on specific combinations of items - That’s not “small data!”
Then you do your shopping and you go to the till and they scan or weigh each item and add it to your total bill, but they also deduct it from the inventory, so they know what’s been sold and what they need to re-order.
You also have the mark-downs, where they reduce the prices of items that are near their sell-by date, so they don’t have to waste those items, but if they fail to sell, then they do zero-rate them and dispose of them – that’s all being captured by the systems.
They’re also tracking you, if you’re a regular customer with a loyalty card – they know a lot about you and your family and they can target special offers at you for things they think you’ll buy.
They’re also tracking the weather and if it’s going to be a lovely weekend, then they change their ranges to promote items that you will want to buy – salads, beers, soft drinks, barbeque food, etc. and you have to get it right because you don’t want to be left with vast amounts of unsold food that has a short shelf life that ends up getting wasted!
What about the staff, you have to ensure you have the right number of staff with the right skills on duty at the right time, you have to calculate their hours and pay, overtime, pensions, deductions, etc.
There’s a lot going on and that’s just one store! Now think about it on a regional and national level, with hundreds or thousands of stores and potentially different weather across the nation or state.
BUT who needs to know how many cans of baked beans got sold last week, probably just a handful of people in the whole company. Likewise for most of the other products, especially long-life and non-perishables. The fighter-pilots in this scenario are the ones predicting sales of short-life perishables like fish, meat, salads and fresh fruit – they have to get it as close to perfect every day, without fail. So they need a lot of information and dashboards and projections.
But they are actually the same information, dashboards and projections as the person with responsibility for baked beans, they’re just glad that if they get it wrong, they can sit in the warehouse for a week.
The bottom line is the data is the same, it’s sliced and diced into graphs and charts and exception reports in almost the exact same way for all products and the people for who it matters get those reports at store level and at regional and national level. Then there are summary reports for managers because they don’t need to know how many kilos of fish got sold UNLESS a lot wasn’t sold and it has made a loss.
Let’s not forget that this data is also analysed by time of day, day of the week, season, weather, festivities and celebrations, etc. and the data is stored for years, so they can look back at previous trends to inform their decision-making. “What happened last time we had good weather for the May Bank Holiday?”. It’s also being analysed to see how we respond to advertising and news stories across both traditional channels like TV, radio, magazines, etc. but also new media – Facebook, Twitter, etc.
So, YES!, there’s a lot of data and a lot of real-time data because every barcode that is scanned is real-time data, changing the shop’s inventory before that customer has even left the shop and they know if you always buy that, sometimes buy it or have never bought it before, if you’re a card-carrying loyal customer.
But is it really BIG Data and what does that even mean?
Well it’s certainly big in terms of sheer volume and that’s why we’re hearing so much more about it now than we used to because we now have the applications and the hardware to process it into meaningful information within reasonable timescales. Apparently 90% of all the data ever created has happened in the last 2 years- incredible when you think how old man-kind is. But data is not information, it’s just bits and bytes on a disk somewhere until someone makes sense of it!
The really big news is around data augmentation, which is basically taking seemingly unrelated data and merging it with other data – the weather affecting sales of salad is augmented data – sales + weather data. Mad cow disease or horsemeat scares impacting sales of processed beef products is augmented data- news + sales, but again, not rocket science!
But some of it gets very clever and those abilities to predict trends could change our lives! The speed at which they were able to crunch all the video footage of Boston, find the bombers and publish their photos was big data, kind-of. If the Russian information had been managed properly and the CIA and FBI had shared their data and the incident had been stopped before it even happened, that would have been big news for big data, but still not exactly rocket science.