“Big data is the future” or so we are told. With enough data we can create models that provide us with good outputs for unknown inputs using an array of techniques like: using probabilities to estimate likely relationships; regression to find trends and interpolate answers; or by training general purpose learning algorithms.
In particular, Machine Learning (ML) is in vogue, and although the underpinning concepts aren’t new (I had assignments combining computer vision and artificial neural networks at university back in 2001), the capabilities of machines and easy access to massive levels of computing power now allow much more practical application of these concepts.
Regardless of the technology or the hype, there are universal concepts that are paramount to the successful application of a technology. For instance it is important to understand what the technology can and can’t do, and what properties are intrinsic and what are variable. Continuing with ML as an example, it is more effective to pre-process an image and extract key attributes and feed those into a neural network than to give it a million pixels per data entry.
One universal concept is that the technology needs to solve a real problem, or to use business terms, needs to ‘add value’. There is a cost to using a technology – for big data, collecting data can be expensive, notably in mitigating the risk of failing to manage the data i.e. ensuring it is secure and compliant. To offset this cost we need to establish value, which means asking:
- How does having this give us a competitive advantage?
- How can I monetize this?
For some of the big and famous organizations the answers to these are fairly clear: Amazon wants shopping data to provide better choices than competitors, drawing more customers and therefore more sales; Google and Facebook want information that targets their adverts to more of the right people, resulting in more buying per advert, incentivizing customers to buy more adverts.
One strategy for answering these questions is to create data which is so much better than competitors’ data, that customers will pay to access the data. This is not a new concept as software products have been up-selling reporting since time immemorial, but recently there seems to be more inclination to answer modelling questions rather than just provide charts. This is where the business questions need to be applied. For instance, if it is possible to mine data to answer questions like “what impact does doing X have on Y”, then ask yourself whether these answers are something that customers will pay for and competitors don’t have. If so, then you’re onto an excellent strategy. If not, then is having that data valuable?