Harnessing the data tsunami

When it comes to both business and research, having more data ought to mean having more information, which should in turn provide a clear advantage in the decision making process. However, as businesses are discovering, big data brings with it all manner of issues when it comes to structure, management and analysis. How does all of this data translate into useful information? How can it be utilised to make faster and more informed decisions, at all levels of business?

Traditional databases contain structured data, which conforms to a defined structure or taxonomy that rarely changes over time. Standard querying interfaces are then used to access that content.

But data is becoming harder to manage. New technologies, including business tools such as slack and social media, have dramatically changed the way we generate and consume data, leading to a mass of unstructured data of various types. We must also consider that documents, images and videos should be treated as data, and when we do this, it should be no surprise that 90% of the data generated today within business is unstructured. What’s concerning is that we spend on average 2.5 hrs a day just searching for information amongst this unstructured data to help us do our job.

Extracting meaning

Every business knows that a database remains worthless without some means of knowing what’s in it. As data grows in both scope and volume it become more and more difficult to get a meaningful answer to the question of what lies within.

Unlike structured data, these large, disparate sources of data must be processed and analysed before they can be qualified, understood and then effectively used, but what

Humans are extremely good at understanding the messy, unstructured nature of the world, and so making sense of unstructured data has normally been the preserve of people examining records outside of any database. But applying human intelligence directly to this scale of data is simply not feasible. The processing performance of the human brain has its own limitations and cannot scale at the same rate that this unstructured data is growing.

Allowing users to truly extract value and insight from the growing avalanche of data requires new ways of thinking. This is even more true when managing digital content such as images, video or documents, where the portion of the data that can be considered structured is even smaller. How are we ever to extract value and sense from this ever-growing tsunami of data?

Combining AI and human sentiment

Enter Artificial intelligence (AI) and specifically, machine learning. ML techniques have the ability to consume large amounts of data streamed from varied sources, and to make sense of it all in a systematic way. Through the large-scale processing of hundreds of thousands or even millions of pieces of data and content in a relatively short period of time, AI can extract quantifiable information in a manageable format.

By delegating tasks such as face detection, OCR sentiment analysis, object detection, and metadata extraction to AI, humans can step away from the repetitiveness of manually annotating data, to make better use of their cognitive functions.

This is important because, in many cases, the evaluation of content is highly subjective and not suited to processing even by advanced AI techniques. Such crucial, subjective judgements might relate to financial, quality, research or progress decisions over large visual datasets. Key decisions can be made with more speed and more clarity when salient information has been extracted automatically from thousands or even millions of files.

Machine learning is increasingly adept at classification tasks such as identifying objects, less tangible features such as sentiment or emotion, and much, much more. This makes it especially useful when enhancing existing databases or archives of content for which little or no structured information exists. The best part is that machine learning processes can be run 24×7, and can be upgraded to new and better machine learning models as they appear in the future.

So how can the tsunami of unstructured data be harnessed? ML is clearly a potent tool to this end. But to use ML effectively requires both an effectively trained model, and a means to explore the extracted data. Read our other blog posts to find out how Zegami’s visual exploration interface provides both of these: a fantastic tool for efficient, robust preparation of data for training ML models; and a game-changing new interface for exploration of the results.

Zegami is an Oxford University spinout company founded on the 1st February 2016 by Samuel Conway CEO, Roger Noble CTO and Stephen Taylor (Chief computational Biologist, Weatherall Institute of Molecular Medicine, Oxford). As a company we are focussed building the next generation search platform utilising the concept of Visual Data Exploration and Augmented Intelligence.