Researchers and academics today must consume and process vast quantities of data. Doing so efficiently and with accuracy is an increasing challenge. Furthermore, much of this data needs to be presented to others, both for external scrutiny and as part of public engagement.
Zegami provides a new way to visualise and work with data in a way that allows for efficient, systematic assessment and processing of data. The visual interface also provides an ideal medium for telling the story of the research through graphical representations of the data.
Researchers applying artificial intelligence to their visual and numerical data also use Zegami to prepare and tag their training data sets to increase the accuracy of their machine learning models.
Weatherall Institute of Molecular Medicine at the University of Oxford
Curing genetic disease with Machine Learning
Many diseases afflicting humans are genetic in origin, meaning that they are the result of faulty activation or deactivation of parts of the genome. Many such conditions are difficult to treat effectively by conventional means, but the emerging field of gene therapy promises to provide a potential solution.
Developing gene therapies requires gaining a detailed understanding of the functioning of the human genome, specifically around developing a better understanding of which proteins in genes bind, and where they do this in the genome. By gathering and processing extensive experimental data, with the help of machine learning technology, such a picture is being developed.
Using machine learning to help process and the data allows it to be carried out at a much greater volume. Doing so reliably and accurately, however, relies on developing a rigorously trained machine learning model. In turn, obtaining a quality model depends on the preparation of a sufficient quantity of high-quality training data and a carefully guided training process.
For the Wetherall institute, Zegami serves as a high productivity tool for both preparing training data, and for providing feedback on results from models throughout the training process.
With its filtering and layout tools, Zegami’s interface offers an order-of-magnitude increase in the efficiency of these tasks compared to previous approaches. The gains are achieved by enabling humans to handle items in batches of similar specimens rather than one by one.
Zegami also serves as a platform on which to publish all results to ensure their findings are fully open and available for others to reproduce or conduct their own analyses.
1 in 9 people or 821.6 million of the world’s population in 2018 were defined as hungry by the World Health Organisation. Population increases are driving a predicted increase in food requirement of 60% by 2050. Labs around the world are tackling this problem by developing new strains of crops to withstand more hostile conditions with a view to solving the forecasted global food shortages in the next 20 – 30 years.
Researchers, both academic and commercial, are investing in high throughput systems that allow them to grow large numbers of crops (3,000+) in a single controlled experiment. In each experiment, various strains of a crop (for example wheat, barley or rice) are subjected to harsh conditions (salinity, heat) in order to see how well they survive. To rule out bad seeds many of the same variety are grown at once so that an overall average of survivability can be recorded.
The data/imaging opportunity
At such a large scale, automation of growing and assessing the results is essential to conduct the experiments efficiently. A key component of these results is a daily set of images of each individual plant from a variety of angles. These images must then be assessed by experts to determine which strains show the most promise. Furthermore, an experiment with so many moving parts has considerable scope for things to go wrong. Regular monitoring of the system is therefore essential to keep each experiment on track.
Zegami has been collaborating with MRC Weatherall Institute for Molecular Medicines to help clean its data and assist with the training of its machine learning models, specifically around developing a better understanding of which proteins in genes bind, and where they do this in the genome.