A key ingredient to any highly accurate and robust machine learning model is having a well-curated and annotated set of training data. For image classification models, annotation is the process of selecting regions within the image and assigning them a name. For example, drawing a circle around a group of cells and labeling them “cancer”.
In order to train a model, it must process thousands of examples of each class, requiring significant amounts of up-front, largely manual, labelling work. Creating detailed annotations like this can be extremely time-consuming, especially when dealing with thousands or even millions of images. In non-trivial cases like medical imaging, it can be difficult or even impossible to outsource the annotation task, leaving the bulk of the annotation burden with expensive and time-poor experts and clinicians.
Due to the complexity (sometimes over complexity) of images, many organisations either tend to develop their own in-house data annotation tools and image processing systems or do the bulk of the work by hand. This results in misdirected time and energy spent on non-core product development.