What’s AI-in-the-Loop in Data Annotation?

10 min readMar 21, 2022

Data scientists agree: the quality of the training data is a key determinant of the quality of the final model. However, high quality annotated data is usually slow to come by, as humans need to be involved in the process of creating it. This process can be sped up, however, with the help of AI. This is what we call “AI-in-the-Loop” in data annotation.

There are countless domains where AI systems have made a significant and beneficial impact, however there’s one common denominator for all these systems, data. From autonomous vehicles to disease detection systems, models need high quality labeled data to train on.

While there is a massive amount of unstructured, unlabeled data available and procurable, annotated / label data is relatively much harder to obtain. The key reason behind this is the cumbersome process of data annotation, which takes an investment of time, effort and resources.

Whether one is dealing with images, videos, text, audio or any other form of data, the task of filtering out useful information whether it be by drawing bounding boxes or selecting parts of text is very human in nature. While this human element is absolutely essential, unfortunately it is — relative to machines — also slow.

How slow one may ask? That varies, analyzing some of our recent projects at Ango AI we have per label times ranging from 3 secs (simple classification) to 80 seconds (complex polygons for images). But one question that can certainly be pursued is: How can we make this faster? Or more comprehensively:

How can we make the process of data annotation faster while retaining human level accuracy?

This is where AI-in-the-loop comes in.

AI-Assisted annotation

At first glance it may seem counterintuitive and perhaps counter productive to have AI-in-the-loop when annotating data that is then going to be used to train AI systems. However, the core intuition idea is that, such systems have the capability perform phenomenally well at certain tasks such as object detection, named entity recognition e.t.c and most often the same systems that will utilize this data for training can be plugged in earlier in the annotation loop to make use of their capabilities.

During the annotation stage the AI-in-the-Loop system’s primary job is to help the human labeler by assisting in labeling aiming to make the process more efficient and accurate. Thus at this stage unlike the production / deployment stage the model does not need to perform at the best possible level. Instead of metrics such as accuracy or precision, efficacy (how useful the model is to the labeler) is a much more important metric to judge the model’s performance during this stage. A lot of modern research and experiments have shown the benefits of using such systems during the data annotation stage.

The benefits reflect in the quality and the efficiency of annotation:

Quality is impacted positively because AI-in-the-loop predictions are often close to if not exactly the ground truth (the actual label). This may be analogous to two labelers working on the same data, a human and an AI system. Although AI may not match human performance or accuracy it certainly does provide a layer of assistance. The cooperation between the machine and human allows for a certain level of delegation to the machine, which allows the human counterpart to focus more on reviewing the annotations, and correcting if necessary.

Efficiency takes a positive turn as generally the AI prediction reduces the number of interactions the labeler has to make with the datum (image, text e.t.c). For instance if it is an image and the task is of segmentation, then the number of clicks the user has to do to draw a polygon would be highly reduced once the prediction of AI is taken into account. Due to this reduction in human interaction, the overall time needed per sample, and consequently for the dataset as a whole, is reduced.

There is however one important point to mention: like any solution the design, implementation of AI-in-the-loop and the user experience are very important for the usefulness of AI assistance solutions. There certainly is something along the lines of “ideal level of AI assistance” as pointed out by this paper.

Such results have been observed while designing, testing and refining our AI-in-the-loop tools at Ango, where-in AI assistance tools may not be beneficial if the work of using them, interacting with them and correcting their predictions acts as an overhead rather than an aid in the process of data annotation. For instance there certainly have been cases, where in longer time was spent in correcting an image mask generated by the AI, than to label it individually. This phenomenon can be observed in this experiment.

AI Assistance techniques

The domain of AI-in-the-loop for in data annotation is certainly a novel one yet it is exponentially evolving, thus apart from a few sources there are no set list of methods that can be used to achieve AI assistance. However a few that we often engage with and actively research at Ango will be discussed here.

Pre-trained models are basically machine learning / deep learning models that have learnt their parameters through training on a specific dataset. These pre-trained models can be then used to provide predictions on the data that is being labeled.

To understand this concretely, take the case object detection and segmentation for image using the COCO dataset and the MaskRCNN model. When trained on the COCO dataset the model performs very well at detecting and segmenting 80 classes (person, car, traffic sign, e.t.c). Thus once given an image that contains any of the classes the model has been trained on, it will ideally provide a bounding box and mask capturing the object with fairly high accuracy.

For many use cases teams are looking to label these common categories (car, pedestrian) with certain differences only, such as different class names, additional class attributes e.t.c. For such use cases predictions provided by Pretrained AI model (mask rcnn model trained on the coco dataset) can prove especially useful, as this takes burden of capturing various objects in bounding boxes and polygons away from the human labeler, and leaves the task of reviewing and adding additional attributes only.

This technique further evolves the idea of using pretrained models a bit further, by using additional — custom and domain specific, yet similar — data to train a pretrained model further. Through the idea of transfer learning the model adjusts to accommodate the new information and fit itself to it, thus providing useful predictions on domain specific data.

For instance a team may want to train a model to detect different types of tropical fruits and thus needs 2K images labeled. Carrying on the previous example of using COCO and MaskRCNN, while the data containing these fruits items may not nearly be as much as the 80K COCO images, MaskRCNN can still be tuned to provide valuable predictions on the fruit images and help in the process of labeling them.

This is done through the concept of transfer learning; fundamentally, the model had learned to recognize/extract a lot of features from images using the initial training on the COCO dataset, using that information and some more specific training using a few labeled fruit images, the model fits to recognize this new information with fair accuracy. Thus transfer learning provides an added layer of generalizability to pre-trained models allowing an expansion of the domain they can provide valuable predictions in.

AI-in-the-Loop Case Study

One of the experiments related to transfer learning we successfully conducted at Ango was related to a vehicle detection project, the project entailed labeling various parts of a vehicle. We trained a model using a small subset of labeled domain specific (vehicle parts) data. After the performance was deemed to be satisfactory, the model was given 500 images containing 4528 bounding boxes to give it’s prediction for, these predictions were then passed to a labeler and the following results were observed:

Conclusively based on labeler reviews the AI predicted labels did add a layer of assistance and the task shifted from that of creation to a hybrid one, where the labeler reviewed and simply corrected or deleted AI labels. More than 90% of AI suggestions were somehow utilized by the labelers in order to assist the labeling, whereas 30% of the labels were unchanged, effectively suggesting that at least 30% of workload was directly reduced based on these predictions.

This improvement can be reflected in the comparison of label duration (time it took to label) of assets where AI labels were present versus where they were not. As observed below the distribution of labeling times considerably shifted after AI assistance was applied. The pre-assistance (before AI labels helped) mean labeling duration per asset was about 16 minutes, whereas after this assistance was applied this was reduced to about 10 minutes.

Iterative and Active Learning

Adding a further layer of generalization to the previous techniques we have iterative learning. The idea is very simple: repeatedly train the AI-in-the-loop model on incoming data as more and more data becomes available. This means that initially the model is trained on a small subset of the labeled data, the model starts providing inference on the remaining data as it is labeled. As more and more data is labeled the model is trained repeatedly at regular intervals in order to fit better to the data, thus the quality of predictions improves over time.

Moving on with our prior example of fruit dataset and MaskRCNN, the process of iteration after every 200 images here would look something like this:

Train the initial model (transfer learning via COCO pretraining) on 200 images
Use the model predictions on the rest of the data as it is labeled.
Retrain the model once additional 200 images are labeled.
Repeat steps 2–4 until the dataset is fully labeled.

Using this approach the model adapts better and better to the underlying specific data distribution overtime causing predictions to be more accurate and thus helping the human labeler in the process of annotation more accurately over time.

Active learning in the context of data annotation simply answers the following question.

Which data samples should be labeled first to increase the model performance the most?

The way this problem is addressed is by choosing the most uncertain samples i.e. the samples the model is most unsure of in it’s predictions, the key point here is that labeling these uncertain samples through the human labeler and training the model on these samples first would cause the fastest increase in it’s performance, thus making the model more helpful in the process of annotation. If you’re interested in learning more about active learning, please check out this article.

Reinforcement learning

Reinforcement learning is one of the most captivating domains for adoption of AI-in-the-loop in labeling. The domain is still in the process of evolution and thus academically the interest is in its fledgling stage at the moment. For the process of data annotation this paradigm is the closest to the student-teacher relationship that we want to adopt for the AI assistant (student) as it labels data alongside a human (teacher) annotator (this is not to be confused by the teacher-student training methodology for CNNs).

Simply put, reinforcement learning allows an agent (our AI Assistant) to perform actions within an environment (data samples) and based on the outcome of these actions (annotation by AI Assistant) compared to the expected action (annotation by Human) is rewarded or penalized. Over time the agent aims to maximize the reward it earns for its actions, and thus improves performance.

This behavior once applied to the problem of data annotation fits very well, as unlike other techniques the agent does not directly need data or pre-training but rather an environment, which is presented in the form of unlabeled data.

Applied to the example of the Fruit Dataset we have an interaction that would ideally look like this:

A human annotator draws polygons for the first set of images (200 for instance). The agent take’s a set of actions however outputs are not reflected on the platform for this set.
Based on the initial set and rewards to the agent, the agent is ideally expected to perform better on the remaining images, and thus the outputs of the agent are reflected onto the platform.
The action’s of the agent are refined as more and more data is labeled and the label moves to reviewing agent’s actions rather than ignoring them.
The process continues until the dataset is exhausted.

Although with the approach mentioned above there needs to be abundant testing, however if such an approach can be mimicked, the avenue of deep learning can certainly be of immense benefit to the process of data annotation.

Transfer Learning:

https://www.tensorflow.org/tutorials/images/transfer_learning

Reinforcement Learning:

https://arxiv.org/pdf/cs/9605103.pdf

Written by Balaj Saleem, reviewed by Onur Aydın

Originally published at https://ango.ai on March 21, 2022.