What We Found Classifying 25k Images of Faces

A short while ago, our team of data annotators labeled nearly 25 thousand images of faces, classifying them by age, gender, hair color, beard and mustache color (if present), and glasses. We then released the annotated dataset free to the public. You can download the annotated face classification dataset here.

The dataset consists of 23032 face images. Each image was labeled by two independent annotators. The assets for our Face Classification Dataset were taken from the open Flickr-Faces-HQ (FFHQ) dataset. We analyzed the annotations created by our team, and we show our findings in this article.

The first graph illustrates the gender distribution over the dataset. Our annotators found 10472 “Male”, and 12591 “Female” assets in the dataset. In addition to that, 254 people were labeled as “Not sure”. When the total data and the results were compared, conflict was minimal. As such, it can be be ignored.

The conflicting results are added in the figures as a value of 0.5 for each key. To illustrate, if one annotator labels “Male”, but a second annotator labels “Female” for the same image, both the “Male” and “Female” columns’ “conflict” sections are increased by 0.5.

Figure 1: Bar chart of the gender distribution
Figure 2: Heatmap of the conflict in gender distribution

876 images were labeled as “Baby (0–2)” by both annotators. Similarly, 320 images were labeled as “Baby (0–2)” by one annotator, and “Child (3–9)” by another. The conflict between the annotators reached the maximum level in the young and adult ages. Since the difference between the face types of young and adult ages are smaller, the conflict is expected. The conflicts in each age group can be seen in Figures 3 and 4.

Figure 3: Bar chart of the age distribution
Figure 4: Heatmap of the conflict in age distribution

The following graph analyzes the hair color distribution of the dataset. According to the graph in figure 5, the most common hair color is “Brown” with 31% of the total set. It is followed by “Black” and “Blonde”. Since “Black” and “Brown” colors are similar to each other, the conflict reaches the maximum level there.

Figure 5: Bar chart of the hair color distribution
Figure 6: Heatmap of the conflict in hair color distribution

The next graph analyzes the beard color distribution of the dataset. According to the dataset, 82.9% of the assets have no beard and they are labeled as “No hair”. The conflict reaches the maximum between “Black” and “Brown” colors, as seen in the figure below.

Figure 7: Bar chart of the beard color distribution
Figure 8: Heatmap of the conflict in beard color distribution
Figure 9: Bar chart of the mustache color distribution
Figure 10: Heatmap of the conflict in mustache color distribution

The next graph analyzes the mustache color distribution of the dataset. According to the dataset, 82.2% of the images have no mustache and they are labeled as “No hair”. The conflict reaches the maximum between “Black” and “Brown”, like in the hair color graph. It can be noted that the graphs of the beard and the mustache are close to one another.

The next graphs analyzes the eye color distribution of the dataset. The most common eye color is “Brown” after the labeling process. Since it is hard to distinguish between eye colors, the conflict reaches the maximum level at the “Not visible” label as can be seen in figure 11.

Figure 11: Bar chart of the eye color distribution
Figure 12: Heatmap of the conflict in eye color distribution

The following graph shows the wearing glasses distribution of the dataset. It is clear in figure 13 that most people have no glasses.

Figure 13: Bar chart of the glasses distribution
Figure 14: Heatmap of the conflict in glasses types distribution

Let’s go deeper analyzing the dataset. For the following graphs, only the results coming from the first annotator were used.

Most men and women in the set are classified as adults. Although the number of “Young” women in the set is really close to the number of “Adults”, the number of adults in the men category is dominating. Either women in the dataset were, on average, younger, or they appeared younger to our annotators.

Figure 15: Bar chart of the gender distribution of age groups

According to the hair color distribution graph, the most popular color in each age category is brown, except in the baby category. Most baby images in the dataset have blonde hair.

Figure 16: Bar chart of the hair color distribution of age groups

The following figures are really close to each other. Figure 17 represents the beard color distribution by age group, and figure 18 represents the mustache color distribution by age group. Since babies have no beard and mustache, the baby color columns are empty as it is expected. Further, it appears the vast majority of people don’t sport either beard or mustache.

Figure 17: Bar chart of the beard color distribution of age groups
Figure 18: Bar chart of the mustache color distribution of age groups

The following figure shows the eye color by age group in the dataset. Brown was by far the most popular color in all age categories.

Figure 19: Bar chart of the eye color distribution of age groups

According to the following figure, wearing prescription glasses occurs mostly in the adult category. Plus, people with no glasses are the vast majority in each category.

Figure 20: Bar chart of the glasses types distribution of age groups

According to the hair color distribution graph, black hair color is the most popular among males. Most females appear to have brown hair.

Figure 21: Bar chart of the hair color distribution of gender

The following figures are really close to each other. Figure 22 represents the beard color distribution by gender, and figure 23 represents mustache color distribution by gender. Most people do not have any beard or mustache. The most popular color both in beard and mustache is black.

Figure 22: Bar chart of the beard color distribution of gender
Figure 23: Bar chart of the mustache color distribution of gender

The following figure shows the eye color distribution by gender. It appears that most people have brown eyes, with blue in second place for all genders.

Figure 24: Bar chart of the eye color distribution of gender

The following graph illustrates the images in the dataset by type of glasses. It can be seen in figure 25 that most prescription glasses users are male. However, most faces annotated in the set have no glasses.

Figure 25: Bar chart of the glasses distribution of gender

The following graph illustrates the conflict rates between annotators for each feature. Since eye color and age group are difficult to understand by looking at an image, that’s where most conflicts occurred.

Figure 26: conflict rates between annotators

Ango AI provides data labeling solutions for AI teams of all sizes and industries. Our data labeling platform, Ango Hub, is used by dozens of industry-leading companies to label millions of data points monthly. Hub is the most versatile platform in the market, supporting 15+ file types and 20+ annotation tools. It’s also free to try here.

Ango AI also offers an end-to-end, fully managed data labeling service, Ango Service, used by customers all over the world to label data ranging from banking, to insurance, government, medical, and more. We know all of our annotators personally and do not outsource. Book a call with us to learn more.

Authors: Onur Aydın, Kıvanç Değirmenci

Originally published at https://ango.ai on August 3, 2022.

--

--

Next-gen data labeling solutions.

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store