How to Detect Emotions in Images using Python

One of the easiest, and yet also the most effective, ways of analyzing how people feel is looking at their facial expressions. Most of the time, our face best describes how we feel in a particular moment. This means that emotion recognition is a simple multiclass classification problem. We need to analyze a person's face and put it in a particular class, where each class represents a particular emotion. In Python, we can use the DeepFace and FER libraries to detect emotions in images.
By Boris Delovski • Jan 11, 2022

In the previous article of this series, Emotional Artificial Intelligence in Education, we covered the following topics:

  • What is affective computing?
  • What is emotional artificial intelligence?
  • What can we analyze to get a better understanding of how someone feels?
  • How can we apply the aforementioned in the education industry?

One of the easiest, and yet also the most effective, ways of analyzing how people feel is looking at their facial expressions. Most of the time, our face best describes how we feel in a particular moment. This means that emotion recognition is a simple multiclass classification problem. We need to analyze a person's face and put it in a particular class, where each class represents a particular emotion.

Analyzing faces is not always enough to gauge how somebody feels. Humans often try to hide how they feel. This can sometimes lead to misleading results if only emotion recognition in images is performed. However, in combination with other techniques (such as body language in images, or voice analysis in videos), we can get a pretty solid idea of how somebody feels.

Let's demonstrate how easy it is to perform emotion detection in images.  We can use pre-built libraries that will allow us to easily analyze faces and get the results we want very quickly without using too much code.

 

The DeepFace Library

The first library we are going to talk about is the DeepFace library. It is probably the most popular library for performing emotion analysis and similar tasks. DeepFace is an open-source project licensed under the MIT license. This means users can freely use, modify and distribute the library both for non-commercial and commercial purposes. This makes it perfect for anybody who might want to implement it in their practices. It serves as a framework for using already-trained deep learning models to perform image and video analysis. It offers much more than just emotion detection, even if that is what interests us most. 

The library uses pre-trained SOTA models (State of the Art models) in the background. SOTA models are those models that currently achieve the best possible results for some particular task on a set of benchmark datasets. The models DeepFace uses in the background are:

  • VGG-Face
  • Google FaceNet
  • OpenFace
  • Facebook DeepFace
  • DeepID
  • ArcFace
  • Dlib

These models are so good that they have demonstrated that they can analyze images of faces (and even videos) at a level that surpasses what is humanly possible. The face recognition pipeline of DeepFace consists of four stages: 

  • detection 
  • alignment
  • representation 
  • verification

Let's demonstrate how DeepFace performs all of the aforementioned tasks using only one line of code.

 

Using the DeepFace Library

First, we need to install the library. Since it is published in the Python Package Index (PyPI), the easiest way to install it is:

pip install deepface

This will automatically install everything we need to use this library. Using the library is extremely simple. After we import the package, we just need to input an image. The library will give us a detailed analysis of that image. Let's demonstrate how DeepFace works on the following image:

 

Image Source: Paul Ekman Group, The Science of Smiling. https://www.paulekman.com/blog/science-of-smiling/ 

To start, we will import what we need.

from deepface import DeepFace

Then we can analyze the face present in the image.

face_analysis = DeepFace.analyze(img_path = "happy_face_woman.png")

And this is all there is to it if you don't want to customize the process of analysis too much. Running the code above will give you the following result:

{'emotion': {'angry': 4.476726101312781e-06,
  'disgust': 1.6381327493892675e-06,
  'fear': 0.0001274320160076828,
  'happy': 99.06393880033129,
  'sad': 0.02293923016927273,
  'surprise': 3.946005002585829e-06,
  'neutral': 0.9129819073070232},
 'dominant_emotion': 'happy',
 'region': {'x': 77, 'y': 104, 'w': 163, 'h': 163},
 'age': 31,
 'gender': 'Woman',
 'race': {'asian': 2.069193683564663,
  'indian': 7.127643376588821,
  'black': 0.4860048647969961,
  'white': 24.476712942123413,
  'middle eastern': 17.554299533367157,
  'latino hispanic': 48.28614890575409},
 'dominant_race': 'latino hispanic'}

As you can see, we are given a very detailed analysis. It gives us the following information:

  • percentages for each of the 7 basic human emotions, and which is the dominant one
  • the bounding box coordinates for the face in the image with the region parameter
  • the predicted age of the person
  • the predicted gender of the person
  • the predicted race of the person (with percentages for different races)

Since the result we get is a dictionary, we can easily access different parts of it by referencing the keys of the dictionary.

print(face_analysis["emotion"])
print(face_analysis["dominant_emotion"])

The code above gives us the following result:

{'angry': 4.476726101312781e-06, 'disgust': 1.6381327493892675e-06, 'fear': 0.0001274320160076828, 'happy': 99.06393880033129, 'sad': 0.02293923016927273, 'surprise': 3.946005002585829e-06, 'neutral': 0.9129819073070232}
happy

DeepFace will also work with grayscale images. Let's take a look at an example of analyzing the emotions present in the following grayscale image:

 

Image Source: Paul Ekman Group, The Science of Smiling. https://www.paulekman.com/blog/science-of-smiling/ 

To analyze the image above using DeepFace, we will use the same code we used for the image that was in color.

face_analysis_2 = DeepFace.analyze(img_path="happy_face_grayscale.png")

print(face_analysis_2["emotion"])
print(face_analysis_2["dominant_emotion"])

This will lead to the following result:

{'angry': 2.8718812601394677e-18, 'disgust': 2.5457508031498726e-35, 'fear': 1.3584258743615688e-23, 'happy': 100.0, 'sad': 1.4448950023722881e-16, 'surprise': 1.16495389723692e-09, 'neutral': 4.1699252051330404e-06}

happy

While DeepFace might seem like the best solution in all cases, there is a caveat. Since the image needs to go through all of the stages during the pipeline, it can sometimes get "stuck" at a stage. Let's take a look at this image:

Image Source: FER-2013 dataset. 

This is one of the images from the FER (Face Emotion Recognition), a dataset of 48x48 pixel images representing faces showing different emotions.  DeepFace will run into a problem at the face detection part of the pipeline and throw out the following error:

# ValueError: Face could not be detected. Please confirm that the picture is a face photo or consider to set enforce_detection param to False.

In this case, there are two ways to solve this problem:

  • follow what DeepFace suggests as a solution and set the enforce_detection 
    parameter to False OR
  • use some other library

Following the suggestion given to us from the DeepFace library, we can run the following:

face_analysis_3 = DeepFace.analyze(img_path = "FER_dataset_image.png", enforce_detection=False)

print(face_analysis_3["emotion"])
print(face_analysis_3["dominant_emotion"])

This gives us the following result:

{'angry': 0.0008810167471331748, 'disgust': 8.797318595862103e-12, 'fear': 8.577033639407524e-06, 'happy': 99.99908804888058, 'sad': 4.79596746481186e-07, 'surprise': 6.102041458345537e-08, 'neutral': 2.3184728053760715e-05}

happy

The result seems to be good, so this is a valid option. However, another option for cases like these is to use another library. One such library is the FER library.

 

The FER Library

The Facial Expression Recognition (FER) library is an open-source library created and maintained by Justin Shenk, co-founder of VisioLab, that allows us to perform emotion recognition on both images and videos with just a few lines of code. It is not as versatile as the DeepFace library. We can only use it for emotion recognition. Nonetheless, it is still very powerful, and in our case practical since it works out-of-the-box, even with images of low quality.  

The library combines deep learning with OpenCV functionalities to perform emotion recognition in images. The way it works is pretty simple. We pass an image to the FER constructor, which gets initialized using either the OpenCV Haar Cascade classifier or a multi cascade convolutional network (MTCNN). As a result, we get an array of values assigned to each of the aforementioned basic emotions, in percentages between 0 and 1. If we want, we can also access just the value of the dominant emotion.  Let's demonstrate how analyzing emotions using FER works.

 

Using the FER library

FER is also available on PyPI, which means that we can install it very easily by running the following code:

pip install fer

After installing, the first thing we will do is import what we need.

from fer import FER
import cv2

Now we can define our emotion detector. For this example let's use MTCNN. If we set the argument mtcnn to True, the detector will use the MTCNN. If we set it to False, it will use the Haar Cascade classifier.

emotion_detector = FER(mtcnn=True)

We can now define the image that we want to analyze. Let's use an image that has multiple faces in it to demonstrate that FER can analyze multiple faces at once.

Image Source: Paul Ekman Group, The Science of Smiling. https://www.paulekman.com/blog/science-of-smiling/  

test_img = cv2.imread("multiple_faces.png")
analysis = emotion_detector.detect_emotions(test_img)

The result we get is a list of dictionaries, where each dictionary represents one face. It provides us with bounding box coordinates and an analysis of the emotions of the people in the images. 

[{'box': (778, 133, 163, 163),
  'emotions': {'angry': 0.0,
   'disgust': 0.0,
   'fear': 0.0,
   'happy': 0.66,
   'sad': 0.01,
   'surprise': 0.0,
   'neutral': 0.32}},
 {'box': (467, 158, 165, 165),
  'emotions': {'angry': 0.0,
   'disgust': 0.0,
   'fear': 0.0,
   'happy': 1.0,
   'sad': 0.0,
   'surprise': 0.0,
   'neutral': 0.0}},
 {'box': (149, 437, 128, 128),
  'emotions': {'angry': 0.0,
   'disgust': 0.0,
   'fear': 0.0,
   'happy': 1.0,
   'sad': 0.0,
   'surprise': 0.0,
   'neutral': 0.0}},
 {'box': (468, 443, 152, 152),
  'emotions': {'angry': 0.03,
   'disgust': 0.01,
   'fear': 0.01,
   'happy': 0.85,
   'sad': 0.01,
   'surprise': 0.02,
   'neutral': 0.07}},
 {'box': (783, 421, 164, 164),
  'emotions': {'angry': 0.0,
   'disgust': 0.0,
   'fear': 0.0,
   'happy': 0.98,
   'sad': 0.0,
   'surprise': 0.0,
   'neutral': 0.02}},
 {'box': (163, 123, 146, 146),
  'emotions': {'angry': 0.0,
   'disgust': 0.0,
   'fear': 0.0,
   'happy': 1.0,
   'sad': 0.0,
   'surprise': 0.0,
   'neutral': 0.0}}]

 
Of course, FER would work even if the image we inputted was grayscale. 

Using FER for analyzing low-quality images

While DeepFace has problems working with low-quality images (at least out-of-the-box), FER doesn't. Let's demonstrate that on the low-quality image from before.

test_img_low_quality= cv2.imread("FER_dataset_image.png")
analysis = emotion_detector.detect_emotions(test_img_low_quality)
analysis

Running the code above will give us the following result:

[{'box': (0, 0, 45, 45),
  'emotions': {'angry': 0.0,
   'disgust': 0.0,
   'fear': 0.0,
   'happy': 0.9,
   'sad': 0.0,
   'surprise': 0.09,
   'neutral': 0.0}}]

This demonstrates how well FER performs on low-quality images. We can also access only the most dominant emotion by changing the code just a little bit:

dominant_emotion, emotion_score = emotion_detector.top_emotion(test_img_low_quality)
print(dominant_emotion, emotion_score)

This will give us the following result:

happy 0.9

 

Conclusion

Emotion recognition is a field that keeps advancing at tremendous speed. One of the most important aspects of analyzing emotions is analyzing human faces. While the technology is still not perfect, advanced emotion recognition models outperform humans when it comes to emotion recognition in images. Of course, there are certain limitations to models, such as the one we demonstrated when working with the DeepFace library. However, most of the time, the results we get are pretty reliable.

While it is possible to build a custom model, for over 90% of users that is not necessary. The libraries that do exist are open-source, can be used for both commercial and non-commercial purposes, and allow users to perform emotion recognition using just a few lines of code.

Probably the most popular libraries for performing emotion recognition are DeepFace and FER. In this article, we demonstrated how to use both of them, and we also pointed out the advantages and disadvantages of each of the two libraries. In tandem, they form the perfect duo for performing emotion recognition.

In the next article of this series, we will demonstrate how to perform emotion recognition on videos. We will try to predict whether a student is interested in a particular lecture or not. This could in the future become a very powerful tool that helps teachers, professors, and those in the education industry better cater to the needs of their students and make education more effective.

Boris Delovski

Data Science Trainer

Boris Delovski

Boris is a data science trainer and consultant who is passionate about sharing his knowledge with others.

Before Edlitera, Boris applied his skills in several industries, including neuroimaging and metallurgy, using data science and deep learning to analyze images.