So, we had this idea: What if you could use your child’s face as an emoticon in a text message? Instead of a punctuating a point with an everyday ?, wouldn’t it be more expressive and more enjoyable to send a small photo of your child’s beaming face? We decided to see what we could do.
The Challenge
Our main challenge in creating images that represent established emoji emotions is that facial recognition software by itself isnot enough. We needed a mechanism that would recognize emotions on the face. But once we could find a way to recognize emotions on a face, connecting that image with an emoji for use in iMessage is a relatively simple matter – just use App Extension to implement an iMessage App Extension. Information on how to do this can be found through Apple’s Developer site: https://developer.apple.com/imessage/
The SDKs
Google Cloud Vision
We’d never worked with facial recognition SDks before, so to start we checked to see whether Google has any solutions we could try – and yes, it does. The Google Cloud Vision API appeared to do what we were looking for, an Google has a demo to see how it works. Accordingly, we prepared a series of photos (courtesy of one of our emotionally deep developers) presenting a number of facial expressions – and then we had Cloud Vision analyze the picture for emotional content.
Here are the emotions and the photos we are using to show them:
No emotions | Joy | Sorrow | Anger | Surprise |
For some reason, though, Google’s Cloud Vision recognized only the “joy” emotion properly. In the other images, our developer appeared to have no emotions (though his hair appears to have been mistaken for headwear of some kind). Here’s the analysis from Cloud Vision:
No emotions |
Joy |
Sorrow |
Anger |
Surprise |
Our response?
Despite what appeared to be shortcomings in the area of sentiment analysis, Google’s Cloud Vision API may yet play a role for us. It also offers Entity Detection, Inappropriate Content Detection, Labels Detection, and more, and these features may be useful to us in another project on another day.
Figure 1: The JSON response presented by the Cloud Vision API.
EmoVu Mobile SDK
After the Cloud Vision API, we looked at the EmoVu Mobile SDK. The description of this SDK looked very promising, so we requested access – and received this response:
“Unfortunately, we are now working a new major update for Android SDK and iOS SDK and I would be happy to let you know as soon as we have them released.”
Predictably our response was:
Affectiva Emotion Recognition SDK
Finally, we put the Affectiva Emotion Recognition SDK through its paces. We used the same photos we had used when examining the Google Cloud Vision API — and, to our dismay, we found that Affectiva kept detecting “disgust” in each of our photos.
Affectiva Analysis | ||||
No emotions | Joy | Sorrow | Anger | Surprise |
Disgust =100 Joy =99.8 ? = 99.9 Smile =95.5 |
Sadness = 67.7 | Disgust = 100 | Joy = 99.9 |
Naturally, our photo subject responded:
While concerned about the accuracy of the analysis of the photos, we liked the way Affectiva provides a variety of values for each detected face. For example, these are the values that Affectiva provided for the “Joy” face:
Emotion / Expression / Emoji | Value |
anger, contempt, sadness | 0.0 |
disgust | 100.0 |
engagement | 95.5 |
joy | 99.8 |
surprise | 0.2 |
valence | 2.1 |
attention | 98.7 |
browFurrow | 0.1 |
browRaise, innerBrowRaise, lipCornerDepressor, lipSuck, mouthOpen, upperLipRaise | 0.0 |
chinRaise | 2.2 |
eyeClosure | 0.5 |
lipPress | 13.9 |
lipPucker | 7.0 |
noseWrinkle | 91.0 |
smile | 95.5 |
?, ?, ?, ?, ?, ?, ? | 0.0 |
☺ | 99.8 |
? | 99.9 |
? | 6.0 |
? | 2.3 |
? | 4.3 |
While each of these SDKs appear to offer some powerful tools that we could use in creating the customized emoji that we’re looking for, we still had a problem with detection accuracy. “It seems like there must be something wrong with my facial expressions,” said our photo subject, “because all these tools cannot be wrong.” However, we had another idea: What if the problem wasn’t our subject’s facial expressions but the fact that we’d asked the applications to analyze expressions captured in still photos. Maybe we needed a more dynamic input source.
The Solution
Since each of the applications we examined had issues detecting the correct emotion in still photos, we decided to try to connect the Affectiva app to the iPhone camera itself. That way, Affectiva could analyze for emotions in the faces of live subjects, which might yield better results vis a vis the actual emotional expressions. If Affectiva detected a high joy value in a face, for example, we could save that expression for use as our joy emoji.
Implementation
Implementing this idea required us to configure Affectiva to interact with the iPhone’s camera, detect emotions, and provide a callback for each detection.
Here’s what that looks like in code:
[otw_shortcode_info_box border_type=”bordered” border_style=”bordered”]- (id)initWithDelegate:(id )delegate usingCamera:(AFDXCameraType)camera maximumFaces:(NSUInteger)maximumFaces;[/otw_shortcode_info_box]
We decided to store all detected emotion objects in an array of simple models:
[otw_shortcode_info_box border_type=”bordered” border_style=”bordered”]1 @interface DSTEmotionModel : NSObject
2 @property (nonatomic, strong) NSString *title;
3 @property (nonatomic) CGFloat value;
4 @property (nonatomic, strong) UIImage *image;
5 @end
6 @property (nonatomic, strong) NSMutableArray *objects;[/otw_shortcode_info_box]
Figure 2: The emotion model array
Finally, we decided to store only those images whose emotional value was greater than 50. This was an arbitrary number but we did not want to associate any images with emoji if they are not strong reflections of the detected emotion. We also decided to update the array of objects with new images only if Affectiva detects in the new image an emotional value greater than the value already stored in this array. So, if joy is detected in a new analysis of a face and the value for that detection is 90, that image would replace an image previously stored in the array whose joy value is only 80 (but the image with a joy value of 80 would be retained if a newly analyzed face has a joy value of only 70).
To perform a real-time facial analysis and associate different emoji with the detected emotions, we created a simple view:
When a user clicks the “Start” button, the app prompts the AFDXDetector instance to use the camera. Here’s what the code looks like:
[otw_shortcode_info_box border_type=”bordered” border_style=”bordered”]- (void)detector:(AFDXDetector *)detector hasResults:(NSMutableDictionary *)faces forImage:(UIImage *)image atTime:(NSTimeInterval)time;[/otw_shortcode_info_box]
As Affectiva detects facial expressions associated with specific emoji, our app saves a snapshot of that expression in the array and links it with the associated emoji. The resulting pairs of emotional images/emoji appear on the screen beneath the window in which the camera image is seen.
Watch a video below of the app in action:
Once we have a database of images associated with emoji, we just need to implement the iMessage App Extension to enable us to send them to friends. Implementation details can be found here: https://developer.apple.com/imessage/
Result
By connecting our array of images through the iMessage App Extension, we can now draw on those images instead of inserting a standard emoji. The representative emoji are still visible atop the pictures, but if we click the emoji the image is inserted into the message in place of the standard emoji:
Summary
First of all, a big Thank You! to all the people who work on Facial Recognition and Emotion Detection APIs – and who let the world use them. Your work makes it possible to create great applications in a short amount of time. We were surprised by how many libraries are already presented.
By connecting the Affectiva app to the camera and using it to analyze facial expressions in real time, we succeeded in capturing the types of images we were interesting in using for emoji. Then, by connecting our array of images through the iMessage App Extension, we succeeded in substituting those images for the generic emoji themselves.
Needless to say, you can guess the result: