Around the World with Amazon Rekognition Image

This is the first article of a two-part series playing with Amazon Rekognition and flags from around the world. Today we will focus on testing the default behavior of Rekognition Image, in the second part we will use Rekognition Custom Labels to build a custom machine learning model and detect national flags.

Rekognition Image

Object and scene detection is a key feature of Rekognition Image and a powerful and useful technology to search and filter large image collections. It is possible to build applications to search, and organize images, assigning labels based on the visual content. According to the documentation:

The DetectLabels_API lets you automatically identify thousands of objects, scenes, and concepts and returns a confidence score for each label. DetectLabels uses a default confidence threshold of 50.

Rekognition supports thousands of labels belonging to common categories. Will it recognize a flag? Can we use the service to map and search national flags? The AWS Management Console offers a very good “try demo” option that works for any .jpeg or .png image no larger than 5MB. Let’s give it a try, uploading a photo I took at the Haus der Kulturen der Welt in Berlin last week.

Good news! The label “Flag” is definitely supported by Rekognition Image and the confidence is very high, 99.6%.

"Labels": [ { "Name": "Flag",
            "Confidence": 99.57104492187

We can move to the next step, testing different flags.

National flags

While learning the implications of software in disputed territories and partially recognized countries, I encountered a simple case for image detection: national flags. How can we test them with Rekognition Image? First of all we need some images, a reliable dataset we can use to test the behavior of the managed service.

As a dataset of regional and national flags, we will use the ones from the open source repository region-flags, a collection of flags for geographic region and subregion codes maintained by Google.

Graphic images of flags are not the only option, or even the main testing scenario: there are also photos with flags. For real pictures where a flag is present, we will rely on Unsplash, a sharing stock photography website with a liberal license.

We might test automatically the entire region-flags dataset later on using the SDK but for now we can start manually with a few selected countries and the management console.

The Stars and Stripes

The obvious benchmark is the flag of the United States of America. It should be an easy one to be detected, given that it is likely the main market for the product and the initial dataset for training.

Amazon Rekognition labels it as a “Flag” with a 99.7% confidence and “American Flag” with a 92.5% confidence (click on the images to see the data from the AWS console).

A good and reliable result. But is it enough to trust the service for labeling flags worldwide? How does Amazon Rekognition perform when we analyze real pictures including the Stars and Stripes?

Let’s add a second photo from Unsplash:

They have a confidence of 76.2% and 79.5% for the American Flag. The value drops to 70.1% if we instead choose a flag in a less common position.

Not ideal, but the confidence is still pretty good, all three images score between 70% and 80%. Can we then set our confidence at 70% and not validate further the responses? Time to move to different countries to understand how Rekognition works, what the confidence level represents and avoid side effects.

Amazon, this is not the Stars and Stripes!

Let us cross the Florida Straits and use the flag of Cuba. Some similarities in colors and patterns, but it is a very different flag and country.

The confidence level for the label flag is still very high (99.4%) but surprisingly Rekognition scores 82.5% for the American flag. Is it a glitch or a side effect of similarities in shape and design?** Is the model not trained?**

Il Tricolore

As an Italian cloud architect, the natural choice for a completely different flag is il Tricolore: different colors, different patterns. But at 87.6%, the confidence for the label American flag is even higher than for Cuba.

What is going on here? Let’s try a real picture, not just a graphic of the national flag.

Rekognition is doing an amazing job detecting both flags in the photo, but it is not able to recognize them as Italian ones. It is labeling them as American ones.

The American flag or the Malaysian one?

We could compare many other national flags but the results would be very similar. Most are correctly labeled as flags, almost all the ones with distinct stripes, but they are labeled as well as American flags with different levels of confidence.

No other national flag label is detected, neither an “Italian Flag”, nor a “Cuban Flag”. **Amazon Rekognition uses a hierarchical taxonomy of ancestor labels to categorize labels, but apparently the only child of “Flag” is “American Flag”. **

{ "Name": "American Flag", "Confidence": 87.66943359375, "Instances": [], "Parents": [ { "Name": "Flag" }, { "Name": "Symbol" } ] }

The closer a flag resembles the Stars and Stripes, the higher the confidence level: for the flag of Malaysia (92.4%), the value is very similar to the one of the United States (92,5%). Setting an arbitrary high confidence level might help reduce failures but it is not the strongest safety net for flags. We have a surprisingly similar result if we take a Malaysian flag from Unsplash, with a 83.8% confidence for the American flag that is even higher than the one we saw for the real flags of the United States that scored between 70% and 80%.

Not a flag

Mapping any flag to one of the United States is only part of the issue. Other images instead might raise questionable labels. For example, the_ Bandeira de Portugal _might be a positive result when your users search for dynamite, weapon, bomb or weaponry, all labels with a high 88.3% confidence level. Flag is not in the top labels as the PNG file is not recognised as a flag.

The Portuguese flag performs better in a real scenario, where it is once again labeled as flag, with a confidence level of 99.9%.

Anything new?

The above results have been consistent testing Rekognition Image in the last two years, but I wanted to check if something changed recently. Given the on-going war in Ukraine and the popularity of the Ukrainian flag in the last four months, I checked how Amazon Rekognition detects it.

The results are very similar to the Portuguese ones. The flag itself is not recognized as such, with labels that are way off, topping with “Home Decor”. Once a real picture is used, it is labeled as a flag with a high confidence (above 99%) but again no match for the country.

Any feedback from Amazon?

The first time I noticed the issue, I raised a ticket to AWS Support and the feedback was straightforward:

I was in touch with the Rekognition engineering team as well as the Rekognition product team and I have relayed your message over to them. They acknowledged that Rekognition is currently not trained to identify the flags.

That is fair, but I would then recommend removing the value “American Flag” until it is the only child of “Flag”: it gives little benefits with mostly false positives in non-US scenarios.

Conclusions

What have we learned so far? Is it worth using Rekognition Image?

Rekognition Image is doing a good job in decoding objects as flags in photos in very different conditions. It is not able instead to recognize different flags and it is labeling incorrectly most of the objects as American flags

**Accuracy is always relative. **Setting an artificial confidence level for the results of Amazon Rekognition is not enough. National flags are not the most important label and training scenario for machine learning but they are an example of the challenges when image detection is not handled properly.

**This is not an AWS problem, this is our problem as developers **integrating a managed service like Rekognition Image in our product or service. You will be the public face for your end users. If you need to integrate image recognition capabilities in your application, you have to manage the risks and challenges yourself.

If you need more reliable data, you need to take the next step in the image detection journey.

Next step?

It is now time to take the problem in our own hands and train a model to better recognize flags. How can we achieve that with Rekognition Custom Labels? How is Amazon Rekognition going to perform? We will discuss that in a separate article.

Thanks for making it this far! I am always looking for feedback to make it better, so please feel free to reach out to me via LinkedIn or email.

Credits

All the photos and images from the author, Unsplash or the region-flags repository.