Flagging Flags: Nine Numbers with Amazon Rekognition

I recently published an article where I played with Amazon Rekognition and flags from around the world. Few friends and developers asked for more numbers, either out of curiosity or because they had some suggestions or doubts.

Is the “Stars and Stripes” the flag with the highest confidence for the label “American Flag”? Are all the flags labelled as “Flag”? Does the quality of the PNG file affect the label detection? 

Before training a model to better recognize flags with Rekognition Custom Labels, I decided to publish more results and the full dataset. Here are nine numbers and trends for the 255 flags available in the repository. Once more I rely on the images from the open source region-flags, a collection of flags for geographic region and subregion codes maintained by Google. 

128 Flags

The outcome is a coin toss: almost a perfect 50% (128 out of 255) of the flags is labelled “Flag”. Only 98 of them with a confidence above 90%, 73 above 98% and 56 above 99%. OK, a flag is not always a flag. In doubt, toss a coin, a cheaper algorithm than an API request.

Flag is a Flag

98 American Flag

As we already noticed, many flags, including the Cuban and Malaysian ones, are labelled as “American Flag”. How many PNG files are decoded as “American Flag”? There are 98 of them, with 27 above 90% and one above 98%. A high confidence level alone is not always a safety net.

Flag US

Only one Stars and Stripes

The flag with the highest confidence for “American Flag” is the one of the United States Peru. No kidding, a very high 98.3 %. The real “Stars and Stripes” is actually not in the top ten for “American Flag”

Flag Peru

No Syrup, two Maples

Two labels, “Maple” and “Maple Leaf” matched the Canadian flag. And only the Canadian one. Perfect match. Well done Rekognition!

Flag Canada

Eleven Outdoor Flags

What do Slovenia, Laos and Kosovo have in common? Their flags are all labelled “Outdoor”, Laos with a staggering 99.47% confidence level. Whether you are looking for rock climbing in Nong Khiaw or kayaking through Si Phan Don, the flag of Laos is apparently the country’s best marketing tool.

Flag Laos

One Lollipop

There is only one lollipop detected. And we cannot even share it. The flag of Dominica, which features a sisserou parrot, the national bird emblem, got the only (incorrect) candy.

Flag Dominica

61 Stars

Almost a quarter of the flags have the “Star Symbol”, 25 with a confidence above 90% and the European Union leading at 97%. Brexit or not, the twelve golden stars on a blue background are an easy catch for Rekognition.

Flag EU

239 Symbols

Almost every flag has “Symbol” as a label (94%), with 75 of them at 99% confidence level. India, Georgia and Peru are all above 99.99%. Whatever Symbol means.

Flag India

19 Animals

From Kiribati to Mexico, from American Samoa to Uganda, Rekognition does a good job finding animals inside flags: of the 19 decoded, only 3 are false positives. While the parent label (Rekognition uses a hierarchical taxonomy of ancestor labels) is good, the species itself is often wrong: a “Penguin” for Uganda, a “Chicken” for Mexico. Whoops.

Flag Uganda

Size of PNG is not significant

There is not any significant discrepancy testing the flags at 1000px or 250px, with confidence level slightly higher or lower, but without a significant pattern. This is somehow expected as the models are likely trained with images scaled down to a fixed size to reduce the computational load.

Testing All Flags

How can you quickly test all the flags? Amazon Rekognition, the AWS CLI and a while loop is the answer. You upload the dataset to a S3 bucket, and you run a simple command in the AWS CLI:

aws s3api list-objects --bucket <my-bucket> \
    --query 'Contents[].{Key: Key}' | jq .[].Key > list-countries.csv
cat list-countries.csv | while read flag
   aws rekognition detect-labels --image "{\"S3Object\":{\"Bucket\":\"<my bucket>\",\"Name\":"$flag\"}}"  >  $flag.json

Not elegant, but it just works. Every output file is a JSON, one file for every flag. Here is an example output (Italy) and here is a zip with the output for all the countries.


Maybe there is not one single Stars and Stripes but we have only one Lollipop. 

The decoding of flags on Amazon Rekognition is quite unreliable but, with a few exceptions, the decoding of objects and animals inside the flags is accurate. 

Please don’t take the numbers too seriously. As already acknowledged by AWS Support, Rekognition is currently not trained to identify flags. These numbers are just a warning and a reminder that results from image recognition have to be validated and used carefully

How can we improve our results and have some confidence in the flag detection process?  We will soon play with Rekognition Custom Labels and discuss the results in a separate article. 

Thanks for making it this far! I am always looking for feedback to make it better, so please feel free to reach out to me via LinkedIn or email.


All the screenshots are from the author and PNG files are from the region-flags repository.

Around the World with Amazon Rekognition Image 

This is the first article of a two-part series playing with Amazon Rekognition and flags from around the world. Today we will focus on testing the default behavior of Rekognition Image, in the second part we will use Rekognition Custom Labels to build a custom machine learning model and detect national flags.

Rekognition Image

Object and scene detection is a key feature of Rekognition Image and a powerful and useful technology to search and filter large image collections. It is possible to build applications to search, and organize images, assigning labels based on the visual content. According to the documentation:

The DetectLabels_API lets you automatically identify thousands of objects, scenes, and concepts and returns a confidence score for each label. DetectLabels uses a default confidence threshold of 50. 

Rekognition supports thousands of labels belonging to common categories. Will it recognize a flag? Can we use the service to map and search national flags? The AWS Management Console offers a very good “try demo” option that works for any .jpeg or .png image no larger than 5MB. Let’s give it a try, uploading a photo I took at the Haus der Kulturen der Welt in Berlin last week.

A random flag In Berlin

Good news! The label “Flag” is definitely supported by Rekognition Image and the confidence is very high, 99.6%.

"Labels": [ { "Name": "Flag",
            "Confidence": 99.57104492187

We can move to the next step, testing different flags.

National flags

While learning the implications of software in disputed territories and partially recognized countries, I encountered a simple case for image detection: national flags. How can we test them with Rekognition Image? First of all we need some images, a reliable dataset we can use to test the behavior of the managed service.

As a dataset of regional and national flags, we will use the ones from the open source repository region-flags, a collection of flags for geographic region and subregion codes maintained by Google.

Flags from around the world

Graphic images of flags are not the only option, or even the main testing scenario: there are also photos with flags. For real pictures where a flag is present, we will rely on Unsplash, a sharing stock photography website with a liberal license.

We might test automatically the entire region-flags dataset later on using the SDK but for now we can start manually with a few selected countries and the management console.

The Stars and Stripes

The obvious benchmark is the flag of the United States of America. It should be an easy one to be detected, given that it is likely the main market for the product and the initial dataset for training.

Flag of the United States

Amazon Rekognition labels it as a “Flag” with a 99.7% confidence and “American Flag” with a 92.5% confidence (click on the images to see the data from the AWS console).

Rekognition and flag of the USA

A good and reliable result. But is it enough to trust the service for labeling flags worldwide? How does Amazon Rekognition perform when we analyze real pictures including the Stars and Stripes?

Rekognition and flag of the USA

Let’s add a second photo from Unsplash:

Rekognition and flag of the USA

They have a confidence of 76.2% and 79.5% for the American Flag. The value drops to 70.1% if we instead choose a flag in a less common position.

Rekognition and flag of the USA

Not ideal, but the confidence is still pretty good, all three images score between 70% and 80%. Can we then set our confidence at 70% and not validate further the responses? Time to move to different countries to understand how Rekognition works, what the confidence level represents and avoid side effects.

Amazon, this is not the Stars and Stripes!

Let us cross the Florida Straits and use the flag of Cuba. Some similarities in colors and patterns, but it is a very different flag and country.

Rekognition and flag of Cuba

The confidence level for the label flag is still very high (99.4%) but surprisingly Rekognition scores 82.5% for the American flag. Is it a glitch or a side effect of similarities in shape and design?** Is the model not trained?**

Il Tricolore

As an Italian cloud architect, the natural choice for a completely different flag is il Tricolore: different colors, different patterns. But at 87.6%, the confidence for the label American flag is even higher than for Cuba.

Rekognition and flag of Italy

What is going on here? Let’s try a real picture, not just a graphic of the national flag.

Rekognition and flag of Italy

Rekognition is doing an amazing job detecting both flags in the photo, but it is not able to recognize them as Italian ones. It is labeling them as American ones.

The American flag or the Malaysian one?

We could compare many other national flags but the results would be very similar. Most are correctly labeled as flags, almost all the ones with distinct stripes, but they are labeled as well as American flags with different levels of confidence.

No other national flag label is detected, neither an “Italian Flag”, nor a “Cuban Flag”. **Amazon Rekognition uses a hierarchical taxonomy of ancestor labels to categorize labels, but apparently the only child of “Flag” is “American Flag”. **

{ "Name": "American Flag", "Confidence": 87.66943359375, "Instances": [], "Parents": [ { "Name": "Flag" }, { "Name": "Symbol" } ] }

The closer a flag resembles the Stars and Stripes, the higher the confidence level: for the flag of Malaysia (92.4%), the value is very similar to the one of the United States (92,5%). Setting an arbitrary high confidence level might help reduce failures but it is not the strongest safety net for flags. We have a surprisingly similar result if we take a Malaysian flag from Unsplash, with a 83.8% confidence for the American flag that is even higher than the one we saw for the real flags of the United States that scored between 70% and 80%.

Rekognition and flag of Malaysia

Not a flag

Mapping any flag to one of the United States is only part of the issue. Other images instead might raise questionable labels. For example, the_ Bandeira de Portugal _might be a positive result when your users search for dynamite, weapon, bomb or weaponry, all labels with a high 88.3% confidence level. Flag is not in the top labels as the PNG file is not recognised as a flag.

Rekognition and flag of Portugal

The Portuguese flag performs better in a real scenario, where it is once again labeled as flag, with a confidence level of 99.9%.

Rekognition and flag of Portugal

Anything new?

The above results have been consistent testing Rekognition Image in the last two years, but I wanted to check if something changed recently. Given the on-going war in Ukraine and the popularity of the Ukrainian flag in the last four months, I checked how Amazon Rekognition detects it.

Rekognition and flag of Ukraine

The results are very similar to the Portuguese ones. The flag itself is not recognized as such, with labels that are way off, topping with “Home Decor”. Once a real picture is used, it is labeled as a flag with a high confidence (above 99%) but again no match for the country.

Rekognition and flag of Ukraine

Any feedback from Amazon?

The first time I noticed the issue, I raised a ticket to AWS Support and the feedback was straightforward:

I was in touch with the Rekognition engineering team as well as the Rekognition product team and I have relayed your message over to them. They acknowledged that Rekognition is currently not trained to identify the flags.

That is fair, but I would then recommend removing the value “American Flag” until it is the only child of “Flag”: it gives little benefits with mostly false positives in non-US scenarios.


What have we learned so far? Is it worth using Rekognition Image?

Rekognition Image is doing a good job in decoding objects as flags in photos in very different conditions. It is not able instead to recognize different flags and it is labeling incorrectly most of the objects as American flags

**Accuracy is always relative. **Setting an artificial confidence level for the results of Amazon Rekognition is not enough. National flags are not the most important label and training scenario for machine learning but they are an example of the challenges when image detection is not handled properly.

**This is not an AWS problem, this is our problem as developers **integrating a managed service like Rekognition Image in our product or service. You will be the public face for your end users. If you need to integrate image recognition capabilities in your application, you have to manage the risks and challenges yourself.

If you need more reliable data, you need to take the next step in the image detection journey.

Next step?

It is now time to take the problem in our own hands and train a model to better recognize flags. How can we achieve that with Rekognition Custom Labels? How is Amazon Rekognition going to perform? We will discuss that in a separate article.

Thanks for making it this far! I am always looking for feedback to make it better, so please feel free to reach out to me via LinkedIn or email.


All the photos and images from the author, Unsplash or the region-flags repository.