Amazon Rekognition, this is not the Stars and Stripes

With the sensitive topic of biased results for face recognition, Amazon Rekognition has been in the news a lot recently. Amazon announced a one-year moratorium on allowing law enforcement to use the facial recognition platform. IBM decided not to offer facial recognition technology anymore. Microsoft took a conservative approach too.

But Amazon Rekognition is not limited to face recognition, and the benefits and risks for image analysis are much broader. Object and scene detection is a powerful and useful technology to search and filter large image collections. But even with objects, there might be side effects.

Image recognition, flags and bias

While learning the implications of software in disputed territories and partially recognized countries, I encountered a more trivial case for image detection: national flags.

As a dataset, I used the PNG image formats from two open source repositiories, Google Internationalization and country-flags.

The Stars and Stripes

The obvious benchmark is the flag of the United States of America. Amazon Rekognition labels it as a “Flag” with a 99.7% confidence and “American Flag” with a 92.5% confidence. A good and reliable result. But is it enough to trust the service for labelling flags worldwide?

What about Cuba?

Let us cross the Florida Straits and use the flag of Cuba as a comparison. Some similarities in colors and patterns, but still a very different flag. Not only politically.

The confidence level for the label “Flag” is still very high (99.4%) but surprisingly Rekognition has a 82.5% confidence of being the American flag. A glitch or a side effect of similarities in shape and design?

Il Tricolore

As an Italian, the natural choice for a different flag is il Tricolore: different colors, different patterns. But the confidence for the label “American flag” is even higher than for Cuba: 87.6%.

Malaysia

You can compare many other national flags but the results will be very similar. All are correctly labelled as “Flag” but they are labelled as well “American Flag” with various levels of confidence.

No other national flag label is detected, neither an “Italian Flag”, nor a “Cuban Flag”. Amazon Rekognition uses a hierarchical taxonomy of ancestor labels to categorize labels, but apparently the only child of “Flag” is “American Flag”. This likely reflects the main market for the product and the initial dataset for training.

{ "Name": "American Flag", "Confidence": 87.66943359375, "Instances": [], "Parents": [ { "Name": "Flag" }, { "Name": "Symbol" } ] }

The closer a flag resembles the Stars and Stripes, the higher the confidence level: for the flag of Malaysia (92.4%) is very similar to the one of the United States (92,5%). A demonstration that setting an arbitrary high confidence level might help but will not be safe in every scenario.

Any feedback from Amazon?

Last year I raised a ticket to AWS Support and the feedback was straightforward and honest:

I was in touch with the Rekognition engineering team as well as the Rekognition product team and I have relayed your message over to them. They acknowledged that Rekognition is currently not trained to identify the flags.

This is not an Amazon problem, this is your problem as a developer relying on an external service. If you integrate image recognition capabilities in your application, you have to manage the risks and challenges yourself. You cannot bury your head in the sand and hope for the best.

Accuracy is always relative

Setting an artificial confidence level number for the results of Amazon Rekognition is not enough. National flags are not the most important challenge for AI but they are an example of the risks when image detection is not handled properly. And mislabelled flags could even escalate tensions in conflict zones, disputed territories, or partially recognized states.

Fur further posts and talks on the challenges of geolocations, check saorico.com