Dice, Skylines and CloudWatch Anomaly Detection

I am a lazy cloud architect with a background in site reliability engineering. That’s why I immediately felt in love with the idea behind CloudWatch Anomaly Detection when it was announced almost three years ago.

What is anomaly detection?

Regardless of the algorithm used to determine the outliers, anomaly detection is the process of discovering values that differ considerably from the majority of the data and should raise suspicions and alarms. The availability of a managed service, based on machine learning, that alerts a SRE if something goes wrong is too good to be ignored. CloudWatch Anomaly Detection is that option, without integrating third party tools or relying on more complex services like Amazon Lookout for Metrics.

Configuring CloudWatch Anomaly Detection

In a few seconds you can add an alarm that will help monitor even the simplest website. A service with a pricing that is not too high or complicated. What can go wrong with Anomaly Detection? Not too much. As long as you do not consider it a catch-all alarm replacing any other one you have configured in CloudWatch.

While the expected values represent normal metric behavior, the threshold of Anomaly Detection is based on standard deviation, as the label in the console suggests: “Based on a standard deviation. Higher number means thicker band, lower number means thinner band”.

The only not trivial step in the setup is deciding the threshold: what is a good number? Small with possibly many false alarms? High with the chance of missing some outliers? A bigger challenge is to remember that the algorithm cannot know the constraints of your system or the logic behind your product. Let’s give it a try.

Monitoring coconut orders

Let’s assume you have a successful website where you sell coconuts and you want to monitor the number of completed purchases per minute. You have thousands of orders at peak time, a few hundreds during the night with some daily and weekly patterns. Lucky you, that is many coconuts! How can you monitor the online shop? How do you adapt the alarms for seasonality and trend changes?

Without Anomaly Detection, you should have at least two static alarms in CloudWatch to catch the following cases:

  • the “Zero Orders” scenario: it likely indicates that something is broken in the shop. A simple static alarm, catching zero values for the shortest sensible period will not raise many false positives.
  • the “Black Friday” scenario: it is much harder to define a safe upper boundary but you can for example create an alarm at 130% of the maximum value you achieved in the previous month.

Falling coconuts

None of these two static alarms helps if the orders fall by half during the day or if the pattern suddenly changes and you lose 30% of your daily orders. You still do not account for seasonality but these static alarms are better than no monitoring.

Here comes CloudWatch Anomaly Detection: with a few clicks, you can configure an alarm and be notified when the pattern of the orders changes.

Can you simply configure the smart alarm, discard the static ones and trust the magic of machine learning? Let’s take a step back and look at one of the very first presentations of Anomaly Detection.

The example used to highlight the seasonality and the benefits of the new option shows a range band – regardless of how many standard deviations – with negative values. But the ConsumedWriteCapacityUnits metric cannot be negative. A subpar example?

Going below zero

The ConsumedWriteCapacityUnits one is not a corner case. Most AWS and custom metrics have only positive values. Selecting randomly some metrics in the dashboard:

  • you cannot have negative orders in the coconut (custom) metric
  • you cannot have negative IOPS on RDS
  • you cannot have a negative CPU or ACU for Aurora Serverless

Considering 100s metrics, there are only a few that can occasionally go below zero. But the gray band in Anomaly Detection often does.

If you set up a static zero alarm as previously discussed, just keep it: one based on Anomaly Detection might not react as quickly as a static one. The ML option can help finding outliers but it is not the fastest way to catch a broken system with no orders.

For example, during the quieter hours, a “zero orders” scenario would not be immediately an outlier.

Ideally there should be a flag in CloudFront to enforce positive values. But only you know the pattern of your service and a strength of CloudWatch Anomaly Detection is the simple setup. It just works.

Let’s do a simple test to show the difference between constrained values and an algorithm based on machine learning. Let’s roll a dice.

Rolling a dice

One dice, six faces, and numbers between 1 and 6. No pattern and no outliers. There are no 0s and no 7s, there are no values outside the fixed range when you roll a dice. But Anomaly Detection cannot know that.

How can we test it? Let’s roll a dice in CloudWatch with the AWS CLI and a one line bash script roll-a-dice:

aws cloudwatch put-metric-data --namespace "cloudiamo.com" --metric-name "dice-1m" --unit Count --value $(( $RANDOM % 6 + 1 ))

Adding the script to the crontab, we can have a new random value in CloudWatch every minute.

* * * * * /home/ubuntu/roll-a-dice

We now set up Anomaly Detection on the custom dice metric, wait a few days and see what the AWS algorithm thinks of the randomic pattern. How is it going to apply machine learning algorithms to the dice’s past data and create a model of the expected values?

Anomaly Detection is doing a good job given the circumstances but a zero or a seven might not (immediately) trigger an alarm.

Rolling a dice is way too simple and it has no predictable patterns, but if you have hard boundaries in your values, you should have a separate static alarm for that. Relying only on Anomaly Detection is suboptimal. Let’s now challenge CloudWatch and the AWS algorithm with something more complicated, a skyline.

Drawing the NYC skyline

Last year I presented a session at re:Invent, drawing the NYC skyline with Aurora Serverless v2. A SQL script triggered the spikes in the CPU and the Aurora Capacity Unit (ACU) of the serverless database, drawing a basic skyline of New York City in CloudWatch.

Let’s run that SQL script multiple times, for days, for weeks. Is CloudWatch Anomaly Detection going to forecast the NYC skyline?

Reusing the same logic from re:Invent, we can run it on a Aurora Serverless v2 endpoint, adding a 30 minutes sleep between executions and looping. This translates to a single bash command:

while true; do mysql -h nyc.cluster-cbnlqpz*****.eu-west-1.rds.amazonaws.com -u nyc < nyc.sql; sleep 1800; done;

Unfortunately, even after a couple of weeks, the range of Anomaly Detection is still not acceptable.

What is the problem here? A key sentence explains how the service works: “Anomaly detection algorithms account for the seasonality and trend changes of metrics. The seasonality changes could be hourly, daily, or weekly”.

Our loop has a fixed period but it is not hourly, daily or weekly. It is 30 minutes plus the execution of the SQL script. The data points at 7:47 UTC and 8:47 UTC are unrelated. The data points at 7:47 UTC on different days have nothing in common, we do not have a standard and supported seasonality.

But is this really the problem? Let’s change the approach slightly and run the SQL script hourly. It is a single line in the crontab:

0 * * * * mysql -h nyc2.cluster-cbnlqpz*****.eu-west-1.rds.amazonaws.com -u nyc < nyc.sql

Does the new period work better with Anomaly Detection? Let’s wait a few days and see the new forecasted range.

After a couple of days the overlap is still not perfect and the baseline for the CPU is generous but there is now a clear pattern. The outliers are not too different from the ones we saw with the coconuts.

If we suddenly change the crontab entry from hourly to every two hours, we notice that Anomaly Detection was indeed forecasting an hourly pattern.

The seasonality of the data is a key element. A periodic pattern is not enough, an hourly, daily or weekly one is required.

Conclusions

What did we learn? Is it worth using CloudWatch Anomaly Detection?

  • CloudWatch Anomaly Detection is easy to configure, almost free, and is a great addition to a monitoring setup. There are very few reasons not to use it.
  • You should add Anomaly Detection to your existing static alarms in CloudWatch, not simply replace them.
  • Make sure that your pattern is hourly, daily, or weekly.
  • There is much more you can do going forward: Amazon CloudWatch now supports anomaly detection on metric math expressions.
  • Take a look at Amazon Lookout for Metrics if you need a more powerful tool and are planning to automatically detect anomalies in business and operational data. Consider CloudWatch Application Insights if you need automated setup of observability for enterprise applications.

Thanks for making it this far! I am always looking for feedback to make it better, so please feel free to reach out to me via LinkedIn or email.

Credits

Coconut photo by Tijana Drndarski and dice photo by Riho Kroll. Re:Invent photo by Goran Opacic. All other photos and screenshots by the author. The AWS bill for running these tests was approximately 120 USD, mainly ACU for Aurora Serverless. Thanks AWS for the credits. Thanks to Stefano Nichele for some useful discussions about the benefits and challenges of CloudWatch Anomaly Detection.

Serverless Architecture Con – Berlin 2022

This autumn I will be back at the Serverless Architecture Con, this time in Berlin, to talk about serverless databases. The title of my session? The Future of Relational Databases on the Cloud.

Abstract

The major cloud providers offer different options to run a relational database on the cloud. A recent approach is to rely on so-called serverless databases that offer both traditional TCP connections and HTTP API access. In a short journey to databases on the cloud, we will compare different approaches and services, explore the main benefits and limitations of a serverless RDBMS versus a more traditional managed database.

InfoQ – June 2022

From the PowerShell Custom Runtime for Amazon Lambda to MongoDB Atlas Serverless, from SynLapse, a critical Synapse Analytics vulnerability in Azure, to AWS IoT ExpressLink: a recap of my articles for InfoQ in June.

AWS Releases IoT ExpressLink: Cloud-Connectivity Software for Hardware Modules

Amazon recently announced the general availability of AWS IoT ExpressLink. The cloud-connectivity software supports wireless hardware modules to build IoT products that connect with cloud services.

SynLapse: Orca Security Publishes Details for Critical Azure Synapse Vulnerability

In a recent article, Orca Security describes the technical details of SynLapse, a critical Synapse Analytics vulnerability in Azure that allowed attackers to bypass tenant separation. The issue is now addressed but the timing and the disclosure process raised concerns in the community.

Cockroach Labs 2022 Cloud Report: AMD Outperforms Intel

Cockroach Labs recently released their annual cloud report which evaluates the performance of AWS, Microsoft Azure and Google Cloud for common OLTP workloads. Differently from the past, this year’s report does not indicate a best overall provider, but concludes that AMD instances outperform Intel ones. ARM instances were not covered in the tests.

MongoDB Atlas Serverless Instances and Data API Now Generally Available

At the recent MongoDB World 2022 conference, MongoDB announced that serverless instances for Atlas and Data API are now generally available. The new managed serverless option introduces a tiered pricing, with automatic discounts on daily usage.

AWS Introduces IP-Based Routing on Route 53

AWS recently announced support for IP-based routing on Amazon Route 53. The new option of the DNS service allows customers to route resources of a domain based on the client subnet to optimize network transit costs and performance.

AWS DataSync Supports Moving Data between AWS, Google Cloud and Azure

Amazon recently announced that AWS DataSync now supports Google Cloud Storage and Azure Files storage as storage locations. The two new options of the data service helps moving data both into and out of AWS, but data transfer fees still might still be a limitation.

AWS Introduces PowerShell Custom Runtime for Lambda

AWS recently announced a new PowerShell custom runtime for AWS Lambda to run Lambda functions written in PowerShell. With the new runtime developers can write native PowerShell code in Lambda without having to compile it, simplifying deployment and testing.

More news? A recap of my articles for InfoQ in May.

Cockroach Labs 2022 Cloud Report: AMD meglio di Intel?

Cockroach Labs ha recentemente pubblicato il rapporto annuale sul cloud che valuta le prestazioni di AWS, Microsoft Azure e Google Cloud per carichi di lavoro OLTP. Diversamente dal 2021, il report quest’anno non indica un cloud provider migliore, ma osserva come le istanze AMD più recenti offrono prestazioni migliori di quelle Intel. Le istanze ARM non sono invece state testate. 

Secondo il rapporto, tutti e tre i principali fornitori di servizi cloud offrono opzioni simili a prezzi competitivi. Eseguendo oltre 3000 test su 56 diversi tipi di istanze e con 107 configurazioni differenti, la migliore qualità-prezzo è risultata dalle istanze AMD con processori Milan. L’esclusione dai test delle istanze ARM e di altre istanze recenti lascia dubbi per i casi d’uso in cui, ad esempio, le istanze AWS Graviton potrebbero risultare migliori.  Keith McClellan, director of partner solutions engineering at Cockroach Labs e autore del report, riconosce il problema:

ARM tornerà nel Cloud Report 2023 – sfortunatamente non avevamo ancora i binari ARM di CockroachDB per il report di quest’anno. 

Non tutte le analisi concordano sui vantaggi delle istanze AMD. Il benchmark “Economical Comparison of AWS CPUs for MySQL (ARM vs Intel vs AMD)” di Percona, conclude che Graviton è più economico nella maggior parte dei casi e di solito Intel ha prestazioni migliori di AMD, almeno su AWS.

Secondo Cockroach Labs, le istanze più piccole offrono in proporzione risultati migliori di quelle più grandi: per i test OLTP e CPU, il report mostra un vantaggio in termini di prestazioni per vCPU su istanze più piccole, indipendentemente dalla piattaforma CPU, dal cloud o dal tipo di istanza.

Processori e benchmark CPU non erano l’unico obiettivo degli autori. Il rapporto sottolinea l’importanza dei costi di storage e trasferimento dati che hanno un impatto significativo sul costo totale del deployment:

Anche per quantità relativamente piccole di storage, il costo totale di un carico di lavoro è molto più influenzato dal costo dello storage rispetto al costo dell’istanza.

Tranne rare eccezioni, il benchmark evidenzia che non valga la pena scegliere e pagare storage ad alte prestazioni, come ad esempio Provisioned IOPS. Per lo stesso processore, i fornitori di servizi cloud offrono classi di istanza diverse, con un rapporto vCPU/RAM diverso. Nel report, gli autori scrivono: 

I nostri test suggeriscono che, sebbene sia possibile risparmiare scegliendo istanze con un rapporto vCPU/RAM più basso, è probabile che si osservino prestazioni migliori con istanze con più memoria disponibile. (…) Nei nostri test, abbiamo riscontrato che il rapporto ideale di vCPU:RAM è pari a 1:4.

Rispetto all’edizione precedente, il report 2022 ha aggiunto test per tipi di istanze di dimensioni diverse, test di latenza tra regioni e test di salvataggio dati con fsync.

L’accesso al report completo è gratuito ma richiede la registrazione.

Vuoi leggere altre news su AWS?

AWS DataSync supporta il trasferimento dati tra AWS, Google Cloud e Azure

AWS DataSync supporta il trasferimento dati tra AWS, Google Cloud e Azure

Amazon ha recentemente annunciato che AWS DataSync supporta Google Cloud Storage e Azure Files storage. Le due nuove opzioni del servizio aiutano a trasferire i dati sia in entrata che in uscita da AWS, ma i costi di data transfer rimangono significativi

DataSync può copiare e sincronizzare i dati da diverse sources, supportando progetti multi-cloud o particolari requisiti di protezione dei dati. Basandosi su un protocollo proprietario, DataSync esegue e verifica trasferimenti di dati una tantum e periodici e scala elasticamente in funzione del carico. Il servizio supporta filtri di inclusione/esclusione, controlli di limitazione di banda e il recovery automatico in caso di problemi temporanei di rete. Danilo Poccia, chief evangelist EMEA presso AWS, spiega:

In questo modo, puoi semplificare le tue attività di elaborazione dei dati o consolidamento dello storage. Questo aiuta anche se devi importare, condividere e scambiare dati con clienti, fornitori o partner che utilizzano Google Cloud Storage o Microsoft Azure Files. DataSync fornisce sicurezza end-to-end, inclusa la crittografia e la validazione dell’integrità dei dati.

Google Cloud Storage e Azure Files storage sono le prime sources di altri cloud providers, ma non sono le uniche opzioni supportate da AWS DataSync: in servizio può sincronizzare i dati con NFS, SMB, Hadoop Distributed File Systems (HDFS) e diversi servizi AWS come Amazon S3 o Amazon FSx

Mentre Danilo Poccia spiega come trasferire dati da Google Cloud Storage ad Amazon S3, Rodney Underkoffler e Aidan Keane, senior specialist solutions architect presso AWS, dimostrano invece come spostare i dati da SMB su Azure Files

AWS DataSync può aiutare a migrare i dati tra i principali fornitori di servizi cloud, ma i costi del trasferimento dei dati possono rappresentare un ostacolo significativo. Per mitigare l’impatto e ridurre i costi, AWS consiglia di installare l’agent DataSync nell’ambiente di origine dei dati per sfruttare la compressione dei dati in transito. 

Non ci sono costi specifici per le nuove opzioni DataSync, ma spostando dati verso AWS, si è soggetti a costi in uscita da Google Cloud e Microsoft Azure. Viceversa, spostando i dati da AWS, viene addebitato il costo del trasferimento dei dati da EC2 a Internet. La velocità di copia di AWS DataSync dipende dalla quantità di dati e dalle condizioni della rete.

Vuoi leggere altre news su AWS?

AWS introduce il routing IP-based su Route 53

AWS introduce il routing IP-based su Route 53

AWS ha recentemente introdotto il supporto per il routing IP-based su Amazon Route 53. La nuova opzione del servizio DNS consente ai clienti di instradare le richieste in base alla subnet del client per ottimizzare i costi di transito e la latenza.

Se il routing basato sulla geolocalizzazione è pensato per instradare il traffico in base alla localizzazione, è basato comunque su dati centralizzati che Amazon Route 53 raccoglie e mantiene aggiornati. La gestione del routing IP-based permette invece di gestire le richieste basandosi su una conoscenza specifica dei clienti e delle reti. E’ possibile ad esempio gestire gli utenti che arrivano da uno specifico ISP mandando tutte le richieste ricevute a un endpoint dedicato. 

Routing o juggling?
Photo by Rock Staar on Unsplash 

Scott Morrison, Senior Specialist Solutions Architect presso AWS, e Suresh Samuel, Senior Technical Account Manager presso AWS, spiegano:

Con il routing IP-based, puoi migliorare il tuo routing DNS sfruttando la tua conoscenza della rete, delle applicazioni e dei client per prendere le migliori decisioni di routing per i tuoi utenti finali. Il routing IP-based offre un controllo granulare per ottimizzare le prestazioni o ridurre i costi di rete.

Per implementare il routing IP-based per i record su Route 53, è necessario creare una o più CIDR collection, mappando location e blocchi CIDR, che possono essere poi utilizzate nella definizione dei record DNS. Morrison e Samuel chiariscono in che modo il routing IP-based determina l’indirizzo IP della richiesta:

Quando è disponibile, Route 53 utilizzerà il valore EDNS Client Subnet (ECS). In caso contrario, utilizzerà l’IP del resolver  (…) Se il valore ECS nella query DNS corrisponde a una delle sottoreti associate alla location, Route 53 risponde alla query DNS con il valore corrispondente.

IP-based e geolocalizzazione non sono le uniche opzioni di routing disponibili su Route 53: è possibile gestire il traffico attraverso diversi modelli di failover e routing, ad esempio routing basato su latenza, prossimità geografica o weighted routing. 

Se una query arriva da un indirizzo IPv4 o IPv6 non definito, Route 53 risponderà con il valore di default. Con costi a partire da 0,80 USD per milione di query, il routing IP-based è più costoso del routing  geo DNS e geo proximity che partono da 0,70 USD per milione di query. La nuova funzionalità di Route 53 non è supportata per private hosted zone.

Vuoi leggere altre news su AWS?

AWS introduce PowerShell Custom Runtime per Lambda

Around the World with Amazon Rekognition Image 

This is the first article of a two-part series playing with Amazon Rekognition and flags from around the world. Today we will focus on testing the default behavior of Rekognition Image, in the second part we will use Rekognition Custom Labels to build a custom machine learning model and detect national flags.

Rekognition Image

Object and scene detection is a key feature of Rekognition Image and a powerful and useful technology to search and filter large image collections. It is possible to build applications to search, and organize images, assigning labels based on the visual content. According to the documentation:

The DetectLabels_API lets you automatically identify thousands of objects, scenes, and concepts and returns a confidence score for each label. DetectLabels uses a default confidence threshold of 50. 

Rekognition supports thousands of labels belonging to common categories. Will it recognize a flag? Can we use the service to map and search national flags? The AWS Management Console offers a very good “try demo” option that works for any .jpeg or .png image no larger than 5MB. Let’s give it a try, uploading a photo I took at the Haus der Kulturen der Welt in Berlin last week.

A random flag In Berlin

Good news! The label “Flag” is definitely supported by Rekognition Image and the confidence is very high, 99.6%.

"Labels": [ { "Name": "Flag",
            "Confidence": 99.57104492187

We can move to the next step, testing different flags.

National flags

While learning the implications of software in disputed territories and partially recognized countries, I encountered a simple case for image detection: national flags. How can we test them with Rekognition Image? First of all we need some images, a reliable dataset we can use to test the behavior of the managed service.

As a dataset of regional and national flags, we will use the ones from the open source repository region-flags, a collection of flags for geographic region and subregion codes maintained by Google.

Flags from around the world

Graphic images of flags are not the only option, or even the main testing scenario: there are also photos with flags. For real pictures where a flag is present, we will rely on Unsplash, a sharing stock photography website with a liberal license.

We might test automatically the entire region-flags dataset later on using the SDK but for now we can start manually with a few selected countries and the management console.

The Stars and Stripes

The obvious benchmark is the flag of the United States of America. It should be an easy one to be detected, given that it is likely the main market for the product and the initial dataset for training.

Flag of the United States

Amazon Rekognition labels it as a “Flag” with a 99.7% confidence and “American Flag” with a 92.5% confidence (click on the images to see the data from the AWS console).

Rekognition and flag of the USA

A good and reliable result. But is it enough to trust the service for labeling flags worldwide? How does Amazon Rekognition perform when we analyze real pictures including the Stars and Stripes?

Rekognition and flag of the USA

Let’s add a second photo from Unsplash:

Rekognition and flag of the USA

They have a confidence of 76.2% and 79.5% for the American Flag. The value drops to 70.1% if we instead choose a flag in a less common position.

Rekognition and flag of the USA

Not ideal, but the confidence is still pretty good, all three images score between 70% and 80%. Can we then set our confidence at 70% and not validate further the responses? Time to move to different countries to understand how Rekognition works, what the confidence level represents and avoid side effects.

Amazon, this is not the Stars and Stripes!

Let us cross the Florida Straits and use the flag of Cuba. Some similarities in colors and patterns, but it is a very different flag and country.

Rekognition and flag of Cuba

The confidence level for the label flag is still very high (99.4%) but surprisingly Rekognition scores 82.5% for the American flag. Is it a glitch or a side effect of similarities in shape and design?** Is the model not trained?**

Il Tricolore

As an Italian cloud architect, the natural choice for a completely different flag is il Tricolore: different colors, different patterns. But at 87.6%, the confidence for the label American flag is even higher than for Cuba.

Rekognition and flag of Italy

What is going on here? Let’s try a real picture, not just a graphic of the national flag.

Rekognition and flag of Italy

Rekognition is doing an amazing job detecting both flags in the photo, but it is not able to recognize them as Italian ones. It is labeling them as American ones.

The American flag or the Malaysian one?

We could compare many other national flags but the results would be very similar. Most are correctly labeled as flags, almost all the ones with distinct stripes, but they are labeled as well as American flags with different levels of confidence.

No other national flag label is detected, neither an “Italian Flag”, nor a “Cuban Flag”. **Amazon Rekognition uses a hierarchical taxonomy of ancestor labels to categorize labels, but apparently the only child of “Flag” is “American Flag”. **

{ "Name": "American Flag", "Confidence": 87.66943359375, "Instances": [], "Parents": [ { "Name": "Flag" }, { "Name": "Symbol" } ] }

The closer a flag resembles the Stars and Stripes, the higher the confidence level: for the flag of Malaysia (92.4%), the value is very similar to the one of the United States (92,5%). Setting an arbitrary high confidence level might help reduce failures but it is not the strongest safety net for flags. We have a surprisingly similar result if we take a Malaysian flag from Unsplash, with a 83.8% confidence for the American flag that is even higher than the one we saw for the real flags of the United States that scored between 70% and 80%.

Rekognition and flag of Malaysia

Not a flag

Mapping any flag to one of the United States is only part of the issue. Other images instead might raise questionable labels. For example, the_ Bandeira de Portugal _might be a positive result when your users search for dynamite, weapon, bomb or weaponry, all labels with a high 88.3% confidence level. Flag is not in the top labels as the PNG file is not recognised as a flag.

Rekognition and flag of Portugal

The Portuguese flag performs better in a real scenario, where it is once again labeled as flag, with a confidence level of 99.9%.

Rekognition and flag of Portugal

Anything new?

The above results have been consistent testing Rekognition Image in the last two years, but I wanted to check if something changed recently. Given the on-going war in Ukraine and the popularity of the Ukrainian flag in the last four months, I checked how Amazon Rekognition detects it.

Rekognition and flag of Ukraine

The results are very similar to the Portuguese ones. The flag itself is not recognized as such, with labels that are way off, topping with “Home Decor”. Once a real picture is used, it is labeled as a flag with a high confidence (above 99%) but again no match for the country.

Rekognition and flag of Ukraine

Any feedback from Amazon?

The first time I noticed the issue, I raised a ticket to AWS Support and the feedback was straightforward:

I was in touch with the Rekognition engineering team as well as the Rekognition product team and I have relayed your message over to them. They acknowledged that Rekognition is currently not trained to identify the flags.

That is fair, but I would then recommend removing the value “American Flag” until it is the only child of “Flag”: it gives little benefits with mostly false positives in non-US scenarios.

Conclusions

What have we learned so far? Is it worth using Rekognition Image?

Rekognition Image is doing a good job in decoding objects as flags in photos in very different conditions. It is not able instead to recognize different flags and it is labeling incorrectly most of the objects as American flags

**Accuracy is always relative. **Setting an artificial confidence level for the results of Amazon Rekognition is not enough. National flags are not the most important label and training scenario for machine learning but they are an example of the challenges when image detection is not handled properly.

**This is not an AWS problem, this is our problem as developers **integrating a managed service like Rekognition Image in our product or service. You will be the public face for your end users. If you need to integrate image recognition capabilities in your application, you have to manage the risks and challenges yourself.

If you need more reliable data, you need to take the next step in the image detection journey.

Next step?

It is now time to take the problem in our own hands and train a model to better recognize flags. How can we achieve that with Rekognition Custom Labels? How is Amazon Rekognition going to perform? We will discuss that in a separate article.

Thanks for making it this far! I am always looking for feedback to make it better, so please feel free to reach out to me via LinkedIn or email.

Credits

All the photos and images from the author, Unsplash or the region-flags repository.