The conference will run for 28 hours on 20-21 October 2020 and will cover topics on Open Source Databases and Applications using MySQL, PostgreSQL, MongoDB, and MariaDB. You can find the full agenda online and register for free.
Is serverless the future of relational databases? Looking forward to be part of DeveloperWeek Global: Cloud Conference, one of the world’s largest virtual software developer event series, and discuss relational databases on the cloud. See you online at the end of September! More about my talk here.
AWS has recently made AWS Wavelength zones in San Francisco and Boston available to provide a subset of their computing services on Verizon datacenters. The new zones will allow developers to build applications that can benefit from the ultra-low latency of the mobile carriers.
HashiCorp, the company behind the software tool Terraform, introduces a platform to run their products on AWS, Azure, and GCP as managed services. This will extend their enterprise offer with a focus on multi-cloud environments.
In a series of three technical articles, AWS has recently introduced the new “Serverless LAMP stack”. But not everyone in the open-source community believes that the successor of the LAMP stack is proprietary technologies from a single vendor, and alternative approaches have been suggested.
Amazon RDS Proxy is a new fully managed, highly available database proxy for MySQL and PostgreSQL databases running on Amazon RDS and Aurora. The service is tailored to serverless architectures and other applications that open and close database connections at a high rate
With the sensitive topic of biased results for face recognition, Amazon Rekognition has been in the news a lot recently. Amazon announced a one-year moratorium on allowing law enforcement to use the facial recognition platform. IBM decided not to offer facial recognition technology anymore. Microsoft took a conservative approach too.
But Amazon Rekognition is not limited to face recognition, and the benefits and risks for image analysis are much broader. Object and scene detection is a powerful and useful technology to search and filter large image collections. But even with objects, there might be side effects.
The obvious benchmark is the flag of the United States of America. Amazon Rekognition labels it as a “Flag” with a 99.7% confidence and “American Flag” with a 92.5% confidence. A good and reliable result. But is it enough to trust the service for labelling flags worldwide?
What about Cuba?
Let us cross the Florida Straits and use the flag of Cuba as a comparison. Some similarities in colors and patterns, but still a very different flag. Not only politically.
The confidence level for the label “Flag” is still very high (99.4%) but surprisingly Rekognition has a 82.5% confidence of being the American flag. A glitch or a side effect of similarities in shape and design?
As an Italian, the natural choice for a different flag is il Tricolore: different colors, different patterns. But the confidence for the label “American flag” is even higher than for Cuba: 87.6%.
You can compare many other national flags but the results will be very similar. All are correctly labelled as “Flag” but they are labelled as well “American Flag” with various levels of confidence.
No other national flag label is detected, neither an “Italian Flag”, nor a “Cuban Flag”. Amazon Rekognition uses a hierarchical taxonomy of ancestor labels to categorize labels, but apparently the only child of “Flag” is “American Flag”. This likely reflects the main market for the product and the initial dataset for training.
The closer a flag resembles the Stars and Stripes, the higher the confidence level: for the flag of Malaysia (92.4%) is very similar to the one of the United States (92,5%). A demonstration that setting an arbitrary high confidence level might help but will not be safe in every scenario.
Any feedback from Amazon?
Last year I raised a ticket to AWS Support and the feedback was straightforward and honest:
I was in touch with the Rekognition engineering team as well as the Rekognition product team and I have relayed your message over to them. They acknowledged that Rekognition is currently not trained to identify the flags.
This is not an Amazon problem, this is your problem as a developer relying on an external service. If you integrate image recognition capabilities in your application, you have to manage the risks and challenges yourself. You cannot bury your head in the sand and hope for the best.
Accuracy is always relative
Setting an artificial confidence level number for the results of Amazon Rekognition is not enough. National flags are not the most important challenge for AI but they are an example of the risks when image detection is not handled properly. And mislabelled flags could even escalate tensions in conflict zones, disputed territories, or partially recognized states.
Fur further posts and talks on the challenges of geolocations, check saorico.com
At re:Invent in Las Vegas in December 2019, AWS announced the public preview of RDS Proxy, a fully managed database proxy that sits between your application and RDS. The new service offers to “share established database connections, improving database efficiency and application scalability”.
One of the key features was the ability to increase application availability, significantly reducing failover times on a Multi AZ RDS instance. Results were indeed impressive.
But a key limitation was that there was no opportunity to change the instance size or class once the proxy has been created. That means it could not be used to reduce downtime during a vertical scaling of the cluster and made the deployment less elastic.
Time for a second look?
Last week AWS announced finally the GA of RDS Proxy and I thought it was a good time to take a second look at the service. Any further improvements in the failover? Can you now change the instance size once the proxy has been created?
One of the first and few values you should choose when you set up an Amazon RDS Proxy is it the idle client connection timeout. It is already hard to figure out the optimal value in an ideal scenario. But having a user interface that suggests a default of 30 minutes with a label that states “Max: 5 minutes” makes it more difficult. Almost all if the drop down list let you set any value up to 1 hour.
Let us play!
I created again a test-rds and a test-proxy and I decided to perform the very same basic tests I did last December. I started two while loops in Bash, relying on the MySQL client, each one asking every 2 seconds the current date and time to the database:
$ while true; do mysql -s -N -h test-proxy.proxy-***.eu-central-1.rds.amazonaws.com -u testuser -e "select now()"; sleep 2; done
$ while true; do mysql -s -N -h test-rds.***.eu-central-1.rds.amazonaws.com -u testuser -e "select now()"; sleep 2; done
The difference between the test-proxy and the test-rds is significant: it takes 132 seconds for the RDS endpoint to recover versus only 20 seconds for the proxy. Amazing difference and even better than what AWS promises in a more reliable and significant test.
But what happens when I trigger a change of the instance type?
While the numbers for the test-rds do not change significantly, the proxy is simply gone. Once the database cluster behind changes, the proxy endpoint is still available but it does not connect to the database anymore. Changing time out does not help, with no simple way to recover.
ERROR 9501 (HY000) at line 1: Timed-out waiting to acquire database connection ERROR 9501 (HY000) at line 1: Timed-out waiting to acquire database connection ERROR 9501 (HY000) at line 1: Timed-out waiting to acquire database connection ERROR 9501 (HY000) at line 1: Timed-out waiting to acquire database connection ERROR 9501 (HY000) at line 1: Timed-out waiting to acquire database connection
Amazon RDS Proxy is a very interesting service. And it could be an essential component in many deployments where increase application availability is critical. But I would have expect a few more improvements since the first preview. The lack of support for changes of the instances makes it still hard to integrate it in many scenarios where RDS is currently used.