InfoQ – July 2020

A recap of the news articles I wrote for InfoQ in July 2020.

The AWS Serverless LAMP Stack: the Future of PHP or Vendor Lock-in?

In a series of three technical articles, AWS has recently introduced the new “Serverless LAMP stack”. But not everyone in the open-source community believes that the successor of the LAMP stack is proprietary technologies from a single vendor, and alternative approaches have been suggested.

The AWS Serverless LAMP Stack: The Future of PHP or Vendor Lock-in?

AWS Announces General Availability of Amazon RDS Proxy

Amazon RDS Proxy is a new fully managed, highly available database proxy for MySQL and PostgreSQL databases running on Amazon RDS and Aurora. The service is tailored to serverless architectures and other applications that open and close database connections at a high rate

Amazon Rekognition, this is not the Stars and Stripes

With the sensitive topic of biased results for face recognition, Amazon Rekognition has been in the news a lot recently. Amazon announced a one-year moratorium on allowing law enforcement to use the facial recognition platform. IBM decided not to offer facial recognition technology anymore. Microsoft took a conservative approach too.

But Amazon Rekognition is not limited to face recognition, and the benefits and risks for image analysis are much broader. Object and scene detection is a powerful and useful technology to search and filter large image collections. But even with objects, there might be side effects.

Image recognition, flags and bias

While learning the implications of software in disputed territories and partially recognized countries, I encountered a more trivial case for image detection: national flags.

As a dataset, I used the PNG image formats from two open source repositiories, Google Internationalization and country-flags.

The Stars and Stripes

The obvious benchmark is the flag of the United States of America. Amazon Rekognition labels it as a “Flag” with a 99.7% confidence and “American Flag” with a 92.5% confidence. A good and reliable result. But is it enough to trust the service for labelling flags worldwide?

What about Cuba?

Let us cross the Florida Straits and use the flag of Cuba as a comparison. Some similarities in colors and patterns, but still a very different flag. Not only politically.

The confidence level for the label “Flag” is still very high (99.4%) but surprisingly Rekognition has a 82.5% confidence of being the American flag. A glitch or a side effect of similarities in shape and design?

Il Tricolore

As an Italian, the natural choice for a different flag is il Tricolore: different colors, different patterns. But the confidence for the label “American flag” is even higher than for Cuba: 87.6%.

Malaysia

You can compare many other national flags but the results will be very similar. All are correctly labelled as “Flag” but they are labelled as well “American Flag” with various levels of confidence.

No other national flag label is detected, neither an “Italian Flag”, nor a “Cuban Flag”. Amazon Rekognition uses a hierarchical taxonomy of ancestor labels to categorize labels, but apparently the only child of “Flag” is “American Flag”. This likely reflects the main market for the product and the initial dataset for training.

{ "Name": "American Flag", "Confidence": 87.66943359375, "Instances": [], "Parents": [ { "Name": "Flag" }, { "Name": "Symbol" } ] }

The closer a flag resembles the Stars and Stripes, the higher the confidence level: for the flag of Malaysia (92.4%) is very similar to the one of the United States (92,5%). A demonstration that setting an arbitrary high confidence level might help but will not be safe in every scenario.

Any feedback from Amazon?

Last year I raised a ticket to AWS Support and the feedback was straightforward and honest:

I was in touch with the Rekognition engineering team as well as the Rekognition product team and I have relayed your message over to them. They acknowledged that Rekognition is currently not trained to identify the flags.

This is not an Amazon problem, this is your problem as a developer relying on an external service. If you integrate image recognition capabilities in your application, you have to manage the risks and challenges yourself. You cannot bury your head in the sand and hope for the best.

Accuracy is always relative

Setting an artificial confidence level number for the results of Amazon Rekognition is not enough. National flags are not the most important challenge for AI but they are an example of the risks when image detection is not handled properly. And mislabelled flags could even escalate tensions in conflict zones, disputed territories, or partially recognized states.

Fur further posts and talks on the challenges of geolocations, check saorico.com

A Second Look at Amazon RDS Proxy

At re:Invent in Las Vegas in December 2019, AWS announced the public preview of RDS Proxy, a fully managed database proxy that sits between your application and RDS. The new service offers to “share established database connections, improving database efficiency and application scalability”.

Does RDS Proxy make MySQL more elastic?

A first look

In January I shared some thoughts and first results at the AWS User Group Meetup in Berlin and I wrote a post for the Percona Community Blog: A First Look at Amazon RDS Proxy.

One of the key features was the ability to increase application availability, significantly reducing failover times on a Multi AZ RDS instance. Results were indeed impressive.

But a key limitation was that there was no opportunity to change the instance size or class once the proxy has been created. That means it could not be used to reduce downtime during a vertical scaling of the cluster and made the deployment less elastic.

Time for a second look?

Last week AWS announced finally the GA of RDS Proxy and I thought it was a good time to take a second look at the service. Any further improvements in the failover? Can you now change the instance size once the proxy has been created?

Weird defaults?

One of the first and few values you should choose when you set up an Amazon RDS Proxy is it the idle client connection timeout. It is already hard to figure out the optimal value in an ideal scenario. But having a user interface that suggests a default of 30 minutes with a label that states “Max: 5 minutes” makes it more difficult. Almost all if the drop down list let you set any value up to 1 hour.

5 or 30 minutes?

Let us play!

I created again a test-rds and a test-proxy and I decided to perform the very same basic tests I did last December. I started two while loops in Bash, relying on the MySQL client, each one asking every 2 seconds the current date and time to the database:

$ while true; do mysql -s -N -h test-proxy.proxy-***.eu-central-1.rds.amazonaws.com -u testuser -e "select now()"; sleep 2; done
$ while true; do mysql -s -N -h test-rds.***.eu-central-1.rds.amazonaws.com -u testuser -e "select now()"; sleep 2; done

Both return the same results:

2020-07-04 20:24:12
2020-07-04 20:24:14
2020-07-04 20:24:16
2020-07-04 20:24:18

So far so good. I then trigger a reboot with failover of the test-rds instance. What is the delay on the two endpoints?

test-proxy

2020-07-04 20:24:56
2020-07-04 20:24:58
2020-07-04 20:25:20
2020-07-04 20:25:22

test-rds

2020-07-04 20:24:56
2020-07-04 20:24:58
2020-07-04 20:27:12
2020-07-04 20:27:14

The difference between the test-proxy and the test-rds is significant: it takes 132 seconds for the RDS endpoint to recover versus only 20 seconds for the proxy. Amazing difference and even better than what AWS promises in a more reliable and significant test.

But what happens when I trigger a change of the instance type? 

While the numbers for the test-rds do not change significantly, the proxy is simply gone. Once the database cluster behind changes, the proxy endpoint is still available but it does not connect to the database anymore. Changing time out does not help, with no simple way to recover.

test-proxy

ERROR 9501 (HY000) at line 1: Timed-out waiting to acquire database connection
ERROR 9501 (HY000) at line 1: Timed-out waiting to acquire database connection
ERROR 9501 (HY000) at line 1: Timed-out waiting to acquire database connection
ERROR 9501 (HY000) at line 1: Timed-out waiting to acquire database connection
ERROR 9501 (HY000) at line 1: Timed-out waiting to acquire database connection
Photo by Andrew Winkler on Unsplash

As for today, at least for MySQL 5.7 on RDS, introducing the proxy in the architecture makes the environment less elastic. As you have no option anymore to introduce any manual or automatic (vertical) scaling of the database to match traffic. Any change to the database becomes more problematic.

Anything else? There are a few other well documented limitations still present, including the lack of support for MySQL 8.0.

A final recap

Amazon RDS Proxy is a very interesting service. And it could be an essential component in many deployments where increase application availability is critical. But I would have expect a few more improvements since the first preview. The lack of support for changes of the instances makes it still hard to integrate it in many scenarios where RDS is currently used.

Generating reports and KPIs with throw-away databases

We all love metrics. We all need numbers. And different stakeholders need different numbers. Numbers that will drive key decisions inside your organization and for your customers. Becoming a data driven organization requires having reliable data in the first place (…)

You can read my post about generating reports and KPIs with throw-away databases on the Funambol Tech Blog: how we decoupled reporting and user activity, leveraging RDS snapshots to generate throw-away copies of our MySQL databases on AWS.

Funambol Tech Blog – Walking the tight rope of cloud development.

Figuring out the cost of versioning on Amazon S3

We all love versioning on Amazon s3. It gives us peace of mind and the ability to recover our data if something goes wrong. As for the Amazon documentation:

Versioning is a means of keeping multiple variants of an object in the same bucket. You can use versioning to preserve, retrieve, and restore every version of every object stored in your Amazon S3 bucket. With versioning, you can easily recover from both unintended user actions and application failures.

Storage facility
Photo by Steve Johnson on Unsplash

But what about costs?

Keeping many variants of an object will not be free. And as for S3 FAQ, the billing for once is obvious:

How am I charged for using Versioning?

Normal Amazon S3 rates apply for every version of an object stored or requested


Versioning has a simple price structure but how do you monitor the impact on your storage costs? If you spend 10K every month on S3, it is useful to know if versioning is 5%, 10% or 20% of the bill.

Knowing that normal rates apply for every version of an object does not answer the questions:

  • what percentage of your storage is used by versioning?
  • how much enabling versioning is costing you at the end of the month?

If you have 100s of TB or PB of data, millions or billions of objects, you cannot list the bucket and check the items one by one. It would be anyway too costly to do that. There is no direct way to find the added cost of the versioned objects using Cost Explorer. And the Cost and Usage Report provides only the total cost of your bucket.

Photo by Jose Antonio Gallego Vázquez on Unsplash

What about CloudWatch?

A metric in CloudWatch would be the perfect solution, as it would be easy to track and monitor. Unfortunately there is no metric that covers versioning. The current metrics available for a S3 bucket refer only to the number of objects or to the different storage classes, for example:

Available metrics for the bucket

CloudWatch could only help if we knew upfront the ratio between current and previous versions.

What about using the AWS CLI?

You can do almost everything using the AWS CLI. Including listing all the objects in your bucket and figure out their size and versioning status. A very dummy command could be:

$ aws s3 ls s3://YourBucketName --summarize --human-readable –recursive

The output can be processed and the compared with the total size from the CloudWatch metric to get the estimate of the total version size.

Unfortunately, the approach does not scale when you have a large bucket. Besides, listing periodically all the objects has its own costs and you might end up paying more than what you spend in versioning.

​​S3 Inventory then!

A not immediate but very accurate way to determine the cost of versioning is using S3 Inventory. What is it?

Amazon S3 inventory provides comma-separated values (CSV), Apache optimized row columnar (ORC) or Apache Parquet (Parquet) output files that list your objects and their corresponding metadata on a daily or weekly basis for an S3 bucket or a shared prefix (that is, objects that have names that begin with a common string).

We can generate two inventory reports in which the first report includes all versions of object and the second includes only the current versions. After calculating the size from both the reports, subtract the current version objects size from all version objects size.

But even easier, we can generate a single inventory report including only the current versions, compute the storage and compare it to the total bucket size from CloudWatch.

Sampling?

If you have a well distributed bucket, you can analyze only a meaningful subset of your data. It would be faster and cheaper too. For example, if the prefix of all your objects is randomly distributed between aa and zz, you can use a subset of objects using S3 Inventory or AWS CLI to estimate your costs.

For example, out of the 676 prefixes (26 × 26) below you can sample just a handful of them.

What about Cost Explorer?

We said that Cost Explorer does not expose the cost of versioning. But it can be used as a proxy, if you have a lifecycle rule in place to permanently delete previous versions .

Let us assume you have a retention of 30 days for previous versions. You can temporarily increase it and use the daily view of S3 costs in Cost Explorer to see if and how much it affects your billing. For example, you can increase the retention to 40 days, wait 10 days and rollback the change to 30 days.

If the cost of your versioning is significant, you will an increase for those 10 days with a drop when you rollback the change. If not, you know that the percentage of costs of versioning in your S3 bill is not significant, as in the following example.

Example of a daily view of S3 costs in Cost Explorer

To recap…

There is no simple way to determine the percentage cost of using versioning on S3 but there are a few options to have a reliable estimate. Use S3 inventory for an accurate value, play with a sample of your data or change a lifecycle rule for a reasonable guess.

Changing innodb_flush_log_at_trx_commit on the cloud?

I am always curious to find the differences between managed databases on public clouds. Let’s consider for example innodb_flush_log_at_trx_commit, a key system variable for MySQL.

locked property

You cannot change it on Cloud SQL for MySQL (the managed service offered by Google) so you cannot make a durability/speed trade-off.

I can see only a few reasons for doing that:

  • google makes it safer for you, you do not know enough. It is a managed service after all
  • they modified the engine and cannot support it
  • they forgot to expose it as a flag but it might be there in one of the next iterations
  • you would pay for more resources to serve your traffic with the default value. Even if you do not need full ACID compliance


By comparison, you can change and set the value of innodb_flush_log_at_trx_commit on Amazon RDS or Azure Database for MySQL

A similar issue surface if you want to use utf8mb4 as a character set (4-Byte UTF-8 Unicode Encoding). The feedback on Cloud SQL is

Filter utf8mb4 strings out of your data.

What is the lesson here?

Before choosing one of the main providers to host your database as a managed service, double check that they support all the not default configuration you might have in place or you would like to use. This is extremely important in case of migration across different cloud providers: you do not want to figure out you have a blocker when you are half way through the process.