As the BBC recently reported, Matthew Nimetz has spent the last 23 years trying to find a name for the republic of Macedonia that can be accepted both in Skopje and Athens. But a solution for the Macedonia naming dispute has not been agreed yet.
What name should a developer use today when working on location-based services? The user friendly Macedonia or the formal but longer The Former Yugoslav Republic of Macedonia ? How can you make your users in Skopje and Athens both happy?
This is one of the examples I might use at the next Codemotion in Berlin. I do not expect to discuss in half an hour all the geopolitical challenges targeting an international audience and their workarounds, but I am very excited to present the talk “The (accidental) political developer” .
As the focus of this blog is AWS technologies, what about Amazon and their location-based services? Any way to keep those geolocation issues under control?
First of of all let’s see which services Amazon offers that include geolocation capabilities.
AWS offers almost nothing. If we compare the products from AWS with Google Maps API or GeoNames, there is nothing yet that has geolocation capabilities. Of course you can run a third party AMI from the Marketplace, like MaxMind – GeoIP or IP2Location Geolocation. But that is just taking advantage of an EC2 instance, it is not a service directly provided from AWS.
Even Amazon Rekognition, a deep learning-based image analysis, does not offer the full set features that Google Vision has and that could potentially trigger location-based issues. If you upload a picture on Google Vision, thanks to Landmark Detection, you might end up with a location or a questionable place on the Earth. But if you upload a picture of the Eiffel Tower on Amazon Rekognition, you have as a result a disappointing but very safe 99.2% tower.
And you have something similar with a picture of the Western Wall in the the Old City of Jerusalem. Definitely not so accurate, but not a result that can create many controversies.
What about Amazon user interfaces and products for end users? Many photo sharing and storage services like Google Photos or Apple Photos or OneMediaHub (a white label solution provided by Funambol) create tags according to the GPS coordinates (EXIF data) of the pictures uploaded by a user. With the challenges of defining the best tag for Hong Kong or determine if Crimea is a Russian or Ukrainian territory. Prime Photos from Amazon does not. No feature, no issues.
Really nothing on Amazon or AWS?
Amazon of course still has to provide a user interface and a chance for the user to add and validate an address. They have their own interesting choices (no Kosovo, for example) and there might be some entries in the list that might be disputed by some users as countries – somehow the approach of Google of calling the drop-down location and not country feels safer. But that is hardly interesting for a developer.
To summarize, I never had to deal in the past with geolocation issues or controversies while working directly with AWS services or the Amazon platform.
But there is something new that might pose a potential challenge, the Alexa SDK and the built -in slot types that define how data in the slot is recognized and handled. And Amazon Lex, the “conversational interfaces for your applications, powered by the same deep learning technologies as Alexa”.
Alexa relies on slots, that are list of values with many of them predefined by Amazon. For example, the slot AMAZON.Country is a ready list of (English) names of countries around the world. Or AMAZON.DE_CITY provides recognition for German and world cities commonly used by speakers in Germany and Austria.
Anything to be worried about? Let’s test it first.
Big World: Alexa and geolocation
After attending a presentation at Factory Berlin and a talk at the AWS summit, both from Memo Döring and both very inspiring, I decided to build a simple dynamic skill for Alexa. The skill, called Big World, relies on data from Population.io, a project of the World Data Lab that aims to make demography accessible to a wider audience. The very simple skill, given a country name, returns today and tomorrow’s population and returns the values from the World Population API. Below are a screenshot and a short audio demo, the code is available on GitHub.
Going back to the topic of this post, the only challenge was to match the names from the built-in Alexa slot to the countries of the Population.io API. The backend supports only values such as Arab Rep of Egypt,Islamic Republic of Iran, West Bank and Gaza or Hong Kong SAR-China. Names that it’s very unlikely a user is going to say while talking to a voice assistant like Amazon Echo and that require mapping.
var PS = 'West Bank and Gaza';
But unless you type the name entirely wrong, there is really no big challenge and the answer still is not controversial as you are dealing only with the name and not the location itself. For example:
Q. Alexa ask Big World the population of Palestine.
A. You are not alone in this world. The population of West Bank and Gaza today is 4916233, tomorrow there will be 368 people more.
or in a simpler scenario:
Q. Alexa ask Big World how many people live in England.
A. The world population is growing as we speak. The population of United Kingdom today is 65473338, tomorrow there will be 1101 people more.
The answer might not be 100% accurate but it is the best approximation available using the data from Population.io. As for any vocal conversation, an audio interaction with a smart speaker is more forgiving than an incorrect point or country name on a website.
Of course some users might still not be able to find results for specific and perfectly valid country names but that is down to poor coding and logic in the Lambda function and not to the specific Alexa slot.
At the end…
Due to the lack of real location-based features, the services currently available on AWS do not currently present most of the challenges covered in the previous post. But for the very same reasons they do not provide a solution or any help to the developer to address or mitigate them.
As a software developer, the chance to discuss politics is high at the coffee machine or after a couple of beers in the evening but not while writing code. Somehow the last few weeks proved me wrong, I managed to discuss controversial borders and disputed countries in more than one occasion. And all thanks to the new ubiquitous geolocation and image recognition services.
The Hong Kong user
The founders of RaceBase World, a service to discover and rate running events around the world, are based in Hong Kong. And they were not too impressed when their profile page stated as home country China. The geolocation labeling was provided by Mapbox, one of the largest provider of custom on-line maps. While the labeling might be justified – most of their users consider Hong Kong to be part of China – the choice was controversial for many runners based in the territory. And using the official name, the Hong Kong Special Administrative Region of the People’s Republic of China, is not really a feasible option.
And it’s not the only service affected.
When I then uploaded a trail picture taken in Hong Kong but not far from mainland China on OneMediaHub (a cloud solution provided by Funambol) the result was more bizarre. The picture was labeled with the location “Shenzhen, China”. Even if downtown Shenzhen is not exactly a paradise for trail running and it is quite far away.
In this scenario the problem was in the algorithm used to match EXIF data and the accuracy of the open source geolocation database used, GeoNames.
In the same way, a picture taken in the West Bank, not far from Jerusalem, has on OneMediaHub.com the location “Jerusalem, Israel”. Again, the author was not too happy.
Is that really so bad?
Most of the geolocation services are pretty accurate and the error margin is very low. The vast majority of the users are hardly affected by the issues above, something we call corner cases. And even if one of your summer picture get tagged with the next town on the Costa Brava you are hardly going to complain. Or be offended. You might not even notice the bug.
But the problem is that a significant percentage of those scenarios where the algorithm fails or where there is a controversial decoding is in disputed territory or partially recognized states. And that introduces some challenges for the developer who does not want to deal with politics while writing code.
It’s only geolocation!
Actually even a simple signup form where the user has to choose the country might be controversial. Not everyone in the world sadly agrees on the status of Kosovo. Or Palestine. Or even their names.
Google uses “Palestine” (but label the field location) while Amazon goes for a neutral “Palestinian territories”.
Relying on the official UN status might be a safer option, but it does not make local users (or your web designer) very happy either. Let’s go back the Mapbox example with RaceBase World.
Mapbox works for the Palestine Marathon and make most (if not all) the runners attending the event happy but let’s assume a marathon is taking place in Simferopol, the largest city in the Crimean peninsula. Would most locals be OK with Ukraine as the country? Runners in Germany and runners in Russia have usually a different option about the status of Crimea. And there are many similar examples without even considering war zones.
How to fix those issues?
As a developer, if you have only a local audience it’s relatively easy. And you can minimize the controversies. If not, you can have some workarounds or hacks for challenging names or simply hide them (pretend that automatic decoding did not work or just show the city name). Racebase World for example now shows Hong Kong for new registrations in the autonomous territory.
Better, but with a significantly higher development costs, you could show localized names according to where the audience is. Or rely on localization to mitigate the issue (different names in different languages)
But at the end of the day the big players drive the geolocation databases and they care more about where most of their users are. When “2.8 million people took part in marathons in China in 2016, almost twice the number from the previous year”, as the Telegraph recently reported, it’s hard to argue with Mapbox’s approach on what China is and what China is not. Runners in Hong Kong might not be their first audience or growing market.
How can I test my application?
If you want to test how your website performs in critical area, you do not even need real pictures, just edit the EXIF data of a random picture using Photo Exif Editor or similar applications and enjoy the challenge. And you are read to go.
How does it work with AWS services?
What about Amazon and AWS services? Any way to limit or keep the above issues under control? This will be covered soon in the second part of this post.