As a software developer, the chance to discuss politics is high at the coffee machine or after a couple of beers in the evening but not while writing code. Somehow the last few weeks proved me wrong, I managed to discuss controversial borders and disputed countries in more than one occasion. And all thanks to the new ubiquitous geolocation and image recognition services.
The Hong Kong user
The founders of RaceBase World, a service to discover and rate running events around the world, are based in Hong Kong. And they were not too impressed when their profile page stated as home country China. The geolocation labeling was provided by Mapbox, one of the largest provider of custom on-line maps. While the labeling might be justified – most of their users consider Hong Kong to be part of China – the choice is controversial for many runners based in the territory. And using the official name, the Hong Kong Special Administrative Region of the People’s Republic of China, not really a feasible option.
And it’s not the only service affected.
When I then uploaded a trail picture taken in Hong Kong, not far from Mainland China, on OneMediaHub (a cloud solution provided by Funambol) and the result was even more bizarre. The picture was labeled with the location “Shenzhen, China”. Even if downtown Shenzhen is not exactly a paradise for trail running and it is quite far away.
In this scenario the problem was both in the algorithm used to match EXIF data and the accuracy of the open source geolocation database used, GeoNames.
To make matters worse, a picture taken in the West Bank, not far from Jerusalem, for the very same reason had on OneMediaHub.com the location “Jerusalem, Israel”. Again, the author was not too happy.
Is that really so bad?
Most of the geolocation decoding services are pretty accurate and the error margin is very low. The vast majority of the users are hardly affected by the issues above that are corner cases. And even if one of your summer picture get tagged with the next town on the Costa Brava you are hardly going to complain. Or be offended. You might not even notice the bug.
But the issue is that a small percentage of those scenarios where the algorithm fails or where there is a controversial decoding are in disputed territory and partially recognized states. And that introduces some challenges for the developer who does not want to deal with politics while writing code.
It’s only geolocation!
Actually even a simple signup form where the user has to choose the country might be controversial. Not everyone in the world sadly agrees on the status of Kosovo. Or Palestine. Or even their names.
Google uses “Palestine” (but label the field location) while Amazon goes for a neutral “Palestinian territories”.
Relying on the official UN status might be a safer option, but it does not make local users very happy either. Let’s go back the Mapbox example with RaceBase World.
Mapbox works for the Palestine Marathon and make most (if not all) the runners attending the event happy but let’s assume a (fictitious marathon) is taking place in Simferopol, the largest city on the Crimean peninsula. Would most local runners be OK with Ukraine as the country? Runners in Germany and runners in Russia have usually a different option about the status of Crimea. And there are many more similar examples without even considering war zones.
How to fix those issues?
As a developer, if you have only a local audience it’s relatively easy. And you can minimize the controversies. If not, you can have some workarounds or hacks for challenging names or simply hide them (pretend that automatic decoding did not work). Racebase World for example now shows Hong Kong for new registrations in the autonomous territory.
Better, but with a significantly higher development costs, you could show localized names according to where the audience is.
But at the end of the day the big players drive the geolocation databases and they care more about where most of their users are. When “2.8 million people took part in marathons in China in 2016, almost twice the number from the previous year”, as the Telegraph recently reported, it’s hard to argue with Mapbox’s approach on what China is and what China is not. Runners in Hong Kong might not be their first audience or growing market.
If you want to test how your website performs in critical area, you do not even need real pictures, just edit the EXIF data of a random picture using Photo Exif Editor or similar applications and enjoy the challenge. And you are read to go.
How does it work with AWS services?
What about Amazon and AWS services? Any way to limit or keep the above issues under control? This will be covered soon in the second part of this post.