Google engineers have built a photo recognition system that can outperform the most well-travelled humans

west bay beach, honduras


The PlaNet system can identify the location of generic photos more accurately than humans.

A pair of Google employees have built a system called PlaNet that attempts to pinpoint the location of where a photograph was taken by analysing the pixels it contains.

Humans typically struggle to determine where generic photos were taken just by looking at them. If shown a picture of a white sandy beach, for example, they might assume it was taken in the Caribbean when in fact it was taken in the Maldives.

While many humans need a landmark to refer to - such as the Statue of Liberty or Machu Pichu - before they can pinpoint a location, Google's PlaNet system, which is still in its early stages, does not have this problem.

Tobias Weyand and James Philbin, a pair of software engineers at Google, teamed up with developer Ilya Kostrikov to build the PlaNet system. "We think PlaNet has an advantage over humans because it has seen many more places than any human can ever visit and has learnt subtle cues of different scenes that are even hard for a well-travelled human to distinguish," Weyand told MIT Technology Review.

Weyand's team divided the world into a grid made up of 26,000 squares of varying size, depending on the number of images taken in that location. Each square represented a specific geographical area.

The team then created a database of geolocated images from the internet to determine the gid square in which each image was taken. Overall, 126 million images were used.

Weyand and his team took 91 million of these images to teach a powerful neural network - a computer system modelled on the human brain - to work out the grid location using only the image itself. Ultimately they want to be able to put an image into the neural net and get out a particular grid location or at least a set of likely candidates. The neural network was validated with the remaining 34 million photos in the data set.

In order to test PlaNet, the Google team took 2.3 million geotagged images from online photo library Flickr and asked PlaNet to identify their location.

"PlaNet is able to localise 3.6% of the images at street-level accuracy and 10.1% at city-level accuracy," Weyand's team wrote in their academic paper.

The results weren't perfect but PlaNet still outperformed some of the most well-travelled humans on a Google Street View test.

On average, PlaNet guessed where a photo was taken to within 1,131.7km, while 10 well-travelled humans were only able to guess to within 2,320.75km, on average.

"In total, PlaNet won 28 of the 50 rounds with a median localisation error of 1131.7 km, while the median human localisation error was 2320.75 km," Weyand's team wrote. "[This] small-scale experiment shows that PlaNet reaches superhuman performance at the task of geolocating Street View scenes."

NOW WATCH: John McAfee explains why an iPhone backdoor is a terrible idea