Google engineers have built a photo recognition system that can outperform the most well-travelled humans
Humans typically struggle to determine where generic photos were taken just by looking at them. If shown a picture of a white sandy beach, for example, they might assume it was taken in the Caribbean when in fact it was taken in the Maldives.
While many humans need a landmark to refer to - such as the Statue of Liberty or Machu Pichu - before they can pinpoint a location, Google's PlaNet system, which is still in its early stages, does not have this problem.
Tobias Weyand and James Philbin, a pair of software engineers at Google, teamed up with developer Ilya Kostrikov to build the PlaNet system. "We think PlaNet has an advantage over humans because it has seen many more places than any human can ever visit and has learnt subtle cues of different scenes that are even hard for a well-travelled human to distinguish," Weyand told MIT Technology Review.
Weyand's team divided the world into a grid made up of 26,000 squares of varying size, depending on the number of images taken in that location. Each square represented a specific geographical area.
The team then created a database of geolocated images from the internet to determine the gid square in which each image was taken. Overall, 126 million images were used.
Weyand and his team took 91 million of these images to teach a powerful neural network - a computer system modelled on the human brain - to work out the grid location using only the image itself. Ultimately they want to be able to put an image into the neural net and get out a particular grid location or at least a set of likely candidates. The neural network was validated with the remaining 34 million photos in the data set.
In order to test PlaNet, the Google team took 2.3 million geotagged images from online photo library Flickr and asked PlaNet to identify their location.
"PlaNet is able to localise 3.6% of the images at street-level accuracy and 10.1% at city-level accuracy," Weyand's team wrote in their academic paper.
The results weren't perfect but PlaNet still outperformed some of the most well-travelled humans on a Google Street View test.
On average, PlaNet guessed where a photo was taken to within 1,131.7km, while 10 well-travelled humans were only able to guess to within 2,320.75km, on average.
"In total, PlaNet won 28 of the 50 rounds with a median localisation error of 1131.7 km, while the median human localisation error was 2320.75 km," Weyand's team wrote. "[This] small-scale experiment shows that PlaNet reaches superhuman performance at the task of geolocating Street View scenes."
- Night curfew in Ahmedabad, Surat, Vadodra and Rajkot to get extend by 15 days
- Whether we like it or not, Indians are already partaking in the superfood ‘trend’
- India records 16,488 new COVID-19 cases and 113 deaths in last 24 hours
- Maharashtra to conduct class 12,10 exams from 23 and 29 April
- Here is what to expect from Maruti, M&M, Hero, Bajaj Auto and other automakers in February auto sales