Highlight:
- Graduate students at Stanford University developed an AI model that can determine specific locations with impressive accuracy by looking at Google Street View images.
- This app called PIGEON can pinpoint specific locations in Google Street View with a high degree of accuracy, predicting the country with 92% accuracy and pinpointing the location in 40% of its guesses. within 25 kilometers.
- This model was trained on the neural network CLIP developed by OpenAI, and on a dataset from the GeoGuessr game, and achieved impressive results.
Webmaster’s Home (ChinaZ.com) December 20 news: Graduate students at Stanford University have developed an application called PIGEON, which can determine a specific location simply by viewing Google Street View images or other images. It is accurate. The rate is impressive.
Based on data from the preprint paper, PIGEON can predict the country being photographed with 92% accuracy, and can locate the location within 25 kilometers of the target location in 40% of guesses. The paper notes that PIGEON ranks among the top 0.01% of players in the GeoGuessr game, which asks users to guess locations based on captured Google Street View images, and was the inspiration for the project.
So, how does PIGEON work?
The students took advantage of CLIP, a neural network developed by OpenAI, which was trained on visual category names to connect text and images. They then trained on GeoGuessr's dataset, which contains 100,000 original randomly sampled locations and four images to cover the entire "panorama" of a given location, for a total of 400,000 images. The number of training images for PIGEON is relatively small compared to the number of images used for training other AI models. For example, OpenAI’s popular image generation model DALL-E2 was trained on hundreds of millions of images.
In addition, the students developed a separate model called PIGEOTTO, which was trained on 4 million photos from Flickr and Wikipedia to identify locations from a single image. According to the paper's data, PIGEOTTO achieves impressive results on image geolocation benchmarks, surpassing previous state-of-the-art results by 7.7% in city accuracy and 29.8% in country accuracy. The
paper also explores the ethical considerations associated with this model, including its benefits and risks. On the one hand, image geolocation has many positive uses, such as autonomous driving, visual surveys, and satisfying curiosity about where a photo was taken. However, its negative impacts include the most direct invasion of privacy. Therefore, the students decided not to publicly release the model weights and only release the code upon academic validation.
This research shows us the huge potential of AI for image geolocation, but it also raises some privacy and ethical concerns. In future developments, greater attention must be paid to these issues and to ensure that appropriate protective measures are implemented.
paper URL: https://arxiv.org/abs/2307.05845