Geo Sensitive Word Discovery
MetadataShow full item record
Among geolocation related information, in particular, the geo-sensitive word is one of the most critical components. A geo-sensitive word can be a word or phrase for a landmark in the city or county name, abbreviation sports team names in the city, common words or phrases with special meanings in local regions. In this thesis, we propose and evaluate an effective and efficient framework for discovering geo-sensitive words hidden in tweets. This framework overcomes the lack of dataset and embedding alignment problem. There are three key contributions in the proposed framework: (i) a publicly-available dataset containing geo-tagged English tweets from 27 cities in the United States; (ii) a concrete approach to align separately trained word embeddings with Orthogonal Procrustes; (iii) and a well-rounded evaluation framework for geo-sensitive words. The system discovers over 3000 geo-sensitive words in three cities and successfully classified these words into corresponding cities with a 95.32% high accuracy. We also find two key factors that post an impact on the classification performance: (i) feature vector dimension; and (ii) proper learning algorithm.
Xue, Haiping (2019). Geo Sensitive Word Discovery. Master's thesis, Texas A&M University. Available electronically from