Geolocation Inferencing on Social Media Using Gaussian Mixture Model
Abstract
Modeling human behavior over social media can provide valuable insights into crowd behavior. It can be used as sensory data to understand and predict how crowds react to a certain local or international event. This can lead to applications that can predict elections, track flu and detect earthquakes. However, this analysis requires data that are geo-tagged, and most of the social media data has no location associated with it. Many models and algorithms have been proposed to find the location of a user based on his or her social media profile. Unfortunately, most methods are not scalable or robust enough to work perfectly in real world applications. In this research, I have tested and improved Gaussian Mixture Model (GMM) on tweets ranging from 325,875 to 2,332,305 to predict a Twitter user’s location based purely on the tweet content. The experiments test different tokenization approaches, dataset sizes, temporal feature and languages in the dataset to conclude that GMM can indeed solve the location-sparsity issue in social media and pave way for location-based personalized information services.
Citation
Ali, Nazif (2017). Geolocation Inferencing on Social Media Using Gaussian Mixture Model. Undergraduate Research Scholars Program. Available electronically from https : / /hdl .handle .net /1969 .1 /177580.