Systematic Approaches to Improving Geocoding and Reverse Geocoding Systems
Abstract
Geocoding, and reverse geocoding is a process that enables transitions between human-readable
location information and machine-readable coordinates. The former converts human-readable information into machine-readable coordinates, and the latter does the opposite. These two processes perform an essential data-processing function that enables further spatial analysis to be conducted in a variety of fields such as public health. As a result, any subsequent studies and analysis that employ geocoded data are directly impacted by the quality of the output from geocoding and reverse geocoding systems. To be specific, there are three indicators of output quality: (1) match rate, which indicates the rate of successfully geocoded data and determines the usability of geocoded data for further analysis; (2) spatial accuracy, which enhances the validity of studies employing the geocoded data as input; (3) clear and concise metadata descriptions, which provides confidence for selecting the most relevant geocoded and reverse geocoded output.
In this work, the process of geocoding and reverse geocoding has been distilled to reveal the
limitations of the existing solutions. The process of geocoding is divided into two sub-processes:
(1) text retrieval, which aims to match an input description with a candidate of the highest textual
similarity, and (2) geocoding interpolation, which corresponds to deriving the final output coordinates
based on the geometrical and spatial attributes of the retrieved reference candidates. In
examining these sub-processes, the limitations on the existing geocoding systems are identified as
the incapacity for handling (erroneous) geocoding inputs, and the drawbacks of typical geocoding
interpolation methods. As it relates to reverse geocoding, the sub-processes are: (1) match the
most similar candidates to the respective human input and (2) re-rank the candidates according to
specific criterion. The limitations of the existing reverse geocoding systems are the exclusion of
topographical relationships amongst reference data, the ignorance of input uncertainty, and unclear
metadata descriptions.
To overcome these limitations, three branches of research are conducted as follows. (1) To
improve the robustness of geocoding systems for low-quality input, a set of parsing, matching,
and ranking methods are selected. To be specific, a unified evaluation protocol that is specific to geocoding text retrieval tasks (i.e., parsing, matching, and ranking) is defined. Next, a geocoding input dataset, which contains different degrees of errors and variants, is synthesized by mining human input patterns from existing geocoding transactions. From there, the input dataset is used to benchmark a set of geocoding parsing, matching, and ranking methods that are built upon Natural Language Processing (NLP) and Information Retrieval (IR) methods. (2) A novel geocoding interpolation approach, which incorporates Computer Vision (CV) technique, is developed to overcome the parcel homogeneity assumption made by the linear interpolation method; the parcel centroid assumption made by the polygon interpolation method, and the limited coverage of reference data used by the point interpolation method. (3) A new reverse geocoding ranking approach is introduced, which includes ranking output candidates by geometrical and topological attributes that are provided by the retrieved reference data, propagating input uncertainty to output, and fully quantifying each candidate based on relevance.
The work with these three branches aims to improve the match rate, spatial accuracy, and
metadata descriptions of geocoding and reverse geocoding systems when facing low-quality input.
Together, these improvements could lead to better geocoding and reverse geocoding systems
through benefits gained in various spatial analyses and applications that use these systems as part
of their data processing pipelines.
Citation
Yin, Zhengcong (2021). Systematic Approaches to Improving Geocoding and Reverse Geocoding Systems. Doctoral dissertation, Texas A&M University. Available electronically from https : / /hdl .handle .net /1969 .1 /196374.