Show simple item record

dc.contributor.advisorGoldberg, Daniel W.
dc.creatorYin, Zhengcong
dc.date.accessioned2022-07-27T16:43:13Z
dc.date.available2023-12-01T09:22:34Z
dc.date.created2021-12
dc.date.issued2021-12-03
dc.date.submittedDecember 2021
dc.identifier.urihttps://hdl.handle.net/1969.1/196374
dc.description.abstractGeocoding, and reverse geocoding is a process that enables transitions between human-readable location information and machine-readable coordinates. The former converts human-readable information into machine-readable coordinates, and the latter does the opposite. These two processes perform an essential data-processing function that enables further spatial analysis to be conducted in a variety of fields such as public health. As a result, any subsequent studies and analysis that employ geocoded data are directly impacted by the quality of the output from geocoding and reverse geocoding systems. To be specific, there are three indicators of output quality: (1) match rate, which indicates the rate of successfully geocoded data and determines the usability of geocoded data for further analysis; (2) spatial accuracy, which enhances the validity of studies employing the geocoded data as input; (3) clear and concise metadata descriptions, which provides confidence for selecting the most relevant geocoded and reverse geocoded output. In this work, the process of geocoding and reverse geocoding has been distilled to reveal the limitations of the existing solutions. The process of geocoding is divided into two sub-processes: (1) text retrieval, which aims to match an input description with a candidate of the highest textual similarity, and (2) geocoding interpolation, which corresponds to deriving the final output coordinates based on the geometrical and spatial attributes of the retrieved reference candidates. In examining these sub-processes, the limitations on the existing geocoding systems are identified as the incapacity for handling (erroneous) geocoding inputs, and the drawbacks of typical geocoding interpolation methods. As it relates to reverse geocoding, the sub-processes are: (1) match the most similar candidates to the respective human input and (2) re-rank the candidates according to specific criterion. The limitations of the existing reverse geocoding systems are the exclusion of topographical relationships amongst reference data, the ignorance of input uncertainty, and unclear metadata descriptions. To overcome these limitations, three branches of research are conducted as follows. (1) To improve the robustness of geocoding systems for low-quality input, a set of parsing, matching, and ranking methods are selected. To be specific, a unified evaluation protocol that is specific to geocoding text retrieval tasks (i.e., parsing, matching, and ranking) is defined. Next, a geocoding input dataset, which contains different degrees of errors and variants, is synthesized by mining human input patterns from existing geocoding transactions. From there, the input dataset is used to benchmark a set of geocoding parsing, matching, and ranking methods that are built upon Natural Language Processing (NLP) and Information Retrieval (IR) methods. (2) A novel geocoding interpolation approach, which incorporates Computer Vision (CV) technique, is developed to overcome the parcel homogeneity assumption made by the linear interpolation method; the parcel centroid assumption made by the polygon interpolation method, and the limited coverage of reference data used by the point interpolation method. (3) A new reverse geocoding ranking approach is introduced, which includes ranking output candidates by geometrical and topological attributes that are provided by the retrieved reference data, propagating input uncertainty to output, and fully quantifying each candidate based on relevance. The work with these three branches aims to improve the match rate, spatial accuracy, and metadata descriptions of geocoding and reverse geocoding systems when facing low-quality input. Together, these improvements could lead to better geocoding and reverse geocoding systems through benefits gained in various spatial analyses and applications that use these systems as part of their data processing pipelines.
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.subjectGeocoding
dc.subjectReverse Geocoding
dc.titleSystematic Approaches to Improving Geocoding and Reverse Geocoding Systems
dc.typeThesis
thesis.degree.departmentGeography
thesis.degree.disciplineGeography
thesis.degree.grantorTexas A&M University
thesis.degree.nameDoctor of Philosophy
thesis.degree.levelDoctoral
dc.contributor.committeeMemberKatzfuß, Matthias
dc.contributor.committeeMemberHuang, Ruihong
dc.contributor.committeeMemberLyle, Stacey
dc.type.materialtext
dc.date.updated2022-07-27T16:43:13Z
local.embargo.terms2023-12-01
local.etdauthor.orcid0000-0001-7199-5517


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record