Show simple item record

dc.contributor.advisorKum, Hye-Chung
dc.contributor.advisorda Silva, Dilma
dc.creatorIlangovan, Gurudev
dc.date.accessioned2019-11-25T20:47:42Z
dc.date.available2021-08-01T07:32:19Z
dc.date.created2019-08
dc.date.issued2019-06-19
dc.date.submittedAugust 2019
dc.identifier.urihttps://hdl.handle.net/1969.1/186390
dc.description.abstractRecord linkage which refers to the identification of the same entities across several databases in the absence of an unique identifier is a crucial step for data integration. In this research, we study the effectiveness and efficiency of different machine learning algorithms (SVM, Random Forest, and neural networks) to link databases in a controlled experiment. We control for % of heterogeneity in data and size of training dataset. We evaluate the algorithms based on (1) quality of linkages such as F1 score based on a one threshold model and (2) size of uncertain regions that need manual review based on a two threshold model. We find that random forests performed very well both in terms of traditional metrics like F1 score (99.2% - 95.9%) as well as manual review set size (7.1% - 21%) for error rates from 0% to 60%. Though in terms of F1 scores, the algorithms (Random Forests, SVMs and Neural Nets) fared fairly similar, random forests outperformed the next best model by 28% on average in terms of the percentage of pairs that need manual review.en
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.subjectRecord Linkageen
dc.subjectMachine Learningen
dc.titleBenchmarking the Effectiveness and Efficiency of Machine Learning Algorithms for Record Linkageen
dc.typeThesisen
thesis.degree.departmentComputer Science and Engineeringen
thesis.degree.disciplineComputer Scienceen
thesis.degree.grantorTexas A&M Universityen
thesis.degree.nameMaster of Scienceen
thesis.degree.levelMastersen
dc.contributor.committeeMemberFossett, Mark
dc.type.materialtexten
dc.date.updated2019-11-25T20:47:43Z
local.embargo.terms2021-08-01
local.etdauthor.orcid0000-0003-3973-1620


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record