Show simple item record

dc.contributor.advisorFuruta, Richard
dc.creatorMeneses Macchiavello, Luis Davi
dc.date.accessioned2016-09-22T19:44:17Z
dc.date.available2018-08-01T05:57:36Z
dc.date.created2016-08
dc.date.issued2016-07-05
dc.date.submittedAugust 2016
dc.identifier.urihttps://hdl.handle.net/1969.1/158004
dc.description.abstractIt is not unusual for digital collections to degrade and suffer from problems associated with unexpected change. In previous analyses, I have found that categorizing the degree of change affecting a digital collection over time is a difficult task. More specifically, I found that categorizing this degree of change is not a binary problem where documents are either unchanged or they have changed so dramatically that they do not fit within the scope of the collection. It is, in part, a characterization of the intent of the change. In this dissertation, I present a study that compares change detection methods based on machine learning algorithms against the assessment made by human subjects in a user study. Consequently, this dissertation focuses on two research questions. First, how can we categorize the various degrees of change that documents can endure? This point becomes increasingly interesting if we take into account that the resources found in a digital library are often curated and maintained by experts with affiliations to professionally managed institutions. And second, how do the automatic detection methods fare against the human assessment of change in the ACM conference list? The results of this dissertation are threefold. First, I provide a categorization framework that highlights the different instances of change that I found in an analysis of the Association for Computing Machinery conference list. Second, I focus on a set of procedures to classify the documents according to the characteristics of change that they exhibit. Finally, I evaluate the classification procedures against the assessment of human subjects. Taking into account the results of the user evaluation and the inability of the test subjects to recognize some instances of change, the main conclusion that I derive from my dissertation is that managing the effects of unexpected change is a more serious problem than had previously been anticipated, thus requiring the immediate attention of collection managers and curators.en
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.subjectweb resource managementen
dc.subjectdistributed collectionsen
dc.subjectweb change classificationen
dc.titleIdentifying the Effects of Unexpected Change in a Distributed Collection of Web Documentsen
dc.typeThesisen
thesis.degree.departmentComputer Science and Engineeringen
thesis.degree.disciplineComputer Scienceen
thesis.degree.grantorTexas A & M Universityen
thesis.degree.nameDoctor of Philosophyen
thesis.degree.levelDoctoralen
dc.contributor.committeeMemberShipman, Frank
dc.contributor.committeeMemberCaverlee, James
dc.contributor.committeeMemberMandell, Laura
dc.type.materialtexten
dc.date.updated2016-09-22T19:44:17Z
local.embargo.terms2018-08-01
local.etdauthor.orcid0000-0001-8165-3545


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record