Show simple item record

dc.contributor.advisorElliott, Roger W.
dc.contributor.advisorWortham, A. W.
dc.creatorWright, William Randolph
dc.date.accessioned2020-01-08T17:40:33Z
dc.date.available2020-01-08T17:40:33Z
dc.date.created1973
dc.identifier.urihttps://hdl.handle.net/1969.1/DISSERTATIONS-158493
dc.description.abstractA framework for classifying clustering algorithms and a method for performing comparative analyses have been developed. Sixteen models are defined and twenty-four algorithms from the literature are reviewed and classified within the framework. Four of these algorithms are selected for a comparative analysis experiment involving clustering of document abstracts. Six objective functions are defined to measure the "goodness" of a cluster set produced by an algorithm. A detailed description is provided for the collection of 548 abstracts which were selected from the Arms Control and Disarmament abstract journal. The abstracts were prepared in machine-readable form and a series of computer programs written to perform automatic indexing of the abstracts. The indexed abstract data base, referred to as an abstract-concept matrix, is clustered by the four algorithms. Two algorithms are the Rocchio and Dattola routines from the SMART information retrieval system. Two other algorithms, the Single-Link and the Maximal Complete Subgraph, are programmed for use in the experiment. The manual classification provided by the Library of Congress comprises a fifth cluster set for use in the analysis. The six objective function means are computed for each of the five cluster sets, and scaled to a common mean and variance. The resulting 5X6 data matrix is subjected to an analysis of variance and rank tests. The results of the analyses suggest that the computer algorithms produced better clusters than the manual classification, and that the Single-Link and Maximal Complete Subgraph algorithms produced better clusters than the Rocchio and Dattola algorithms. Generally, the objective functions appear to be consistent judges of the "goodness" of the cluster sets. It should be noted that these conclusions are limited by the fact that only one set of clusters was tested in the experiment. The eight major computer programs written for this research are provided. Some recommendations for further research are presented.en
dc.format.extent323 leavesen
dc.format.mediumelectronicen
dc.format.mimetypeapplication/pdf
dc.language.isoeng
dc.rightsThis thesis was part of a retrospective digitization project authorized by the Texas A&M University Libraries. Copyright remains vested with the author(s). It is the user's responsibility to secure permission from the copyright holder(s) for re-use of the work beyond the provision of Fair Use.en
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/
dc.subjectcomputing scienceen
dc.subject.classification1973 Dissertation W954
dc.titleAn experimental comparison of clustering techniquesen
dc.typeThesisen
thesis.degree.disciplineComputing Scienceen
thesis.degree.grantorTexas A&M Universityen
thesis.degree.nameDoctor of Philosophyen
thesis.degree.levelDoctoralen
thesis.degree.levelDoctorialen
dc.contributor.committeeMemberDrew, Dan D.
dc.contributor.committeeMemberPooch, Udo W.
dc.contributor.committeeMemberRhyne, V. Thomas
dc.type.genredissertationsen
dc.type.materialtexten
dc.format.digitalOriginreformatted digitalen
dc.publisher.digitalTexas A&M University. Libraries


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

This item and its contents are restricted. If this is your thesis or dissertation, you can make it open-access. This will allow all visitors to view the contents of the thesis.

Request Open Access