Feature identification framework and applications (FIFA)

Loading...
Thumbnail Image

Date

2006-04-12

Journal Title

Journal ISSN

Volume Title

Publisher

Texas A&M University

Abstract

Large digital libraries typically contain large collections of heterogeneous resources intended to be delivered to a variety of user communities. One key challenge for these libraries is providing tight integration between resources both within a single collection and across the several collections of the library with out requiring hand coding. One key tool in doing this is elucidating the internal structure of the digital resources and using that structure to form connections between the resources. The heterogeneous nature of the collections and the diversity of the needs in the user communities complicates this task. Accordingly, in this thesis, I describe an approach to implementing a feature identification system to support digital collections that provides a general framework for applications while allowing decisions about the details of document representation and features identification to be deferred to domain specific implementations of that framework. These deferred decisions include details of the semantics and syntax of markup, the types of metadata to be attached to documents, the types of features to be identified, the feature identification algorithms to be applied, and which features should be indexed. This approach results in strong support for the general aspects of developing a feature identification system allowing future work to focus on the details of applying that system to the specific needs of individual collections and user communities.

Description

Keywords

humanities informatics, humanities computing, collection enhancement, feature identification, named entity recognition

Citation