Feature identification framework and applications (FIFA)
Loading...
Date
2006-04-12
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Texas A&M University
Abstract
Large digital libraries typically contain large collections of heterogeneous resources
intended to be delivered to a variety of user communities. One key challenge for these
libraries is providing tight integration between resources both within a single collection
and across the several collections of the library with out requiring hand coding. One key
tool in doing this is elucidating the internal structure of the digital resources and using
that structure to form connections between the resources. The heterogeneous nature of
the collections and the diversity of the needs in the user communities complicates this
task. Accordingly, in this thesis, I describe an approach to implementing a feature
identification system to support digital collections that provides a general framework for
applications while allowing decisions about the details of document representation and
features identification to be deferred to domain specific implementations of that
framework. These deferred decisions include details of the semantics and syntax of
markup, the types of metadata to be attached to documents, the types of features to be
identified, the feature identification algorithms to be applied, and which features should
be indexed. This approach results in strong support for the general aspects of developing
a feature identification system allowing future work to focus on the details of applying
that system to the specific needs of individual collections and user communities.
Description
Keywords
humanities informatics, humanities computing, collection enhancement, feature identification, named entity recognition