The language data repository: machine readable storage for spoken language data
Abstract
The Language Data Repository project is working to develop a software architecture capable of storing the transcripts and recordings of spoken language data and capable of hosting software tools to aid in the analysis of that data. The proposed software architecture can be used by multiple people to store linguistic data from multiple languages on either local machines or non-local machines that can be accessed via a network by multiple users simultaneously. The primary user community for the LDR software comes from a targeted subset of linguists conducting research on language groups with no officially established or standardized writing system. These linguistic field workers are typically involved in activities such as: learning these "unwritten" languages, developing orthographic systems, beginning literacy programs, and producing written texts in the new orthographic system (e.g., Bible translations and traditional stories). The secondary user community consists of linguists who need a reliable method of storing spoken language data and the transcripts of those data, regardless of the existence of an established or standardized written code for that language. Such a software system offers two main improvements over current, paper-based methods of recording transcripts of linguistic data. First, by utilizing machine-readable storage, it will enable linguists to use computational tools to aid in linguistic analysis by increasing the ability to quickly and accurately test and evaluate linguistic hypotheses of the rules governing the linguistic systems. Secondly, a standardized method of recording data in a machine-readable format will enhance linguists' ability to document their research and share their results with a greater number of colleagues than previously possible. A benefit to this increase in the distribution of primary data to other colleagues is the ability for mote people to test various hypotheses simultaneously.
Description
Due to the character of the original source materials and the nature of batch digitization, quality control issues may be present in this document. Please report any quality issues you encounter to digital@library.tamu.edu, referencing the URI of the item.Includes bibliographical references (leaf 48).
Citation
Audenaert, Michael Neal (2000). The language data repository: machine readable storage for spoken language data. Texas A&M University. Available electronically from https : / /hdl .handle .net /1969 .1 /ETD -TAMU -2000 -Fellows -Thesis -A93.