The language data repository: machine readable storage for spoken language data

Audenaert, Michael Neal

View/ Open

2000 Fellows Thesis A93.pdf (2.482Mb)

Date

2013-02-22

Author

Audenaert, Michael Neal

Metadata

Show full item record

Abstract

The Language Data Repository project is working to develop a software architecture capable of storing the transcripts and recordings of spoken language data and capable of hosting software tools to aid in the analysis of that data. The proposed software architecture can be used by multiple people to store linguistic data from multiple languages on either local machines or non-local machines that can be accessed via a network by multiple users simultaneously. The primary user community for the LDR software comes from a targeted subset of linguists conducting research on language groups with no officially established or standardized writing system. These linguistic field workers are typically involved in activities such as: learning these "unwritten" languages, developing orthographic systems, beginning literacy programs, and producing written texts in the new orthographic system (e.g., Bible translations and traditional stories). The secondary user community consists of linguists who need a reliable method of storing spoken language data and the transcripts of those data, regardless of the existence of an established or standardized written code for that language. Such a software system offers two main improvements over current, paper-based methods of recording transcripts of linguistic data. First, by utilizing machine-readable storage, it will enable linguists to use computational tools to aid in linguistic analysis by increasing the ability to quickly and accurately test and evaluate linguistic hypotheses of the rules governing the linguistic systems. Secondly, a standardized method of recording data in a machine-readable format will enhance linguists' ability to document their research and share their results with a greater number of colleagues than previously possible. A benefit to this increase in the distribution of primary data to other colleagues is the ability for mote people to test various hypotheses simultaneously.

URI

https://hdl.handle.net/1969.1/ETD-TAMU-2000-Fellows-Thesis-A93

Description

Due to the character of the original source materials and the nature of batch digitization, quality control issues may be present in this document. Please report any quality issues you encounter to digital@library.tamu.edu, referencing the URI of the item.
Includes bibliographical references (leaf 48).

Subject

computer science.
Major computer science.

Collections

University Undergraduate Research Fellows (1968–2012)

Citation

Audenaert, Michael Neal (2000). The language data repository: machine readable storage for spoken language data. Texas A&M University. Available electronically from https : / /hdl .handle .net /1969 .1 /ETD -TAMU -2000 -Fellows -Thesis -A93.