Error Correction Using Probabilistic Language Models
Abstract
Error Correction has applications in a variety of domains given the prevalence of errors of various kinds and the need to programmatically correct them as accurately as possible. For example, error correction is used in portable mobile devices to fix typographical errors while taking input from the keypads. It can also be useful in lower level applications – to fix errors in storage media or to fix network transmission errors. The precision and the influence of such techniques can vary based on requirements and the capabilities of the correction technique but they essentially form a part of the application for its effective functioning.
The research primarily focuses on various techniques to provide error correction given the location of the erroneous token. The errors are essentially Erasures which are missing bits in a stream of binary data, the locations of which are known. The basic idea behind these techniques lies in building up contextual information from an error-free training corpora and using these models, provide alternative suggestions which could replace the erroneous tokens. We look into two models - the topic-based LDA (Latent Dirichlet Allocation) model and the N-Gram model. We also propose an efficient mechanism to process such errors which offers exponential speed-ups. Using these models, we are able to achieve up to 5% improvement in accuracy as compared to a standard word distribution model using minimal domain knowledge.
Citation
Sunder, Gowrishankar (2015). Error Correction Using Probabilistic Language Models. Master's thesis, Texas A & M University. Available electronically from https : / /hdl .handle .net /1969 .1 /155125.