Construction of an Optimized Multi-Stage High-Throughput Virtual Screening Pipeline for Long Non-Coding RNA's
Abstract
Long non-coding RNA’s(lncRNA’s) are a type of RNA transcripts with a length of more than 200 nucleotides which cannot be translated into proteins. The study of lncRNAs is extremely important since it has been discovered that a wide range of biological processes are affected by them, such as epigenetic regulation, metabolic processes, chromosome dynamics and cell differentiation. This work investigates the classification of lncRNA’s from protein coding transcripts (PCT’s) using a multi-stage high throughput virtual screening (HTVS) pipeline. Each stage of the pipeline is a support vector machine (SVM) model.
Various features associated with RNAs in general have been calculated. These features are divided into three groups- sequence based, secondary structure based and physicochemical property based. These features were first calculated and analyzed in a method called LncFinder. Support vector machines have been trained on these features on the basis of complexity and time taken for calculation. Support-vector machines are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis.
These SVM’s have then been arranged on an HTVS pipeline as different stages of the pipeline.
The pipeline has been optimized using an optimization framework, for determining the screening thresholds of each stage of the HTS pipeline. The final number of lncRNA’s obtained can then be further used for drug discovery purposes. This multi-stage classification process significantly reduces the effective selection cost per potential candidate and make the HTS pipelines less sensitive to their structural variations.
Citation
Gadiyaram, Manasa (2022). Construction of an Optimized Multi-Stage High-Throughput Virtual Screening Pipeline for Long Non-Coding RNA's. Master's thesis, Texas A&M University. Available electronically from https : / /hdl .handle .net /1969 .1 /197363.