Abstract
We propose a novel strategy for discovering motifs from gene expression data. The gene expression data comes from DNA microarray analysis of the bacterium E. coli in response to recovery from nutrient starvation. We have annotated the data and identified the upregulated genes. Our interest is to find a set of common regulatory motifs that is responsible for the upregulation of these specific genes. We expect that a common motif that a protein can bind to will be present in the upstream region of the upregulated genes and will not be present in the upstream region of the genes that showed constant level of expression over the time series. Our objective is to find the common motifs that are over represented in the upstream sequences of upregulated genes and not present in the control set, which is the set of genes whose expression remained the same. We believe that there could be several subsets of co-regulated genes among the co-expressed genes i.e., we do not require the motif to be present in all sequences. We propose a new algorithm for finding such motifs through stages of pre-processing, de-noising, agglomerative clustering and consensus and checking. Through this process, we have found some motifs that are good candidates for further investigation.
Rajagopalan, Ganesh (2002). MOPAC: motif finding by preprocessing and agglomerative clustering from microarrays. Master's thesis, Texas A&M University. Available electronically from
https : / /hdl .handle .net /1969 .1 /ETD -TAMU -2002 -THESIS -R3442.