Large-scale analysis of phylogenetic search behavior
Loading...
Date
2009-05-15
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Phylogenetic analysis is used in all branches of biology by inferring evolutionary
trees. Applications include designing more effective drugs, tracing the transmission of
deadly viruses, and guiding conservation and biodiversity efforts. Most analyses rely
on effective heuristics for obtaining accurate trees. However, relatively little work has
been done to analyze quantitatively the behavior of phylogenetic heuristics in tree
space. This is important, because a better understanding of local search behavior
can facilitate the design of better heuristics, which ultimately leads to more accurate
depictions of the true evolutionary relationships.
In order to access and analyze the tree search space, we implement an effec-
tive local search heuristic. Having an effective heuristic that can open the space is
important, since no search heuristic in this field can effectively provide data collec-
tion control. So we have implemented and estimated a search heuristic, Simple Local
Search or SLS, that works reasonably well in the space.
Our investigations led to several interesting observations about the behavior of a
search heuristic and the tree search space. We studied the correlation of tree features
of search path trees, where tree features refer to the parsimony score, the Robinson-
Foulds distance and the homoplasy measure. Most importantly from the results,
parsimony score was highly correlated with Robinson-Foulds distance only in trees
that lie on the search path to a local optimum. We also note that the scores of
neighborhoods along search paths improve together, as a local search progresses. Correlations of tree features of search path trees are particularly useful in char-
acterizing and controlling a search path. This paper proposes one possible stopping
criterion to maximize the tree search results while minimizing computational time
tested on three biological datasets using the correlation between the parsimony score
and the RF distance value of search path trees. Also, the observation that scores of
a neighborhood on a search path improve together gives us a significant amount of
flexibility in selecting the next pivot of a search without losing performance.
Eventually, our long-term goal is developing an effective search heuristic that
can deal with large scale tree space in reasonable time. Improved knowledge about
the tree search space and the search heuristic can provide a reasonable starting point
toward the goal.
Description
Keywords
phylogenetic trees, maximum parsimony