Thanks to recent progress in biotechnologies, the high-throughput data
in molecular biology are getting more
and more interesting from the point of view of application of advanced
methods for data analysis, aimed
at finding non-trivial topological characteristics in the data such as
branching points and holes. Existence
of such non-trivial structures in the data can have direct biological
interpretations such as the process
of cell fate decisions during cell differentiation or existence of
cyclic processes in a cell (e.g., cell cycle).
I will present examples of application of such methods in studying
various biological systems and explain their general
principles. I will focus on the universal method of elatic principal
graphs for topological data analysis developed by us.
The method is based on application of the notion of harmonic graph
embedding into a multi-dimensional space,
minimization of graph elastic energy and using graph grammars defining
a family of possible graph structures (such as trees).
Simplest implementations of the approach already give very usefull
data approximators such as principal curves,
principal closed curves, principal manifolds and principal trees.
Several ideas for making these data approximators
robust to the noise and outliers in the biological data will be
presented.