Éva Czabarka (University of South Carolina)

Title: Structural and enumeration results on some tree families relevant for bioinformatics


2:30 pm / CH 445

Abstract. Biologists use trees as a first approximation to represent the evolution of genetic material. These trees are ideally but not necessarily rooted, internal vertices have degree at least 3 (root may have degree 2) and the leaves are labeled with the names of the corresponding species. In case of phylogenetic trees the leaves are unique, in case of gene trees leaf-labels may repeat. I will present some results on phylogenetic and gene trees. These are joint work with various subsets of the collaborators P.L. Erdős, V. Johnson, A. Kupczok, V.Moulton and L.A. Székely.


We obtained generating functions that enumerate the number of gene trees (binary, non-binary, rooted and unrooted) on a given label multi-set.

P.L. Erdős and L.A. Székely provided a bijection between rooted semi-labeled trees and set partitions. This, with the asymptotic normality of the Stirling numbers of the second kind (Harper) translates into the asymptotic normality of rooted semi-labeled trees with a fixed number of vertices and a variable number of internal vertices. We applied Harper's method and the Erdős-Székely bijection to obtain the asymptotic normality of phylogenetic trees.

In 1996 Guigo et al. posed the following problem: for a given phylogenetic tree and a set of gene trees on these species, what is the minimum number of duplication episodes explaining these species trees? We prove min-max theorems that generalize Gallai's archetypal min-max theorem on intervals that give a structural answer to this question.