MBFT 2013
Ősi genomok géntartalmának meghatározása a fajfa és génfák közös rekonstrukciójával
[email protected]
Géntörténetek a genom történelmében Szöllősi Gergely ELTE-MTA “Lendület” Biofizika Kutatócsoport UMR 5558 CNRS LBBE Lyon, Franciaország
and cattle fetal chains, and differ in Divergence the cattIe adult chain. Evolutionary and Therefore the situation, from this point of view is not entirely unConvergence,in Proteins. ambiguous. However, methionine andtryptophanarerare residues in ZUCKERKANDL hemoglobins, and their presence in the EMILE human chain and absence from Laboratoire de Physlsico-Chimie Colloidale N . R. S., the other chains under consideration is more meaningful than any Montpellier France of the other pertinent relationships that are observed. On the basis of the evidence, we propose the relationship between California Institutevertical of Technology, dimension is proporchains shown in Fig. 4.In this figure, the Pasadena, California du C.
AND
LINUS PAULING
I. THEMOLECULAR APPROACH To THE ANALYSISOF THE EVOLUTIONARY PROCESS Exponents of chemical paleogenetics have been faced at the present meeting bytwo disapproving scientific communities, the organismal evolutionists and taxonomists on the one hand,and some pure(very unorganismal) biochemists on the other hand. Some of the biochemists point out, or imply, that the interest in the biochemical foundation of evolutionary relationships between organisms is a second-rate interest. According to them (and to us), what most counts in the life sciences today is the uncovering of the molecular mechanisms that underly the observations of classical biology. The concept of mechanism should, however, not be applied exclusively to short-timed processes. The type of molecules that have been called informational macromolecules (68) or semantides (758) (DNA, RNA, proteins) has a unique role in determining the properties of living matter in each of three perspectives that differ by the magnitude of time required for the processes involved. These processes are the short-timed biochemical reaction, the medium-timed ontogenetic event, andthe long-timed evolutionary event. Although the slower processes must be broken down into linked faster processes, if one loses sight of the slower processes one also loses the links between the component faster processes. Why are semantides to play a privileged role in the understanding
hemoglobin
Darwin 1837
tional to the numberof differences between chains&asPauling given in Table VIII. Zukerkandl An ancestral gene has ,duplicated to yield two daughter genes, one of which has become the human y gene, and 1965 the other the horse gene and the gene present in the descent of the Primates prior to the appearance of the and genes. Not a great many million years before
Daubin & Boussau 2011
MBFT 2013
A genomok történelme
[email protected]
az első fajfa 1835-ből
“.. the great Tree of Life, which fills with its dead and broken branches the crust of the earth, and covers the surface with its ever-branching and beautiful ramifications.” On the Origin of Species Darwin, 1859
Darwin 1837
Makroszkópikus karakterek helyett molekuláris szekvenciák
MBFT 2013
Egy géncsalád története
[email protected]
az első génfa 1965-ből
“Of all natural systems,living matter is the one which, in the face of great transformations, preserves inscribed in its organization the largest amount of its own past history.” Molecules as Documents of Evolutionary History Zukerkandl & Pauling, 1965
d cattle fetal chains, and differ in Divergence the cattIe adult chain. Evolutionary and erefore the situation, from this point of view is not entirely unConvergence,in Proteins. biguous. However, methionine andtryptophanarerare residues in ZUCKERKANDL moglobins, and their presence in the EMILE human chain and absence from Laboratoire de Physlsico-Chimie N . R. S., other chains under consideration is Colloidale more meaningful than any Montpellier France the other pertinent relationships that are observed. On the basis of the evidence, we propose the relationship between California Institutevertical of Technology, dimension is proporains shown in Fig. 4.In this figure, the Pasadena, California du C.
TTCCCC
TTACGC
TTCCGC
AND
LINUS PAULING
I. THEMOLECULAR APPROACH To THE ANALYSISOF THE EVOLUTIONARY PROCESS Exponents of chemical paleogenetics have been faced at the present meeting bytwo disapproving scientific communities, the organismal evolutionists and taxonomists on the one hand,and some pure(very unorganismal) biochemists on the other hand. Some of the biochemists point out, or imply, that the interest in the biochemical foundation of evolutionary relationships between organisms is a second-rate interest. According to them (and to us), what most counts in the life sciences today is the uncovering of the molecular mechanisms that underly the observations of classical biology. The concept of mechanism should, however, not be applied exclusively to short-timed processes. The type of molecules that have been called informational macromolecules (68) or semantides (758) (DNA, RNA, proteins) has a unique role in determining the properties of living matter in each of three perspectives that differ by the magnitude of time required for the processes involved. These processes are the short-timed biochemical reaction, the medium-timed ontogenetic event, andthe long-timed evolutionary event. Although the slower processes must be broken down into linked faster processes, if one loses sight of the slower processes one also loses the links between the component faster processes. Why are semantides to play a privileged role in the understanding
hemoglobin
nal to the numberof differences between chains as given in Table VIII. Pauling, 1965 ancestral gene has ,duplicatedZukerkandl to yield two & daughter genes, one of ich has become the human y gene, and the other the horse gene
AGACGC
TTCCGC AGAGGG
homológ génszekvenciák alapján rekonstruált génfa
MBFT 2013
Egy géncsalád története
[email protected]
sztochasztikus szekvenciaevolúció-modell: Közös őssel rendelkező géncsalád:
ML rekonstrukció p(szekvenci´ak|G) TTCCGC TTCCCC TTACGC AGAGGC AGAGGG AGACGC
TTCCCC
TTACGC
TTCCGC
G
génfa
szubsztitúciók sorozata
AGAGGC
Felsenstein, 1981
AGACGC
AGAGGG
MBFT 2013
Egy géncsalád története
[email protected]
sztochasztikus szekvenciaevolúció-modell: Közös őssel rendelkező géncsalád:
ML rekonstrukció p(szekvenci´ak|G) TTCCGC TTCCCC TTACGC AGAGGC AGAGGG AGACGC
TTCCCC TTCCGC
TTACGC
TTCCGC
G
szubsztitúciók sorozata
génfa
TTACGC
AGAGGC
Felsenstein, 1981
AGACGC
AGAGGG
MBFT 2013
Egy géncsalád története
[email protected]
sztochasztikus szekvenciaevolúció-modell: Közös őssel rendelkező géncsalád:
ML rekonstrukció p(szekvenci´ak|G) TTCCGC TTCCCC TTACGC AGAGGC AGAGGG AGACGC
TTCCCC TTCCGC
TTACGC
TTCCGC
G
szubsztitúciók sorozata
génfa
TTACGC AGACGC AGAGGC AGAGGC
Felsenstein, 1981
AGACGC
AGAGGG
MBFT 2013
Egy géncsalád története
Minden faj genomjában megtalálható gének: pl. 16S rRNS
Woese, 1977
(a gének 1%-ának története) Ciccarelli, 2006
16S rRNS gének fája
[email protected]
MBFT 2013
Egy géncsalád története
chains, and differ in the cattIe adult chain. Az egyes géncsaládok története egyedi, nem egyezik situation, from this point of view is not entirely unde hordozza owever, methionine andtryptophanarerare residues inannak lenyomatát. nd their presence in the human chain and absence from chains under consideration is more meaningful than any rtinent relationships that are observed. is of the evidence, we propose the relationship between n Fig. 4.In this figure, the vertical dimension is propor-
a fajok történetével,
A fajok fája - a genomtörténelem rekonstruálható teljes genomok alapján
hemoglobin
Zukerkandl & Pauling umberof differences between chains as given in Table VIII. 1965 ene has ,duplicated to yield two daughter genes, one of ome the human y gene, and the other the horse gene ene present in the descent of the Primates prior to the
[email protected]
Daubin & Boussau 2011
Egy géncsalád története
MBFT 2013
[email protected]
Az egyes géncsaládok története egyedi, nem egyezik a fajok történetével, de hordozza annak lenyomatát.
génfa
fajfa TTACGC
c1 TTCCGC TTCCCC TTACGC AGAGGC AGAGGG AGACGC
b1
c2
gének
Duplikáció TTCCGC TTCCGC a1 Transzfer & Loss események sorozata TTACGC szubsztitúciók
géntranszfer G
génfa
AGACGC
sorozata
idő
d1
AGAGGC
d2
AGAGGC AGACGC
a1
A
b1 c2 , c1
AGAGGG
BC
d2 , d1
D
fajok genomjában gének
Egy géncsalád története
MBFT 2013
[email protected]
Az egyes géncsaládok története egyedi, nem egyezik a fajok történetével, de hordozza annak lenyomatát.
génfa
fajfa a1
c1
b1
c2
gének
d2
Duplikáció Transzfer & Loss események sorozata
géntranszfer idő
d1
a1
b1 c2 , c1
A
BC
d2 , d1
D
fajok genomjában gének
MBFT 2013
Egy géncsalád története a genomtörténelemben
szekvenciák valószínűsége a génfa alapján p(szekvenci´ak|Gi ) génfa-valószínűsége fajfa alapján p(Gi |S,r´at´ak)
ssolo @elte.hu
sztochasztikus genomevolúció-modell: D,T & L ráták
fajfa
idő
L
idő
D T
S
MBFT 2013
A genomtörténelem rekonstrukciója A génfák “molekuláris fosszíliák”, a topológiájukban kódolt transzferek ősi fajfa-elágazások (ősi fajok) időrendjét rögzítik.
at´ak) segítségével tárható fel. A génfákban rejlő információ p(Gi |S,r´
1 transzfer
2 transzfer
3 transzfer
idő
a1
d2
ssolo @elte.hu
aa11
bb11 cc22 ,, c11 d2 , d1
a1
A A
BC B
A
D
b1 c2 , c1
BC
d2 , d1
a1
D
A
b1 c2 , c1 d2 , d1
BC
D
Szöllősi, Boussau, Abby, Tannier & Daubin PNAS (2012) Phylogenetic modeling of lateral gene transfer reconstructs the pattern and relative timing of speciations
MBFT 2013
A genomtörténelem rekonstrukciója
ssolo @elte.hu
Párhuzamos számításokat végző algoritmus: (pár tucattól több száz CPU-ig)
szerver: egy közös fajfa optimalizálása fajfa kliensek: több ezer génfa valószínűsége p(Gi |S,r´ at´ ak) d1
a1
c1
b1
c2
d2
a1
b1 c2 , c1
A
BC
d2 , d1
D
d1
a1
c1
b1
c2
d2
TTCCGCTTCCGC TTCCGC TTCCCCTTCCCC TTCCCC TTACGCTTACGC TTACGC AGAGGCAGAGGC AGAGGC AGAGGGAGAGGG AGAGGG AGACGCAGACGC AGACGC
d1 c1
a1 b1 b1 c2c, c1 2
a1
d2 d2 , d1
A TTCCGCBTTCCGC C TTCCGCD TTCCCCTTCCCC TTCCCC TTACGCTTACGC TTACGC AGAGGCAGAGGC AGAGGC AGAGGGAGAGGG AGAGGG AGACGCAGACGC AGACGC
d1
a1
c1
a1 b1
b1 cc2 , c1 2
d2 d2 , d1
A TTCCGCB C TTCCGCD TTCCGC
...
a1
b1 c2 , c1
A
BC
TTCCCCTTCCCC TTCCCC TTACGCTTACGC TTACGC AGAGGCAGAGGC AGAGGC AGAGGGAGAGGG AGAGGG AGACGCAGACGC AGACGC
HOGENOM (Homologous complete genomes gene families)
d2 , d1
D
0.90
MBFT 2013
720
699
718
719
A genomtörténelem rekonstrukciója
ssolo TABLE S2: Statistical significance of the ML DTL rooting. To control for reconstruction errors in gene trees we performed two A @elte.hu
i) using all families and ii) using the 50% most congruent families. Corresponding p-values are shown separated by a semi-colon f model and dataset, for the four roots discussed in the text. For roots which could not be rejected with confidence p < 0.001 for one test p-values are shown in bold. The time-orders for each root were optimized independently using a homogenous model. The hetero model calculations were performed using these time orders.
időrend - 36 cianobaktérium genomja - 3 milliárd év Gloeo. all; 50% uniform <1e-4; <1e-4 heterogenous <1e-4; <1e-4 uniform <1e-4; <1e-4 heterogenous <1e-4; <1e-4
idő
36 genomból 80 000 gén, 8332 géncsalád
Blue Green all; 50% all; 50% no out-genes <1e-4; <1e-4 1.0; 1.0 <1e-4; <1e-4 1.0; 1.0 out-genes 0.007; 0.009 0.972; 0.024 <1e-4; <1e-4 1.0 ; 0.735
Violet all; 50%
Nostocales
<1e-4; <1e-4 <1e-4; <1e-4 0.042; 0.989 <1e-4; 0.266
TABLE S3: Comparison of date estimates. To obtain an estimate of the date of each node we scaled the height of every time slice prop to the number transfers per branch occurring in the time slice and calibrated time based on i) the age of the root using constraints f [19] ii) the age of node 5 based on estimates of the age of akinate forming Nostocales [20]. Calibration dates are shown in bold. We c the results with the relaxed molecular clock (RMC) estimates of ref. [21] table 3, where the authors used different RMC methods to range of median estimates. All dates are given in units of 109 years ago (Ga). Estimated dates are based on ML time order shown in The range of estimated dates corresponds to the uncertainty in constraints and does not consider statistical support of node positions.
node ML DTL root calibration [19] ML DTL Nostocales calibration [20] RMC estimates ref. [21] table 3 root 3.50 - 2.70 3.58 -3.06 3.07 -2.67 3 2.94 - 2.27 3.01 - 2.58 2.99 - 2.61 4 2.62 - 2.02 2. 68 - 2.30 2.53 - 2.34 5 2.39 - 1.84 2.45 - 2.10 2.49 - 2.29 6 2.10 - 1.62 2.15 - 1.84 2.31 - 1.80 8 1.73 - 1.33 1.77 - 1.52 2.42 - 1.78 9 1.59 - 1.22 1.62 -1.39 2.49 -1.97 14 0.92 - 0.71 0.94 - 0.80 1.02 -0.64 Szöllősi, Boussau, Abby, Tannier & Daubin PNAS (2012) Phylogenetic modeling of lateral gene transfer reconstructs the pattern and relative timing of speciations
MBFT 2013
Egy géncsalád története
chains, and differ in the cattIe adult chain. Az egyes géncsaládok története egyedi, nem egyezik situation, from this point of view is not entirely unde hordozza owever, methionine andtryptophanarerare residues inannak lenyomatát. nd their presence in the human chain and absence from chains under consideration is more meaningful than any rtinent relationships that are observed. is of the evidence, we propose the relationship between n Fig. 4.In this figure, the vertical dimension is propor-
a fajok történetével,
Az egyedi, külön-külön rekonstruált génfák sok hibát tartalmaznak
hemoglobin
Zukerkandl & Pauling umberof differences between chains as given in Table VIII. 1965
ene has ,duplicated to yield two daughter genes, one of ome the human y gene, and the other the horse gene ene present in the descent of the Primates prior to the
[email protected]
Daubin & Boussau 2011
Ősi genomok rekonstrukciója
MBFT 2013
[email protected]
jobb génfák, pontosabb géntörténetek fajfa és génfák közös optimalizálása (pár száztól pár ezer CPU-ig)
szerver: egy közös fajfa optimalizálása fajfa kliensek: génfák optimalizálása szekvencia- és génfa-valószínűségek d1 d
1
1
b1 b1
c2c2
d2d2
p(szekvenci´ak|Gi )
a1
c1
b1
a1a
c1 c1
d1
c2
d2
TTCCGCTTCCGC TTCCGC TTCCCCTTCCCC TTCCCC TTACGCTTACGC TTACGC AGAGGCAGAGGC AGAGGC AGAGGGAGAGGG AGAGGG AGACGCAGACGC AGACGC
aa11
bb11 cc22 , c11
AA
C BB C
d1 c1
a1 b1 b1 c2c, c1 2
a1
d2 d2 , d1
A TTCCGCBTTCCGC C TTCCGCD TTCCCCTTCCCC TTCCCC TTACGCTTACGC TTACGC AGAGGCAGAGGC AGAGGC AGAGGGAGAGGG AGAGGG AGACGCAGACGC AGACGC
d1
a1
c1
a1 b1
b1 cc2 , c1 2
d2 d2 , d1
A TTCCGCB C TTCCGCD TTCCGC
...
a1
b1 c2 , c1
A
BC
TTCCCCTTCCCC TTCCCC TTACGCTTACGC TTACGC AGAGGCAGAGGC AGAGGC AGAGGGAGAGGG AGAGGG AGACGCAGACGC AGACGC
d22 ,, dd11
D D
p(Gi |S,r´ at´ ak)
HOGENOM (Homologous complete genomes gene families)
d2 , d1
D
Ősi genomok rekonstrukciója
MBFT 2013
[email protected]
jobb génfák, pontosabb géntörténetek szimulációs eredmények fix fajfára
30
●
20
● ● ●
● ● ● ●
● ●
Az egyedi génfák alapján detektált transzferek több mint 2/3-a rekonstrukciós hiba eredménye.
10
● ●
{
{
KOMPLEX EGYSZERŰ KOMPLEX EGYSZERŰ
fajfa felhasználásával
Vincent Daubin
A fajfa figyelembevétele drámaian pontosabb génfákat eredményez.
0
ALE exODT
Távolság a helyes fához
40
●
csak génszekvenciák alapján Szöllősi, Tannier, Lartillot & Daubin Systematic Biology (2013) Lateral Gene Transfer from the Dead Szöllősi, Rosikiewicz, Boussau, Tannier & Daubin Systematic Biology (2013) Efficient exploration of the space of reconciled gene trees
MBFT 2013
genome coverage to prevent PHYLDOG from interpreting these artifactual ‘‘shared losses’’ as a signal for clustering low-coverage genomes together in the species tree. More precisely, we added a component to the expected number of gene losses on terminal branches that depended on genome coverage (Supplemental Material section S8). For this analysis, we benefited from a French national supercomputing resource for research, JADE, currently the 43rd largest supercomputer in the world Figure 2. (A) Correlation between the expected and reconstructed numbers of duplications and (Top500 November 2011 supercomputer losses per gene and per branch of the species tree. The x = y line is in gray. (B) Topological (RF) (Robinson list, http://www.top500.org), and used and Foulds 1979) distance to the true gene family trees of the trees reconstructed by PHYLDOG under 3000 processes in parallel. a simpler model of sequence evolution (JC69) than that used in the simulation (HKY85 with rate heterogeneity among sites) and by PhyML under the same simple model and under the correct model of We started PHYLDOG from a ranevolution. For PHYLDOG, the median RF distance to the true tree is at 0. dom species tree topology, and obtained the tree shown in Figure 3. For comparison, we also reconstructed the species trees using two alternative mammalian species with widely different sequence coverages approaches: iGTP (DL parsimony method) (Chaudhary et al. 2010), (Supplemental Table S1). Genomes with low coverage share by and duptree (gene tree parsimony method) (Wehe et al. 2008). chance a number of unsequenced or unannotated genes, makThese two approaches differ from ours by their use of a parsimony ing this data set challenging for studying genomic evolution framework and the fact that the gene trees need to be reconstructed (Milinkovitch et al. 2010). We introduced a correction to account for
Ősi genomok rekonstrukciója
[email protected]
fajfa - 36 emlős genomja - 160 millió év fajfa és génfák közös optimalizálása emlősök fajfája
Párhuzamos számításokkal a közös optimalizáció kivitelezhető több tucat teljes genom esetén.
PHYLDOG gének száma ősi genómokban
Figure 3. Mammalian tree reconstructed by PHYLDOG, with arbitrary branch lengths. Ancestral gene contents obtained using PhyML (red), TreeBeST (green), and PHYLDOG (blue) are shown for several nodes (circled).
Genome Research
Bastien Boussau
www.genome.org
325
Boussau, Szöllősi, Duret, Gouy, Tannier & Daubin Genome Res. (2013) Genome-scale coestimation of species and gene trees
MBFT 2013
genome coverage to prevent PHYLDOG from interpreting these artifactual ‘‘shared losses’’ as a signal for clustering low-coverage genomes together in the species tree. More precisely, we added a component to the expected number of gene losses on terminal branches that depended on genome coverage (Supplemental Material section S8). For this analysis, we benefited from a French national supercomputing resource for research, JADE, currently the 43rd largest supercomputer in the world Figure 2. (A) Correlation between the expected and reconstructed numbers of duplications and (Top500 November 2011 supercomputer losses per gene and per branch of the species tree. The x = y line is in gray. (B) Topological (RF) (Robinson list, http://www.top500.org), and used and Foulds 1979) distance to the true gene family trees of the trees reconstructed by PHYLDOG under 3000 processes in parallel. a simpler model of sequence evolution (JC69) than that used in the simulation (HKY85 with rate heterogeneity among sites) and by PhyML under the same simple model and under the correct model of We started PHYLDOG from a ranevolution. For PHYLDOG, the median RF distance to the true tree is at 0. dom species tree topology, and obtained the tree shown in Figure 3. For comparison, we also reconstructed the species trees using two alternative mammalian species with widely different sequence coverages approaches: iGTP (DL parsimony method) (Chaudhary et al. 2010), (Supplemental Table S1). Genomes with low coverage share by and duptree (gene tree parsimony method) (Wehe et al. 2008). chance a number of unsequenced or unannotated genes, makThese two approaches differ from ours by their use of a parsimony ing this data set challenging for studying genomic evolution framework and the fact that the gene trees need to be reconstructed (Milinkovitch et al. 2010). We introduced a correction to account for
Ősi genomok rekonstrukciója
[email protected]
Downloaded from genome.cshlp.org on August 27, 2013 - Published by Cold Spring Harbor Laborato
jobb génfák, pontosabb géntörténetek
Boussau et al.
Ősi genomméretek
gének száma
a priori (here we used these methods emlősök fajfája with gene trees inferred by PhyML). We also compared the species tree inferred by Phyldog with the species tree that is used by Ensembl to build the Compara database (which is based on a synthesis of the current literature on mammalian phylogeny). Overall, the four species trees agree on most well-established relationships and, for instance, support the Atlantogenata hypothesis for the root of the placental phylogeny (Fig. 3; Supplemental Figs. S5, S6, S7; Waddell et al. 1999; Murphy et al. 2007). However, iGTP does not recover the consensus mamkortárs malian root between monotremes and Eutheria. Most incongruences among the four trees appear in Laurasiatheria, notably Figure A4.közös Quality of ancestral chromosome optimalizáció regarding the position of bats, a problem still highly controversial from gene tree reconciliations. We used the sp valószerűbb ősi genomméreteket (McCormack et al. 2012; Zhou et al. 2012), possibly made difficult tions from Compara to analyze TreeBeST trees by effects of gének incomplete lineage sorting (McCormack et al. 2012). eredményez. nious reconciliation using the species tree in száma ősi genómokban PHYLDOG trees. (A) Genome content correspon Interestingly, the PHYLDOG tree and the tree used by Ensembl genes from 5039 families (selected for compa place the tree shrew Tupaia belangeri as a sister to primates, as in a plemental Material section S10), for all ances previous study based on rare genomic events (Janecka et al. 2007), Figure 3. Mammalian tree reconstructed by PHYLDOG, with arbitrary branch lengths. Ancestral gene contents obtained using PhyML (red), TreeBeST phylogeny. ‘‘Extant’’ corresponds to the observe (green), and PHYLDOG (blue) are shown for several nodes (circled). but the two parsimony-based methods place it next to or within data set for extant species. Gene contents recon Genome Research 325 trees are significantly smaller than those reco rodents. Although the question of the position of tree shrews rewww.genome.org trees: paired Wilcoxon test P-value = 4.10!4. (B Bastien mains largely open, these results suggest that the simultaneous inper ancestral gene. proportion of genes Boussau, Szöllősi, Duret, Gouy, Tannier & The Daubin Genome Res. (2013) with Boussau ference of gene trees and the species tree such as implemented in Genome-scale coestimation of species(blue) and gene treesfor PhyML (red) and for PHYLDOG than PHYLDOG may be an important step toward resolving difficult Wilcoxon test P-value = 3.10!11 for the compar
PHYLDOG
Ősi genomok rekonstrukciója
MBFT 2013
[email protected]
jobb génfák, pontosabb géntörténetek
cianobaktérium cianobaktérium génfák alapján génfák alapján rekonstruált rekonstruált ősi génsorrend ősi génsorrend
szomszédos gének szomszédos száma gének száma
4. Qualitychromosome of ancestral reconstruction chromosome reconstruction inferred ancestral inferred ne tree reconciliations. We used tree and reconciliaiations. We used the species treethe andspecies reconciliaom Compara to analyze TreeBeST trees, and the most parsimoanalyze TreeBeST trees, and the most parsimoa génfa hibák látszólagos a génfa hibák látszólagos extra szomszédokat adnak szomszédokat conciliation using in Figure 3 for adnak PhyML and ng the species treethe in species Figure 3tree forextra PhyML and OG trees. (A) Genome content to theoftotal number of nome content corresponds to corresponds the total number om (selected 5039 families (selected for comparison purposes, see Supies for comparison purposes, see Supal Material section S10), for all ancestral nodes in the species tion S10), for all ancestral nodes in the species ny. ‘‘Extant’’ corresponds the observed numbers rresponds to the observed to numbers of genes in our of genes in our for Gene extantcontents species. reconstructed Gene contentsfrom reconstructed ies. PHYLDOGfrom PHYLDOG emaller significantly smaller than those from reconstructed than those reconstructed TreeBeST from TreeBeST !4 !4 aired Wilcoxon test P-value = 4.10 of. (B) test P-value = 4.10 Number of adjacencies . (B) Number adjacencies estral gene. The proportion of genes with two adjacencies is higher proportion of genes with two adjacencies is higher LDOG (blue) than PhyML (red)(green) and TreeBeST an for PhyML (red)for and TreeBeST (paired (green) (paired !11 !11 n3.10 test P-value = 3.10 for the comparison for the comparison with TreeBeST).with TreeBeST).
Eric Tannier
0
1
ALE PhyML
ALE PhyML
0.4
0.6
●
0.2
●
összes gének hányada
0.2
0.4
0.6
●
0.0
összes gének hányada
DeCo DeCoLT
összes gének hányada
összes gének hányada
emlős génfákemlős alapján génfák alapján rekonstruált rekonstruált ősi génsorrend ősi génsorrend
0.0
shed Cold Spring Harbor Laboratory Press pringby Harbor Laboratory Press
validáció rekonstruált génsorrenddek segítségével
2
● ●
●
●
● ● ●● ●●
30 41 52 63 74
5
A fajfa figyelembevétele pontosabb ősi génsorrendeket eredményez. ●
●
6
7
●
szomszédos gének szomszédos száma gének száma
a génfa hibák látszólagos a génfa hibák látszólagos extra szomszédokatextra adnak szomszédokat adnak
Bérard, Gallien, Boussau, Szöllősi, Daubin & Tannier Bioinformatics (2012) Evolution of gene neighborhoods within reconciled phylogenies Patterson, Szöllősi, Daubin &Tannier BMC Bioinformatics (2013) Lateral Gene Transfer, Rearrangement, Reconciliation
Ősi genomok rekonstrukciója
MBFT 2013
[email protected]
jobb génfák, pontosabb ősi génszekvenciák szimulációs eredmények
Sequence Tree LG
ASR
Sequence Tree C60
Reconciled Tree F81 + exODT
0.15 0.10
A fajfa figyelembevétele pontosabb ősi aminosav szekvenciákat eredményez.
0.05
Distance to the true sequence
b)
0.00
Távolság a helyes szekvenciához
40 30 20
Robinson Foulds distance to the 'real' tree
10 0
Távolság a helyes fához
a)
Sequence Tree LG
Reconciled Tree F81 + exODT
'Real' Tree
Figure 3: Impact of the phylogenetic tree on ASR.
TTCCCC TTCCGC
TTACGC
TTCCGC
TTACGC
G
génfa
AGACGC
Mathieu Groussin
Sequence Tree C60
rekonstruált ősi génszekvenciák
AGAGGC AGAGGC AGACGC
AGAGGG
Groussin, Hobbs, Szöllősi, Gribaldo, Arcus & Gouy in preperation
Ősi genomok rekonstrukciója
MBFT 2013
[email protected]
jobb génfák, pontosabb ősi génszekvenciák szimulációs eredmények
Sequence Tree LG
ASR
Sequence Tree C60
Reconciled Tree F81 + exODT
0.15 0.10
A fajfa figyelembevétele pontosabb ősi aminosav szekvenciákat eredményez.
0.05
Distance to the true sequence
b)
0.00
Távolság a helyes szekvenciához
40 30 20
Robinson Foulds distance to the 'real' tree
10 0
Távolság a helyes fához
a)
Sequence Tree LG
Reconciled Tree F81 + exODT
'Real' Tree
Figure 3: Impact of the phylogenetic tree on ASR.
TTCCCC TTCCGC
TTACGC
TTCCGC
TTACGC
G
génfa
AGACGC
Mathieu Groussin
Sequence Tree C60
rekonstruált ősi génszekvenciák
AGAGGC AGAGGC AGACGC
AGAGGG
Groussin, Hobbs, Szöllősi, Gribaldo, Arcus & Gouy in preperation
Ősi genomok rekonstrukciója
MBFT 2013
[email protected]
jobb génfák, pontosabb ősi génszekvenciák validáció in vitro feltámasztott ősi enzimekkel összes firmicute (71) LeuB szekvencia, >1 milliárd év Enzyme
{
kortárs LeuB-k 3-isopropylmalate dehydrogenase E. C. 1.1.1.85
{
ősi “feltámasztott” LeuB különböző módszerekkel
ASR
(IP M )
KM
Topt
G‡N
U
(mM)
( C)
(kJmol
BPSYC
0.2
47
94.9
BSUB
0.7
53
95.9
BCVX
1.1
69
100.7
1.5
85
114.4
1.6
85
110.9
6.8
78
91.4
Reconciled Tree + LG Reconciled Tree + EX EHO
1
)
A fajfa figyelembevétele lehetővé teszi funkciónális ősi enzimek rekonstrukcióját.
Sequence Tree + EX EHO
Table 1: Biophysical parameters for the ancestral LeuB enzyme of the Firmicutes ancestor. Values obtained in this study for the ancestor of Firmicutes (bold characters) were inferred using either the LeuB sequence tree or the LeuB reconciled tree and either with the site-homogeneous LG model or with the site-heterogeneous EX EHO model. Data for contemporary (first three lines) are shown for comparison)
Mathieu Groussin
Groussin, Hobbs, Szöllősi, Gribaldo, Arcus & Gouy in preperation
Ősi genomok rekonstrukciója
MBFT 2013
[email protected]
jobb génfák, pontosabb ősi génszekvenciák validáció in vitro feltámasztott ősi enzimekkel összes firmicute (71) LeuB szekvencia, >1 milliárd év Enzyme
{
kortárs LeuB-k 3-isopropylmalate dehydrogenase E. C. 1.1.1.85
{
ősi “feltámasztott” LeuB különböző módszerekkel
ASR
(IP M )
KM
Topt
G‡N
U
(mM)
( C)
(kJmol
BPSYC
0.2
47
94.9
BSUB
0.7
53
95.9
BCVX
1.1
69
100.7
1.5
85
114.4
1.6
85
110.9
6.8
78
91.4
Reconciled Tree + LG Reconciled Tree + EX EHO
1
)
A fajfa figyelembevétele lehetővé teszi funkciónális ősi enzimek rekonstrukcióját.
Sequence Tree + EX EHO
Table 1: Biophysical parameters for the ancestral LeuB enzyme of the Firmicutes ancestor. Values obtained in this study for the ancestor of Firmicutes (bold characters) were inferred using either the LeuB sequence tree or the LeuB reconciled tree and either with the site-homogeneous LG model or with the site-heterogeneous EX EHO model. Data for contemporary (first three lines) are shown for comparison)
Mathieu Groussin
Groussin, Hobbs, Szöllősi, Gribaldo, Arcus & Gouy in preperation
Ősi genomok rekonstrukciója
MBFT 2013
[email protected]
jobb génfák, pontosabb ősi génszekvenciák validáció in vitro feltámasztott ősi enzimekkel összes firmicute (71) LeuB szekvencia, >1 milliárd év Enzyme
{
kortárs LeuB-k 3-isopropylmalate dehydrogenase E. C. 1.1.1.85
{
ősi “feltámasztott” LeuB különböző módszerekkel
ASR
(IP M )
KM
Topt
G‡N
U
(mM)
( C)
(kJmol
BPSYC
0.2
47
94.9
BSUB
0.7
53
95.9
BCVX
1.1
69
100.7
1.5
85
114.4
1.6
85
110.9
6.8
78
91.4
Reconciled Tree + LG Reconciled Tree + EX EHO
1
)
A fajfa figyelembevétele lehetővé teszi funkciónális ősi enzimek rekonstrukcióját.
Sequence Tree + EX EHO
Table 1: Biophysical parameters for the ancestral LeuB enzyme of the Firmicutes ancestor. Values obtained in this study for the ancestor of Firmicutes (bold characters) were inferred using either the LeuB sequence tree or the LeuB reconciled tree and either with the site-homogeneous LG model or with the site-heterogeneous EX EHO model. Data for contemporary (first three lines) are shown for comparison)
Mathieu Groussin
Groussin, Hobbs, Szöllősi, Gribaldo, Arcus & Gouy in preperation
Ősi genomok rekonstrukciója
MBFT 2013
[email protected]
jobb génfák, pontosabb ősi génszekvenciák validáció in vitro feltámasztott ősi enzimekkel összes firmicute (71) LeuB szekvencia, >1 milliárd év Enzyme
{
kortárs LeuB-k 3-isopropylmalate dehydrogenase E. C. 1.1.1.85
{
ősi “feltámasztott” LeuB különböző módszerekkel
ASR
(IP M )
KM
Topt
G‡N
U
(mM)
( C)
(kJmol
BPSYC
0.2
47
94.9
BSUB
0.7
53
95.9
BCVX
1.1
69
100.7
1.5
85
114.4
1.6
85
110.9
6.8
78
91.4
Reconciled Tree + LG Reconciled Tree + EX EHO
1
)
A fajfa figyelembevétele lehetővé teszi funkciónális ősi enzimek rekonstrukcióját.
Sequence Tree + EX EHO
Table 1: Biophysical parameters for the ancestral LeuB enzyme of the Firmicutes ancestor. Values obtained in this study for the ancestor of Firmicutes (bold characters) were inferred using either the LeuB sequence tree or the LeuB reconciled tree and either with the site-homogeneous LG model or with the site-heterogeneous EX EHO model. Data for contemporary (first three lines) are shown for comparison)
Mathieu Groussin
Groussin, Hobbs, Szöllősi, Gribaldo, Arcus & Gouy in preperation
Ősi genomok rekonstrukciója
MBFT 2013
[email protected]
jobb génfák, pontosabb ősi génszekvenciák validáció in vitro feltámasztott ősi enzimekkel összes firmicute (71) LeuB szekvencia, >1 milliárd év Enzyme
{
kortárs LeuB-k 3-isopropylmalate dehydrogenase E. C. 1.1.1.85
{
ősi “feltámasztott” LeuB különböző módszerekkel
ASR
(IP M )
KM
Topt
G‡N
U
(mM)
( C)
(kJmol
BPSYC
0.2
47
94.9
BSUB
0.7
53
95.9
BCVX
1.1
69
100.7
1.5
85
114.4
1.6
85
110.9
6.8
78
91.4
Reconciled Tree + LG Reconciled Tree + EX EHO
1
)
A fajfa figyelembevétele lehetővé teszi funkciónális ősi enzimek rekonstrukcióját.
Sequence Tree + EX EHO
Table 1: Biophysical parameters for the ancestral LeuB enzyme of the Firmicutes ancestor. Values obtained in this study for the ancestor of Firmicutes (bold characters) were inferred using either the LeuB sequence tree or the LeuB reconciled tree and either with the site-homogeneous LG model or with the site-heterogeneous EX EHO model. Data for contemporary (first three lines) are shown for comparison)
Mathieu Groussin
Groussin, Hobbs, Szöllősi, Gribaldo, Arcus & Gouy in preperation
Ősi genomok rekonstrukciója Perspektíva
MBFT 2013
[email protected]
“ancestor-ome kutatás”
Ősi genomok génrepertoárja
Rekonstruált:
Ősi tulajdonságok, környezet
metabolikus hálózatok fehérjeszekvencia és -kompozíció génsorrend
..
(Ernst Mayer)
Vincent Daubin
Koevolúció detektálása és modellezése: kromoszómán szomszédos gének metabolikus hálózatban “szomszédos” gének Derényi Imre
hőmérséklet atmoszférikus oxigén ökológiai kh.-ok kromoszómák ..