PDF hosted at the Radboud Repository of the Radboud University Nijmegen
This full text is a publisher's version.
For additional information about this publication click this link. http://hdl.handle.net/2066/93604
Please be advised that this information was generated on 2013-03-21 and may be subject to change.
Genome-scale integrative genotype-phenotype analysis of Lactococcus lactis and Lactobacillus plantarum
Proefschrift ter verkrijging van de graad van doctor aan de Radboud Universiteit Nijmegen op gezag van de rector magnificus prof. mr. S.C.J.J. Kortmann, volgens besluit van het college van decanen in het openbaar te verdedigen op woensdag 6 juni 2012 om 10.30 uur precies door
Jumamurat Rahymnazarovich Bayjanov geboren op 12 juli 1979 te Halach, Toerkmenistan
Promotor: Prof. dr. Roland J. Siezen Copromotor: Dr. Sacha A.F.T. van Hijum Manuscriptcommissie: Prof. dr. Joost G. Hoenderop Prof. dr. Roland Brock Prof. dr. ir. Vítor A.P. Martins dos Santos
ISBN/EAN: 978-90-819078-0-4 © Jumamurat R. Bayjanov Alle rechten voorbehouden.
Dit werk maakt deel uit van het BioRange programma van het Netherlands Bioinformatics Centre (NBIC), dat wordt ondersteund door een BSIK subsidie door het Netherlands Genomics Initiative (NGI). Gedrukt door WÖHRMANN PRINT SERVICE
Genome-scale integrative genotype-phenotype analysis of Lactococcus lactis and Lactobacillus plantarum
Doctoral Thesis to obtain the degree of doctor from Radboud University Nijmegen on the authority of the Rector Magnificus prof. dr. S.C.J.J. Kortmann, according to the decision of the Council of Deans to be defended in public on Wednesday, June 6, 2012 at 10.30 hours by
Jumamurat Rahymnazarovich Bayjanov Born on July 12, 1979 in Halach, Turkmenistan
Supervisor: Prof. dr. Roland J. Siezen Co-supervisor: Dr. Sacha A.F.T. van Hijum Doctoral Thesis Committee: Prof. dr. Joost G. Hoenderop Prof. dr. Roland Brock Prof. dr. ir. Vítor A.P. Martins dos Santos
ISBN/EAN: 978-90-819078-0-4 © Jumamurat R. Bayjanov All rights reserved.
This work is part of the BioRange programme of the Netherlands Bioinformatics Centre (NBIC), which is supported by a BSIK grant through the Netherlands Genomics Initiative (NGI). Printed by WÖHRMANN PRINT SERVICE
TABLE OF CONTENTS
Chapter 1
Introduction
7
Chapter 2
PanCGH: a genotype-calling algorithm for pangenome CGH data
21
Chapter 3
PanCGHweb: a web-tool for genotype-calling in pangenome CGH data
37
Chapter 4
Genome-scale diversity and niche adaptation analysis of Lactococcus lactis by comparative genome hydridization using multi-strain arrays
43
Chapter 5
PhenoLink – a web-tool for linking phenotype to ~omics data for bacteria: application to gene-trait matching for Lactobacillus plantarum strains
77
Chapter 6
Genotype-phenotype matching analysis of 38 Lactococcus lactis strains using Random Forest methods
103
Chapter 7
Summarizing discussion
127
Samenvatting Publications Acknowledgements Curriculum Vitae
135 143 145 147
CHAPTER 1 Introduction
7
Samenvatting
zeer waardevol voor het prioriteren van relaties om verder te valideren in het laboratorium. Zo hebben we fenotype-gerelateerde L. lactis genen gevisualiseerd op metabole routes met behulp van informatie uit de KEGG database (resultaten niet getoond). Deze functionaliteit is echter niet geïntegreerd in PhenoLink. In dit proefschrift presenteerden we drie verschillende, maar verwante studies van vele L. lactis stammen. We zijn begonnen met genotypering van deze stammen met de pan-genoom arrays (zie hoofdstuk 2) en het gebruik van genotypering resulteert in een uitgebreide vergelijkende genomics-analyse (zie hoofdstuk 4). Vervolgens werd PhenoLink gebruikt om de relaties tussen de genotypische en fenotypische eigenschappen van deze stammen te identificeren (zie hoofdstuk 6). Met deze verkennende studies onthullen we nieuwe inzichten in de genomische en fenotypische eigenschappen van L. lactis stammen. De gen-fenotype relaties moeten echter worden gevalideerd en mechanistisch begrepen door experimentele studies. Zo identificeerden we een kleine genomische regio met een grootte van ongeveer 15 kb in L. lactis stammen die groeien op de planten oligosaccharide melibiose (zie hoofdstuk 6). Deze regio is afwezig in het genoom van L. lactis stammen die niet groeien op melibiose. Een vergelijkbare regio werd niet aangetroffen in melibiose-negatieve stammen van Streptococcus mutans (zie hoofdstuk 6). Experimentele follow-up analyse kan uitwijzen of deletie van deze chromosomale regio in melibiosepositieve L. lactis stammen leidt tot verlies van het vermogen tot groei op melibiose. De studies van L. lactis en L. plantarum beschreven in dit proefschrift hebben laten zien dat een beter begrip van de genetische basis van een groot aantal fenotypen van deze stammen kan worden beschreven door sequentiebepaling van een aantal representatieve genomen. Op basis van recente ontwikkelingen in de genotypering technologieën (bijv.: DNA sequencing) zou sequencing van tientallen representatieve stammen van een species leiden tot inzichten in niche-, fenotype- en genotypeniveau verschillen van deze stammen binnen de soort van interesse.
Erkenningen en financiering:
Samenvatting
Met dank aan: Sacha van Hijum en Roland Siezen voor textuele herziening van dit hoofdstuk, en Jos Boekhorst voor het ter beschikking stellen van zijn orthology prediction tool gebaseerd op OrthoMCL15 ter bepaling van homologe genen van de 7 Lactococcus lactis stammen. Financiering: Besluit Subsidies Investeringen Kennisinfrastructuur (BSIK) te verlenen [door Nederland Genomics Initiative (NGI)]; BioRange 141
Samenvatting
programma [als onderdeel van, in Nederland Bioinformatics Centre (NBIC)], en het NGI (als onderdeel van het Kluyver Centre for Genomics van Industriële Fermentatie).
References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15.
142
Siezen, R.J. et al. Phenotypic and genomic diversity of Lactobacillus plantarum strains isolated from various environmental niches. Environ Microbiol 12, 75873 (2010). Kleerebezem, M. et al. Complete genome sequence of Lactobacillus plantarum WCFS1. Proc Natl Acad Sci U S A 100, 1990-5 (2003). Siezen, R.J. & van Hylckama Vlieg, J.E. Genomic diversity and versatility of Lactobacillus plantarum, a natural metabolic engineer. Microb Cell Fact 10 Suppl 1, S3 (2011). Passerini, D. et al. Genes but not genomes reveal bacterial domestication of Lactococcus lactis. PLoS One 5, e15306 (2010). Rademaker, J.L. et al. Diversity analysis of dairy and nondairy Lactococcus lactis isolates, using a novel multilocus sequence analysis scheme and (GTG)5PCR fingerprinting. Appl Environ Microbiol 73, 7128-37 (2007). Deng, X., Phillippy, A.M., Li, Z., Salzberg, S.L. & Zhang, W. Probing the pan-genome of Listeria monocytogenes: new insights into intraspecific niche expansion and genomic diversification. BMC Genomics 11, 500 (2010). Lefebure, T. & Stanhope, M.J. Evolution of the core and pan-genome of Streptococcus: positive selection, recombination, and genome composition. Genome Biol 8, R71 (2007). Rasko, D.A. et al. The pangenome structure of Escherichia coli: comparative genomic analysis of E. coli commensal and pathogenic isolates. J Bacteriol 190, 6881-93 (2008). Schoen, C. et al. Whole-genome comparison of disease and carriage strains provides insights into virulence evolution in Neisseria meningitidis. Proc Natl Acad Sci U S A 105, 3473-8 (2008). Tettelin, H. et al. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial “pan-genome”. Proc Natl Acad Sci U S A 102, 13950-5 (2005). Willenbrock, H., Hallin, P.F., Wassenaar, T.M. & Ussery, D.W. Characterization of probiotic Escherichia coli isolates with a novel pan-genome microarray. Genome Biol 8, R267 (2007). Robison, K. Semiconductors charge into sequencing. Nat Biotechnol 29, 805-7 (2011). Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81, 559-75 (2007). Molzen, T.E. et al. Genome-wide identification of Streptococcus pneumoniae genes essential for bacterial replication during experimental meningitis. Infect Immun 79, 288-97 (2011). Chen, F., Mackey, A.J., Stoeckert, C.J., Jr. & Roos, D.S. OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups. Nucleic Acids Res 34, D363-8 (2006).
Publications
1. Bayjanov, J.R., Siezen, R.J. and van Hijum, S.A. (2010) PanCGHweb: a web tool for genotype calling in pangenome CGH data, Bioinformatics, 26, 1256-1257. 2. Bayjanov, J.R., Wels, M., Starrenburg, M., van Hylckama Vlieg, J.E., Siezen, R.J. and Molenaar, D. (2009) PanCGH: a genotype-calling algorithm for pangenome CGH data, Bioinformatics, 25, 309-314. 3. de Bok, F.A., Janssen, P.W., Bayjanov, J.R., Sieuwerts, S., Lommen, A., van Hylckama Vlieg, J.E. and Molenaar, D. (2011) Volatile compound fingerprinting of mixed-culture fermentations, Appl Environ Microbiol, 77, 6233-6239. 4. Liu, M., Bayjanov, J.R., Renckens, B., Nauta, A. and Siezen, R.J. (2010) The proteolytic system of lactic acid bacteria revisited: a genomic comparison, BMC Genomics, 11, 36. 5. Siezen, R.J., Bayjanov, J., Renckens, B., Wels, M., van Hijum, S.A., Molenaar, D. and van Hylckama Vlieg, J.E. (2010) Complete genome sequence of Lactococcus lactis subsp. lactis KF147, a plant-associated lactic acid bacterium, J Bacteriol, 192, 2649-2650. 6. Siezen, R.J., Bayjanov, J.R., Felis, G.E., van der Sijde, M.R., Starrenburg, M., Molenaar, D., Wels, M., van Hijum, S.A. and van Hylckama Vlieg, J.E. (2011) Genome-scale diversity and niche adaptation analysis of Lactococcus lactis by comparative genome hybridization using multistrain arrays, Microb Biotechnol, 4, 383-402. 7. J. R. Bayjanov, D. Molenaar, V. Tzeneva, R. J. Siezen, S. A. F. T. van Hijum: PhenoLink – a web-tool for linking phenotype to ~omics data for bacteria: application to gene-trait matching for Lactobacillus plantarum strains, Accepted for publication in BMC Genomics. 8. Jumamurat R. Bayjanov, Marjo J.C. Starrenburg, Marijke R. van der Sijde, Roland J. Siezen, Sacha A.F.T van Hijum: Genotype-phenotype matching analysis of 38 Lactococcus lactis strains using Random Forest methods, Submitted for publication. 9. Wouter G. Touw, Jumamurat R. Bayjanov, Lex Overmars, Lennart Backus, Jos Boekhorst, Michiel Wels, Sacha A. F. T. van Hijum: Data mining in the Life Sciences with Random Forest: a walk in the park or lost in the jungle?, Submitted for publication. 143
144
Acknowledgements / Dankwoord / Sagbolsunlar / Teşekkürler This project was supported by a BSIK grant, which is a Dutch subsidy based on revenues from natural gas. I am grateful to have benefitted from this grant. A PhD project is often perceived as moving through a tunnel and hoping to see a light at the other end of the tunnel, while not being (b)eaten by unseen creatures (aka referees). I could not have passed this tunnel without guidance and sheer tolerance from Roland, constructive teachings from Douwe and kind yet demanding guidance from Sacha. I thank them all for their support and, especially, for enduring many cycles of manuscript revision. I also want to thank Gert and Sacha wholeheartedly for their help to solve a very important personal issue. During my PhD project I have worked with many enthusiastic people at CMBI. I shared fruitful years of my life with my colleagues at the Bacterial Genomics group (bamics). I thank them all for joyful moments I experienced here. In particular, I thank all of my (ex-) room-mates Richard, Tom, Barzan, Tilman, Michel, Wilco and Joost. I also want to acknowledge non-bamicsers, especially, Philip, Bas, Martin and Radek for useful discussions. I am indebted to Barbara and Marly for their administrative support. Doktora yapmamda emeği geçen bütün herkese çok minnetdarım. Özellikle de hocalarım Ümit, Bilal, Muhittin, Murat ve Fethi beye çok teşekkür ederim. Üniversitede beraber çalıştığımız Süleyman, Hakan, Mehmet ve Alpaslan beye de teşekkür ederim. Bular bilen birlikde aga ýaly goldan we bile işleşen Begenç we Agaly aga köp sagbolsun aýtmak isleýärin. Şamyradyň, Welimuhammediň we Gözel gelnejäniň eden kömeklerini hem asla unutmaryn. Işde we işden daşarda maňa kömek beren beýleki dostlaryma ylaýtada Döwran Annanyýaza, Guwanç Kabula, Mekan Gara, Öwez aga, Agamyrat aga köp sagbolsun aýtmak isleýärin. I also want to thank Yasir, Munawwar, Özgür, Usman, Ghulam and Safdar for their support. Hollanda’da bana yardım eden çok insan oldu we özellikle de Şaban, Hüseyin, Abdulla, Mehmet ve Abbas abiye çok teşekkür ederim. I am grateful for the support of my mother, my late father, my wife, my brother and my two sisters. Her işiň arkasynda görünmeýän gahrymanlaryň bolşy ýaly bu işiň görünmenýän gahrymanlary hem enem, aradan aýrylan atam, maşgalam we doganlarymdyr. Olara bolan hormatym we sylagym birnäçe setire sygmajakdygy üçin men olara gysgaça sagboluň diýýärin. Uzak illere ogluny “peýdaly birzatlar” öwrensin diýip iberen, ýöne özi welin ýeke ýaşamaga hem razy bolan eneme bolsa aýdara kän söz ýoklugy üçin diňe “köp sagboluň” diýýärin. Meniň bilen ömrüni paýlaşmaga razy bolan we hem ýagyşly hem güneşli günlerde birlikde bolan we iki sany bahasyz baýlyga (oglum bilen gyzyma) terbiýe berýän gelnim Tawusa hem aýratyn sagbolsun aýtmak isleýärin. 145
146
Curriculum Vitae Jumamurat Bayjanov was born in Halach, Turkmenistan, on 12 July 1979. He received his Bachelor’s degree in Computer Engineering in 2001 from International Turkmen-Turkish University in Ashgabat, Turkmenistan. After working for a year as a teaching assistant at his alma mater, he started a Master’s program at Saarland University in Saarbrücken, Germany. In 2004 he obtained his Master of Science degree in Computer Science and in the same year he returned to the International Turkmen-Turkish University to work as a lecturer. At the end of 2006 he started his PhD project at the Bacterial Genomics group of the CMBI at the Radboud University Medical Centre in Nijmegen, the Netherlands. Main findings of this project are described in this thesis.
147