Penelusuran database bioinformatika Lutfan Lazuardi
Spektrum dan hirarki dari informatika kesehatan Public Health Informatics Populations Clinical Informatics Individual Patients
Imaging Informatics Tissues, Organs Bioinformatics Molecular Cellular Diadaptasi dari Shortliffe Adapted from Shortliffe
Apa itu bioinformatika? • Disiplin ilmu yang meliputi bagaimana mendapatkan, memproses, menyimpan, mendistribusikan dan menganalisis informasi biologi dan kedokteran • Dalam arti luas:
– Setiap penelitian yang berhubungan dengan proses biologi dengan mempergunakan komputer
• Dalam arti sempit:
– Analisis berbasis komputer terhadap data sekuens dari struktur makromolekul
B I O L O G Y P H Y S I O M I C S
Bioinformatika menjembatani banyak disiplin ilmu C E L L O M I C S
B I O T E C H E V O L U T I O N I N F O T E C H Bioinformatika menggabungkan ilmu Biologi, O N T O L O G Y
Kedokteran, Kimia, Matematika, Statistik dan P R O T E O M I C S
ilmu Komputer untuk memahami proses biologis
M O L E C U L A R M O D E L I N G
dari kehidupan
M A T H E M A T I C S M E T A B O L O M I C S
T R A N S C R I P T O M I C S G E N O M I C S
S T A T I S T I C S Kolaskar, 2003
Experimental Computation Information technology
Mathematical & physical models
Hardware & instrumentation Methodology & expertise Sequence Physiology (and beyond)
DNA sequence Gene & Genome Molecular evolution
Genome sequencing
Protein Structure, Proteomics Folding, Function & Interaction Metabolic Pathways Regulation Functional genomics Signaling (microarrays, Networks 2D-PAGE, etc) Physiology & Cell Biology High-tech Interspecies Field ecology Interaction Ecology & Environment
Genomic data analysis
Statistical genetics
Protein structure prediction, dynamics, folding & design Data standards, representations, & analytical tools for Complex biological data
Dynamical Systems modeling
Computational ecology
Kaitan antara ilmu biologi dengan teknology diadaptasi dari Gibas, 2003
Penelitian biologi abad 21 “ The new paradigm, now emerging is that all the 'genes' will be known (in the sense of being resident in databases available electronically), and that the starting point of a biological investigation will be theoretical.” - Walter Gilbert
BIOINFORMATICS
The Pyramid of Life Metabolomics 1400 Chemicals
Proteomics 10,000 Proteins
Genomics 30,000 Genes Wishart (2004)
Entrez: Neighbors and Hard Links Word weight PubMed abstracts
Phylogeny
33-D -D Structure Structure
Taxonomy
VAST
Genomes
BLAST
Nucleotide sequences
Protein sequences
BLAST
Source NCBI
Data sekuens
Diagnosis
Pencarian Obat Baru
Hypothesis-Driven Research
Pengembangan Vaksin
penelitian tradisional
NGS, Microarray
• Satu gen setiap eksperimen • butuh waktu panjang • Melelahkan • hasil terbatas • In vitro, in vivo, ex vivo
• fast tracking • ribuan gen setiap kali eksperimen • fungsi dari gen, baik sendiri atau interaksi dengan yang lainnya • In silico (in algorithmo)
Contoh basisdata • Nucleotide Database (GenBank) – BLAST (Basic Local Alignment Search Tool)
• • • • •
Protein Sequence Database Protein Structure Database (PDB) Genome Database Microarray Database Metabolic Pathway and Protein Function Database
Contoh tipe data
Nucleotide/protein sequence
Gene expression level
GenBank • • • •
Basis data sekuens Koleksi anotasi sekuens DNA 171.744.486 sekuens (April 2014) Data sekuens didapatkan dari submisi langsung dari para ilmuwan/author • Basis data genbank didesain untuk menyediakan informasi sekuens yang paling up to date untuk komunitas ilmuwan
Sumber data GenBank • Submisi langsung dari individu peneliti melalui form (BankIt, Sequin) • Submisi melalui Batch email (EST, GSS, STS) • Melalui akun FTP (File Transfer Protocol) • Data dari tiga kolaborasi basis data: – GenBank – DNA Database of Japan (DDBJ). – European Molecular Biology Laboratory Database (EMBL)
Basis data primer vs. sekunder • Primary Databases – Original submissions by experimentalists – Database staff organize but don’t add additional information • Example: GenBank
• Derivative Databases (Secondary) – Human curated • compilation and correction of data • Example: SWISS-PROT, NCBI RefSeq mRNA
– Computationally Derived • Example: UniGene Chattopadhyay, 2007
Format file • Genbank Flatfile (GBFF) – Header – Features – Sequence
• FASTA format – Deskripsi dimulai dengan tanda> – Diikuti dengan data sekuens – Berupa protein atau DNA
Contoh analisis Kaohsiung J Med Sci. 2008 Feb;24(2):55-62. doi: 10.1016/S1607551X(08)70098-6.
Phylogenetic study of dengue-3 virus in Taiwan with sequence analysis of the core gene. Tung YC1, Lin KH, Chang K, Ke LY, Ke GM, Lu PL, Lin CY, Chen YH, Chiang HC.
URL: http://www.sciencedirect.com/science/article/pii/S160 7551X08700986
• Analisis kemiripan (BLAST) • Desain primer
• Komparasi sekuen • Multiple alignment • Phylogenetic analysis
Phylogenetic analysis
• high-density oligonucleotide human genome array GeneChips U133 Plus 2.0 (Affymetrix) • This chip comprises more than 54.000 probe sets and analyzes the expression level of over 47.000 transcripts and variants including 38.500 well-characterized human genes Sumber: affymetrix
Microarray assay life cycle Biological question
Data analysis
Microarray detection
Sample preparation
Microarray hybridization
Proses data Microarray Microarray chips
Images scanned by laser
-70 144 33
707
Datasets
New sample Prediction:
Gene Value D26528_at 193 D26561_cds1_at D26561_cds2_at D26561_cds3_at D26579_at 318 D26598_at 1764 D26599_at 1537 D26600_at 1204 D28114_at
Data Mining and analysis
Sumber: Yuki Juan (2003)
Class Sno D26528 D63874 D63880 … ALL 2 193 4157 556 ALL 3 129 11557 476 ALL 4 44 12125 498 ALL 5 218 8484 1211 AML 51 109 3537 131 AML 52 106 4578 94 AML 53 211 2431 209 …
Preprocessing
Contoh skema data analisis microarray Microarray Raw data Normalization Gene expression profiles Filtering steps
High level analysis
Statistical test; T-test/ANOVA (Analysis of Variance) validation Cluster Analysis PTM (Pavlidis template matching)
Genes of interest
Biological Process/Function/Pathway
Contoh analisis Biogerontology. 2009 Apr;10(2):191-202. doi: 10.1007/s10522-008-9167-1. Epub 2008 Aug 27.
Microarray analysis reveals similarity between CD8+CD28- T cells from young and elderly persons, but not of CD8+CD28+ T cells. Lazuardi L1, Herndler-Brandstetter D, Brunner S, Laschober GT, Lepperdinger G, Grubeck-Loebenstein B.
URL: http://link.springer.com/article/10.1007%2Fs10 522-008-9167-1
contoh hierarchical cluster analysis 10
Cluster gen Genes clusters (1-21) 1 3 2 4 5
cluster 13
6 7
1 9
2 0 2 1
Y1_28P & Y2_28P : CD8+CD28+ T cells from young persons O1_28P & O2_28P : CD8+CD28+ T cells from elderly persons Y1_28N & Y2_28N : CD8+CD28– T cells from young persons O1_28N & O2_28N : CD8+CD28– T cells from elderly persons
Expression level
02_28N
01_28N
Y2_28N
Y1_28N
02_28P
1 17 8
Linkage distance
01_28P
1 6
B
Y2_28P
8 9 1 0 1 1 1 21 1 13 4 5
Y1_28P
Y2_28N
01_28N
02_28P
Y1_28N
01_28P
Y1_28P
7
02_28N
0 Y2_28P
A
Classification, function and pathway analysis (pantherdb.org)
Links • • • • • • •
Genbank https://www.ncbi.nlm.nih.gov/genbank/ Protein database http://www.wwpdb.org/ http://www.rcsb.org/pdb/home/home.do KEGG Pathway database http://www.genome.jp/kegg/genes.html
Terima kasih
[email protected]