Bioinformatics research projects Jaak Vilo vilo@egeen.ee http://www.egeen.ee/u/vilo/ Director of Informatics, EGeen Associate Professor, Department of Computer Science, University of Tartu Scientist, Estonian Biocentre Consultant, ArrayExpress, European Bioinformatics Institute Gene regulation and genetic networks bioinformatics Gene regulation by transcriptional control is largely based on DNA sequence specific control elements like transcription factor binding sites, signals for enhancers, silencers, DNA structural elements, etc., and combinatorial aspects of those. Our groups aim is to capture those signals in a systematic manner, store information in a single database, and facilitate users to access and analyse that data. Our goal is to reverse-engineer gene regulatory networks. Alternative Splicing data analysis Alternative splicing is a mechanism for a cell for a fine-scale control about which forms of proteins are produced (which exons will be included in the final product) from each gene at each stage of development, external conditions, or in various tissue types. Our groups aim is to study these mechanisms of fine-scale control based on the analysis of DNA specific signals that carry information for such control. Gene expression data analysis Microarray gene expression studies are providing a rich wealth of novel data for large-scale bioinformatics analysis. The aim is to develop fast data analysis methods for serving the needs of large groups of researchers using public databases like ArrayExpress. The research builds upon the initial development of the Expression Profiler analysis tools (part of ArrayExpress infrastructure). Functional genomics and data integration studies Various functional genomics data from large scale experiments like different protein-protein interaction methods, phenotypic data from systematic gene knockouts, genome-wide binding localisation studies (ChIP on chip), etc., although never perfectly accurate, provide a wealth of new information that has to be put into context of other data sources. The key is the integration of those data sources in order to facilitate data analysis that can help us to gain better hypothesis on function and design new experiments. GPCR receptor bioinformatics Based on our previous study for G-protein coupled receptor - GPCR coupling specificity predictions our aim is to increase our ability to predict GPCR-mediated signalling pathways and mechanims for signal transduction. Data mining methods development for bioinformatics Many above mentioned areas of research will need the development of underlying computational data analysis and visualization methods. The group will focus on developing those computational methods and help the bioinformaticians to utilize best computational methods in their research. Pattern discovery and recognition in sequences; sequence algorithms Biological studies to discover motifs in DNA, RNA, or protein sequences in large extent rely on basic reserarch of algorithmics of sequence based methods or combinatorial pattern matching. The task is to develop new pattern discovery and pattern matching algorithms and tools that can be used for large scale bioinformatics studies. One of those tools is SPEXS, an algorithm developed by Jaak Vilo and used for the analysis of DNA and protein sequences. Medical and clinical data handling and storage, population and statistical genetics, pharmacogenetics. Of utmost importance for any biomedical research whose aim is to study human health and genetics, is the proper capture of the clinical data. The aim is to develop data management solutions for health, clinical, lifestyle, environmental data and the analysis of those data. These data together with drug consumption and treatement effects information allow to develop predictive methods for drug efficacy studies.