DAMDID 2017 Conference

Plenary Sessions

Keynotes

Data-Driven Genomic Computing: Making Sense of the Signals from the Genome

Stefano Ceri
Politecnico di Milano
Italy

Brief Bio

Stefano Ceri is professor of Database Systems at the Dipartimento di Elettronica, Informazione e Bioingegneria (DEIB) of Politecnico di Milano; he was visiting professor at the Computer Science Department of Stanford University (1983-1990). His research work covers four decades (1976-2016) and has been generally concerned with extending database technologies in order to incorporate new features: distribution, object-orientation, rules, streaming data; with the advent of the Web, his research has been targeted towards the engineering of Web-based applications and to search systems. More recently he turned to crowd searching, to social media analytics, and to genomic computing.

He is the recipient of two ERC Advanced Grants: "Search Computing (SeCo)" (2008-2013), focused upon the rank-aware integration of search engines in order to support multi-domain queries and “Data-Centered Genomic Comouting (GeCo)” (2016-2021), focused upon new abstractions for querying and integrating genomic datasets. He is the recipient of the ACM-SIGMOD "Edward T. Codd Innovation Award" (New York, June 26, 2013), an ACM Fellow and a member of Academia Europaea.

Abstract

Genomic computing is a new science focused on understanding the functioning of the genome, as a premise to fundamental discoveries in biology and medicine. Next Generation Sequencing (NGS) allows the production of the entire human genome sequence at a cost of about 1000 US $; many algorithms exist for the extraction of genome features, or "signals", including peaks (enriched regions), mutations, or gene expression (intensity of transcription activity). The missing gap is a system supporting data integration and exploration, giving a “biological meaning” to all the available information; such a system can be used, e.g., for better understanding cancer or how environment influences cancer development.

The GeCo Project (Data-Driven Genomic Computing, ERC Advanced Grant, 2016-2021) has the objective or revisiting genomic computing through the lens of basic data management, through models, languages, and instruments, focusing on genomic data integration. Starting from an abstract model, we developed a system that can be used to query processed data produced by several large Genomic Consortia, including Encode and TCGA; the system employs internally the Spark engine, and prototypes can already be accessed from Cineca or from PoliMi servers. During the five-years of the ERC project, the system will be enriched with data analysis tools and environments and will be made increasingly efficient. Among the objectives of the project, the creation of an “open source” repository of public data, available to biological and clinical research through queries, web services and search interfaces.

The astronomical data deluge: problems and solutions

Giuseppe Longo
University of Napoli Federico II

Brief Bio

Giuseppe Longo is professor of astrophysics at the University of Napoli Federico II, associate to the California Institute of Technology (Center for Data Driven Discovery) and to the Italian National Institutes of Astrophysics and of Nuclear Physics. He is also a member of the oldest scientific academy in the world (Accademia Pontaniana).

He is among the early pioneers of astroinformatics and has co-authored more than 200 scientific papers and many books and proceedings. He has been chairperson of the Interest group on Knowledge Discovery in Databases of the International Virtual Observatory Alliance, co-proposer of the International Astronomical Union Working Group on Astrostatistics and Astroinformatics, and of the IEEE task force “astrominers”.

In collaboration with S.G. Djorgovski at Caltech, he established and co-chairs the yearly international workshops on Astroinformatics. Currently he is national delegate in two European Cost Action aimed at the exploitation of Big data and co-PI of the SUNDIAL Horizon 2020 Marie Curie Training Network on Astroinformatics. He also leads the machine learning efforts on photometric redshifts within the Euclid collaboration.

Abstract

Since almost a decade astronomy has entered the "big data era" with digital whole sky surveys producing hundreds of terabytes of complex data every day. In the coming years, new instruments such as the Large Synoptic Survey Telescope (LSST) or the Square Kilometer Array (SKA) will further increase this trend and are expected to produce unprecedented data streams: SKA alone will produce close to 1.5 PBytes of raw data per day. These data will need to be stored, distributed, reduced and visualized. In the recent past, the various Virtual observatory national initiatives have addressed and largely solved the problems connected with the federation and access of existing archives but much remains to be done in order to cope with these new challenges, especially for what concerns the implementation of proper data mining and data visualization methods.

Astronomical data are in fact highly heterogeneous (textual, images, tabular, etc.), complex (hundreds if not thousands of often correlated parameters measured for billions of objects) and, furthermore, in most cases, they need to be understood in almost real time in order to allow for the needed photometric and spectroscopic follow-up's. This is the case, for instance, of the LSST which will continuously scan a large fraction of the sky producing thousands of transient events every night. There is no doubt that the capability to extract the complex information present in these data sets will lead to the discovery of new, fundamental science, and will define the path of astrophysics and cosmology in the current century.

After a detailed overview of the problems posed by these new projects, the talk will address several specific topics using as scientific test bed the evaluation of photometric redshifts: a quantity which is crucial in order to scientifically exploit all current and future cosmological surveys. Also in this part of the talk specific attention will be devoted to the issue of the development of proper machine learning based methods for data understanding and data visualization. This issue is emphasized because it is still in an early stage of implementation.

In particular, the talk will focus on the optimal feature selection, on the evaluation of errors with machine learning methods and on how the extraction of features needs to be revised in order to match the needs of the new age of Data Driven Discovery.

The EU's Human Brain Project (HBP) Flagship – Accelerating brain science discovery and collaboration

Katrin Amunts
Katrin Amunts, Prof., Dr. med., full professor for Brain Research and director of the C. and O. Vogt Institute for Brain Research at the Heinrich-Heine University Duesseldorf, director of the Institute of Neuroscience and Medicine (INM-1), Research Centre Juelich
Germany

Brief Bio

Katrin Amunts did postdoctoral work in the C. & O. Vogt Institute for Brain Research at Duesseldorf University, Germany. In 1999, she moved to the Research Centre Juelich and set up a new research unit for Brain Mapping. In 2004, she became professor for Structural-Functional Brain Mapping at RWTH Aachen University, and in 2008 a full professor at the Department of Psychiatry, Psychotherapy and Psychosomatics at the RWTH Aachen University as well as director of the Institute of Neuroscience and Medicine (INM-1) at the Research Centre Juelich. In 2013, she became a full professor for Brain Research at the Heinrich-Heine University Duesseldorf, director of the C. and O. Vogt Institute for Brain Research, Heinrich-Heine University Duesseldorf and director of the Institute of Neuroscience and Medicine (INM-1), Research Centre Juelich.

Since 2007 Katrin Amunts is a member of the editorial board of Brain Structure and Function. She is member of the German Ethics Council since 2012, and since 2016 vice chair of the German Ethics Council. Katrin Amunts is the programme speaker for the programme Decoding the Human Brain of the Helmholtz Association, Germany as well as speaker of the Helmholtz Portfolio Theme Supercomputing and Modeling for the Human Brain (SMHB). Since 2013 Katrin Amunts is leading the Subproject 2 Human Brain Organization of the European Flagship Project, The Human Brain Project (HBP). She has been recently elected as the Chair of the Scientific and Infrastructure Board (SIB) of the HBP. In order to better understand the organizational principles of the human brain, she and her team aim to develop a multi-level and multi-scale brain atlas, and use methods of high-performance computing to generate ultra-high resolution human brain models.

Abstract

The human brain has a multi-level organisation and high complexity. New approaches are necessary to decode the brain with its 86 billion nerve cells, which form complex networks. E.g., 3D Polarized Light Imaging elucidates the connectional architecture of the brain at the level of axons, while keeping the topography of the whole organ; it results in data sets of several petabytes per brain, which should be actively accessible while minimizing their transport. Thus, ultra-high resolution models pose massive challenges in terms of data processing, visualisation and analysis. High-resolution data obtained in post-mortem brains supplement information about structural and functional connectivity as obtained by Magnetic Resonance imaging, which can be acquired from the living human brain, but with lower spatial resolution. By bringing together different aspects of connectivity in one and the same atlas, it is feasible to combine the advantages of the different approaches, and to address the different spatial (and temporal) scales.

The Human Brain Project is developing a multimodal human brain atlas, which integrates the information of brain organisation from multiple levels including cellular architecture, connectivity, molecular and genetic maps, and results from neuroimaging studies of the brain from healthy subjects and patients. Empirical research is supplemented by modelling and simulation, resulting in new data that is stimulating new experiments.

The Human Brain Project creates a cutting-edge European infrastructure to enable cloud-based collaboration among researchers coming from different disciplines around the world, to create development platforms with databases, workflow systems, petabyte storage, and supercomputers opening new perspective to decode the human brain.

Invited Talks

Development of genomic based diagnostics in various application domains

Zoltan Szallasi
Harvard Medical School

Brief Bio

Please, find the Bio of Dr. Zoltan Szallasi here: http://damdid2017.frccsc.ru/en/tutorials.html

Abstract

Genomics has not only become a major field in big data science but it is also posing unique challenges due to the fact that the cost of next generation sequencing has significantly outpaced Moore’s law over the past decade. Low cost sequencing can be applied in virtually any biological, biochemical setting and it has opened up a vast array of industrial and research applications. This talk will review the multitude of opportunities offered by next generation sequencing in a wide array of genomic applications that attracted industrial involvement. These include cancer diagnostics, cancer immunology, liquid biopsies in disease care and perinatal diagnostics, metagenomics etc.

While next generation sequencing based genomics easily qualifies as one of the main areas of big data science, in many aspects it is also markedly different from those. While in, e.g., financial data the individual variables are connected by poorly understood causative factors and in physics the entire data space is regulated by well defined, homogenous laws of physics, in biology, genomics the situation lies somewhere in between. Variables, such as genes, proteins etc. are connected by the principles of physical chemistry, but the actual parameters of those significantly vary across the various pairs of biological entities. This places the analysis of biological systems in the realm of robust, complex systems for which the analytical principles are poorly understood. Therefore, in order to effectively analyze the massive amounts of genomic information one needs to “front-load” the computational analysis with as much biological knowledge as possible.

We will present several strategies along those lines. In particular, we will discuss how genomics, next generation sequencing based whole genome analysis helps us to understand DNA repair pathway aberrations, and their diagnostic and therapeutic implications in cancer. We will also discuss how genomics is exploited to understand the main principles of therapeutic immune responses against cancer and how genomics, machine learning and high throughput screening are combined in an interdisciplinary environment to design effective vaccines against cancer.

Important dates

Conference

Paper submission	18.06.2017
Tutorial application submission	04.06.2017
Notification of acceptance	20.07.2017
Camera-ready papers	15.08.2017

PhD Workshop

Paper submission	23.06.2017
Notification of acceptance	20.07.2017
Camera-ready papers	15.08.2017

Satellite Events

Satellite Event submission	16.04.2017
Notification of acceptance	30.04.2017

Participant Profile

Attention!

Plenary Sessions

Keynotes

Data-Driven Genomic Computing: Making Sense of the Signals from the Genome

The astronomical data deluge: problems and solutions

The EU's Human Brain Project (HBP) Flagship – Accelerating brain science discovery and collaboration

Invited Talks

Development of genomic based diagnostics in various application domains

Important dates