The process of data verification in scientific research is often long and complex; add to that the escalated increase in data over the recent years, and the process becomes a very difficult one. Philip Morris International and IBM Research have collaborated to develop the sbv IMPROVER project, to help scientists be able to more quickly and effectively verify and analyze data generated. The sbv IMPROVER project is set up as a series of challenges that is presented to the scientific community. Scientists submit methods and solutions which are then open for public discourse. The public discourse allows the community to come to an agreement on the best possible solution. The project is currently on its third stage. Manuel Peitsch, the VP of Biological Systems Research at Philip Morris International, Research & Development tells Rachel Lim more about the project and its impact on research.
Q: What is the sbv IMPROVER project?
A: sbv IMPROVER stands for systems biology verification: Industrial Methodology for PROcess VErification in Research. It is a collaborative effort by Philip Morris International (PMI) R&D and IBM Research, designed to develop a more transparent and robust process for assessing complex scientific data. Using a crowd-sourcing methodology, the project is designed as a series of challenges whereby scientists from around the world actively contribute to the development of an innovative method for the verification of scientific data and concepts in systems biology research.1,2 The project is funded by PMI.
The methodology being explored in sbv IMPROVER has potential applications in many industries –x biotechnology, pharmaceuticals, nutrition and environmental safety to name a few. In theory it could be applied to any area that requires a more meaningful analysis of scientific data. There are no restrictions on where and how the methodology can be used and the impact of sbv IMPROVER is thus far reaching and highly significant.
For example, PMI are drawing on genomics, proteomics, metabolomics, high-resolution imaging, observational clinical studies and mathematical modeling to gain a mechanistic understanding of the causal relationship between cigarette smoking and smoking-related disease. Verification of data derived from systems biology is planned to form part of the assessment of an innovative range of products, currently in development, which may have the potential to reduce the risk of smoking related diseases.
Q: What are the factors that led to the development of this project?
A: In recent years, technological advances and a broadening of scientific understanding has led to an explosive growth of scientific data and publications. Indeed, it has been estimated that in the past decade the growth rate of scientific publications was 5.6% per year, or equivalently, a doubling time of 13 years3. As such, the traditional peer-review system, one of the most important mechanisms for quality control of scientific papers, is being put under increased strain. Peer-reviewers get little reward for their efforts and are facing an ever increasing workload. Furthermore, it is questionable whether the peer-review system can objectively assess the quality of high-throughput data and the validity of the sophisticated analyses and interpretations that nowadays pervade many scientific fields.
sbv IMPROVER is designed to complement the peer-review process and addresses many of these challenges, enabling more rigorous evaluation of large, complex 'big data' sets and enabling independent verification of the conclusions reached by peer-review.
Q: What stage is the project currently at?
A: The sbv IMPROVER project has been running for more than two years and the organizers are currently accepting submissions for the Network Verification Challenge, in which participants are asked to fine-tune and verify sophisticated models of biological networks related to human lung disease.4 As well as generating models that represent the current status of scientific knowledge in this area, the challenge also provides a framework by which other biological networks can be visualized, expanded and verified.
Participants are asked to enhance and/or verify the models using a sophisticated online platform, with networks being encoded in Biological Expression Language (BEL), a human-readable and machine-computable language that captures causal and correlative relationships between biological entities. The process is set-up as a collaborative competition where points are awarded for various actions which contribute to the improvement of the models.
The contributions made by the participants will then be carefully reviewed by the sbv IMPROVER team. The most controversial edges (i.e., those that did not obtain a consensus from the community in the online phase) will be selected for further review and discussion at an international networking 'jamboree' session, planned for 18-20 March 2014 in Montreux, Switzerland. This will allow for face-to-face discussion of controversial edges and a simultaneous review of the scientific evidence with recognized experts. The objective of the jamboree is to reach a consensus among the experts on how best to formulate the network models.
Scientists who take part in the Network Verification Challenge benefit from enhanced recognition amongst their peers, early access to curated network models of signaling pathways, downloadable networks for participants who perform a certain number of actions (which are likely to help scientists generate new hypotheses for their own research) and early expertise in BEL. This is increasingly being adopted as a biological syntax conducive to computational manipulation.
Q: Can you tell us about the previous challenge?
A: The previous challenge was the Species Translation Challenge, set-up to define which biological events observed in rodents can be translated to humans. The biomedical field has generally worked under the assumption that biological processes in rodents can correspond to biological processes in humans under analogous conditions, yet few studies have sought to address the precise limitations of this approach.
The Species Translation Challenge was a great success, both in terms of the number and quality of submissions (from a total of 28 teams, comprising 51 scientists from 14 countries across Asia, Australia, Europe and North America) and also in terms of scientific learnings. The results, recently presented at the sbv IMPROVER Symposium 2013 in Athens, Greece, demonstrate that rodent models can be used to generate predictions of biological processes in humans to a greater extent than may be expected based on species similarity alone.
The Species Translation Challenge was divided into four sub-challenges:
- The Intra-Species Protein Phosphorylation Prediction explored
the extent to which protein modifications can be inferred from
gene expression changes induced by different stimuli within one
given species, in this case, rat. A statistically significant degree
of translatability was attained.
- The Inter-Species Protein Phosphorylation Prediction asked
whether protein modifications induced by stimuli in humans can
be inferred from those observed in rat models following exposure
to the same stimuli. The level of translatability was specific
to the stimulus or phosphoprotein, with a remarkable prediction
- The Inter-Species Pathway Perturbation Prediction asked
whether responsive gene sets and biological processes in humans
can be predicted based on the corresponding transcriptional
data in rats. Whereas some stimuli were predicted with
reasonable statistical significance, others were not. In general,
the predictability at the signaling level was better than at the
- The Species Specific Network Inference asked whether
signaling pathway networks are similarly activated in human
and rat cells when exposed to the same stimuli. A considerable
number of network edges related to signal transduction processes
were consistently predicted among participants here.
Three best-performing teams were recognized in the first sub-challenge. Team AMG involved scientists from University of California Santa Barbara (USA), University of Groningen (Netherlands) and Rutgers University (USA). The other best-performers in this sub-Challenge were Clemson University (USA) and Wayne State University (USA). Team AMG was also best-performer in sub-challenges two and three.
Five teams were recognized as joint best-performers in sub-challenge four: one from Max Planck Institute for Dynamics of Complex Technical Systems (Germany), a Swiss team from University of Lausanne and Institute of Bioinformatics, one from Pacific Northwest National Laboratory (USA), and two separate teams from University of Pittsburgh (USA).
Detailed results and a series of peer-reviewed papers are planned to be published in due course. It is also planned that the data-set used in the challenge will be made available to the scientific community for ongoing work. Discussions are taking place regarding the timing and best approach to achieve this.
Q: And what about the first sbv IMPROVER Challenge?
A: The first challenge, launched in March 2012, was the Diagnostic Signature Challenge, which evaluated the fundamental biological question of whether there is sufficient information in gene expression data to diagnose diseases. Members of the global scientific community were invited to identify diagnostic signatures in four specific disease areas: chronic obstructive pulmonary disease (COPD), lung cancer, psoriasis and multiple sclerosis (MS).
The challenge demonstrated that the optimal methodological approach and predictive power appears largely dependent on endpoint. Though no one approach was superior across all four diseases, there were certain methods, like linear discriminate analysis, that were found to perform consistently well in all. The results of the Diagnostic Signature Challenge, together with a number of research papers from the best performing teams, have been published in a special issue of the peer-reviewed journal Systems Biomedicine.5
Q: What is next for sbv IMPROVER?
A: The current focus is very much on the Network Verification Challenge. We are actively engaging the scientific community and providing a platform that can bring together many valuable insights into the network models. By the time the submission period closes in late February 2014, we plan to have captured the current wisdom for each network. This will be followed by lively debate on the more controversial aspects of these models at the jamboree session in Montreux, Switzerland.
We are also opening a perennial aspect to sbv IMPROVER, whereby anyone who is interested can retrospectively take part in challenges which have closed (starting with the Diagnostic Signature Challenge). Not only does this give interested parties the opportunity to see how they would have fared in the challenges, it also ensures that new evidence, insight and collaboration can continuously contribute to the community's understanding of the scientific problems being explored in sbv IMPROVER.
Q: Finally, how can scientists take part in the Network Verification Challenge?
A: Anyone interested in taking part should register online at https://www.sbvimprover.com. The Network Verification Challenge is open to scientists from commercial entities as well as academic and research institutions. We welcome submissions from scientists at any stage of their careers. Submissions will be accepted through to February 2014.
About the Interviewee
Manuel Peitsch is the VP of Biological Systems Research of Philip Morris International, Research & Development.
Before joining PMI, Manuel Peitsch worked in the pharmaceutical industry for over fifteen years, following seven years in academia. His work has mainly been in the areas of computational life sciences (incl. bioinformatics) and experimental biology (incl. genomics and proteomics) in drug discovery.
He holds several patents related to proteomics, genomics and computer science and has published over 120 articles, booked titles and technical notes (cited over 10000 times). Manuel has done pioneering work in the area of molecular modeling, cell biology and computational text analytics.
Manuel was a founder of several initiatives, including two start-up companies and the Swiss Institute of Bioinformatics. He has served as a member of the Swiss National Research Council, is the Chairman of the Executive Board of the Swiss Institute of Bioinformatics and an active scientific advisor to several academic and commercial entities. Manuel is a Computer World Honors Laureate and a recipient of several awards including the New England Business and Technology Award and the United Devices Grid Visionary Award. Manuel holds a BASc in Life Sciences, a MASc in Physical Chemistry and a PhD in Biochemistry; he is also a Professor for Bioinformatics at the University of Basel.