No industry is exempt from the market impact of big data disruption. The following is a list of 20 projects which we believe have the greatest potential to develop life science research. The choice was difficult but I think this list succeeds in bringing together a blend of early stage research plus applications being used in real world clinical care.
Mapping our Brains
Developing a fuller understanding of the brain’s circuitry represents a huge challenge, with an estimated 100 billion neurons in the brain and 1,000 trillion neuronal connections (10,000 per neuron) triggering thoughts and actions. McGill University (Montreal) have established a research-and-development centre that will serve as a test bed to adapt EMC Isilon scale-out NAS for visualization and data-storage technology to support the complex needs of leading-edge neuroscience. The new centre will be expanded in coming years to encompass research across the university, reflecting the increased role of Big Data in research.
“A single high-resolution dataset for a full brain now requires over 200 terabytes of disk space for raw data alone,” says Alan Evans, the McGill brain-imaging expert who worked with European scientists to produce the BigBrain map. “EMC’s storage and visualization technologies will help us take brain imaging to the next level – from the equivalent of old fashioned street maps to Google Earth.”
Read more HERE
How to store exponential growth in Big Data? By 2025, between 100 million and 2 billion human genomes are likely to be sequenced, representing a four-to-five orders of magnitude growth in 10 years, according to a study published in PLOS Biology journal. The data storage demands for this alone could run to as much as 2-40 exabytes. It even exceeds the 1 extrabyte per year projected for what will be the world’s largest astronomy project, the Square Kilometre Array, to be sited in South Africa and Australia.
The recent convergence of affordable DNA sequencing and the scalability of Twist Bioscience’s silicon-based DNA synthesis technique presents a new opportunity enabling the oldest lifeform, DNA, to become a viable data storage option. Using DNA as an archival technology avoids two key limitations of traditional digital storage media: limited lifespan and low data density. DNA data storage could last up to 2,000 years without deterioration according to a recent presentation at the American Chemical Society. In addition, a single gram of DNA can store almost a one trillion gigabytes (almost a zettabyte) of digital data.
Read more about Twist Bioscience and their recent work with Microsoft HERE
Open source information to guide personalized treatment decisions
There are several high profile collaborations engaged in open-source initiatives. One example is, as part of the White House Precision Medicine Initiative, the New York Genome Center and IBM are collaborating to create an open repository of genetic data to accelerate cancer research and, ultimately, to use insight from IBM Watson to inform personalized treatment decisions. The initial project will begin with 200 patients being treated at New York City-area hospitals, such as Memorial Sloan Kettering and Columbia-Presbyterian. Unlike smaller precision medicine projects that focus on 30 to 50 genes, this effort will sequence a whole genome (estimated to be 22,000 genes) in order to understand the DNA of patients tumours to better personalize their treatment program.
Enabling robust participant engagement for Precision Medicine Initiatives
A key topic in our November summit will be a discussion around what innovations can be developed where big data/technology and precision medicine/science intersect. A key program that is currently underway with this dynamic is the Precision Medicine Initiative (PMI) Cohort between Vanderbilt University and Verily (formerly Google Life Sciences). The first phase of the program is defined as a ‘participant-engaged, data-driven enterprise supporting research at the intersection of human biology, behaviour, genetics, environment, data science and computation, and much more to produce new knowledge with the goal of developing more effective ways to prolong health and treat disease.” The Vanderbilt-Verily partnership will help researchers in the PMI establish and test methods for enabling direct recruitment of participants, with 50,000 targeted by the end of 2016.
Building a knowledgebase to interpret patient profiles in large populations
Human Longevity Inc (HLI) is building a multidimensional genomic-phenotypic repository – HLI Knowledgebase and Health Nucleus platform - in order to interpret individual patient profile in relation to petabytes of medical knowledge from large populations. HLI has committed to sequence 1 million genomes by 2020 as well as integrating standardized, longitudinal phenotypic records. The Knowledgebase currently contains approximately 25,000 genomes and 11,000 integrated health records (IHR’s).
In addition to collecting data from the Health Nucleus clinical center in San Diego, California, HLI has entered a 10 year partnership with AstraZeneca (to sequence half a million samples collected in clinical trials), Genentech and Discovery Insurance in South Africa and the UK (part of Discovery’s Vitality Program offering a behavioural wellness solution to give people the tools, knowledge and incentives to improve their own health) so to broaden the depth of clinical relevant data in the system.
Dutch high-performance supercomputing and clinical insight
The volume of stored clinical data is growing at around 40% per year due to rapid advancements in diagnostic medical imaging and patient monitoring, chronic disease management as well as the adoption of new medical-grade IoT devices (eg. AliveCor Heart Monitor and AliveECG app for detecting atrial fibrillation) that enable patient empowerment and self-management. This clinical data is then supplemented by genomics data and digital pathology records. Big data research services are now being opened up for large-scale collaborative initiatives to identify actionable information in this data ecosystems.
Philips and SURFsara (the leading Dutch high-performance supercomputing and data infrastructure provider for education & academic research) have announced a new collaboration to connect the Philips HealthSuite cloud platform to the SURFsara National Research Infrastructure to provide new cloud-based research services in precision medicine and population health.
"Today, hospitals can already retrieve massive amounts data from multiple sources and various disciplines and through research obtain new clinical insights," said Jeroen Tas, CEO Connected Care and Health Informatics, Philips. "But even greater value lies in combining, normalizing and analyzing the current islands of data. Our integrated services aim to combine data on all levels and connect health systems, clinical expertise and research programs in a secure and compliant manner. Through networked healthcare research we want to facilitate collaboration on the next generation of breakthroughs in care delivery."
Phenoptics solutions to understand response to checkpoint inhibitors
In order to deploy effective, personalized cancer immunotherapies, clinicians and researchers require a lot of patient data. Scientists at the University of California and PerkinElmer have been collaborating over the last 4 years to launch the ‘Phenoptics solution’ which is designed to examine the density, location and proximity of a variety of immune cell types before and during treatment of melanomas with anti-PD-1 therapy, comparing responders with non-responders. The platform includes staining kits, imaging systems and analysis software and the quantitative pathology research system was seen as an important component in skin cancer research published online in the April 19 edition of the New England Journal of Medicine.
Big Data in systems medicine
To understand genomic changes in tumours and predict responses to pharmacological inventions, a lot of progress has been made in genomic databases (eg. The Cancer Genome Atlas, Oncomine and the University of California Santa Cruz Cancer Genome Browser allowing researchers to document genome-wide changes of various cancers. This information can allow us to discover resistance mechanisms and devise second-stage or combinatorial strategies in further treatment protocols).
However our understanding of protein expression patterns and metabolite composition can provide greater insight into end point phenotypes and allow us to employ interventions around disease characterizations. In September 2013, the National Cancer Institute launched the first public proteomic data of colorectal tumor samples previously analyzed by the TCGA - the first complementation of proteomic and genomic data on the same tumors. By coupling this data and understanding precisely which signalling pathways contribute to pathogenesis, we can focus research efforts into narrow questions. Furthermore we start to build our knowledge on how metabolites can not only serve as diagnostic signals but also as biomarkers that are linked with disease progression, response to therapies and new therapeutic targets.
Speaker highlight - Dr Chris Kinsinger, Program Manager, Office of Cancer Clinical Proteomics Research, NCI will be speaking on his work on proteogenomic integration of clinical proteomic tumour analysis consortium (CPTAC).
Deep learning application for drug development
While achieving state-of-the-art results and even surpassing human accuracy in many challenging tasks, the adoption of deep learning in biomedicine has been comparatively slow. However progress is being made by company’s such as Insilico Medicine (a big data analytics company located at the Emerging Technology Centers at the Johns Hopkins University Eastern campus in Baltimore)
The researchers in this organization demonstrated that a deep neural network platform could be trained on large transcriptional response data sets to classify various drugs into therapeutic categories solely based on their transcriptional profiles (Paper in Mol.Pharm. 2016 Jun 8). The team even used these techniques in human aging research to measure the effectiveness of therapeutic interventions. In the 2016 study, the deep learning system was able to identify 5 markers for predicting human chronological age (albumin, glucose, alkaline phosphatase, urea and erythrocytes) [study link].
3D cell explorer
Nanolive, a Swiss startup founded in 2013, has launched a 3D Cell Explorer microscope which allows researchers to view the inner workings of live cells without the need for stains of labels. It is a tomographic microscope which uses light refraction from different angles to measure all parts of a cell down to 200nm. A laser light illuminates the sample, rotating 360 degrees for a full scan. It is hoped that the 3D Cell Explorer will open up whole new fields of research where we can observe and intervene on biological processes at a cellular level without the addition of foreign chemicals.
AI in Pancreatic Cancer drug development efforts
Whilst dramatic progress has been made in the last 30 years in many cancer types the treatment of pancreatic cancer is still limited. An established 40,560 people will die from pancreatic cancer, with it being the twelfth most common cancer and having only a 7.2% of patients surviving five years after diagnosis (according to the NIH).
Berg believes that artificial intelligence can help these R&D efforts. In collaboration with the lab of James Moser (co-director of the Pancreas and Liver Institute & director of the Pancreatic Cancer Research Institute, Beth Israel Deaconess Medical Center) Berg is using its proprietary software processes with biological data to uncover unexpected connections between healthy and sick patients. The resulting insight will allow for a more informed hypothesis, which in turn should enable more efficient drug development.
Liquid biopsies, changing the clinic sample collection
Sequencing firm Illumnia has launched a spin-off company (in January 2016) called Grail, who’s focus is to develop a plasma-based genetic screen for the early detection of multiple cancers.
Liquid biopsies would allow for a quick, convenient and minimally painful procedure and allow clinicians to closely monitor how tumours are responding to therapies and forecast cancer recurrences. For example, in 2015, a study showed that breast cancer revealed signs of metastasis 3 years before it could be diagnosed with standard clinical tools. The genomic information circulating in the bloodstream could allow researchers to understand disease history better as well as also pointing to the cancer’s origin and potential spread. Presently, extensive testing is required before liquid biopsies can supersede surgical biopsies but hopefully one day this could change.
GenePool from Station X to power cancer research
In cases of pancreatic cancer prognosis is bleak. Only 18% of patients with advanced pancreatic cancer are alive after one year and 4% after five years. To push forward innovative research, the Center for Biomedical Informatics at Georgetown University has adopted Station X’s GenePool platform for use in genomic investigations related to pancreatic cancer.
“Genomics and bioinformatics are central to realizing the promise of precision medicine and there is a pressing need for technologies that facilitate collaboration and bridge from foundational research to translational applications in the clinic,GenePool will allow us to replace or augment a number of disparate tools with an integrated solution that can support basic research applications like molecular profiling and then, help us place those findings within the context of individual patient datasets to support translational research and clinical decision making.” Dr Subha Madhavan, Director of the ICBI.
EHR data dive boosts Mount Sinai enrolment in Bayer diabetic kidney disease trial
Researchers at Icahn School of Medicine at Mount Sinai have reported a jump in enrolment in a Phase III trial of a Bayer diabetic kidney disease drug, finerenone, after using natural language processing to shift through electronic health records (E HR) in search of eligible participants.
Previously after four months, and 50 hours spent, only one patient had been enrolled but when Mount Sinai used the CLiX Enrich, a tool designed to probe unstructured data that had shown promise when applied retrospectively, enrolment jumped to 97. Dr. Girish Nadkarni, the lead investigator for the CLiX Enrich evaluation, belived the tool enabled the Mount Sinai team to do in one week what would have taken four months if performed using earlier, more manual processes.
Analyzing the CLIMB study and finding markers for MS
Orion has collaborated with Thomson Reuters to build a pathway-driven model on demographic, genomic and clinical-outcomes data from the CLIMB longitudinal study (collaboration with Brigham and Women’s Hospital). The study identified 56 pathways and 26 subnetworks associated with time to relapse in MS. This included changes to Notch signalling, usually associated with neurogenesis and neural plasticity, being associated with faster time to relapse in MS. These results indicate that integration of prior pathway knowledge and patient gene expression data will lead to improved, more robust molecular signatures for patient stratification biomarkers.
To better understand the underlying biology of MS, Orion then went on to develop a collaboration with GNS Healthcare and build a data-driven model from the CLIMB Data. Using Bayesian inference to go beyond correlations and make casual inferences between data nodes, the Orion MS 1.0 model uncovered novel genetic drivers as well as new gene expression pathways that may contribute to disease outcomes. The model highlighted the involvement of sex-linked genes in disease progression, which is notable as MS disproportionately affects women.
Using data analytics software to industrialize cell therapy
In April 2015, GE Ventures struck a deal with Mayo Clinic to launch Vitruvian Networks, an independent technology company which will use advanced manufacturing and data analytics software to speed up cell therapy industrialization.
The goal is that Mayo Clinic will supply Vitruvian with data and expertise related to biomarkers and clinical outcomes to guide the development of personalized therapies, whilst GE Healthcare will provide the know-how on monitoring and optimizing, which GE software does for everything from jet engines to power plants. The company has already been announced as an inaugural partner of Sean Parker’s $250m Institute for Cancer Immunotherapy.
Astrazeneca and it’s 2 million genome project
AstraZeneca has launched an unprecedented effort to sequence 2 million genomes (and compile health records) from two million people over the next decade to unearth markers that are associated with disease and with responses to treatment. To gather together all this data, the company will be collaborating with the Wellcome Trust Sanger (UK) and Human Longevity (San Diego, US) as well as drawing from 500,000 participants at its own clinical trials and medical samples it has collected over the past 15 years. This is truly a ‘big data’ project with an estimated 5 petrabytes of data being generated. As Menelas Pangalos, Executive VP of the company’s innovative medicine programme, says this data, if stack as DVDs, would be four times higher than the Shard (London’s 310 metre skyscraper).
This is not the first time that a large drug company has put huge resources into genomics to fuel drug discovery but, as noted in a recent Nature article, David Goldstein, Director of the Institute for Genomic Medicine at Columbia University, believes the ‘field has turned a corner’. With advancements in genome sequencing making it cheaper and faster than ever, plus better bioinformatics tools and genome-editing methods such as CRISPR-Cas9, this time around researchers should have an easier time in determining what DNA changes affect living cells and why.
Medical imaging and analysis on the cloud
It is estimated that 90 percent of all medical data today is contained in images – X-Rays, CAT scans etc. This unstructured data is seen as the next forefront in automating medical diagnosis after efforts in analysing text-based records and this has led to some high profile acquisitions including IBM’s (Watson) takeover of Merge Healthcare for $1 billion in 2015 and Medidata buying Intelemage (a medical image sharing and workflow management company).
Now, Median Technologies has partnered up with software giant Microsoft to integrate big data into its oncology imaging systems, with the goal of delivering ‘Precision Medicine’ from the cloud. The company will install its Ibiopsy system, which is able to detect the phenotypes of various cancers through the extraction and measurement of imaging biomarkers, on the Azure cloud platform from Microsoft. This will add significant capabilities and allow efficient processing, analysis of medical images with improved early detection and monitoring of new targeted treatments. It is hoped that the solution will be first used in clinical routine by the end of 2017.
Microbiota meets big data
The development of high-throughput sequencing technologies has transformed our capacity to investigate the composition and dynamics of the microbial communities that populate diverse habitats. Differential coverage-based approaches for analysing metagenomic data has resulted in the en masse recovery of tens to hundreds of population genomes and allowed us to begin to extract biologically meaningful information from large datasets.
This is a huge challenge as a 2014 paper published by the American Academy of Microbiology showed, with the ratio of microbial to human cells probably in the region of 3:1 (1 being 37 trillion cells in a 70kg male). We know that the gut microbiota and the metabolities have an immense impact on host physiology by modulating the chemistry of the gut. Reduced diversity in the microbiota has been described in inflammatory bowel diseases such as Crohn’s and ulcerative colitis, and it is understood that these disturbances result in the dysregulation of adaptive immune cells underlying these disorders. Therefore understanding this scientific area has become a priority for many drug development groups (eg. the newly established Janssen Human Microbiome Institute and in January 2016 Seres Therapeutics and Nestle Health Sciences signed a $1.7billion drug development deal to focus on bowel diseases and C.difficile infections). Indeed, even the US government has launched a $121 million National Microbiome Initiative to attempt to map and investigate collection of microorganisms in the human body and other ecosystems.
Leading our understanding of the impact of the microbiome on human health is the Human Microbiome Project, a US NIH program, which in 2015 completed its first phase– with the characterization of the microbial compositions from a cohort of 300 healthy adult human subjects within five regions of the body, including nasal passages, the gastrointestinal tract and the urogenital tract. Additionally there are a number of other external repositories outside of this project, including the QIME and MG-RAST that are allowing researchers to get more insight in the workings of this system.
Precision Medicine to Infectious Disease
IDbyDNA is a precision medicine company focused on metagenomic approaches for infectious disease identification. This September (2016) the company announced the closure of a $9million Series A financing round to further develop its Taxonomer-based DNA search technologies and launch metagenomics-based clinical tests for infectious diseases. The goal is to help scientists and doctors detect any organism, in any sample and since the launch of Taxonomer.com in May 2016, the company has already registered users from more than 150 academic or government institutions and corporations.
"Although the human microbiome has long been known to influence human health and disease, with the advent of high-throughput sequencing we have only recently begun to understand and appreciate the depth of its involvement. Insights are coming together through the work of companies like IDbyDNA and analytic tools like their ultra-fast metagenomics search algorithm Taxonomer” said Stuart Peterson, co-founder and senior partner at ARTIS Ventures, who led this funding round for IDbyDNA.
IDbyDNA was founded in November 2014 by four industry veterans with track records that include pioneering genome analysis methods first used by the Human Genome Project. CEO, Dr Guochun Liao, was a consulting professor at Stanford University and former VP Bioinformatics at Centrillion, and former head of Computational Genomics at Roche Palo Alto. Dr Robert Schlaberg, is a medical director at ARUP Laboratories and assistant professor of Pathology at the University of Utah. Mark Yandell, PhD, is a professor of Human Genetics and Edna Benning Presidential Endowed Chair at University of Utah and co-director of the USTAR Center for Genetic Discovery. Martin Reese, PhD, is the founder, president and CSO of Omicia, and former founder and president of Neomorphic, one of the first Bioinformatics companies, which was sold to Affymetrix.