As the world continues to grapple with the evolving variants of the COVID-19 virus, our ability to identify mutations accurately and speedily to develop the vaccines will determine how long the pandemic will continue to disrupt our lives. Another valid argument is our behaviour and the precautions we take to safeguard ourselves from this virus.
by Sumir Bhatia
Genome sequencing has fueled the discovery of genes associated with diseases and identifying new variants tied to phenotypes. Researchers have created maps of gene variations related to health, disease, and drug response through comprehensive DNA exploration, and studies built critical databases and catalogued significant variances for future applications.
Historically, high costs meant that DNA studies were unattainable to many researchers, particularly those working on diseases, populations, and population diversities using large sample sets. Moreover, the process took time and the considerable might of high-performance computers. In 2003, a way to analyse genetic information was discovered, but it took 13 years to analyse a single sequence of genetic information.1 This discovery is said to have created a market worth US$3 million, and ever since then, scientists have been working on making this 13-year period much shorter. However, the advancements in technology have made it possible to vigorously create cost-effective, efficient systems and tools to accelerate medical research and diagnosis. In 2019, we saw the time to analyse genetic information reduced to 60-150 hours.
Therefore, high-performance computing (HPC) and next-generation genome sequencing (NGS) have become critical in ensuring researchers and medical professionals can make life-saving discoveries and decisions. Genomics and other omics analyses are now crucial in nearly all biomedical projects.
Utilising Genome Mapping, Surveillance, and Sequencing in the Development and Delivery of Vaccines
When COVID-19 first appeared, genomics became the first step in the highest-priority efforts to understand the virus and how it would affect us. Genomics helped fuel research efforts in vaccine design, diagnostic kit improvement, virulence assessment, identifying determinants of susceptibility, virus tracking, and identifying drug targets.
Scaling out population genomics productions in a timely fashion largely depends on the underlying HPC technologies and the acceleration they can offer. Faster processing would allow more data to be processed and analysed. So how could we deliver this?
We studied different permutations of hardware, software, and system factors affecting genomics workflows’ performance and after a period of thorough testing, we came up with a tool for deploying and scaling HPC for Genomics to analyse large volumes of omics data at record-breaking speeds - Genomics Optimization and Scalability Tool (GOAST).
GOAST optimises and scales genomics research by responding to both science and technology needs. It accelerates genomics analytics and increases sample throughput at much lower costs than solutions relying on expensive accelerators and proprietary software. An efficient, cost-saving, and accessible tool, scientists from large and small research organisations can understand data faster and make discoveries sooner.
Understanding the “Why” and “How” of It All
HPC and artificial intelligence (AI) are the driving forces behind the most promising advances from weather, to manufacturing, to basic research, and of course, to human health. Specifically, in Life Sciences, the scale and breadth of modern research would not be possible without the aid of HPC and Supercomputers. No longer are researchers analysing one gene in a few individuals. Instead, they are routinely analysing cohorts of people, for example, in clinical trials and examining the genomes of entire populations.
HPC systems that support high-throughput volumes accelerate speeds, optimise infrastructure, and protect data are necessary for enabling Life Science research and propelling life-saving discoveries. The sequencing of virus genomes and their variants needs to be done quickly to generate the right insights and create effective vaccines and this is only possible with HPC powering the backend.
AI is another critical component contributing to genomics and sequencing speed. AI reduces the time it takes to collect information and make decisions, enabling healthcare providers and researchers to accurately classify and diagnose a wide range of viruses and diseases, ultimately transforming patient care and outcomes. AI assists in making sense of an avalanche of genomics data and understanding the difference between one genome and another is extremely crucial.
Using a typical conventional GATK4 (Next-Generation Genome Analysis Toolkit), genome analysis takes about 150 hours, but GOAST significantly reduces that time to only 48 minutes. Many scientists and engineers know that a nearly 200-fold speed-up cannot be achieved quickly. GOAST has been instrumental in increasing productivity in the lab, enabling more researchers to analyse genomes, which has been crucial in the context of COVID-19.
The Benefits of Genome Sequencing and Its Relevance Today
GOAST has been deployed worldwide to accelerate research into basic science, infectious disease, and precision medicine, supporting individual labs right through to population-level efforts at scale and with affordability. Any organisation running sequential bioinformatics workflows and omics analytics will benefit greatly from the optimisations in the GOAST appliance and the HPC environments in which they operate.
Beyond COVID-19, the use of GOAST and other high-performance tools can help develop personalised treatments and accelerate the use of improved diagnostics and therapies to provide patients with more accessible and cost-effective treatment options. We leverage GOAST to help data centres worldwide accelerate their workflows and plan their HPC resources more effectively as they embark on ever-increasing workloads from cohort-level and population-level genomics projects.
Large national genomic initiatives can benefit from the affordability of new Next-Generation Sequencing methods and increased computing and storage capacities. Singapore’s GenomeAsia is an example of market-wide efforts to capture the population’s genetic variations, which can help accelerate the research needed to make precision medicine a realistic proposition. Lenovo’s commitment to developing and adopting cutting-edge technological innovation enables the worldwide movement to sequence entire populations and bring such initiatives closer to making Precision Medicine a reality.
Furthermore, harnessing the power of technology, genome sequencing can be used in other sectors outside of healthcare. For instance, in agriculture, genome sequencing can lead to better research in finding optimal farming techniques and solving issues, from irrigation and water shortage to crop disease. GOAST has proven to be an ally for such initiatives. For example, the Centre for Genetic Manipulation of Crop Plants (CGMCP) at the University of Delhi2 has been trying to improve the productivity of oilseed mustard and develop more climate-resistant crops.
Essentially, any organisation running sequential bioinformatics workflows and omics (genomics, transcriptomics, etc.) analytics will benefit significantly from the optimisations in the GOAST appliance. Due to falling sequencing and processing costs, omics work has become so widespread that practically all life sciences programmes, from fundamental research to agriculture to virology to precision health, rely on HPC to enable such complex analyses.
The Importance of Next-Generation Sequencing
In the late 2000s, the advent of Next-Generation Sequencing (NGS) technology resulted in a significant reduction in the cost of DNA sequencing. The development of NGS coupled with the advancements in High-Performance Computing (HPC) storage and computing technologies at the time produced the perfect storm for a deluge of genomics data.
Precision Medicine, which strives to provide tailored prevention, diagnosis, and treatment by using knowledge from a person’s distinct genomic and environmental backgrounds, was born due to this data explosion. Given the new affordability of NGS methods and the recent decade’s increased computing and storage capacities, we can now perform genomics at the population level.
The most significant challenges population genomics efforts face are scale and time. Population genomics requires scaling up input data from exomes (the portions of a genome that code information for protein synthesis) to whole genomes, scaling up production levels (from a handful to tens of thousands of samples), and having to do so under concise time frames.
Genome assembly (assembling the DNA “letters” into words), variant analysis (comparing how a gene is “spelt” in different persons), and downstream bioinformatics are three of the four processes in population sequencing that take place in the HPC environment of a cluster or supercomputer (for example measuring the effect of variations on function or disease). As a result, the ability to scale out population genomics outputs in a timely manner is heavily reliant on HPC technology and the underlying acceleration they can provide.
The possibilities of GOAST and the HPC environment to power cutting-edge research are endless. From developing vaccines quicker and taking genome sequencing to the next level, GOAST can help solve some of humanity’s greatest challenges. For example, empowering agricultural institutions to breed more nutritious, drought and disease-resistant, high-yield plants to solve global food shortages as well as deliver on the promise of precision health to maximise the value of healthcare.
Our commitment to developing and adopting cutting-edge technology enables the worldwide movement of sequencing ever larger samples, empowering scientists on the frontlines of COVID-19 research and beyond to accelerate the path to discovery. Whether for basic research, infectious diseases, or precision medicine, the tech revolution of accessible innovation is here. As we move forward post-pandemic, we believe new technologies like GOAST, and other high-performance tools will empower governments, researchers, and corporations to uncover and deliver new solutions to solve humanity’s greatest challenges.
- Mathew, A. A. (2020, November 17). [TechSparks 2020] Lenovo’s Mileidy Giraldo reveals the future of Life Sciences: High performance computing and ai. HerStory. Retrieved from https://yourstory.com/herstory/2020/11/techsparks-2020-mileidy-giraldo-genomics-future-life-sciences/amp
- University of Delhi Department of Genetics. (2021). Powering groundbreaking plant genomics research. Lenovo. Retrieved from https://www.lenovo.com/us/en/resources/data-center-solutions/case-studies/university-of-delhi-department-of-genetics/
About the Author
Sumir Bhatia, President, Asia Pacific, Lenovo Infrastructure Solutions Group