Breakthrough technology: Gene sequencing enters the era of personal customizatio

The omics technologies that developed at the beginning of the millennium played an important role when the COVID-19 pandemic emerged in 2020: glycomics analysis of the glycosylation of the spike protein of the novel coronavirus, metabolomics analysis of the differences between mild and severe patients, and comparative omics to outline the action map of the novel coronavirus proteins, etc. So why are these omics analyses relatively rare in normal times? What technology does the nucleic acid testing we come into contact with in our daily lives belong to?

For example, metabolomics is often used in complex biological research, with a long detection and analysis cycle, and is not suitable for rapid determination of individual nucleic acid negative/positive. Moreover, metabolomics generally requires the extraction of human plasma or serum, which is not a friendly operation for the vast number of compatriots who need to be tested multiple times when returning home for the New Year. Therefore, nucleic acid testing is generally performed by throat swab or nasal swab, which can collect genetic material with a gentle scrape. Subsequently, through DNA amplification and sequencing technology, the negative/positive can be determined. When did this "flying into the homes of ordinary people" DNA sequencing technology begin to develop and be widely applied?

Advertisement

Microfluidic technology reduces the cost of DNA sequencing.

DNA sequencing technology is an important technical approach for studying genomics. The research content of genomics is to characterize and study all the genes of an organism, and to analyze and compare the relationships between different genomes. At the end of the 20th century, genome projects for multiple species were launched one after another.In 1977, the sequencing of the bacteriophage (5.4 kbit) was completed.

 

In 1981, the sequencing of the mitochondrial genome in human cells (16.6 kbit) was completed.

 

In 1992, the sequencing of chromosome III of Saccharomyces cerevisiae (yeast) (315 kbit) was completed.

 

In 1995, the sequencing of the Haemophilus influenzae genome (1.8 Mbit) was completed.

 

In 1996, the complete sequencing of the Saccharomyces cerevisiae genome (12.1 Mbit) was completed.Nothing is more well-known to the general public than the Human Genome Project (also known as the "moon landing" of life sciences). In 2003, the secrets of the 3 billion base pairs (3000 Mbit) of human genomic DNA were fully revealed. Since then, the era of genomics has begun, and DNA sequencing technology has developed rapidly.

Because DNA is genetic material, its structure and composition are relatively simple and very stable compared to other biological macromolecules such as proteins, polysaccharides, and lipids. Therefore, compared to other omics technologies, DNA sequencing technology has developed particularly rapidly. Representative of this are Personal Genomics, the $100 Genome, and Cancer Genomics. These three technologies were also selected as "Top 10 Breakthrough Technologies" by the MIT Technology Review in 2004, 2009, and 2011, respectively.

After the Human Genome Project revealed the genetic information of 3 billion base pairs, people began to pay attention to the individual differences brought by genetic information. Why do we have similar genetic information, yet there are huge differences in hair color, skin color, height, and a series of other traits? The chief scientist of Perlegen Sciences, David Cox, quickly proposed to find the differences between personal genomes and develop personal genome sequencing into a fast and effective means of detecting diseases. This technical method avoids doctors making judgments in the vast 3 billion pieces of information, but instead tailors them to each individual, based on the results of each person's genome sequencing, to determine whether the individual is susceptible to a certain disease and to take preventive measures in advance.

Of course, some diseases correspond to one or two gene mutations, and such diseases (such as Huntington's disease) are relatively easy to diagnose. However, there are also many diseases that are related to multiple genes and are not easy to judge. At this time, the correlation between the results of personal genome sequencing and the onset of diseases is not close enough, so Cox and others also hope to collect a large amount of personal data, analyze and process to obtain single nucleotide polymorphisms (SNPs) to establish a closer relationship with diseases. Similar to "inspection, auscultation and olfaction, inquiry, and palpation," diagnose individuals through the "symptoms" of genes. This is also the first step towards precision medicine.

In order to complete the Human Genome Project, the world invested 3 billion US dollars. Where is the market for "private customization" of personal genome sequencing? With the continuous advancement of sequencing instruments, especially the progress of microfluidics technology (this technology was selected as one of the "Top 10 Breakthrough Technologies" by the MIT Technology Review in 2001), the cost of genome sequencing has been greatly reduced. In 2009, most scientists conservatively believed that genome sequencing could be reduced to 1,000 US dollars, but they also set a high goal to reduce the cost of genome sequencing to 100 US dollars within 5 years, that is, the $100 Genome.How does microfluidic biochips achieve cost reduction? Since the genomic DNA has 3 billion "letters" of information to be read, assuming we imagine a person's genomic DNA as a long rope, people usually choose to cut this long rope into segments, such as 100 to 1000 letters. After sequencing, the segments are then pieced back together into a long rope. The entire process is very costly and time-consuming. Microfluidic biochips also need to cut the long rope, but each segment can have 1 million letters, which increases the sequencing speed and reduces the cost. Of course, in addition to microfluidic biochip technology, scientists have adopted many methods (first, second, third, and fourth generation sequencing technologies) to optimize the sequencing process.

On March 29, 2012, Life Technologies, an American life technology company, announced the launch of the gene sequencer IonProton in China. With the help of this technology product, personal whole-genome sequencing can be completed in just one day for $1,000. In January 2014, Illumina, a global gene sequencing and chip technology company, launched the HiSeq X Ten instrument, reducing the cost of sequencing a single human genome to below $1,000.

Single-cell sequencing realizes the characterization of cellular diversity.

As the speed of personal genome sequencing becomes faster and the cost becomes lower, researchers have proposed a new challenge: single-cell sequencing. Traditional sequencing generally requires the extraction of DNA or RNA from tens of thousands of cells for sequencing, which often leads to biases in understanding the diversity of human cells. Single-cell sequencing, similar to the "single-cell analysis" technology mentioned earlier, is sequencing and analyzing the genome at the single-cell level, realizing the characterization of cellular diversity.In 2009, the first single-cell mRNA sequencing study was published, marking the rise of single-cell sequencing technology. In 2018, high-throughput single-cell sequencing technology was named the Breakthrough Technology of the Year by Science and Nature Methods magazines. The market for single-cell sequencing technology has also been growing at an average annual rate of over 20%.

Why does traditional sequencing require tens of thousands of cells, while single-cell sequencing only needs a single cell? Simply put, traditional sequencing is like the "pool testing" in today's nucleic acid testing, where tens of thousands of cells will exhibit certain characteristics, such as gene mutations, but it is impossible to determine which or what kind of cells contribute. Single-cell sequencing is similar to "individual testing" in nucleic acid testing, which is clear at a glance. At the same time, it is also obvious that single-cell sequencing faces the challenge of reducing sequencing time, saving sequencing costs, and improving sequencing accuracy.

In 2016, 10x Genomics successfully launched the world's first single-cell sequencer, the Chromium Controller, and has since dominated the global market, maintaining a leading market share.

How does 10x Genomics achieve better, faster, and more accurate single-cell sequencing? The company relies on droplet microfluidics technology, using gel beads (GEMs) encapsulated in oil droplets containing 750,000 barcodes, which can automatically complete the capture of 80,000 cells within 10 minutes. The "oil-in-water" structure achieves the encapsulation of gel beads and cells, cell lysis, and the dissolution of gel beads to release barcode sequences targeting specific genomic sequences. Overall, microfluidics technology achieves the separation of single cells, while droplet technology achieves the establishment of sequencing libraries for the same gene in different cells.

The development of science and technology has no best, only better. In recent years, the fertile land of China has also nurtured a group of ambitious people, whose participation has brought new vitality to the single-cell sequencing track.Established in 2018, Singleron has raised nearly 200 million US dollars in financing to date, focusing on applying single-cell sequencing technology in the fields of clinical testing, health management, and drug development. In 2020, Singleron launched the first automated single-cell sequencing pre-treatment instrument in China, the Singleron Matrix. For different samples, Singleron has achieved a complete solution for sample preparation before sequencing, which is convenient for the storage and transportation of clinical samples.

In June 2021, the domestic company 10K Genomics launched the first droplet microfluidic single-cell sequencing instrument in China, Perseus™. The company's founder, Shi Weiyang, stated that the core equipment of the instrument has been localized. Compared with the common transcriptome analysis in the market, 10K Genomics is also expanding the market and developing single-cell methylation, single-cell proteomics, etc., to create a multi-omics single-cell sequencing platform.

The domestic company M20 Genomics has recently considered using single-cell sequencing technology to promote the mapping of the Cell Atlas 2.0, build a blueprint of human cells, establish an ultra-precise model of human physiology, and provide an accelerator for drug research and experiments.

In April 2021, researchers from 39 Chinese research institutes and hospitals cooperated with the 10x Genomics China team to use single-cell RNA sequencing technology to process 1.46 million cells from 284 samples, conducting an in-depth analysis of the immune response after the infection of the new coronavirus, and looking for new culprits of cytokine storms. The related work was published in the journal "Cell". The potential of large-scale single-cell mapping research is evident. More research and applications are waiting for single-cell sequencing technology to show its strength.Sequencing Technology Drives Development in Multiple Fields

Globally, the development of high-throughput, low-cost, single-cell, and ultra-sensitive sequencing technologies is advancing at an astonishing pace, leaving people dazzled.

At the same time, the rapid progress in sequencing technology has also driven the academic community to conduct more research. For example, people are considering using genetic material DNA as a data storage material. In the era of big data, the amount of information generated on Earth every day far exceeds the total amount of information in human civilization over the past 5,000 years. It is predicted that by 2040, the world will need 1×10⁶t of silicon-based chips to store data. DNA storage has high density, low energy consumption, and a long cycle, and theoretically, only 1kg is needed to store global information.

The process of information storage mainly involves storage, encoding, and reading. DNA sequencing technology corresponds to the reading of data. However, there are still several issues with DNA at the storage level, such as high synthesis costs, slow speed, and poor stability. But people still look forward to the future of DNA as a data storage material. Scientists involved in "human genome sequencing" may not have thought that one day, genome sequencing would also bring opportunities for the rise of DNA storage.

In addition, one of the "Top 10 Breakthrough Technologies" in the 2011 MIT Technology Review, cancer genomics, has also benefited from the development of sequencing technology. This technology revolves around the sequencing of cancer gene maps to achieve cancer prediction or treatment. Elaine Mardis, a leader in this field, believes that the "gene-mutation-treatment" equation is too simple. Cancer sequencing not only needs to consider various gene mutations but also RNA sequencing beyond DNA sequencing. In addition, incorporating real data in clinical trials and promoting the sharing of patient gene data need to be continuously advanced to achieve this goal.In 2013, the journal "Science" featured multiple articles on "Cancer Genomics," proposing comprehensive development in areas such as sequencing analysis, principle summarization, disease detection, and clinical impact. In August 2020, a Spanish research team identified 568 cancer driver genes from the genomes of 28,076 tumor samples from 66 types of cancer, creating the most complete panorama of cancer driver genes to date.

The development of cancer genomics indicates that in the near future, routine cancer treatment will no longer focus on organs but will pay more attention to the genomic characteristics of cancer. It is hoped that in the future, medical experts will be able to determine the best treatment plan for each patient by reading DNA information during the treatment process.

In addition to cancer, research on life extension has also benefited from the development of sequencing technology. Currently, many genes have been proven to have a wide-ranging impact on the aging and lifespan of various model animals. This list of genes is still growing. Can the intervention of aging genes achieve life extension? To quote Albert Einstein: "I never think of the future—it comes too quickly."

DNA sequencing technology has also brought about a broad market prospect. The global gene sequencing industry market size is continuously expanding, with Illumina's market value alone reaching $57.196 billion (data from Baidu Stock Market).

Analyzing only the Chinese sequencing market size, from 2015 to 2019, the domestic gene sequencing industry market size grew at a compound annual growth rate of 40%, with a market size of about 14.9 billion yuan in 2019. In the future, the Asia-Pacific region, led by China, will become one of the important markets for genetic testing. Looking at specific diagnostic and treatment directions, tumor diagnosis and treatment are also one of the most potential markets.The massive scale of the gene sequencing market has already manifested in all aspects of daily life.

Preconception and prenatal examinations, such as amniocentesis, Down's screening, and non-invasive prenatal genetic testing, all use sequencing technology in combination with the genetic patterns of diseases to help couples conceive healthy babies and avoid birth defects. Additionally, the more scientific "blood test for kinship" - paternity testing - is realized through DNA sequencing. Genetic screening and targeted tumor therapy in hospitals are also application directions of DNA sequencing. Many pieces of evidence left in criminal cases can also be located by extracting DNA sequencing to identify the suspect.

Even DNA sequencing has guiding significance for everyone's daily life habits. For example, due to individual genetic differences, everyone's alcohol metabolism and coffee metabolism are different. Through DNA sequencing, many life habits such as drinking coffee can be customized, and everyone can find a lifestyle that suits them better.

The nucleic acid testing that people come into contact with is an important manifestation of DNA sequencing. By extracting samples through nasal or throat swabs, if the sample contains the virus's genes, it means "positive," and otherwise, it is "negative." Such convenient and accurate nucleic acid testing is a byproduct of the development of a series of sequencing technologies. "Technology changes life," thanks to the scientists who proposed great ideals more than 20 years ago. It is their years of hard work that have made today's life convenient and fast.

In addition to the judgment of positive and negative, DNA sequencing in nucleic acid testing has more applications. For example, why has the COVID-19 pandemic for more than two years been caused by some Alpha strains, some Delta or Omicron strains? DNA sequencing technology has helped people track the new coronavirus. In 2022, the tracking of new coronavirus mutations (COVID Variant Tracking) was selected as one of the "Top Ten Breakthrough Technologies in the World" by MIT Technology Review.So far, the novel coronavirus has become the most sequenced organism on Earth, with the global accumulation of positive samples of the virus genome sequence reaching more than 7 million genetic maps, which has also enabled scientists to quickly discover a variety of viral mutations.

 

Scientists at Stellenbosch University, including Tulio Oliveira, announced to the world on November 25, 2021, that as early as six weeks before, Oliveira's team had a vague "feeling" about the highest degree of mutation at the end of November, the Omicron variant. They quickly carried out a large amount of DNA sequencing for new cases, and quickly determined the Omicron mutation.

 

Thanks to the rapid identification of the mutation, a series of researches around Omicron were quickly launched after November, such as the infectivity, toxicity of the strain, and the sensitivity to existing vaccines. At the same time, a series of policies such as travel bans and vaccination principles were also quickly promulgated. On the road to epidemic prevention, people have gradually shifted from "passive epidemic prevention" to "real-time follow-up."

 

Scientists do not stop, but hope to "take the initiative to attack", predict the next step of the virus from a large amount of data, and prevent new threats from emerging.

 

Although DNA sequencing is now very "affordable", for global samples, there will still be a high cost. Therefore, Oliveira proposed to give priority to the samples of areas with a sudden increase in infection rates, analyze them in a targeted manner, and call it "non-linear sequencing". At present, the sequencing technology, sequencing equipment, and sequencing level around the world are uneven, which also brings challenges to non-linear sequencing. For example, Peru's sequencing technology is niche, and the sequencing discovery of the Lambda variant was too late, leading to the Lambda epidemic in Peru being unstoppable.The prediction of viruses is also reflected in influenza viruses. Every year, scientists select the strains of influenza that are likely to dominate in the future and design vaccines based on this. However, the accuracy of current predictions is relatively low.

Sequencing can tell us how viruses have mutated in the past. Currently, it is not possible to accurately determine how viruses will evolve in the future. However, at least real-time tracking has provided the world with an "early warning," preventing a variant from becoming a "spark" that turns into a "raging fire."