Tumors or circulating tumor cells release circulating tumor DNA (ctDNA) into the blood when undergoing apoptosis or necrosis. Approximately 0.1% to 10% of cell-free DNA originates from cancer cells. Tumors or circulating tumor cells release circulating tumor DNA (ctDNA) into the blood when undergoing apoptosis or necrosis. ctDNA markers are typically single-nucleotide variants (SNVs), insertion-deletion mutations (indels) and copy number alterations (CNAs), however the measurement of these molecules are hampered by factors such as molecule loss during library construction, PCR artifacts, and sequencing errors.
The limitation of DNA markers
By limiting the target of the measurement to DNA mutations from cancerous cells, and being able to only reliably look at a tiny fraction (0.1% is one mutant in a thousand wild-type copies), the task of early detection using only DNA markers is formidable. With approximately 140 genes that 'drive' tumorigenesis there are several targeted gene sequencing methods that various groups are using to look at specific driver genes mutated in cancer.
Phenotype of cancer cells driven by gene expression
At its most basic level, regardless of how a normal cell becomes a cancer cell, these cancer cells have traits they share in common, and these traits will be characterized by genes being expressed. While gene expression has been studied exhaustively for its connection to cancer (and both RNA microarrays and RNA-Seq methods are still in active use today in cancer research worldwide), relatively less attention has been paid to methylation, which play a fundamental role in which genes are expressed to begin with.
The challenge of studying DNA methylation
One main reason DNA methylation has not been studied as extensively as RNA expression or DNA mutations is due to its method of detection. The primary method of detection is bisulfite conversion of unlabeled Cytosine residues to Uracil, while 5'-methyl Cytosines remain protected and remain as Cytosines. While sequencing the converted Uracils (which were unlabled Cytosines before conversion), these Uracils are read as Thymidines. Thus if a region of DNA is completely unlabled with a methyl group, all the Cytosines will be read as Thymidines, and the canonical four DNA codes of G's, T's, C's and A's becomes only three (G's, T's and A's).
One consequence of this reduced complexity genome when analyzing sequencing data (from a four-base code to a three-base one) is the complexity of bioinformatic analysis. However, a worse consequence to utilize bisulfite treatment is loss of material due to bisufite-induced DNA damage.
The value of methylation haplotypes
Singlera researchers have used their extensive expertise in single-cell methylation analysis (see our Publications here) to apply these techniques to examine methylation haplotypes in cell-free DNA. Historically methylation is studied via microarrays or real-time PCR, where a particular methylated CpG site is interrogated independently of any adjacent site. Through the use of retaining methylation status across a given strand of DNA being sequenced (the CpG methylation occurs in CpG-rich regions called 'CpG islands' and 'CpG shores'), this adjacency information yields a much more powerful methylation signature than single CpG methylation status measured as separate entities.
Another difference between the study of DNA mutation and the study of DNA methylation is it's readout: the methylation percentage ranges from completely unmethylated or 0%, to completely methylated or 100%.
A useful analogy
You can think of it as a large jar of beads with one of 100 colors, compared to the same jar of beads where the 100 colors are arranged in certain patterns on short pieces of string. It is much easier to discern a pattern looking at the short stretches, compared to looking at individual beads, if any pattern can be discerned at all.
Carrying forward this illustration with DNA mutation analysis, instead of 100 colors of beads (representing 0% to 100% methylation), now you only have 7 colors (four representing Single Nucleotide Variants, two representing insertions or deletions, and one representing a copy number alteration), and instead of the large jar you now have a cup. Looking for patterns of carcinogenesis from many fewer variables (the 7 beads instead of 100) as well as breadth of the search space (the small cup of ~140 genes instead of a large jar of the CpG islands and shores across the genome) you can start to understand the power of this approach.
Singlera Genomics' PanSeer assay technology
The PanSeer assay (for research use only) measures distinct tumor-specific methylation patterns in adjacent methylation sites. The technology interrogates over 20,000 methylation marker patterns, and noise is also reduced by employing a highly efficient targeted sequencing library construction method.
The assay itself is simple to run and flexible, taking 1 day to process purified DNA to sequencer-ready library. Compatible with the two leading next-generation sequencing platforms on the market (systems from Illumina such as the MiSeq or NextSeq as well as from Thermo Fisher Scientific including the Ion Torrent S5) any laboratory familiar with NGS library construction can quickly implement this method.
The bioinformatics is cloud-based at present, and with a highly efficient targeted sequencing approach both the data footprint and compute overhead are kept to a minimum.
The Taizhou Longitudinal Study
This longitudinal study collected samples and monitored the health of ~120,000 healthy individuals over the course of 2008-2018 in Taizhou China. Plasma samples were collected at the start of the study and participant health monitored regularly. (You can access the 2009 publication in BMC Public Health titled "Rationales, design and recruitment of the Taizhou Longitudinal Study" here.)
828 samples were randomly selected, and a subset was used to train an algorithm for the PanSeer assay. The remaining samples were used to test the results.
Detecting cancer four years before conventional diagnosis
PanSeer can detect cancer up to four years before conventional diagnosis with 95% sensitivity and 96% specificity using ctmDNA (see figure below).