A ribbon model of molecules of protein p53 binding to a strand of DNA.

A ribbon model of molecules of protein p53 binding to a strand of DNA.





Obtain many circulating biomarkers broadly, or a few biomarkers deeply?

Can you have it all?

At the recent Society for Neuroscience, Abcam sponsored a workshop entitled "Biomarkers of Neuroinflammation in Parkinson’s Disease" where Dr. Nicole Polinski from The Michael J. Fox Foundation for Parkinson’s Research spoke about their ongoing efforts to find a biomarker for this debilitating neurodegenerative disorder, as well as their work with Abcam's new FirePlex 70-plex microRNA profiling and antigen detection platform. (Abcam has a series of detection antibodies specifically for Parkinson's Research.)

What was notable about this work was the hypothesis-driven nature of the research; markers such as LRKK2 and dozens of other proteins were examined across samples from the Foundation's extensive biobank collection. Sensitivity down to 5 pg/mL of serum was clearly demonstrated, and Abcam's technical teams worldwide showed very good non-cross-reactivity data leveraging the FirePlex technology.

Still after two years of strong collaborative work, the preliminary data didn't have strong significance; the P-values were in the p<0.05 range for the strongest markers. This work did suggest further directions, however.

The long-lasting search for circulating biomarkers

Whether antigens in serum, circulating tumor DNA in plasma, microRNA in exosomal vesicles, research into readily-accessible biomarkers in the bloodstream is an active area of research and translational biology. The burgeoning growth of the diagnostics market and especially companion diagnostics are currently being driven by DNA-based diagnostic tests. (For a comprehensive review, Lin et al is a good place to start.)

The promise of next-generation sequencing technology is currently being realized in its usefulness in genomic characterization of cancer samples to guide targeted therapy, specifically the trio of FDA approvals for Thermo Fisher Scientific's Oncomine test followed a few months later by Memorial Sloan Kettering's MSK-IMPACT test, and just a few weeks after that Foundation Medicine's FoundationOne received approval.

Another active area of emerging development and approval is with analysis of circulating tumor DNA. In the summer of 2016, Roche received approval for the first test of this type, the cobas EGFR Mutation Test v2 for detecting gene mutations by real-time PCR for NSCLC. Other companies, specifically Foundation Medicine and Guardant Health, are working on getting FDA approval for their NGS-based tests for therapy guidance using circulating tumor DNA markers.

Going narrow versus going broad

The Roche cobas EGFR test brings up an important point: real-time PCR (as well as digital PCR, which has not received regulatory approval yet for any DNA-based test or companion diagnostic) is a very accurate technology, but is limited in the number of markers that can be assayed. With limited amount of input material (the typical yield of circulating free DNA from 2 mL of whole blood or 1 mL of plasma is only 10 nanograms) there are only a limited number of markers that can be interrogated with PCR-based methods, even with clever multiplexing.

This tug-of-war of narrow versus broad is reminiscent of the Whole Genome Sequencing (WGS) versus targeted sequencing (including Whole Exome Sequencing) debate. While there is strong evidence WGS will yield better exome variant quality than using a WES approach, nonetheless the cost differential (roughly a six-fold difference in overall cost due to the much larger amount of sequencing required for WGS even with additional expense for exome enrichment) becomes prohibitive for using WGS for exome data.

These tradeoffs illustrate the continuum of choices that have to be faced: are the set of biomarkers interrogated comprehensive enough to become a validated and approved FDA test? Can the set of biomarkers be sensitive and specific enough to show not only clinical validity but also clinical utility after many years of experimental then routine testing of thousands (or even tens of thousands) of patient samples?

In an ideal world you would have a broad set of markers interrogated deeply. However the practical considerations raised above, namely limited sample and limited resources, mean difficult tradeoff choices have to be made.

The challenge of hypothesis-free

In the early 2000's after the Human Genome Project finished there were efforts to jump-start studies into the Human Proteome. With the advent of whole-genome gene expression arrays several companies launched protein arrays as well as novel proteomic enrichment such as SELDI-TOF Mass Spectrometry and examining antibody-ligand kinetics with Surface Plasmon Resonance (SPR). Of course, tandem liquid-chromatography mass spectrometry and other proteomic technologies are still in use today. These broad approaches face the same difficulty as illustrated above: a limiting amount of sample, measuring hundreds or thousands of analytes face the same trade-off.

Experiments are devised for hypothesis-free investigation: measure as much as you can, analyze it with fast computers and novel algorithms, and a model appears as if by magic in the data. Except for the fact that this never happens; science just does not work in this fashion. Science advances with a question, it starts with an idea, it starts with a hypothesis, which is then tested, and the hypothesis refined.

We took a look earlier at some of the fallacies of using big data in the early detection of cancer. Even with the ease of collecting data in this data-intensive age, does not mean that the signal magically becomes stronger.

Lightweight sequencing of circulating tumor methylated DNA (ctmDNA)

By measuring clusters of 'CpG islands' (see this 2011 reference for a handy review) Singlera has developed an elegant approach to obtaining both deep coverage of useful biomarkers along with a complex set of informative biomarkers that can stand the rigor of assaying thousands of patient samples. With approximately 70% of all gene promoters associated with a CpG island, the association of CpG methylation with transcription initiation is clear. This approach, targeted methylation haplotype sequencing, is novel. And why this approach has not been used before is clear: keeping methylation haplotype information is too difficult.

A common genomewide approach to examining CpG methylation is Reduced Representation Bisulfite Sequencing (or RRBS) first described and published in 2005. While not a complete whole-genome bisulfite sequencing approach, it was a useful reduction in amount of sequencing needed as well as a data-rich subset for further study of methylation, and used in The Cancer Genome Atlas project. Nonetheless with TCGA data, one of the first steps in analysis is to remove the methylation haplotype data and preserve only the average methylation status across that tissue sample for that position.

Singlera technology examines single molecule methylation haplotypes, preserving the individual CpG methylation status across that molecule, like a digital barcode. Thus, instead of getting an analog number from 0% to 100% for CpG status at a specific site, Singlera technology determines a binary readout for that same site along with many other adjacent sites. Thus that single-strand digital barcode is information dense, and information rich, since Singlera can associate that single molecule along with hundreds or thousands of others to a cancer gene expression network or gene expression pathway.

Using hundreds or thousands of these individual digital barcodes, interrogating 10's of thousands of individual CpG loci and maintaining their adjacency information as a group, Singlera can efficiently target this rich dataset without an immense sequencing data generation overhead. Singlera's approach is lightweight, using commonly-available next-generation sequencing equipment.

Click here to contact us if you'd like to investigate further, or would like to look into collaboration opportunities.