Five Phases of Biomarker Development for Early Detection of Cancer

Where are the retrospective studies?

Collecting samples

About two decades ago was an exciting time in biology, and optimism reached a peak with the first draft of the Human Genome Project announced in 2000 (“A revolution in preventing, diagnosing, treating, and curing disease” the press release notes). <> At that time, given increases in genomic technology (for example the wide use of gene expression microarrays and the increasing sensitivity of proteomic analysis from human serum), the National Cancer Institute established the Early Detection Research Network (EDRN) that same year.

In 2001 researchers from the University of Washington and the Fred Hutchison Cancer Center published a paper entitled “Phases of Biomarker Development for Early Detection of Cancer”, in which they laid out the following formal structure ‘that a biomarker needs to pass through to produce a useful population-screening tool’.

The five phases of biomarker development

The five phases are as follows:

Phase 1: Pre-Clinical Exploratory (Promising directions identified)

Phase 2: Clinical Assay and Validation (Clinical assay detects established disease)

Phase 3: Retrospective Longitudinal (Biomarker detects disease early before it becomes clinical and a “screen positive” rule is identified)

Phase 4: Prospective Screening (Extent and characteristics of disease detected by the test and the false referral rate are identified)

Phase 5: Cancer Control (Impact of screening on reducing the burden of disease on the population is quantified)

Many cancer methylation studies for early detection are pre-clinical and exploratory in nature

There have been a few notable papers analyzing cell-free methylated DNA using a variety of methods. A group from the University of Toronto published in November 2018 a paper entitled “Sensitive tumor detection and classification using plasma cell-free DNA methylomes” (Shen and De Carvalho et al. Nature 2018) using a cell-free methylated DNA immunoprecipitation approach they call cfMeDIP-Seq.

Examining 388 cell-free DNA samples from both early- and late-stage cancer patients of several different tumor types, as well as 24 early-stage pancreatic cancer cell-free DNA samples, they have a Receiver-Operator Characteristic (ROC) Area Underneath Curve (AUC) values of 0.97 in detecting lung cancer, 0.92 for pancreatic cancer, and 0.96 for healthy controls.

This study uses patient samples of cell-free DNA after conventional diagnosis. While a promising approach, using limited amount of input material (they cite robust performance with only 1-10 ng cfDNA), this paper can be classified as a “Phase 1: Pre-Clinical Exploratory” investigation.

Another paper came out a few weeks later from a group in Israel, titled “Comprehensive human cell-type methylation atlas reveals origins of circulating cell-free DNA in health and disease” (Moss and Dor et al. Nature Comm. 2018) Using a microarray to both generate a reference atlas from 25 tissue types, and then used to interrogate cell-free DNA samples from healthy and affected individuals, they are able to demonstrate tissue or cell-type of origin using methylation patterns.

Applying their method to metastatic cancer (11 samples of metastatic colon, metastatic lung or metastatic breast cancer), they were able to trace the original (known) tissue of origin in 3/4 of the colon, 2/4 of the lung, and 3/3 of the breast cancer cases respectively, for an overall accuracy of 8/11 or 72%.

Clearly the usefulness of tissue of origin from cell-free DNA is the difficult cases of Cancer of Unknown Primary (CUP), currently estimated at 2% of all cancer cases or approximately 31,480, not as a screening tool but as a useful potential diagnostic.

One method developed in Australia used physical properties of methylated cell-free DNA to differentiate between affected and healthy cell-free DNA samples. “Epigenetically reprogrammed methylation landscape drives the DNA self-assembly and serves as a universal cancer biomarker” by Sina and Trau et al. Nature Comm 2018 using electron microscopy analysis and gold nanoparticles.

The research group discovered “DNA polymeric behaviour is strongly affected by differential patterning of methylcytosine, leading to fundamental differences in DNA solvation and DNA-gold affinity between cancerous and normal genomes.” Calling their assay MethylScape with an electrochemical or colorimetric assay, their accuracy across five cancer types approached 90%. However, again, these samples were from conventionally-diagnosed patient samples, and thus again can be classified as a “Phase 1: Pre-Clinical Exploratory” study.

Existing biobanks for retrospective studies

In “The case for early detection”, Etzioni and Hartwell et al. Nature Reviews Cancer 2003 make a point often made with much more recent investigation into the etiology of cancer: “a key insight provided by use of new molecular technologies for cancer — such as expression array analysis and proteomic profiling — is that we have greatly underestimated the heterogeneity of the disease.” The emphasis of serum-based biomarkers resulted in the creation of longitudinal biobanks of pre-cancer samples: the Women’s Health Initiative, the Baltimore Longitudinal Study of Aging, and the Physician’s Health Study in the United States. However these biobanks were established with the expectation that serum (that is, protein-based) biomarkers were to ones to be measured, not the circulating cell-free DNA isolated from plasma (and plasma separated within a relatively short time from venipuncture collection referred to in this earlier blog post).

More recently, the UK Biobank was started enrolling in 2006 and currently follows 500,000 individuals ages 40-69 for 30 years.

In 2008 in China the Taizhou Longitudinal Study began enrolling 120,000 individuals ages 30-80 and is “an open-ended prospective study with very broad research aims.” Several hundred pre-cancer diagnosis blood samples were obtained by Singlera for a retrospective study using the PanSeer Assay  (manuscript currently in preparation). Remarkably, of the 159 samples that were tested in one- to four-years prior to conventional diagnosis, a full 113 (or 71%) of them came up positive with the Singlera PanSeer Assay. (Refer to a PDF of the Technical Note here.) Of 498 healthy samples tested in a blinded fashion, a full 461 were assayed negative for a specificity of 93%.

A Prospective Phase 4 is a clinical trial

The determination of ‘real-world’ false-positive and false-negative rates, and assay validation in a prospective clinical study is used in an application to the US FDA as part of the test approval process. Plenty of information about this process is provided on a FDA website page titled Overview of IVD Regulation.

And in case you were curious about a list of all Nucleic Acid Based tests, including the two colorectal cancer screening tests approved for early detection and links to their submission documents, the FDA has a useful page here.


  1. Pepe and Yasui et al. J Natl Cancer Inst 2001. "Phases of biomarker development for early detection of cancer" PMID:11459866
  2. Shen and De Carvalho et al. Nature 2018. "Sensitive tumour detection and classification using plasma cell-free DNA methylomes" PMID:30429608
  3. Moss and Dor et al. Nature Comm. 2018. "Comprehensive human cell-type methylation atlas reveals origins of circulating cell-free DNA in health and disease"PMID:30498206
  4. Sina and Trau et al. Nature Comm 2018. "Epigenetically reprogrammed methylation landscape drives the DNA self-assembly and serves as a universal cancer biomarker" PMID:30514834
  5. Etzioni and Hartwell et al. Nature Reviews Cancer 2003. "The case for early detection" PMID:12671663