Levels of Evidence for Adult and Pediatric Cancer Treatment Studies (PDQ®)
A variety of endpoints may be measured and reported from clinical studies in oncology. These may include total mortality (or survival from the initiation of therapy), cause-specific mortality, quality of life, or indirect surrogates of the four outcomes, such as event-free survival, disease-free survival, progression-free survival, or tumor response rate. Endpoints may also be determined within study designs of varying strength, ranging from the gold standard—the randomized, double-blinded controlled clinical trial—to case series experiences from nonconsecutive patients. The PDQ editorial boards use a formal ranking system of levels of evidence to help the reader judge the strength of evidence linked to the reported results of a therapeutic strategy. For any given therapy, results can be ranked on each of the following two scales: (1) strength of the study design and (2) strength of the endpoints. Together, the two rankings give an idea of the overall level of evidence. Depending on perspective, different expert panels, professional organizations, or individual physicians may use different cut points of overall strength of evidence in formulating therapeutic guidelines or in taking action; however, a formal description of the level of evidence provides a uniform framework for the data, leading to specific recommendations.
The PDQ Adult Treatment Editorial Board and the PDQ Pediatric Treatment Editorial Board add information on levels of evidence, described below, to the PDQ Adult Cancer Treatment Summaries and the PDQ Pediatric Cancer Treatment Summaries when appropriate.
Strength of Study Design
The various types of study design are described below in descending order of strength:
- Randomized, controlled, clinical trials.Double-blinded.Nonblinded treatment delivery.The randomized, double-blinded, controlled, clinical trial (1i) is the gold standard of study design. To achieve this ranking, the study allocation must be blinded to the physician both before and after the randomization and the treatment assignment take place. This design provides protection from allocation bias by the investigator and from bias in assessment of outcomes by both the investigator and the patient. Unfortunately, most clinical trials in oncology cannot be double-blinded after treatment allocation because procedures or toxic effects often vary substantially among study allocations in ways that are obvious to both the health care professional and the patient. In most cases, however, it should be possible to blind the investigator and the patient until the randomization has been made. If blinding of the therapy delivered cannot be accomplished, a rank of 1ii is assigned. Meta-analyses of randomized studies offer a quantitative synthesis of previously conducted studies. The strength of evidence from a meta-analysis is based on the quality of the conduct of individual studies. Moreover, meta-analyses can magnify small systematic errors in individual studies. A study comparing the results of single, large, randomized trials to those of meta-analyses of smaller trials published earlier on the same topics showed only fair agreement (kappa statistic = 0.35). Outcomes of the large, randomized, controlled trials were not predicted accurately by the meta-analysis 35% of the time. Meta-analyses performed by different investigators to address the same clinical issue can reach contradictory conclusions. Therefore, meta-analyses of randomized studies are placed in the same category of strength of evidence as are randomized studies, not at a higher level.Subset analyses of randomized studies are subject to errors inherent in multiplicity (i.e., statistically significant results to be expected as a result of random variation of measured effects in multiple subsets). Therefore, subset analyses do not represent the same strength of evidence as the overall analysis of a randomized trial as designed unless explicit prospective hypotheses are made for the analyzed subset. Otherwise, subset analyses should be placed in the next lower category of study design (nonrandomized, controlled, clinical trials).
- Nonrandomized, controlled, clinical trials.This category includes trials in which treatment allocation was made by birth date, chart number, day of clinic appointment, bed availability, or any other strategy that would make the allocation known to the investigator before informed consent is obtained from the patient. An imbalance can occur in treatment allocation under such circumstances. For the reasons given above, subset analyses within randomized trials often fall into this category of evidence.
- Case series.Population-based, consecutive series.Consecutive cases (not population-based).Nonconsecutive cases.These clinical experiences are the weakest form of study design, but they may be the only available or practical information in support of a therapeutic strategy, especially in the case of rare diseases or when the evolution of the therapy predates the common use of randomized study designs in medical practice. They may also provide the only practical design when treatments in study arms are radically different (e.g., amputation vs. limb-sparing surgery). Nevertheless, these experiences do not have internal controls and must look to outside experiences for comparison. This always raises the issues of patient selection and comparability with other populations. In order of generalizability to other populations are population-based series, nonpopulation-based but consecutive series, and nonconsecutive cases.
- LeLorier J, Grégoire G, Benhaddad A, et al.: Discrepancies between meta-analyses and subsequent large randomized, controlled trials. N Engl J Med 337 (8): 536-42, 1997.
- Bailar JC 3rd: The promise and problems of meta-analysis. N Engl J Med 337 (8): 559-61, 1997.
Strength of Endpoints
Commonly measured endpoints for adult and pediatric cancer treatment studies are listed below in descending order of strength:
- Total mortality (or overall survival from a defined time).This outcome is arguably the most important one to patients and is also the most easily defined and least subject to investigator bias.
- Cause-specific mortality (or cause-specific mortality from a defined time).Although this may be of the most biologic importance in a disease-specific intervention, it is a more subjective endpoint than total mortality and more subject to investigator bias in its determination. This endpoint may also miss important effects of therapy that may actually shorten overall survival.
- Carefully assessed quality of life.This is an extremely important endpoint to patients. Careful documentation of this endpoint within a strong study design is therefore sufficient for most physicians to incorporate a treatment into their practices.
- Indirect surrogates.Event-free survival. Disease-free survival.Progression-free survival.Tumor response rate.These endpoints may be subject to investigator interpretation. More importantly, they may, but do not automatically, translate into direct patient benefit such as survival or quality of life. Nevertheless, it is rational in many circumstances to use a treatment that improves these surrogate endpoints while awaiting a more definitive endpoint to support its use.
Because studies or clinical experiences are ranked both by strength of design and importance of endpoint, a given study would have a two-tiered ranking (e.g., 1iiA for a nonblinded randomized study showing a favorable outcome in overall survival and 3iiiDiv for a phase II trial of selected patients with response rate as the outcome). In addition, all recommendations must take into account other issues that cannot be so easily quantified, such as toxicity, width of confidence intervals of observations, trial size, quality assurance in the trial, and cost. Nevertheless, the PDQ ranking system provides an ordinal categorization of strength of evidence as a starting point for discussions of study results.