Your activity: 41 p.v.
your limit has been reached. plz Donate us to allow your ip full access, Email: sshnevis@outlook.com

Evidence-based medicine

Evidence-based medicine
Authors:
Arthur T Evans, MD, MPH
Gregory Mints, MD, FACP
Section Editor:
Mark D Aronson, MD
Deputy Editor:
Carrie Armsby, MD, MPH
Literature review current through: Dec 2022. | This topic last updated: Apr 23, 2021.

INTRODUCTION — Evidence-based medicine (EBM) is the care of patients using the best available research evidence to guide clinical decision-making (figure 1) [1,2]. The value of EBM is heightened in light of the following considerations:

The volume of evidence available to guide clinical decisions continues to grow at a rapid pace (figure 2)

Improvements in research design, clinical measurements, and methods for analyzing data have led to a better understanding of how to produce valid clinical research

Despite advances in research methods, many published study results are false or draw misleading conclusions [3]

Many clinicians, even those in good standing, do not practice medicine according to the best current research evidence

The basic elements of EBM are reviewed here. They include [1]:

Formulating a clinical question

Finding the best available evidence

Assessing the validity of the evidence (including internal and external validity)

Applying the evidence in practice, in conjunction with clinical expertise and patient preferences

The focus is upon applying the results of research involving patients and clinical outcomes, such as death, disease, symptoms, and loss of function. Other kinds of evidence, such as those obtained by personal experience and laboratory studies of the pathogenesis of disease, are also useful in the care of patients but are not usually included under "evidence-based medicine." EBM is meant to complement, not replace, clinical judgment in tailoring care to individual patients. Similarly, EBM and the delivery of culturally, socially, and individually sensitive and effective care are complementary, not contradictory (figure 1).

FORMULATING A CLINICAL QUESTION — Clinical questions are frequently complex, but it is usually wise to sharpen the focus by answering more simple questions (table 1). The question must be explicitly defined before searching for the answer [4].

The search for the best answers to clinical questions begins with a tight, explicit formulation of the question [4]. For example, the question "what is the best treatment for type 2 diabetes?" is too general and broad to be answered well. For evaluating the effectiveness of an intervention, four questions should be considered (commonly referred to as "PICO") (table 2):

What is the relevant patient population?

What intervention is being considered?

What is the comparison intervention or patient population?

What outcomes are of interest?

For example, an answerable relevant question may be: "Among obese adults with type 2 diabetes, is metformin more effective than sulfonylurea drugs in preventing death?"

The approach is similar for clinical questions involving diagnosis or prognosis (table 2).

Patient population — The ultimate goal of EBM is to inform clinical decisions regarding individual patients. Ideally, therefore, one would seek answers from studies that enrolled research subjects who were very similar to one's patient. If the target population is defined too broadly, the study results may not apply to patients whose characteristics differ substantially from the typical study subject.

However, there is also some danger in defining the target population too narrowly. High-quality research of very specific groups of patients is often unavailable, and the alternative subgroup analysis of larger, more inclusive studies can be problematic because of serious methodologic concerns [5-14]. (See 'External validity' below.)

Intervention — In formulating the PICO question, it is important to specify the intervention being considered. A similar approach is used to evaluate questions regarding diagnosis or prognosis, in which case, the question ought to clearly specify the specific diagnostic test or risk factor of interest.

As with the patient population, it is important to avoid overly narrow or broad definitions of the intervention (or test or risk factor). For questions that involve drug therapy, the dose, timing, and duration of treatment need to be considered. For example, for a middle-aged man with hypertension, one may want to know whether 81 mg of aspirin taken daily and indefinitely prevents strokes. However, good data for narrowly defined treatment schedules may be unavailable, leading to a perilous reliance on subgroup analyses. Under such circumstances it may be worthwhile to relax the definition of intervention to something broader, for example, "low-dose aspirin."

Comparison — In randomized treatment trials, the comparison group can be a placebo, usual care, or active treatment. Placebo-controlled trials have two distinct advantages: They facilitate blinding and control for the placebo effect (nonspecific treatment effect). However, they do not allow one to compare the effects among real-world choices [15]. It is important that the comparison intervention be clinically appropriate (ie, an alternative intervention that would realistically be under consideration).

Outcomes — It is important to consider all patient-important outcomes (including benefits and harms). It is not sufficient to think of benefit (or harm) in general terms; one must be specific about the outcomes of interest. In particular, outcomes should be well defined, measurable, reliable, sensitive to change, and actually assess clinically relevant aspects of a patient's health.

Particular issues related to the types of outcomes measured in clinical studies include:

Composite endpoints – The use of a composite of multiple combined endpoints has the advantage of increasing the study's statistical power but can be difficult to interpret. Interpretation is easy if all component outcomes are of equal importance to the patient and the intervention affects them all to the same extent. However, this is rarely the case. When an intervention's effects are not consistent across the different outcomes and the outcomes are valued differently, then interpretation of the composite is difficult. For this reason, studies that have a composite endpoint for the primary outcome should also report the results for each of the individual outcomes that make up the composite.

For example, in a study comparing coronary bypass surgery with percutaneous angioplasty and stenting for severe coronary artery disease, the main study outcome was a composite of death, stroke, myocardial infarction, or need for repeat revascularization [16]. Compared with bypass surgery, percutaneous intervention had a significantly lower risk of stroke but a significantly higher risk of repeat revascularization. Focusing of the composite endpoint in this case is not helpful.

"Soft" outcomes – Much of clinical research focuses on objective outcomes, which include the "hard" outcomes of death and disease (for example, myocardial infarction, stroke, and loss of limb). The "softer" outcomes that measure function, pain, and quality of life are less common but, for many questions, are the key outcomes of interest. It is usually easy to measure the hard outcomes without the need for special instruments. On the other hand, outcomes that require subjective interpretation by patients or clinicians demand a carefully developed and validated measurement tool. Subjective outcomes are usually more susceptible to the placebo effect or expectation bias. Strategies to mitigate these errors, such as proper blinding, become critically important. But even the hard, objective outcomes are prone to bias.

Surrogate outcomes – Sometimes, the most clinically important outcomes are difficult to measure and a surrogate outcome becomes an easier and cheaper substitute. Surrogate outcomes are expected to predict clinical benefit or harm based on epidemiologic, pathophysiologic, or other scientific evidence [17]. The advantages of using surrogate outcomes rather than clinical outcomes is that studies can generally be done with fewer subjects and completed more quickly at lower cost. These advantages account for the prevalent use of surrogate outcomes in clinical research (45 percent of the new medications approved by the US Food and Drug Administration [FDA] between 2005 and 2012 were based on studies with surrogate outcomes) [18]. Common examples include blood pressure in trials evaluating antihypertensives and hemoglobin A1c level in trials evaluating diabetes medications.

However, the use of surrogate endpoints can lead to erroneous conclusions [19]. Furthermore, research using surrogates can be difficult to incorporate into an overall assessment of risks and benefits because these outcomes, by definition, are only indirectly important to patients. The 2010 Institute of Medicine (IOM) recommendations state that surrogate endpoints should only be used if their ability to predict clinically important outcomes is conclusively documented [17].

Even well-qualified surrogates that appear to meet the IOM standards can be problematic. A sobering example is the use of hemoglobin A1c as a surrogate, or substitute, for the outcomes of diabetes treatment that are clinically important (death, disease, and dysfunction). Several therapies that demonstrated impressive reductions in hemoglobin A1c were later found to have no effect, or harm, on clinically relevant outcomes. (See "Glycemic control and vascular complications in type 2 diabetes mellitus".)

FINDING THE EVIDENCE

Evidence-based medicine resources — Most medical information is now rapidly accessible from computers and handheld devices. However, skill is required to quickly find the desired information, while limiting irrelevant "noise." Different approaches are required depending on the reason for seeking the information:

Rapidly answering a specific clinical question, a cornerstone of EBM, requires a strategy that is fast and accurate and can be mastered by most clinicians without the need for technical sophistication.

Keeping current with developments in one's field ("knowledge management") is challenging and generally not feasible without the use of a curated resource. Answering all important clinical questions by reading, appraising, and summarizing evidence would be overwhelming and simply impossible for the individual clinician. Therefore, the bulk of these tasks must be delegated to trustworthy sources. UpToDate is a resource for this purpose; many other resources are available online. However, the fact that a resource is electronic and easily accessible does not mean it is evidence based. Wikipedia, for example, is commonly used to answer clinical questions [20]. However, Wikipedia entries can have major omissions and have been judged inadequate for the practice of EBM [21-24].

Conducting a systematic review requires an exhaustive search of the primary data using multiple search tools. This is discussed separately. (See "Systematic review and meta-analysis".)

Qualities of useful information sources for clinicians include:

Rapidly accessible (within minutes), so the information can guide clinical decisions as they arise

Targeted to the specific clinical question

Evidence-based and current

Portable

Easy to use

Within the domain of information technology, a distinction is made between a database, which is a collection of bibliographic references to medical articles (eg, Medical Literature Analysis and Retrieval System online [MEDLINE], Cumulative Index for Nursing and Allied Health Literature [CINAHL], Excerpta Medica database [EMBASE], Cochrane databases), and an access portal, which is a user interface with a built-in search engine (eg, PubMed, Ovid).

Each access portal may have access to more than one database. Access portals also may provide options for citation management and citation maps. Citation maps are networks of citation links between various articles in a database. These may be outgoing (articles cited in the bibliography of a particular paper) or incoming (other, more recent reports that cite the index article). Exploring citation maps is thus a legitimate method of searching the literature, occasionally producing novel and helpful results.

Search filters (also called "hedges," "limits," "strategies," and "clinical queries") are predefined search terms designed for a specific purpose (eg, limiting searches to guidelines or randomized controlled trials). These are both portal- and database-specific. Because the filters are platform-specific, results may be very different for seemingly identical searches.

Categories of evidence — Evidence can be summarized at three levels of complexity (figure 3) [25]:

Primary (original) research – Primary research involves data that are collected from individuals or clusters of individuals, with clusters defined by clinician, clinic, geographic region, or other factors. Within primary research, EBM practitioners should consider the hierarchy of evidence to minimize the risk of bias (figure 3). For studies evaluating therapy or harm, well-conducted randomized clinical trials are superior to observational studies, which are superior to unsystematic clinical observations [25]. Appropriate study design depends on the question being investigated (figure 4). Questions regarding benefits (and harms) of an intervention are best answered with randomized controlled trials, whereas questions regarding risk factors for disease and prognosis are best answered with prospective cohort studies.

Systematic reviews – Systematic reviews are best for answering single questions (eg, the effectiveness of tight glucose control on microvascular complications of diabetes). They are more scientifically structured than traditional reviews, being explicit about how the authors attempted to find all relevant articles, judging the scientific quality of each study, and weighing evidence from multiple studies with conflicting results. These reviews pay particular attention to including all strong research, whether or not it has been published, to avoid publication bias (positive studies are preferentially published). Systemic reviews and meta-analyses are discussed in greater detail separately. (See "Systematic review and meta-analysis".)

Summaries and guidelines – Summaries and guidelines represent the highest level of complexity. Ideally, guidelines are a synthesis of systematic reviews, original research, clinical expertise, and patient preferences. At their best, summaries and guidelines are a comprehensive synthesis of the best available evidence, from which the guidelines themselves follow. Guidelines should therefore be based on a critical appraisal of the relevant original research and systematic reviews. The quality of published guidelines are highly variable, even among those sponsored by professional organizations, with several examples of multiple guidelines on the same topic making contradictory recommendations [26]. Standards for guideline development have been put forth by several organizations including the Grading of Recommendations Assessment, Development and Evaluation (GRADE) working group; the Institute of Medicine (IOM); and the Appraisal of Guidelines for Research & Evaluation (AGREE) Instrument [27-32]. These standards are endorsed by numerous organizations, including the United States National Heart, Lung, and Blood Institute (NHLBI); the British National Institute for Health and Care Excellence (NICE) [33]; the American College of Physicians (ACP); the Cochrane Collaboration; and UpToDate [34].

The accepted standards for guideline development include:

Rely on systematic reviews

Grade the quality of available evidence

Grade the strength of recommendations

Make an explicit connection between evidence and recommendations

UpToDate uses the GRADE working group's approach to making recommendations. Further details are provided on our Editorial Policies website.

ASSESSING THE VALIDITY OF THE EVIDENCE — Clinicians should have the skills necessary to critically evaluate research articles that are important to their practice. Critical appraisal skills enhance mastery and autonomy in the practice of medicine. In addition, critical appraisal skills can help clinicians more wisely choose which information sources they use, favoring sources with explicit standards for weighing evidence. These skills can also make informal reading more efficient by making it easier to concentrate on especially strong articles and to skip weak ones. There are many opportunities to learn critical reading skills from books [35], journal articles, courses, and special sessions of professional meetings.

A number of guidelines are available that describe standards for conducting and reporting different types of studies. The set of guidelines endorsed by the International Committee of Medical Journal Editors (ICMJE) can facilitate the critical appraisal of individual studies based on the type of study:

Systematic reviews and meta-analyses – Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) [36] and PRISMA Protocols (PRISMA-P) [37]

Randomized controlled trials – Consolidated Standards of Reporting Trials (CONSORT) [38] and Standard Protocol Items: Recommendations for Interventional Trials (SPIRIT) [39]

Observational studies – Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) [40]

Diagnostic and prognostic studies – Standards for Reporting of Diagnostic Accuracy (STARD) Studies [41] and Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) [42,43]

The focus of critical appraisal is judging both internal validity and generalizability (external validity) (figure 5).

Internal validity — Internal validity refers to the question of whether the results of clinical research are correct for the patients studied. Threats to internal validity include bias and chance:

Bias – Bias is any systematic error that can produce a misleading impression of the true effect. Randomized trials are performed with the aim of reducing bias, and well-conducted trials usually have a low risk of bias. However, flaws in the conduct of clinical trials can produce biased results.

Potential sources of bias in randomized trials include:

Failure to conceal random assignment to those enrolling study subjects

Failure to blind relevant individuals (including study participants, clinicians, data collectors, outcome adjudicators, and data analysts) to group assignment

Loss to follow-up (missing outcome data)

Failure to adhere to assigned intervention

Stopping early for benefit

Preferentially publishing small (underpowered) studies with statistically significant results (publication bias)

Chance – Chance is random error, inherent in all observations. The probability of chance producing erroneous results can be minimized by studying a large number of patients. P-values are commonly misinterpreted as the probability that the findings are merely due to chance. Instead, p-values describe the probability that if the null hypothesis were true, the study would find a difference as large, or larger, than the one found. (See "Proof, p-values, and hypothesis testing".)

External validity — External validity refers to the question of whether the results of the study apply to patients outside of the study, particularly the specific patient (or population) being considered by the EBM practitioner. Study patients are typically highly selected, unlike patients in usual practice. Often, they have been referred to academic medical centers, meet stringent inclusion criteria, are free of potentially confounding conditions or disorders, and are willing to countenance the rigorous demands of study protocols. As a result, they may be systematically different from the patients most doctors see in practice. In particular, study subjects in treatment trials are often at low risk for the adverse study outcome of interest (death, disease, dysfunction, dissatisfaction). Because treatment benefits are typically confined to patients at higher risk, it is not unusual, therefore, for study results to apply to only a minority of study subjects who have a sufficiently high baseline risk [44,45]. Although treatment effect size is often related to baseline risk, many studies do not measure this relationship, making it more difficult to judge whether, and how, study results apply to a particular individual patient.

Indirect evidence — When a study involves a somewhat different population than is of interest to the EBM practitioner (eg, older, younger, sicker, healthier), some may be inclined to reject the evidence altogether, claiming that it "doesn't apply to my patient." In reality, this type of indirect evidence can help inform medical decision-making, particularly in the absence of direct evidence. Our confidence in the expected results, however, is generally lower than it would be with direct evidence.

Subgroup analyses — When the study does not address the specific patient population of interest, one strategy is to rely on subgroup analyses that evaluate results according to different patient characteristics (eg, age, sex, severity of illness). However, caution should be used in interpreting the results of subgroup analyses to avoid drawing false conclusions. Potential problems include:

Reporting bias – Subgroup analyses that are included in published reports may represent a select subset of all the analyses performed. The "interesting" subgroup analyses are preferentially presented, producing a positive reporting bias [9,46].

Multiple comparisons – Whether examining a multiplicity of different outcomes or different patient subgroups, the probability of finding at least one spurious statistically significant finding increases as the number of analyses increases [47]. Perhaps the most celebrated illustration of this effect was a report of a randomized trial comparing streptokinase, aspirin, both, or neither in the treatment of acute myocardial infarction [48]. Authors, tongue-in-cheek, reported subgroup analysis by astrologic birth sign. Subjects who were born under the signs of Gemini or Libra experienced slightly higher mortality from aspirin, whereas subjects born under the other astrologic signs enjoyed a large reduction in mortality (p<0.00001).

Lower statistical power – Subgroup analyses always involve fewer subjects than the main analysis, meaning that many are underpowered to detect a true effect (false-negative results). When underpowered subgroup analyses are coupled with selective reporting, it becomes more likely that positive subgroup results are false positives and that the magnitude of effect is exaggerated [3,49-52]. (See "Proof, p-values, and hypothesis testing", section on 'Power in a negative study'.)

To minimize the risk of drawing false conclusions from subgroup analyses, EBM practitioners should ask the following questions [11,12,14]:

Were the subgroup analyses specified a priori, including hypotheses for the direction of the differences?

Were subgroups defined by baseline risk (an approach that is frequently useful) or by post-randomization events, such as treatment adherence or changes in variables that might be affected by treatment (an approach that is usually misleading)?

Was the number of subgroup analyses limited to only a few?

Were subgroup differences analyzed by testing for effect modification (interaction) rather than separate statistical tests for each subgroup?

If multiple analyses were performed, was an appropriate statistical technique used to account for multiple comparisons (Bayesian techniques or raising the threshold for statistical significance)?

Were subgroup analyses limited to primary outcomes? Exploring subgroup differences across all secondary outcomes exacerbates the multiplicity problem.

Was an effect seen only in the subgroup analyses when the main study results were negative? This is particularly likely to represent a false-positive finding and should be regarded with great skepticism.

Subgroup differences reported in randomized controlled trials often have deficiencies in one or more of these categories, particularly failing to specify subgroup analyses a priori and failing to test for effect modification (interaction), and very few are corroborated in subsequent meta-analysis or randomized controlled trials [53]. Nevertheless, for treatment studies, high-value clinical research should almost always include a prespecified subgroup analysis of how absolute treatment benefit varies along a continuum of a risk score determined by multiple baseline variables considered together [54]. This is because, for many treatments, the absolute treatment effect is much bigger in a high-risk minority and smaller in the lower-risk majority.

SUMMARIZING THE TREATMENT EFFECT — Once validity of the evidence is ascertained, the next question to ask is what the effect of treatment is. Treatment effect sizes can be summarized using both:

Relative effect estimates (eg, relative risk [RR], odds ratio [OR], hazard ratio [HR]), and

Absolute effect estimates (eg, absolute risk reduction [ARR] or number needed to treat [NNT])

In general, it's preferable to use absolute effect estimates when counseling patients because studies have shown that using relative effect estimates tend to generate biased overenthusiasm [55,56]. However, the absolute effect of a treatment will vary depending on baseline risk and duration of follow-up [54,57,58].

On the contrary, for most interventions, RR is a more stable quantitative summary of effect size across the spectrum of baseline risk and across the duration of follow-up time [54].

Unfortunately, the published literature does not always make it easy to extract the appropriate measure of effect that fits a patient's circumstances. It is especially difficult if the results are reported using ORs or HRs because if the outcome is common, these effect estimates are trickier to convert to language that allows clinicians to explain the expected benefits to patients [59-64].

Additional details regarding different types of effect estimates are provided separately. (See "Glossary of common biostatistical and epidemiological terms", section on 'Terms used to describe the magnitude of an effect'.)

APPLYING THE EVIDENCE IN PRACTICE — The ultimate goal of EBM is to use the best evidence to improve the care of individual patients. However, applying evidence to practice is not always straightforward.

The know-do gap — There is often a gap between recommendations from the best available evidence and actual practice (sometimes referred to as the "know-do gap"). The reasons for the gap are numerous, including uncertainty whether results of large studies apply to individual patients, lack of awareness or misunderstanding of the evidence, and failure to organize care in a way that fosters use of evidence [65].

EBM is not intended to replace clinical judgment [66]. Individual patients should be cared for in light of the best available research evidence but with care tailored to their individual circumstances, including genetic makeup, past and concurrent illnesses, health-related behaviors, and personal preferences. Several studies clearly demonstrate that many clinical decisions are not made based on the best research evidence or on relevant individual patient characteristics but seem most consistent with the practice habits or practice style of the clinicians.

A substantial body of research, as well as practical experience, has demonstrated that all of us, as we care for patients, engage in systematic errors of omission or commission, relative to the best available research evidence. Prominent examples are the widespread prescription of antibiotics for acute cough or the use of radiologic tests for uncomplicated acute low back pain.

In some cases, failure to practice according to the best current evidence is due to a knowledge deficit. However, knowledge alone rarely changes behavior [66]. Behavior change usually requires a combination of interventions and influences, including time for rethinking practice habits. The table lists the possible influences on clinicians' behavior, roughly in descending order of strength, based on a growing research literature on clinician behavior change and on common sense (table 3). Usually, no single influence is strong enough to make important changes; combinations are necessary.

Differences in baseline risk — As previously mentioned, one concern that may limit application of evidence to practice is the uncertainty of whether results of large trials apply to an individual patient. Patients in clinical trials typically do not respond uniformly to a particular intervention, but rather, the effect of treatment varies from patient to patient. This is sometimes referred to as treatment effect heterogeneity [60]. Some variation is expected just from the play of chance. Confidently distinguishing between chance variation and true variation in treatment effect requires many more patients than are typically included in most clinical trials. Therefore, treatment effect heterogeneity is usually only explored during meta-analysis of multiple trials or when single trials prospectively test for differences in treatment effect across predefined subgroups. However, as discussed above, results of subgroup analyses often yield misleading conclusions. (See 'Subgroup analyses' above.)

While true differences in relative benefit across different subgroups are fairly uncommon, differences in absolute benefit across different subgroups are quite common due to variation in patients' baseline risk [45,60]. Therefore, simply confirming that a patient would have qualified for enrollment in a particular trial is insufficient justification for assuming that the study's "average" treatment effect would apply to the specific patient. For example, in the Diabetes Prevention Program study, patients in the highest quartile of baseline risk accrued large absolute benefit from either metformin or lifestyle changes [57]. However, for patients in the lowest two quartiles of baseline risk (bottom one-half), there was trivial benefit. The reported summary effect, which reflected the average benefit across the entire study population, did not accurately capture the effect in patients with high baseline risk nor those with low baseline risk. This seeming paradox is a result of the distribution of intervention benefit often being asymmetric, with a small minority at high baseline risk getting a disproportionate share of the benefit [45]. However, there are many exceptions to this general pattern [54,67].

Harms (side effects) of treatment can also have an asymmetric distribution based upon baseline risk but less so than with treatment benefits. The costs and burdens of treatment are generally uniform across all patients, regardless of baseline risk. Consequently, patients with low baseline risk for key outcomes may experience more downside from treatment than benefit [45].

Based on a heightened appreciation for differences in absolute benefits of treatment, some guideline panels have shifted towards framing recommendations based upon balancing the expected absolute benefit for a given baseline risk against the potential harms of treatment [68,69]. A challenge of this approach is accurately characterizing the patient's baseline risk, which often depends upon risk modeling and, as such, is always based on assumptions.

SUMMARY AND RECOMMENDATIONS

Evidence-based medicine (EBM) is the care of patients using the best available research evidence to guide clinical decision-making (figure 1).The basic elements of EBM include (see 'Introduction' above):

Formulating a clinical question

Finding the best available evidence

Assessing the validity of the evidence (including internal and external validity)

Applying the evidence in practice, in conjunction with clinical expertise and patient preferences

The search for the best answer to a clinical question begins with a tight definition of the question. In formulating questions regarding the effectiveness on an intervention, four components (PICO: patient population, intervention, comparison, outcomes) should be considered (table 2). (See 'Formulating a clinical question' above.)

Evidence can be summarized at three levels of complexity: primary research, systematic reviews, and summaries and guidelines (figure 3). Appropriate research study design depends on the question being investigated (figure 4). (See 'Categories of evidence' above.)

Clinicians should have the skills necessary to critically evaluate research articles that are important to their practice. Critical appraisal skills enhance mastery and autonomy in the practice of medicine. The focus of critical appraisal is judging both internal validity and generalizability (external validity) (figure 5). (See 'Assessing the validity of the evidence' above.)

Full implementation of EBM should include a realistic plan for changing clinical behavior as needed (table 3). This implementation must include, but is not limited to, access to information. (See 'Applying the evidence in practice' above.)

ACKNOWLEDGMENT — The UpToDate editorial staff acknowledges Robert H Fletcher, MD, MSc, who contributed to an earlier version of this topic review.

  1. Sackett DL, Straus SE, Richardson WS, et al. Evidence-based medicine: How to practice and teach EBM, 2nd ed, Churchill Livingstone, Edinburgh 2000.
  2. Sackett DL, Rosenberg WM, Gray JA, et al. Evidence based medicine: what it is and what it isn't. BMJ 1996; 312:71.
  3. Ioannidis JP. Why most published research findings are false. PLoS Med 2005; 2:e124.
  4. Richardson WS, Wilson MC, Nishikawa J, Hayward RS. The well-built clinical question: a key to evidence-based decisions. ACP J Club 1995; 123:A12.
  5. Assmann SF, Pocock SJ, Enos LE, Kasten LE. Subgroup analysis and other (mis)uses of baseline data in clinical trials. Lancet 2000; 355:1064.
  6. Sun X, Briel M, Busse JW, et al. Credibility of claims of subgroup effects in randomised controlled trials: systematic review. BMJ 2012; 344:e1553.
  7. Fernandez Y Garcia E, Nguyen H, Duan N, et al. Assessing Heterogeneity of Treatment Effects: Are Authors Misinterpreting Their Results? Health Serv Res 2010; 45:283.
  8. Head SJ, Kaul S, Tijssen JG, et al. Subgroup analyses in trial reports comparing percutaneous coronary intervention with coronary artery bypass surgery. JAMA 2013; 310:2097.
  9. Kasenda B, Schandelmaier S, Sun X, et al. Subgroup analyses in randomised controlled trials: cohort study on trial protocols and journal publications. BMJ 2014; 349:g4539.
  10. Zhang S, Liang F, Li W, Hu X. Subgroup Analyses in Reporting of Phase III Clinical Trials in Solid Tumors. J Clin Oncol 2015; 33:1697.
  11. Sun X, Ioannidis JP, Agoritsas T, et al. How to use a subgroup analysis: users' guide to the medical literature. JAMA 2014; 311:405.
  12. Fletcher J. Subgroup analyses: how to avoid being misled. BMJ 2007; 335:96.
  13. Aronson D. Subgroup analyses with special reference to the effect of antiplatelet agents in acute coronary syndromes. Thromb Haemost 2014; 112:16.
  14. Rothwell PM. Treating individuals 2. Subgroup analysis in randomised controlled trials: importance, indications, and interpretation. Lancet 2005; 365:176.
  15. Vickers AJ, de Craen AJ. Why use placebos in clinical trials? A narrative review of the methodological literature. J Clin Epidemiol 2000; 53:157.
  16. Serruys PW, Morice MC, Kappetein AP, et al. Percutaneous coronary intervention versus coronary-artery bypass grafting for severe coronary artery disease. N Engl J Med 2009; 360:961.
  17. Institute of Medicine (US) Committee on Qualification of Biomarkers and Surrogate Endpoints in Chronic Disease; Editors Micheel CM and Ball JR. National Academies Press, Washington, DC. 2010. Available at: https://www.ncbi.nlm.nih.gov/pubmedhealth/PMH0079490/ (Accessed on September 26, 2016).
  18. Downing NS, Aminawung JA, Shah ND, et al. Clinical trial evidence supporting FDA approval of novel therapeutic agents, 2005-2012. JAMA 2014; 311:368.
  19. Yudkin JS, Lipska KJ, Montori VM. The idolatry of the surrogate. BMJ 2011; 343:d7995.
  20. Allahwala UK, Nadkarni A, Sebaratnam DF. Wikipedia use amongst medical students - new insights into the digital revolution. Med Teach 2013; 35:337.
  21. Azer SA, AlSwaidan NM, Alshwairikh LA, AlShammari JM. Accuracy and readability of cardiovascular entries on Wikipedia: are they reliable learning resources for medical students? BMJ Open 2015; 5:e008187.
  22. Kräenbring J, Monzon Penza T, Gutmann J, et al. Accuracy and completeness of drug information in Wikipedia: a comparison with standard textbooks of pharmacology. PLoS One 2014; 9:e106930.
  23. Kupferberg N, Protus BM. Accuracy and completeness of drug information in Wikipedia: an assessment. J Med Libr Assoc 2011; 99:310.
  24. Hasty RT, Garbalosa RC, Barbato VA, et al. Wikipedia vs peer-reviewed medical literature for information about the 10 most costly medical conditions. J Am Osteopath Assoc 2014; 114:368.
  25. Agoritsas T, Vandvik PO, Neumann I, et al. Finding Current Best Evidence. In: Users' Guides to the Medical Literature: A Manual for Evidence-Based Clinical Practice, 3rd Ed, Guyatt G, Rennie D, Meade MO, Cook DJ (Eds), McGraw-Hill Education, 2015. p.29.
  26. Burda BU, Norris SL, Holmer HK, et al. Quality varies across clinical practice guidelines for mammography screening in women aged 40-49 years as assessed by AGREE and AMSTAR instruments. J Clin Epidemiol 2011; 64:968.
  27. Clinical Practice Guidelines We Can Trust. In: Graham R, Mancher M, Wolman DM, Greenfield S, Steinberg E, eds. Washington, D.C.: The National Academies Press. Committee on Standards for Developing Trustworthy Clinical Practice Guidelines; Board on Health Care Services; Institute of Medicine of the National Academy of Sciences.; 2011. Available at: https://www.ncbi.nlm.nih.gov/pubmedhealth/PMH0079468/ (Accessed on September 28, 2016).
  28. Brouwers MC, Kho ME, Browman GP, et al. Development of the AGREE II, part 1: performance, usefulness and areas for improvement. CMAJ 2010; 182:1045.
  29. Brouwers MC, Kho ME, Browman GP, et al. Development of the AGREE II, part 2: assessment of validity of items and tools to support application. CMAJ 2010; 182:E472.
  30. Neumann I, Santesso N, Akl EA, et al. A guide for health professionals to interpret and use recommendations in guidelines developed with the GRADE approach. J Clin Epidemiol 2016; 72:45.
  31. Guyatt G, Oxman AD, Akl EA, et al. GRADE guidelines: 1. Introduction-GRADE evidence profiles and summary of findings tables. J Clin Epidemiol 2011; 64:383.
  32. Andrews JC, Schünemann HJ, Oxman AD, et al. GRADE guidelines: 15. Going from evidence to recommendation-determinants of a recommendation's direction and strength. J Clin Epidemiol 2013; 66:726.
  33. Thornton J, Alderson P, Tan T, et al. Introducing GRADE across the NICE clinical guideline program. J Clin Epidemiol 2013; 66:124.
  34. GRADE working group. Organizations that have endorsed or that are using GRADE. Available at: http://www.gradeworkinggroup.org (Accessed on September 28, 2016).
  35. Users' Guides to the Medical Literature: A manual for evidence-based clinical practice, 3rd Ed, Guyatt G, Drummond R, Meade MO, Cook DJ (Eds), McGraw-Hill Education, 2015.
  36. Moher D, Liberati A, Tetzlaff J, et al. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. Ann Intern Med 2009; 151:264.
  37. Moher D, Shamseer L, Clarke M, et al. Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015 statement. Syst Rev 2015; 4:1.
  38. Schulz KF, Altman DG, Moher D, CONSORT Group. CONSORT 2010 statement: updated guidelines for reporting parallel group randomized trials. Ann Intern Med 2010; 152:726.
  39. Chan AW, Tetzlaff JM, Altman DG, et al. SPIRIT 2013 statement: defining standard protocol items for clinical trials. Ann Intern Med 2013; 158:200.
  40. Vandenbroucke JP, von Elm E, Altman DG, et al. Strengthening the Reporting of Observational Studies in Epidemiology (STROBE): explanation and elaboration. Ann Intern Med 2007; 147:W163.
  41. Bossuyt PM, Reitsma JB, Bruns DE, et al. Towards complete and accurate reporting of studies of diagnostic accuracy: The STARD Initiative. Ann Intern Med 2003; 138:40.
  42. Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): the TRIPOD statement. Ann Intern Med 2015; 162:55.
  43. Moons KG, Altman DG, Reitsma JB, et al. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): explanation and elaboration. Ann Intern Med 2015; 162:W1.
  44. Kent DM, Hayward RA. Limitations of applying summary results of clinical trials to individual patients: the need for risk stratification. JAMA 2007; 298:1209.
  45. Vickers AJ, Kent DM. The Lake Wobegon Effect: Why Most Patients Are at Below-Average Risk. Ann Intern Med 2015; 162:866.
  46. Dwan K, Altman DG, Clarke M, et al. Evidence for the selective reporting of analyses and discrepancies in clinical trials: a systematic review of cohort studies of clinical trials. PLoS Med 2014; 11:e1001666.
  47. Schulz KF, Grimes DA. Multiplicity in randomised trials II: subgroup and interim analyses. Lancet 2005; 365:1657.
  48. Randomised trial of intravenous streptokinase, oral aspirin, both, or neither among 17,187 cases of suspected acute myocardial infarction: ISIS-2. ISIS-2 (Second International Study of Infarct Survival) Collaborative Group. Lancet 1988; 2:349.
  49. Wittes J. On looking at subgroups. Circulation 2009; 119:912.
  50. Brookes ST, Whitley E, Peters TJ, et al. Subgroup analyses in randomised controlled trials: quantifying the risks of false-positives and false-negatives. Health Technol Assess 2001; 5:1.
  51. Reinhart A. Statistics Done Wrong: The Woefully Complete Guide., No Starch Press, San Francisco, CA 2015.
  52. Lauer MS. From hot hands to declining effects: the risks of small numbers. J Am Coll Cardiol 2012; 60:72.
  53. Wallach JD, Sullivan PG, Trepanowski JF, et al. Evaluation of Evidence of Statistical Support and Corroboration of Subgroup Claims in Randomized Clinical Trials. JAMA Intern Med 2017; 177:554.
  54. Kent DM, Nelson J, Dahabreh IJ, et al. Risk and treatment effect heterogeneity: re-analysis of individual participant data from 32 large clinical trials. Int J Epidemiol 2016; 45:2075.
  55. Chao C, Studts JL, Abell T, et al. Adjuvant chemotherapy for breast cancer: how presentation of recurrence risk influences decision-making. J Clin Oncol 2003; 21:4299.
  56. Perneger TV, Agoritsas T. Doctors and patients' susceptibility to framing bias: a randomized trial. J Gen Intern Med 2011; 26:1411.
  57. Sussman JB, Kent DM, Nelson JP, Hayward RA. Improving diabetes prevention with benefit based tailored treatment: risk based reanalysis of Diabetes Prevention Program. BMJ 2015; 350:h454.
  58. Stang A, Poole C, Bender R. Common problems related to the use of number needed to treat. J Clin Epidemiol 2010; 63:820.
  59. Spruance SL, Reid JE, Grace M, Samore M. Hazard ratio in clinical trials. Antimicrob Agents Chemother 2004; 48:2787.
  60. Kent DM, Trikalinos TA, Hill MD. Are unadjusted analyses of clinical trials inappropriately biased toward the null? Stroke 2009; 40:672.
  61. Knol MJ, Duijnhoven RG, Grobbee DE, et al. Potential misinterpretation of treatment effects due to use of odds ratios and logistic regression in randomized controlled trials. PLoS One 2011; 6:e21248.
  62. Norton EC, Dowd BE. Log Odds and the Interpretation of Logit Models. Health Serv Res 2018; 53:859.
  63. Sheldrick RC, Chung PJ, Jacobson RM. Math Matters: How Misinterpretation of Odds Ratios and Risk Ratios May Influence Conclusions. Acad Pediatr 2017; 17:1.
  64. Schmidt CO, Kohlmann T. When to use the odds ratio or the relative risk? Int J Public Health 2008; 53:165.
  65. Haynes B, Haines A. Barriers and bridges to evidence based clinical practice. BMJ 1998; 317:273.
  66. Davis DA, Thomson MA, Oxman AD, Haynes RB. Changing physician performance. A systematic review of the effect of continuing medical education strategies. JAMA 1995; 274:700.
  67. Rothwell PM. Can overall results of clinical trials be applied to all patients? Lancet 1995; 345:1616.
  68. Grundy SM, Stone NJ, Bailey AL, et al. 2018 AHA/ACC/AACVPR/AAPA/ABC/ACPM/ADA/AGS/APhA/ASPC/NLA/PCNA Guideline on the Management of Blood Cholesterol: Executive Summary: A Report of the American College of Cardiology/American Heart Association Task Force on Clinical Practice Guidelines. Circulation 2019; 139:e1046.
  69. US Preventive Services Task Force, Bibbins-Domingo K, Grossman DC, et al. Statin Use for the Primary Prevention of Cardiovascular Disease in Adults: US Preventive Services Task Force Recommendation Statement. JAMA 2016; 316:1997.
Topic 2763 Version 27.0

References